Big Data in Medical Science and Healthcare Management: Diagnosis, Therapy, Side Effects 9783110445749, 9783110445282

Big Data in medical science – what exactly is that? What are the potentials for healthcare management? Where is Big Data

222 63 4MB

English Pages 290 [258] Year 2015

Table of contents :
Autopilot and “Doctor Algorithm„•?
Contents
1. Intro Big Data for Healthcare?
2. Information Management for Systems Medicine – on the Next Digital Threshold
3. Some Philosophical Thoughts on Big Data
4. Big Data from a Health Insurance Company′s Point of View
5. Big Data and the Family Doctor
6. How Value is Created from Data: Experiences from the Integrated Health Care System, “Gesundes Kinzigtal„• (Healthy Kinzigtal)
7. Ethics
8. The New Data-Supported Quality Assurance of the Federal Joint Committee: Opportunities and Challenges
9. Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies
10. Big Data – More Risks than Benefits for Healthcare?
11. Big Data – An Efficiency Boost in the Healthcare Sector
12. Medical Big Data and Data Protection
13. Big Data in Healthcare from a Business Consulting (Accenture) Point of View
14. Influence of Big Pharma on Medicine, Logistics and Data Technology in a State of Transition
15. Semantics and Big Data Semantic Methods for Data Processing and Searching Large Amounts of Data
16. Quantified Self, Wearable Technologies and Personal Data
17. “For the Benefit of the Patient„• ... What Does the Patient Say to That?
18. Visualization – What Does Big Data Actually Look Like?
19. The Digital Patient?
Publisher and Index of Authors
Glossary
Testimonials

Recommend Papers

Management Innovation and Big Data 9811992304, 9789811992308

Adhering to the combination of theoretical introduction and practical case introduction, this book summarizes the basic

188 12 3MB Read more

Medical Image Synthesis: Methods and Clinical Applications (Imaging in Medical Diagnosis and Therapy) [1 ed.] 1032152842, 9781032152844

Image synthesis across and within medical imaging modalities is an active area of research with broad applications in ra

161 57 66MB Read more

Side Effects: How Our Healthcare Lost Its Way – And How We Fix It 9781786495389, 9781786495365, 1786495384

***A Waterstones Best Books of 2022 pick*** 'David Haslam is uniquely placed to reflect on how healthcare has lost

140 87 6MB Read more

Fluoroquinolone-Associated Disability (FQAD) - Pathogenesis, Diagnostics, Therapy and Diagnostic Criteria: Side-effects of Fluoroquinolones 3030741729, 9783030741723

In this book, Stefan Pieper supports doctors and therapists in easily diagnosing FQAD and in better and more adequately

100 52 1MB Read more

Big Data Management: Data Governance Principles for Big Data Analytics 9783110664065, 9783110662917

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers b

180 66 2MB Read more

Colon Cancer Diagnosis and Therapy Vol. 3 (Colon Cancer Diagnosis and Therapy, 3) 3030727017, 9783030727017

Colorectal cancer (CRC) is a major global health challenge as the third leading cause for cancer related mortalities wor

111 58 Read more

Artificial Intelligence for Data-Driven Medical Diagnosis 9783110668322, 9783110667813

This book collects research works of data-driven medical diagnosis done via Artificial Intelligence based solutions, suc

190 6 5MB Read more

Artificial Intelligence for Data-Driven Medical Diagnosis 9783110668322, 9783110667813

This book collects research works of data-driven medical diagnosis done via Artificial Intelligence based solutions, suc

143 64 18MB Read more

The Effects of Globalization on National Labor Markets: Diagnosis and Therapy [1 ed.] 9783428522996, 9783428122998

Die Auswirkungen der Globalisierung werden vielfach als Bedrohung für die nationalen (Arbeits)Märkte angesehen. Die 69.

119 14 662KB Read more

Big Data Management: Data Governance Principles for Big Data Analytics 9783110662917, 9783110664065, 9783110664324, 9781547417957, 3110662914

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers

185 110 2MB Read more

Big Data in Medical Science and Healthcare Management: Diagnosis, Therapy, Side Effects
9783110445749, 9783110445282

Author / Uploaded
Peter Langkafel (editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Peter Langkafel (Ed.) Big Data in Medical Science and Healthcare Management

Also of interest Advanced Data Management Lena Wiese, 2015 ISBN 978-3-11-044140-6, e-ISBN 978-3-11-044141-3, e-ISBN (EPUB) 978-3-11-043307-4

Healthcare Mario Glowik, Slawomir Smyczek (Eds.), 2015 ISBN 978-3-11-041468-4, e-ISBN 978-3-11-041484-4, e-ISBN (EPUB) 978-3-11-041491-2, Set-ISBN 978-3-11-041485-1

Data Mining Jürgen Cleve, Uwe Lämmel, 2014 ISBN 978-3-486-71391-6, e-ISBN 978-3-486-72034-1, e-ISBN (EPUB) 978-3-486-99071-3

Sebastian Müller

BIG DATA ANALYSEN PROGRAMMIEREN IN DER CLOUD

SOFTWARETECHNIK

Big Data Analysen Sebastian Müller, 2017 ISBN 978-3-11-045552-6, e-ISBN 978-3-11-045780-3, e-ISBN (EPUB) 978-3-11-045561-8, Set-ISBN 978-3-11-045781-0

Big Data in Medical Science and Healthcare Management  Diagnosis, Therapy, Side Effects

Published by Dr. med. Peter Langkafel, MBA

ISBN 978-3-11-044528-2 e-ISBN (PDF) 978-3-11-044574-9 Set-ISBN 978-3-11-044575-6 Library of Congress Cataloging-in-Publication Data A CIP catalogue record for this book has been applied for at the Library of Congress. Bibliographic information published by the Deutsche Nationalbibliothek The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available in the Internet at http://dnb.dnb.de. With kind permission of medhochzwei Verlag. The german edition of Langkafel, Peter (Ed.), “Big Data in Medizin und Gesundheitswirtschaft – Diagnosen, Therapien, Nebenwirkungen” has been published by medhochzwei Verlag, Alte Eppelheimer Straße 42/1, 69115 Heidelberg, Germany © 2014 medhochzwei Verlag GmbH, Heidelberg www.medhochzwei-verlag.de Translation: Thorsten D. Lonishen, CL-Communication GmbH, Germany: www.cl-communication.com © 2016 Walter de Gruyter GmbH, Berlin/Boston Cover image: Shironosov/iStock/Thinkstock Typesetting: Lumina Datamatics Printing and binding: CPI books GmbH, Leck ♾ Printed on acid free paper Printed in Germany www.degruyter.com

For Gudrun, Arved, Napurga and Anio. You are my best Big Datas!

Autopilot and “Doctor Algorithm”? The first flight with the help of an autopilot was shown at the world exhibition in Paris in the year 1914: Both the vertical as well as the horizontal directions of flight were controlled by two gyroscopes. These were driven by a wind-powered generator located behind the propeller. What was a fascinating novelty at the time is today taken for granted. Nowadays, airline pilots on an average flight actually pilot the plane by themselves for only about 3 minutes.¹ The rest is controlled by modern automation using a variety of sensors and onboard computers. Most readers will probably not pilot modern aircrafts very often – but autopilots in modern cars have already become a part of daily life, and these days most drivers will no longer want to do without them. Initially, digital maps may have led to some bizarre accidents – such as driving into a lake on a clear day. But in modern passenger cars Big Data is taking on an ever increasing part of driving behavior: speed, distance, braking behavior, directional stability, etc. The range of auxiliary functions is constantly growing, which is also true for their acceptance. A modern car today incorporates more information technology than the Apollo rocket, which enabled a manned mission to the moon in 1969. Medicine has evolved over the last 100 years – but medical autopilots are not available, or should we say not yet available? For some, this may be an inspiring vision of the future – for others the damnable end of treatment with a “human touch.” Particularly in the field of medicine, a huge, unmanageable amount of information is generated every day – but computers as coaches and not as data servants: Is that vision a long way off or have we almost reached it already? The issue concerns much more than mere technical feasibility. If we want to understand Big Data in medicine, we have to take a broader view of the matter. In this book, more than 20 different experts, from the broader field of and around medicine, have written articles concerning completely different aspects of the issue – thus including the conditions and possibilities of “automation of everyday (medical) life.” Roughly 100 years ago the philosopher and mathematician Alfred North Whitehead described civilization as something “that develops by increasing the number of important tasks that we can perform without thinking about them.”² However, the impact in the 21st century is quite different: “Automation does not simply replace human activities but actually changes them, and often does so in a manner that was neither intended nor envisaged by its developers.”

1 Nicholas Carr: Die Herrschaft der Maschinen: Blätter für deutsche und internationale Politik, 2/2014. 2 Raja Parasuraman, quoted from: Carr: Die Herrschaft der Maschinen: Blätter für deutsche und internationale Politik, 2/2014.

VIII  Autopilot and “Doctor Algorithm”?

Big Data – in medicine, too – may be changing the world more than we can yet understand or want or wish to acknowledge. Special thanks goes out to all authors for their contributions to this kaleidoscope: The texts are hopefully also reflected in each other. We often see kaleidoscopes as a “mere” children’s toy. With a little technical skill, however, they can also be used as a “microscope” or “telescope” – but even here, software has already been developed that can simulate these effects… I hope you enjoy the reading experience and may your insights be small or Big! Peter Langkafel

Contents Peter Langkafel 1 Intro Big Data for Healthcare?  1 Schepers Josef and Martin Peuker 2 Information Management for Systems Medicine – on the Next Digital Threshold  33 Albrecht von Müller 3 Some Philosophical Thoughts on Big Data  45 Thomas Brunner 4 Big Data from a Health Insurance Company’s Point of View  53 Harald Kamps 5 Big Data and the Family Doctor  63 Alexander Pimperl, Birger Dittmann, Alexander Fischer, Timo Schulte, Pascal Wendel, Martin Wetzel and Helmut Hildebrandt 6 How Value is Created from Data: Experiences from the Integrated Health Care System, “Gesundes Kinzigtal” (Healthy Kinzigtal)  69 Rainer Röhrig and Markus A. Weigand 7 Ethics  89 Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning 8 The New Data-Supported Quality Assurance of the Federal Joint Committee: Opportunities and Challenges  101 Werner Eberhardt 9 Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies  115 Axel Wehmeier and Timo Baumann 10 Big Data – More Risks than Benefits for Healthcare?  123 Marcus Zimmermann-Rittereiser and Hartmut Schaper 11 Big Data – An Efficiency Boost in the Healthcare Sector  131

X  Contents

Thilo Weichert 12 Medical Big Data and Data Protection  139 Sebastian Krolop and Henri Souchon 13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View  151 Peer Laslo 14 Influence of Big Pharma onMedicine, Logistics and Data Technology in a State of Transition  165 Michael Engelhorn 15 Semantics and Big Data Semantic Methods for Data Processing and Searching Large Amounts of Data  177 Florian Schumacher 16 Quantified Self, Wearable Technologies and Personal Data  197 Axel Mühlbacher and Anika Kaczynski 17 “For the Benefit of the Patient” … What Does the Patient Say to That?  213 Peter Langkafel 18 Visualization – What Does Big Data Actually Look Like?  223 Peter Langkafel 19 The Digital Patient?  231 Publisher and Index of Authors  235 Glossary  241 Testimonials  247

Peter Langkafel

1 Intro Big Data for Healthcare? Content 1 Big Data  2 1.1 … or “Little Sexy Numbers”?  2 1.2 The First Mister Big Data?  3 1.3 Mr. Big Data in Medicine?  4 1.4 Mr. Interpret Big Data  6 1.5 The Only Statistics You Can Trust, Are …  7 1.6 Mr. Understand Data  8 1.7 Datability  8 1.8 Big Data in the Bundestag  9 1.9 The Three Vs of Big Data  9 1.10 Can Medicine Learn from Other Industries?  10 2 Big Data Framework in Medicine  11 2.1 Fields of Action in Medicine  13 2.2 It’s Time for Big Data  15 2.3 The Example of Google Flu Trends  17 2.4 Big Data Adoption Model in Medicine  18 3 Big Data in the Context of Health – More Than Old Wine in New Bottles?  19 3.1 The Difference to Classic Clinical Analysis (Old Wine in New Bottles?)  20 3.2 Is Big Data in the Hands of Medical Controlling Degenerating to a DRG Optimization Technology?  20 3.3 Are Hospitals Prepared for Big Data?  21 3.4 Are Internal Hospital Big Data Analyses Data Protection Cliffs, and if so, Where Exactly?  21 4 Personalized Medicine and Big Data  22 4.1 The (Genetic) Human in Numbers  23 4.2 Personalized Medicine?  26 4.3 Statement by the German Ethics Council  28 4.4 23 and me?  29 4.5 Big Data, “Personalized Medicine” and IT  30 Literature  31

2  Peter Langkafel

1 Big Data Can you remember the first time you heard the term Big Data? This buzzword has recently found its way into event titles, panel discussions, articles and background discussions at a phenomenal rate. How is it possible that two little words could and still can celebrate such global success? New terms always act as projection screens onto which the many different protagonists and players can interpret their version and vision of things. The wider and more emotional the interpretation of such neologisms, the more promising the word creation is in itself. Big Data means fear, fascination and overestimation at the same time. Perhaps the rapid rise of this word pair can be best explained if we try to figure out its etymology, i. e., the origin of the words: The little word “big” is generally a rather inaccurate term. Big is always relative and requires a comparison or a reference to actually make sense. Where there is big, there also has to be small. “Data” is just as imprecise. Those who already know a little about computer science might be able to distinguish between data, information and know-how. For example: The number 41 as data actually says very little. However, if it is specified as an indication of a person’s body temperature in degrees Celsius,¹ it makes more sense. This information in the right context provides the knowledge that this person may be in extreme danger (unless they’re sitting in a sauna). So the special thing about Big Data is not so much the two individual words, but actually their combination: Big Data – how big is the data that is stored about me actually? And where and how is it stored? And how is it analyzed? What benefits and damage can I assume? How far is “big brother” from Big Data? In 2013 the scandals surrounding the NSA (National Security Agency) and NSU (National Socialist Underground) caused a stir around the world – and there, too, the issue of Big Data was involved: Edward Snowden courageously made public the vast amounts of data gathered by organizations such as the NSA and the extent to which they are stored and interpreted. The scandal surrounding the gang of murderers in the NSU showed us that data may be at hand, but for various reasons may not be made available to the right people or appropriately correlated and connected. When it comes to the media, these two scandals are certainly an important subtext of the meteoric rise of Big Data.

1.1 … or “Little Sexy Numbers”? Its fascinating inaccuracy is therefore both an advantage and a disadvantage of this new expression. It is impossible to show the impact of Big Data² in all areas. BITKOM, 1 For the sake of better readability, gender-specific differentiation will be dispensed with. Gender specific terms shall apply to both genders for the sake of equality. 2 Mayer-Schönberger/Cukier: Big Data: Die Revolution, die unser Leben verändern wird. 2013.

1 Intro Big Data for Healthcare?



3

for example, calls Big Data the “raw material of the 21st century.” Again, this metaphor also permits both interpretations: Raw materials are the source of wealth – but raw materials have been and still are the cause of wars… Hype or hubris – both are included in Big Data.

1.2 The First Mister Big Data? Maybe it was just a little mishap that would go on to dramatically change the way we see the world: An Italian scholar named Galileo Galilei had acquired a telescope invented by Jan Lippershey thanks to his good contacts in what is now the Netherlands. This happened about 400 years ago, in the year 1609. He looked at it and was marveling at the four-fold magnification of the world – when it slipped out of his hands and fell to the ground. This isn’t historically documented, but it could have happened. He looked at the lenses and the construction. While trying to reassemble the telescope, he had the idea of arranging the lenses differently and perhaps also of using differently curved lenses. After only a short time he had reconstructed a telescope that enabled a 33-fold magnification of the world. Of course, he was also able to look into the windows of the houses and mansions of his neighbors – but that had little impact on the understanding of the world. Instead, he observed the night sky over Veneto – including Jupiter with its various satellites, and of course the moon. Every night he looked at the moon and especially the shadows caused by the hills and craters on its surface. He studied them with increasing excitement and began to sketch and chart them. When trying to explain the shadows on the moon, he began to have second thoughts and then became certain: They could only be explained if the sun did not revolve around the earth, but the earth revolved around the sun and the moon, in turn, around the earth. The shadows that became apparent to him not only caused him quite some personal trouble (“And yet it moves.”) – but also destroyed the Ptolemaic or geocentric view of the world and proved the heliocentric system according to Nicolaus Copernicus. The work “de revolutionibus orbium coelestium” published in 1543 (in the year of his death, for safety’s sake) had highest heretical potential! And now Galileo Galilei was actually able to prove that with some sketches of the moon?³ The millennia-old understanding of the earth as the center of the world was destroyed by a few small lenses. So was Galileo thus actually the first Mr. Big Data? It wasn’t until November 2, 1992 that Galileo Galilei was formally rehabilitated by the Roman Catholic Church. If and when Edward Snowden is rehabilitated remains to be seen. And yet this comparison of the two individuals may not draw a consensus everywhere.

3 Recommended reading on this topic: Bertolt Brecht: Leben des Galilei: Schauspiel. 1998.

4  Peter Langkafel

Source: Wikipedia

Fig. 1.1: Drawing of the moon by Galileo Galilei

Even at that time a certain Venetian named Conde was supposed to have recognized the military potential of this invention and wanted to secure that knowledge exclusively for a specific purpose: He imagined that with the help of such a telescope on the look-out of his three-master he could make a better and quicker decision as to whether attacking an approaching fleet would be worthwhile and successful – or whether turning his ships around before they were discovered was the better move. Data analysis, knowledge and power with new technology – but the business acumen of other Venetians at the time was greater and the profits from the sales of these “miracle glasses” higher. The telescopes therefore quickly turned out to be a real “blockbuster,” not only for the noble houses of the world.

1.3 Mr. Big Data in Medicine? If you turn a telescope around, you get a microscope. So in the following years, researchers tirelessly examined all kinds of living and non-living things under

1 Intro Big Data for Healthcare?



5

increasingly sophisticated microscopes. Thus not only the world of suns, planets and moons was unlocked, but also that of materials, cells and microbes – and medical interest often played a dominant role. In Berlin in 1882 Robert Koch also sketched the things he saw through his microscope with increasing enthusiasm: 31.

a

b 32.

33.

c

34.

Die Aetiologie der Tuberkulose. Source: Wikipedia

Fig. 1.2: Drawing by Robert Koch

6  Peter Langkafel

In the cells he looked at, he saw small rods. Amazingly, these were no artifacts but actually even withstood treatment with acids. The discovery of this previously unknown inhabitant of the earth – namely the tuberculosis pathogen – brought Robert Koch world fame and the Nobel Prize in 1905. In order to illustrate his discovery to the notables of the Berlin medical society, he used something of an early multimedia show: He built an oval ring with rails from a toy railway set on a huge table that was especially built for that purpose in a room in the Dorotheenstrasse in Berlin (whether he borrowed the rails from his children is not known). He then set up his microscope focused on the tuberculosis specimens on a small train trailer, which he elegantly and interactively moved from one astonished observer to the next. Let’s just hope that the mighty moustaches, the “look” of the emperor, fashionable at the time, did not get caught in the little cogs of the microscopes. So how does the story go? Robert Koch discovered the tuberculosis pathogen, he invented tuberculin as an antidote and thus provided a medicine to rid humanity of the pathogen that had probably caused the death of millions of people around the world… But unfortunately that story is not true. Perhaps the most important and difficult part about Big Data, then as now, is not only collecting the data, but also interpreting it and drawing the right conclusions from it.

1.4 Mr. Interpret Big Data Thomas McKeown was a British physician and medical historian. He put forward the theory that it was not so much the medical achievements that contributed to the prolongation of life. In 2008 the highly respected journal Lancet wrote that “Tuberculosis lost about 75 % of its mortality from its known high before streptomycin was available.” Furthermore, the tuberculin with which Robert Koch wanted to become rich proved to be ineffective. McKeown asked the provocative question: “Does medicine matter?” Does medicine actually play a role? In the self-image of most physicians it certainly does. “Do you know the difference between a heart surgeon and God…?” The audience at a “Big Data conference” was recently asked that question. “Well …,” the answer goes, “God does not think he is a heart surgeon.” Nobody in the 21st century would seriously try to question the many achievements of medicine. However, it was many years before we were able to understand and prove the value of refrigerators for the health of the world’s population. Neither then nor now would anyone seriously think of elevating Alexander Twinning, the man who commercially marketed refrigerators from 1834 onwards, to the Olympus of medicine.

1 Intro Big Data for Healthcare?



7

But to fully and properly collect and interpret data is something that will become ever more important in future. Maybe someday a researcher will receive the Nobel Prize in Medicine for finding the right algorithm to correctly “calculate” or “anticipate” sepsis or some other disease? Perhaps our perspective is sometimes far too narrow – and we do not notice it? Is today’s focus on genetic data and the modifications thereof (“omics”) due to the fact that it is a new field and we can calculate great amounts of data for it – which IT companies with their great machines are of course very happy about and therefore often like to join forces with researchers in the quest for external funds? Or can Big Data in the field of medicine help us to better understand not only very specific but also more fundamental problems of health and disease, and then to draw the appropriate conclusions? For example, do you know what is probably the largest and most important public health problem? Well, it is probably a lack of exercise: According to the “Bewegungstudie” (survey on exercise) by TK from 2013, “Germany is sitting itself into sickness.” According to it, 2/3 of the population in Germany spend less than one hour per day moving or exercising – with far-reaching consequences for their weight, blood pressure and metabolism. Will Big Data show us that the best investment in health will be through cycling (individual effort) and a sound network of bicycle paths (political decision)? Or do we actually already know certain things, but find that changing them is perceived as being a difficult task, since it not only has to include technologies but also organizational and personnel changes?

1.5 The Only Statistics You Can Trust, Are … Pretty much all publications on Big Data agree: The core competence is the correct interpretation of data. Significance, correlation and causality are not the big killjoys they are sometimes made out to be, but actually the essential preconditions of Big Data. A historical example: Once upon a time in the Far East or the South, the ruler of a great empire noticed that there were more deaths in a particular region than elsewhere. He then had the phenomenon investigated. His experts came, counted and concluded that there was a strikingly large number of doctors in that particular region. So the ruler had all the doctors killed. Admittedly, this little story is a rather drastic way of showing the difficulty of Big Data – in medicine too: What is correlation, what is causality? Were the doctors really all quacks and did not know what they were doing? Or were there so many of them there because they were needed so much? Or did the deaths in the region have very little to do with the medicine or the doctors? Is it a statistical artifact? Is it often better to trust our “gut feelings” than to interpret large amounts of data incorrectly?

8  Peter Langkafel

1.6 Mr. Understand Data Gerd Gigerenzer’s book, Gut Feelings, was the science book of the year in 2008. Are complex and rational decisions not as good and practicable as we may think? Are cognitive and analytical procedures not as successful as often claimed? The director of the Max Planck Institute for Educational Research emphasizes the importance of intuition and rules of thumb – and thus of “gut feelings” – over rationality. Based on a multitude of studies he was able to prove that “half-knowledge” and “take the best” are often superior to complex analytical strategies. In a study he asked German and US-American students the question “Which city has the larger population: San Diego or San Antonio?” The surprising result: The German students stated the right answer more often (San Diego). They had often never even heard of the other city – in contrast to their American counterparts. Gigerenzer assumes that somewhat (!) uninformed decisions are more likely to lead to success. At the Harding Institute for Risk Literacy, he and his team explored this in the field of medicine in particular – and were able to show how low the know-how of medical professionals in this environment actually is. All those who have never heard of alpha and beta errors, cannot distinguish sensibility from sensitivity, and can hardly tell the difference between absolute and relative risk reductions will find a quiz and the “Unstatistik des Monats” (the statistical absurdity of the month) on the Institute’s website as a further introduction to the topic.

1.7 Datability The main theme of the Cebit 2014 was Big Data. At the same time a further buzzword was created: “datability.” This is to be understood as “using large amounts of data at high speed in a responsible and sustainable way.” In the future, too, there will be new, previously unknown neologisms which should be able to address the use, uselessness, value and risk of this technology. Only the future will decide whether a comparison with radiation is appropriate. At the beginning of the 20th century people like Henri Becquerel and Marie Curie discovered “radioactive radiation.” Since then, a broad discussion has taken place as to how the related technology can and should be used: The use in the field of nuclear medicine as well as radiology (CTs, MRTs) is certainly widely accepted, and the benefits are pretty much undisputed. A somewhat more disputed issue is nuclear energy, where you will find equally vehement supporters and opponents. On this continuum of acceptance, the construction or use of nuclear weapons can be found on the other side – this is where a critical distance, if not an absolute rejection, is predominant.

1 Intro Big Data for Healthcare?



9

So with regard to today’s Big Data, are we standing at the brink of a key or transformation technology? What is required is a more intense general and political discussion about the assessment and the use or prohibition of these technologies. Let us refer to the comparison of nuclear technology with medicine: On the one hand, particularly in medical facilities, there are databases that we have not yet analyzed and which are left to lie fallow, as it were. Where and how data in medicine could and should be brought together – by politics, by companies, and even by individuals – is what this book is all about. There will be new applications, which will surely be seen to be universally accepted. Add to that the medical Big Data applications that are already happily used by some people today (for example applications relating to the “quantified self”), but which are rejected by other members of the human race with the shake of the head or a fit of rage. For this evaluation we also have to include the extreme: Today, certain software is already subject to the same regulations as on arms exports. Which basically means: Applications can be used as weapons. The potential for “dual use” (i. e., the use for peaceful as well as for military purposes) in medicine too, is at the core of Big Data.

1.8 Big Data in the Bundestag The Scientific Service of the German Bundestag⁴ dealt with the topic of Big Data in November 2013. Several members of parliament and the public wanted to know the definition of the term which is currently being used so often in political debates: Big Data referred not to an individual new technology. In fact, Big Data describes a bunch of newly developed methods and technologies that enable the capture, storage and analysis of a large and arbitrarily expandable volume of differently structured data.

For the IT industry as well as for users in business, science or public administration, Big Data has therefore become the big topic of innovation in information technology.

1.9 The Three Vs of Big Data Data is today essentially defined by three characteristics, called the “three Vs.” 1. For one, it is the volume of data that is produced by the progressive digitalization of virtually all aspects of modern life in unimaginably large quantities,

4 Deutscher Bundestag: Nr. 37/13 (November 6, 2013), 2013.

10  Peter Langkafel

2.

3.

which doubles roughly every two years. Thus, estimates for the year 2013 run to over 2 sextillion bytes of data being stored around the world – which, if stored and stacked on iPads would result in a 21,000 Km long wall. Humanity sends approximately 220,000,000 e-mails around the world per minute. Velocity: While in the past data was accumulated in specific intervals that enabled you to process it little by little, today one is continuously exposed to the flow of data due to networking and electronic communication. In order to be able to use it, the incoming information has to be taken up and analyzed at ever faster rates or even in “real time.” A variety of data occurs today in particularly diverse and complexly structured sources such as social networks, photos, videos, MP3 files, blogs, search engines, tweets, e-mails, Internet telephony, music streaming or the sensors of “intelligent devices.” All kinds of subjective statements in written or spoken contributions that express moods or opinions are particularly interesting for areas such as advertising, marketing or even election campaigns. In order to make the latter machine readable requires programs that can recognize judgmental statements or even emotions about products, brands, etc., which is technically very challenging.

According to estimations by McKinsey Global Institute, the use of Big Data could create annual efficiency and quality increases to the value of around EUR 222 billion for the US-American health service alone, and approx EUR 250 billion for the entire public sector in Europe. The exceptional thing about Big Data analyses is particularly apparent in the new quality of results from the combination of data that has so far not been related to any other data.

1.10 Can Medicine Learn from Other Industries? The question is whether anything can be learned from other industries, and if so, what: According to Mayer-Schönberger, as many as two-thirds of the shares on US markets are today traded through computers and their algorithms – in milliseconds! So can we expect future scenarios in which we would do better to ask Dr. Computer than our general practitioner? To a certain extent, global digitization is far ahead of medicine: In the year 2000, only 25 % of the globally available information was said to be digital. In the year 2013,⁵ however, only 2 % was said to be not digital. “If this information were stored in books, it would cover the entire area of the United States with a stack that

5 Mayer-Schönberger/Cukier: Big Data: Die Revolution, die unser Leben verändern wird. 2013.

1 Intro Big Data for Healthcare?



11

was 52 books high”⁶ – striking, often unquestioned or unquestionable comparisons are also a part of Big Data. More of it? The great library of Alexandria was built in Egypt during the reign of Ptolemy II, and was supposed to represent the entire world’s knowledge. Today we live in a world in which 320 times the amount of data contained in the ancient library is allocated to each and every human being… Eight million books are said to have been printed between the years 1453 and 1503. Therefore more data was produced in those 50 years than in the previous 1200 years, from the founding of Constantinople. Today we produce that amount of data in three days alone… The fact that quantity can also have a new form of quality is described by MayerSchönberger using the example of a horse: An oil painting, a photo of a horse and the rapid scrolling of photos with more than 24 frames per second – this changes not only the mere number of images, but the essence itself. Today this example can be easily transferred to medicine: From anatomical drawings, to ultrasound images, to ultrasound films – and to 3D ultrasound animation… Amazon’s worldwide success is also due to the way that people who have purchased a certain product will automatically be shown other products that were bought by people who purchased the same product. In the early years of the corporation, an internal study found that this automated assignment of recommendations is superior even to an editorial team of booksellers who make their own recommendations based on buyer profiles – and, of course, it is also faster, cheaper and more scalable. Amazon may not actually know why someone who reads Goethe will also listen to Beethoven, but more than a third of Amazon’s turnover is today said to be generated through recommendations based on algorithms. So do we have to analyze and interpret data differently in medicine? Is it conceivable that during the course of a disease we do not actually understand why something happens, but that it is likely to happen and that we can therefore respond to it better? Should we, as proposed by the author of Big Data, “possibly give up some accuracy in order to detect a general trend”?

2 Big Data Framework in Medicine The Big Data Framework summarizes the key elements for applications in medicine. The framework consists of technical and non-industry-related elements, a medical, a legal and an organizational dimension. 6 Mayer-Schönberger/Cukier: Big Data: Die Revolution, die unser Leben verändern wird. 2013.

12  Peter Langkafel

Analysis, Visualization, Decision Support Data Management (Integration and Sharing) Data Collection Infrastructure Medical Data (laboratory results, image data, genomics…)

Public Health Data (health authorities, municipalities, WHO…)

Insurance Data

Research Data

(health insurance companies, insurances, public institutions...)

(bio banks, clinical trials, PubMed…)

Individual Data Generated by Patients

Non Classical Healthcare Players

(wellness, nutrition...)

(social networks, telco, retail…)

Governance (data protection, IT security, informational self-determination) Source: Author's illustration.

Fig. 1.3: Big Data Framework in Medicine

The following provides a brief bottom-to-top explanation of the areas and illustrates them by examples: – The area of governance in particular refers to basic organizational and legal conditions – these may relate to the individual company or institution, but also to basic national or international frameworks. – Data sources in the medical context are manifold: For one thing, they constitute the medical data that is primarily found in hospitals. In addition, these can be complemented by or correlated with data from public institutions – including, for example, municipalities, public health departments, ministries or international organizations such as the WHO (World Health Organization). Depending on the health system, data about the insured persons (for example when claiming benefits) can be integrated with the health insurance companies, other insurances or even private entities – such as HMOs (Health Maintenance Organizations). Research data – with its often very specific characteristics and collected under special ethical and data protection requirements – is another category. Individual data generated through patients (pain journal, data entered through sensors…) is generally still not a part of the health system today, but will significantly increase in importance in the near future. Still practically not involved today are the socalled “non-traditional players.” These include data from social networks (e.g., Facebook), data from mobile operators or even retailers (“consultation will take place in the supermarket”)⁷ – and even the German postal service is said to be thinking about using mail carriers for the home delivery of medication… 7 Online: https://www.trendmonitor.biz, [accessed on: July 30, 2014].

1 Intro Big Data for Healthcare?

– –

–

–



13

Infrastructure, which primarily describes the physical conditions for the transport, connection and storage of data. Data collection and archiving is particularly significant in medicine – first in terms of data security (as recommended by the Federal Office for Information Security – BSI) but also in terms of the legal requirements for long-term archiving.⁸ The term “data management” includes not only the integration of data (in accordance with existing standards such as DICOM IHE or HL7), but also the understanding of role assignment and the control of data access. The actual merging of data in order to present it to various user groups and to analyze it is therefore only the “surface.” Data can “merely” be shown or if not, then specific constellations of data can cause direct consequences.

This framework is not intended to replace existing models such as the OSI model. OSI (Open Systems Interconnection Model) is a reference model for network protocols as a layered architecture with seven so-called layers. The “Big Data Framework in Medicine” is a framework that identifies the key dimensions of Big Data in medicine that go far beyond a purely technical point of view.

2.1 Fields of Action in Medicine The fields of action for Big Data applications are already multifaceted today – and future scenarios will also be described in this book that were almost unimaginable not long ago. Government/ Public Health Health Insurance

Health Education and Information

Financial Ressource Optimization

Public Health Monitoring

Prevention

Communication

Disease Management

Quicker Implementation of Evidence

Care Provider (hospitals, doctors) Pharma

Optimized Clinical Studies

Adherence

Consumer Behavior

New Business Areas

Care (individual doctors…)

Medical Performance Optimization

Decision Support (Clinical Decision Support)

Research

Information Exchange (Patient to Patient p2p)

Prevention of Misuse

Non-Traditional Players (retail, telco…)

Product Development

Prognosis of Dieseases

Patients

Control of Regulations

Information Exchange (Patient to Patient p2p)

Governance (data protection, IT security, informational self-determination)

Source: Author's illustration.

Fig. 1.4: Fields of Action of Big Data in Medicine 8 GMDS: Dokumentenmanagement, digitale Archivierung und elektronische Signaturen im Gesundheitswesen. 2012.

14  Peter Langkafel

The main scenarios are described here and outlined using examples – of course other overlaps of dimensions are also possible. The different participants are shown on the left side of the diagram. The boxes and scenarios indicate the focus of areas – without excluding other cross-references. – Health Education and Information refers to all areas that compile or upgrade data for different participants, sometimes for the first time (for example the redesign of disease management programs as well as quantified self approaches). – The analysis of the inputs (resources) enables new forms of resource control and thus output (medical quality, effectiveness). This enables organizational approaches, for example (pay per performance) or the design of new forms of care. – An example of Public Health Monitoring are the faster and more targeted health measures based on health trends as well as individual health-promoting activities. – The rapid feedback of subjective and objective data can also contribute toward product development – such as new services in the field of disease care as well as medical devices in the AAL (ambient assisted health) environment. – The prognosis of diseases with an “individual health analytics” approach is an example of how a better understanding of medical conditions can be achieved by the integration of data from the living environment (empowerment). – Data integration and analysis enable faster and more targeted forms of prevention as well as an adaptation of campaigns to current challenges or a response to acute threats, such as epidemics. – For example, disease management programs can be offered in a more individualized way in future. – Recruitment for clinical trials as well as data simulation can enable new forms of study faster, and connect bed and bench (clinic and research laboratory) better. – New services and products can be designed for the extended healthcare market (nutrition, fitness, wellness…). – Medical performance optimization can enable better implementation of current guidelines (“coach the doctor”) or transparent outcome measurement. – Exchange of information from patient to patient: The first and most famous example of this is http://www.patientslikeme.com/– an internet platform on which patients can find other patients with similar symptoms and experiences.⁹ – Communication describes all the processes and scenarios that help to overcome the silos of the institutions, and enable new forms of planning and implementation of health services. – Adherence means scenarios in which a person’s behavior, i. e., the intake of medication, a diet regime and/or a change in lifestyle, corresponds better to the recommendations agreed to with the therapist. 9 The founder of Patientslikeme recently made a plea to all individuals to donate their personal health data. The partners of the US company include academic institutions as well as pharmaceutical companies – that would certainly also be interested in the patient data.

1 Intro Big Data for Healthcare?

–

–

– – – –



15

New fields of business will arise for providers who so far do not belong to the group of traditional players – e.g., a postal service that takes on care services, supermarkets that are part of a diet program, or telecommunication companies that provide telemedicine services. New forms of digital decision support include, for example, not only new kinds of visualizations, but also the improved integration of genetic, clinical and research data and the provision of activity algorithms. Big Data scenarios also enable better protection against medical malpractice. Control scenarios will be able to incorporate new indicators that address individual or geographical components more effectively. Faster implementation of evidence can be an important focus of Big Data applications. Exchange of information between doctor and patient can include telemedical services, or real-time data analysis, even via sensors. Here there is more than one possibility: scenarios that have a stronger procedural focus (e.g., diabetes monitoring) or applications that accumulate and analyze existing data.

2.2 It’s Time for Big Data The following diagram shows the potential and the scenarios of Big Data arising from a time perspective: New models can be developed on three coordinates:

Temporal: yesterday, today, tomorrow…

Horizontal: Outpatient – Inpatient

Vertical: Clinic – Research

Source: Author's illustration.

Fig. 1.5: Dimensions of Big Data

1.

Horizontal: This could include the better integration of data along the treatment chain. This means not only the combination of outpatient, inpatient and

16  Peter Langkafel

2.

3.

rehabilitative treatment, but perhaps also information from everyday life that could be included in a sensible way. Vertical: This entails the deeper integration of databases, such as the better connection of administrative, clinical and research-related data. Another example is the “open data initiative” of the British Medical Journal. The BMJ will in future only publish articles that have been registered in advance, handle the results/data in a transparent way and disclose them accordingly. And we are not talking about “peanuts”: A large part of clinical studies have never been registered or published – according to the BMJ approximately 50 % of all studies, with a significant number of unknown cases. More than 99 % of the data collected for research purposes (with considerable effort and commitment and possibly risks to patients) is no longer available after its publication. If we had access to all that data, how much more would we know? Temporary: Big Data analyses not only encompass a backward glance, but also enable the real-time monitoring of business processes. The next step is from “what happens” to “why something happens.” This is where further analyses may provide new insights. A particularly exciting area is where future scenarios can be simulated or maybe even “predicted”: To monitor a (potentially) complicated course of therapy earlier on – or to even take appropriate measures – might become a high point of Big Data. A look into the future is particularly fascinating, and it is where longings, visions and desires come together: What will happen, what risk profiles are there? And how does this permit earlier preventive intervention, a change/expansion in clinical diagnostics and the adaptation of therapies? After all, health does not take place in hospitals, but in real life. So how can we integrate that area better? Examples High 1 Risk stratification/patient identification for integrated care programs

1

Prediction/ Simulation What will happen? 7 Evaluation Why2 did it happen? 6 Data mining Why1 did it happen? Monitoring What is happening now?

2 Risk adjusted benchmark/simulation of hospital productivity

Technological complexity

5

Low

3 Identification of patients with negative drug-drug interactions

2 4 Identification of patients with potential diseases ("Patient finder") 5 Evaluation of clinical pathways 6 Evaluation of drug efficacy based on real world data

4 3

7 Performance evaluation of integrated care programs and contracts

8 Reporting What happened? 9 10 11

8 Identification of inappropriate medication 9 Systematic reporting of misuse of drugs 10 Systematic identification of obsolete drugs usage

Low

High

11 Personal health records

Business value/impact 1 Machine based 2 Hypothesis based

Source: McKinsey & Company: McKinsey Big Data Value Demonstration Team.

Fig. 1.6: The aspiration: Better understanding of what happened, what will happen

1 Intro Big Data for Healthcare?



17

In a whitepaper called “The Big Data Revolution in Healthcare” (2013), McKinsey and Company summarized the scenarios that will be applicable in medicine. So where are we today? The reporting in hospitals today is mainly backwardlooking: What happened in the past? In the best case it might be: What is happening right now? In the area of data mining and evaluation in terms of a real-time performance evaluation, initial projects already exist, and HMOs are working on detailed risk stratifications of patient populations.

2.3 The Example of Google Flu Trends A particularly prominent example is the discussion about “Google Flu Trends” – a web service operated by Google, which aims to correlate the entry of certain search terms with the regional occurrence of diseases – for example the flu. Without additional instruments, the use of the search engine is meant to recognize disease trends. Google employees themselves published an article titled “Detecting influenza epidemics using search engine query data” in the scientific journal Nature: 50 million search terms were analyzed per week between 2003 and 2008.¹⁰ This was based on 42 prediction parameters that were identified from the data. The authors were pretty euphoric and stated that they can predict influenza-like symptoms accurately within a day. Because the relative frequency of certain queries is highly correlated with the percentage of physician visits in which a patient presents with influenza-like symptoms, we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.

Google was celebrated: At last important evidence had been found that there was a benefit to be had from Big Data in medicine. But little by little doubt and skepticism crept in, which resulted in a detailed (“final final”) summary of the discussion in the academic journal Science.¹¹ Under the title “Big Data and The Parable of Google Flu: Traps in Big Data Analysis,” the case was re-analyzed and summarized: Google had predicted more than twice as many cases of infection than were actually later confirmed by the CDC (Centers for Disease Control and Prevention), the national health authority of the United States! The authors in Science demonstrated that there may be many examples that show how the analysis

10 Brammer/Brilliant/Ginsberg et al.: Detecting influenza epidemics using search engine query data. In: Nature 457, February 19, 2009. Online: http://www.nature.com/nature/journal/v457/n7232/full/ nature07634.html, [accessed on: August 6, 2014]. 11 Lazer/Kennedy/King/Vespignani: The Parable of Google Flu: Traps in Big Data Analysis. In: Science 343/2014, pp. 1203–1205.

18  Peter Langkafel

of search engines and social networks could predict certain things.¹² However, in contrast to such propositions, we are in fact far from being able to replace established methods. The main points of criticism are: – The quantity of data cannot replace the quality of data. – Search engines or social networks are not designed for medical data collection. People “googling” a particular disease, will not necessarily have that disease themselves (but may be looking out of pure interest or for someone else). – The construction mechanisms of studies, whether epidemiological or clinical, such as for measuring validity or reliability, cannot be replaced by pure volume. – The search algorithms were not constant and were adapted by Google itself (“blue team issues”). – So-called “red team attacks” may be unlikely, but cannot be ruled out in such a construction: This refers to “data attacks” attempting to manipulate public opinion by entering certain search terms artificially and with manipulatory intent (whether by human hand or computer). – Lack of transparency and the inability of independent repetition (replication). – The combination of large amounts of data with small search terms inevitably leads to a problematic marriage of the two, with statistical consequences. Nevertheless, the authors claimed that the data collected by Google was definitely of value, though they admitted that it must be combined with other data sources, such as those made available by health authorities.

2.4 Big Data Adoption Model in Medicine Different approaches exist on how to measure the maturity of an organization when it comes to Big Data. From these one can then develop something of a “roadmap” for future planning. The key parameters are: – Data collection: How is data currently collected, and to what extent has the level of digitization progressed? – Data sharing: Is there a possibility of data exchange within or outside of an organization? Are standards available and maintained for that purpose? – Data analytics: At what stage of development is the current data analysis (“data warehouse”) and what percentage of employees can use specific data

12 “The problems we identify are not limited to GFT. Research on whether search or social media can predict x has become commonplace and is often put in sharp contrast with traditional methods and hypotheses. Although these studies have shown the value of these data, we are far from a place where they can supplant more traditional methods or theories.”

1 Intro Big Data for Healthcare?



19

visualizations? What specific algorithms and modeling of data are there that go beyond traditional “reporting”? Are there any IT-based tools to develop forecasts and models? A concrete example is the Big Data maturity level guide,¹³ which includes a multifaceted plurality of dimensions, and analyzes the maturity of the organization according to the stage of development. In 2013 Verhej¹⁴ examined the process of Big Data adoption on the basis of specific case examples. He differentiates five core areas: strategy, knowledge development, piloting, implementation and fine tuning. In particular, he emphasizes the “business case” – because not everything that is possible is sensible or provides added value. In the health care industry the “roadmap” issue is currently becoming particularly important, since only with an overview can the incredible heterogeneity of systems be overcome and at the same time the benefits be made transparent.

3 Big Data in the Context of Health – More Than Old Wine in New Bottles? In a hospital context – this area exemplifies the entire health care industry – data is today already generated, collected and evaluated. So what is new and explains the success of Big Data? This is where three trends come together and converge in Big Data: 1. Technologically there are new ways of processing huge amounts of data in real time, for example through in-memory database technology (processing large amounts of data in the main memory, SAP HANA) and NoSQL databases (“Not only SQL” databases), which are optimized for dealing with unstructured data and provide query interfaces via SQL – for example the prominent open source solution Hadoop, a highly scalable, highly parallel data storage and data processing technology. That was simply unimaginable a few years ago. 2. In addition, every hospital has something you could call “silo experience.” Data is distributed, buried and even hidden in so many different, separate systems, meaning that there is a need and even a deep desire to merge everything. One could also call it “a treasure that lies hidden in fragments.” So it is also about bringing together existing data in “real time” and possibly developing new business models from that. What would Amazon or Facebook do with the data generated in a university hospital? This idea may be fascinating for some – but for others it’s a nightmare… 13 Halper/Krishnan: TDWI Big Data Maturity Model Guide. 2014. 14 Verhej: The Process of Big Data Solution Adoption. 2013.

20  Peter Langkafel

3.

These new technologies lead to new applications. An example of this might be knowledge-based systems, which can really “coach” a treatment process and merge large amounts of information for that purpose. Until now, IT in hospitals, particularly in the clinical field, has been primarily perceived as a data servant and not as a coach, who can provide support whenever necessary. The word Big Data brings these streams together “crisply.”

3.1 The Difference to Classic Clinical Analysis (Old Wine in New Bottles?) Big Data is centrally connected with the integration of different views: Medical controlling also has to be seen from a quality perspective – not only a monetary one. On the other hand, quality assurance and evidence-based medicine also have to be considered in economic terms. The traditional separation between management (finance), medical (hospital) and potentially also research has to be abolished. The call to do so may be “old wine.” Today we have no more new bottles, but we do have a complete, ultra-modern filling station. And what’s more, it also enables us to provide the correct brew for a variety of different tastes. So – not just old wine for everyone, but even cocktails for those who want them…

3.2 Is Big Data in the Hands of Medical Controlling Degenerating to a DRG Optimization Technology? If software is used in a first phase to optimize revenue and improve the management of a hospital – that is certainly a good start. But it is only a first step. Modern hospitals, such as the Charité, understand that they need a strategic platform for integrated data analysis. In this case medical controlling is surely “low hanging fruit” with which to devise a process-cost-based hospital in the future. If we think of the vision of “outcome based payment,” it will only be possible if data at the micro level (the patient), meso level (the department) and macro level (the hospital) are merged in order to provide any kind of control at all… But the real problem in today’s hospitals is much more banal: Even in university hospitals the majority of documentation still takes place in part on paper – and without digitization that cannot be computed, even with Big Data. DRG optimization has not only an inward but also an outward dimension: Being able to simulate price models based on new assumptions but with “old” cases when negotiating budgets with health insurance companies can quickly yield millions of Euros for the hospitals…

1 Intro Big Data for Healthcare?



21

3.3 Are Hospitals Prepared for Big Data? Generally: Yes and no! IT is often not given the central strategic importance that it should have. This is something medical information scientists have been lamenting for years, but have not managed to do much about, as they often have difficulty explaining the strategic added value in the words of the executive management. How many hospitals have a Chief Information Officer (CIO) who is a member of the executive board? Very few. How many Chief Medical Information Officers (CMIOs ) – a person who strategically and explicitly “takes care” of medical IT needs – are there in German-speaking hospitals? Very few… Big Data is not about which server to buy – but about how I can handle my central nervous system in the medical “knowledge industry,” i. e., how I can measure and act. This message and this understanding have not yet arrived in most hospitals. CXXOs – this is not a title on a business card, but a top management understanding that I should not leave this decision to my IT department alone. Nevertheless, many think Big Data is a central issue for the future. Some hospitals are already beginning to develop strategies in this area, but we are just getting started. Furthermore, experts around the world agree: The problem is not the technology, but that the people/staff possess enough statistical know-how to deal with it. Whoever cannot distinguish absolute risk from relative risk and thinks that correlation equates to causation should stay away from Big Data.

3.4 Are Internal Hospital Big Data Analyses Data Protection Cliffs, and if so, Where Exactly? As with all processes – whether with or without IT – privacy and personal rights must be protected. However, if privacy (or sometimes the pretext of privacy) prevents important insights with which to improve patient treatment, then it has to be scrutinized – because otherwise privacy protection will mutate to perpetrator protection and become a common excuse to prevent much needed transparency. On the other hand, Big Data sounds similar to “big brother” – which is a particularly sensitive issue, given the discussion about the NSA and misuse of data by intelligence services. We are only at the beginning of a political (note well!), not of a technical discussion about Big Data – and thus one of the most important raw materials in the globalized economy. Again, this invokes comparison with the discussion about the benefits of nuclear energy in the 1940 s and 50s: Today nuclear weapons are scorned, nuclear power plants are a controversial issue, and nuclear medicine departments are welcome – but we are generally talking about the same technology. Where, when and to whom can I give my data and thus “my innermost self,” and which organizations and companies may use what? This is where we need clear rules and laws and no unresolved big gray areas. Today in Germany, we have not only a Federal Data Protection Act, but also local forms of federal data protection.

22  Peter Langkafel

Regulation of data protection on a European level, such as the European Parliament’s draft for data protection regulation, EU-DSGVO, will be necessary. This should make a contribution toward unifying data protection legislation in Europe.¹⁵ The protection of privacy and the personal data of citizens is a general social concern in Europe. This applies to all areas of life – whereby, in this context, the subject of Big Data is particularly sensitive. The fundamental right to the guarantee of the confidentiality and integrity of information technology systems (colloquially referred to as basic IT law, fundamental computer rights or fundamental rights to digital privacy) is a fundamental right applicable in the Federal Republic of Germany, which primarily serves the protection of personal data stored and processed in information systems. This right is not specifically mentioned in the basic law, but was thus specified as a special form of the general right of personality in 2008 by the Federal Constitutional Court or derived from existing fundamental legal provisions respectively.¹⁶ Similarly, a discussion regarding informational self-determination and the “right not to know.” This is the point at which we reach a complex ethical discussion, which we have to face.

4 Personalized Medicine and Big Data To provide each patient with an individual therapy that acts painlessly, quickly and easily – for some that is a vision, for many others it might sound delusional. Often personalized medicine, particularly in the context of Big Data, is understood to mean the connection of genetic code and possible healing options. Hardly any other area of medicine has seen such an increase (“explosion”) of data in the last five years, with the corresponding faster and cheaper analysis of the data. Therefore, the hype about Big Data and genetics can also be explained through these new developments. The German Cancer Research Center announced in May 2014: Extensive genetic analyses of cancer cells have shown: Every tumor is different and every cancer patient must be treated individually. The National Center for Tumor Diseases (NCT) Heidelberg wishes to meet these demands: From 2015 onwards, patients at the NCT will be offered a genetic analysis of their cancer cells according to which an individual therapy can be recommended.¹⁷

15 Press release and statement by TMF, July 2014: “Die Daten der Bürger schützen, biomedizinische Forschung ermöglichen: Medizinische Forscher kommentieren den Entwurf einer europäischen Datenschutz-Grundverordnung” 16 Wikipedia: Grundrecht auf Gewährleistung der Vertraulichkeit und Integrität informationstechnischer Systeme. Online: http://de.wikipedia.org/wiki/IT-Grundrecht, [accessed on: August 1, 2014]. 17 Press release by DKFZ: http://www.dkfz.de/de/presse/pressemitteilungen/2014/dkfz-pm-14-24Individualisierte-Krebsmedizin-fuer-jeden-Patienten.php

1 Intro Big Data for Healthcare?



23

200

160

120

80

40

0 2004 2006 Source: Author's illustration.

2008

2010

2012

Fig. 1.7: The “Data explosion”

So are we at the brink of a break-through of “personalized medicine” in all areas?

4.1 The (Genetic) Human in Numbers What does the human (or preferably the parameterization in biology) mean in numbers? The human genome (genomic DNA) has a length of 2 x 3.2 Mb with approximately 19,000 to 26,000 protein-coding genes, 1500 genes for micro-RNAs, some 8900 pseudogenes and approximately 290 disease genes for monogenic (Mendelian) genetic diseases. The numbers for a transcriptome of a human (messenger RNA): – Number of translated proteins in a cell: approx. 5000 to 10,000 – Number of mRNA molecules per cell: approx. 15,000 – Average length of an mRNA: 1000‒1500 nucleotides – Average half-life of an mRNA: 10 hours – Length of a processed microRNA: 18‒24 nucleotides

– – – – –

The human proteome (proteins and peptides): Number of protein copies of an mRNA molecule: approx. 300‒600 Average length of a protein: 375 amino acids Longest protein (titin): 33,423 amino acids Average half-life of protein: 6.9 hours Number of copies of a protein per cell: 18,000 (median)

24  Peter Langkafel

– – –

– –

Number of human tissues (cell types): Different types of tissues in the body: approx. 230 Organ systems in humans: 22 (categories of Human Phenotype Ontology HPO) Number of cells in a human body 3.7 ± 0.8 x 1013 Human diseases: Monogenic (Mendelian) genetic diseases: approx. 9000 Monogenic (Mendelian) genetic diseases with known disease genes: approx. 3800 caused by mutations in one of the 2850 disease genes

Bear in mind, this is merely the primary genetic view of a human. Furthermore, the International Classification of Diseases (ICD) as well as the Diagnostic and Statistical Manual of Mental Disorders (DSM) also include tens of thousands of groups and sub-groups of diseases. In order to understand the complexity of genetics, a visit to the Encode Project site¹⁸ is strongly recommended to get an impression of the diversity in this area. ENCODE (which stands for ENCyclopedia Of DNA Elements) is a research project that was initiated in September 2003 by the US National Human Genome Research Institute (NHGRI). The project aims to identify and characterize all functional elements of the human genome as well as the transcriptome. They recommend that visitors play with the Human Genome Browser, which enables you to zoom into different gene sequences online.¹⁹ Some people may quickly shy away from this topic, since the complexity is confusing and unnerving.

At this point I would like to include a personal note: When I look at these pages, I detect a sense of humility in the face of creation. I get the impression that we can probably see only the tip of the iceberg in 2014… Although there are scientists who claim that a genetic terra incognita no longer exists. But is that so? If I shred the score of a symphony by Beethoven and analyze the number of notes, clefs and trebles, can I then understand the essence of the music? The causality, from genome to peptide, to active ingredient, to action and activity of a human being is not linear and not unidirectional – ”Hope is the dawn on a stormy night.”²⁰

In terms of “Big Data and personalized medicine,” there are many experts today – and probably even more, who think they are experts.

18 http://genome.ucsc.edu/ENCODE/ 19 The following link provides a depiction of a gene sequence: http://genome.ucsc.edu/cgi-bin/ hgTracks?db=hg19&position=chr21 %3A33031589-33041578&hgsid=383855821_U1XsDAYLsaHrj1AOzE66eZH16J4V, [accessed on: August 5, 2014]. 20 Goethe: Triumph der Empfindsamkeit, IV.

1 Intro Big Data for Healthcare?



25

Therefore, I would like to let a “real” expert have his say. An expert who has worked in the Encode Project for years. Ewan Birney is a computational biologist and Deputy Director of the European Bioinformatics Institute in Hinxton, UK, which is part of the European Molecular Biology Laboratory (EMBL). Birney coordinated the data analyses of the Encode project. In a ZEIT interview by the title of “The Genome is a Jungle Full of Strange Creatures”²¹ in September 2012, he provided some deep insights into his work. DIE ZEIT: The Human Genome Project deciphered our genome and explored the human genes ten years ago. Now you and 440 colleagues have worked on it for another five years in the Encode Project. Why, exactly? Ewan Birney: It’s actually a part of our job to bring the search for the genes to an end, even if many people think that that already happened long ago. But the genes account for only a tiny part of the genetic information. Encode’s major goal was to find out what all the rest of a genome is actually good for – all the non-coding DNA, which derogatorily was called junk. ZEIT: You seem to have discovered quite a bit … Birney: We will post at least 40 publications in three professional journals online in one fell swoop. Thirty of these are linked by a matrix so that readers can track and cross-reference every aspect. Nothing like that has ever been done before. ZEIT: And what is it you found in the genome? Birney: It is full of surprises. There is far more going on there than we ever expected. The genetic material is full of activity. ZEIT: So we have to bury the idea that our genome consists mostly of garbage? Birney: That’s correct. Junk DNA was never a particularly appropriate metaphor, if you ask me. I find “the dark matter of the genome” much better. ZEIT: And how much dark matter will still remain in the genetic makeup after Encode? Birney: It is difficult to say: We understand some parts, but don’t understand others. There are too many functional levels in a genome. But let me put it this way: We have now assigned 80 per cent of all hereditary factors to a certain biological activity. Of these 1.2 per cent code for all the proteins in a body, but another 20 per cent are for controlling these genes. ZEIT: Alright, so do we now understand how the genetic makeup functions? Birney: Unfortunately not. I wish we did. ZEIT: But we’re looking at a kind of circuit diagram for our genetic makeup? Birney: That is a very good analogy. Only one cannot really say that our genome looks clean and tidy. The unexplored wilderness we encountered – that was a real surprise for me. The genetic material is a jungle full of strange creatures. It’s hard to

21 Online: http://www.zeit.de/2012/37/Encode-Projekt-Birney, [accessed on: 5.8.2014].

26  Peter Langkafel

believe how closely packed it is with information! We are now in the situation of an electrician who has to check the electrical system in an old house and notices: all the walls, ceilings and floors are covered with light switches. We have to find out how all of these switches are connected with the light, heating and equipment in the rooms. ZEIT: And what do these genetic switches do in our body? Birney: Take the cells of hair roots, for example. They activate genes that are responsible for the color pigments in the hair. The liver cells, on the other hand, form alcohol dehydrogenase, the enzyme that breaks down alcohol… ZEIT:… We hope! Birney: Yes. We always knew that the differences between the cells of different organs and tissues are determined by the position of the genetic switches. But what we did not know: The genome is full of them. We have discovered four million genetic switches with which genes are controlled. They are called transcription factor binding sites – contact points between the DNA and control proteins. ZEIT: Conclusion: Up to now, we were aware of more than 20,000 genes for proteins. Now you have added millions of switches that control a cell’s biology in a variety of combinations. And all of that can also be different from person to person. Is there any hope left that we will actually understand it all one day? Birney: We will probably need the entire 21st century for it. But the good news is: You do not have to understand everything in order to make progress in medicine. For example, many of the genetic switches are located in areas of the genome that are associated with the most widespread diseases. Many of these diseases – diabetes, bowel inflammation and the like – will be caused by errors in the switches. We can presume such switch effects for 400 diseases.

4.2 Personalized Medicine? Personalized medicine, also called individualized medicine, aims to treat each and every patient while taking into account individual conditions that go beyond functional disease diagnosis. The term “personalized medicine” as outlined is often used particularly in connection with pharmacotherapy/biomarkers and/or “genetic therapy.” However, this use of the term “personalized/individualized medicine” in a reduced, biological interpretation is somewhat controversial. The Federal Center for Health Education (BZgA) emphasizes in its main definitions that the term “personalized medicine” is misleading in its contextual meaning in so far as “the personal side of a person, i. e., his capability of reflection and self-determination is primarily not meant at all, but that fundamental biological structures and processes are emphasized.” The chairman of the central ethics commission of the German Medical Association, Urban Wiesing, was critical: “Personal characteristics manifest themselves not on a molecular,

1 Intro Big Data for Healthcare?



27

but on a personal level.” Heiner Raspe from the Center for Population Medicine and Health Care Research (ZBV) at the University of Luebeck argues that the term “personalized medicine” is too one-sided in the sense of pharmacogenomically based therapy; he specifies that in addition to biomarkers there are also “psycho markers” and “socio markers,” which also deserve to be considered in medical therapy, as selected examples have shown. The Bundestag committee on education, research and technology impact assessment proposes the term “stratified medicine,” which is increasingly used in international literature.²² However, the terms “personalized” and “individualized” suggest a therapy that is precisely tailored to the individual being treated. But this is not normally the case in modern day medicine and also not meant. Basically, today all diagnostic and therapeutic measures should orient themselves not only on the best available evidence, but also on the individual characteristics of a patient, and, as long as appropriate, the patient’s wishes, in the spirit of patient autonomy. Techniker Krankenkasse²³ (technicians’ health insurance company) suggests: In general, stratification is carried out with the help of biomarkers. That is why the expression “biomarker-based medicine” would be possible as a synonym for stratified medicine. This is what is usually meant in today’s medicine in the context of “personalized medicine.” If it is to be clarified that the stratifying medicine is oriented on inherited genetic characteristics, the term “genome-based medicine” can also be used.

Conventional medicine according to the one drug fits all principle Stratifying medicine • Gender medicine • Biomarker-based/genome-based medicine (true) Individual medicine • • • •

Therapeutic one-of-a-kinds Autologous cell therapies Active personalized immune therapy (tumor vaccines) Individual medical products (rapid prototyping)

Source: TK: Innovationsreport 2014.

Fig. 1.8: The path to “personalized medicine”

Unique therapies constitute true individualization, i. e., custom-made therapeutic interventions for individuals. This includes, for example, autologous cell therapies, which are carried out with the help of the body’s own cells. In the field of medical devices this is also understood to mean individual prostheses or implants made 22 Quoted according to: Wikipedia: Personalisierte Medizin. Online: http://de.wikipedia.org/wiki/ Personalisierte_Medizin, [accessed on: August 6, 2014]. 23 TK: Innovationsreport 2014. Online: http://www.tk.de/centaurus/servlet/contentblob/641170/ Datei/121104/Innovationsreport_2014_Langfassung.pdf, [accessed on: August 6, 2014].

28  Peter Langkafel

according to the rapid prototyping principle. In addition, APVACs (Active Personalized Vaccines) must also be mentioned in the context of individualized medicine, even if they have not yet arrived in everyday clinical practice. This refers to a vaccination against cancer. With APVACs the composition of a vaccine (especially tumor vaccines) is determined by previous biomarker tests. The peptides necessary for the formation of the vaccine are then derived from a peptide warehouse, for example. However, such active personalized medicine is still in the development stage and is not currently used in general clinical practice. Moreover, one can argue about whether the terms “personalized” and “individualized” go too far, because it is not the person or the personality or individuality of a person that is addressed. In fact, only the biomarkers of the respective people are meant. Biomarkers are parameters with which a property can be objectively measured in order to use it as an indicator for normal biological or pathological processes or pharmacological responses to a therapeutic intervention. Biomarkers can be differentiated into prognostic, predictive and pharmacokinetic/pharmacodynamic biomarkers. According to TK, there are currently 27 substances available on the German market, for which a biomarker test is to be carried out according to the respective scientific information before using the drug. In most cases, the use of the drug is indicated only if the diagnostic test showed a positive result before treatment and the biomarker, i. e., a certain molecular change in the tumor cells, is present.

4.3 Statement by the German Ethics Council In April 2013, the German Ethics Council released a statement titled “The Future of Genetic Diagnostics – From Research to Clinical Application.” For this purpose, information was provided on the latest technical methods of genetic diagnosis and its use in medical practice during a public hearing. “In recent years, the methods of genetic analysis have been developed rapidly. The new applications are designed to improve the explanation of the causes of disease as well as risk forecasts, and to contribute to new therapeutic approaches. However, it remains to be seen how quickly and to what extent they will find their way into clinical practice.” The Ethics Council stressed that “the question of the quality of life should not be reduced to medical or genetic findings.”… “With multifactorial diseases, it is not appropriate to constantly address genetic variances as (potentially harmful) mutations. Often these are actually polymorphisms that are widespread in the population. Their possible influence on disease and health is only achieved in the complex context of other genetic, epigenetic and environmental factors.” After a period of high-flying expectations, the assessment of a direct clinical application of the results of genome-wide association studies has, for now, given way to disillusionment. The irreversible conceptual difficulty is that for mulitfacto-

1 Intro Big Data for Healthcare?



29

rially influenced characteristics, a large number of gene loci and an even greater number of interactions between these gene loci come into question. This inevitably leads to a tendency to statistical overfitting of the correlations, where random correlations between DNA sequence and phenotype are interpreted to be the putative cause. Likewise, underfitting can also occur, where actually relevant genes or interactions between multiple genes are not or not properly recorded and therefore escape identification.

4.4 23 and me? “23 and me,” a Big Data start-up from the United States, certainly provides the best known genetic test kit. As a direct-to-consumer test (DTC test), the company offered to “analyze the individual genome” via the Internet with the help of a few milliliters of saliva. In addition to being tested for the probabilities for a variety of diseases, you can have your genetic closeness to certain animals (such as cow or mouse) analyzed for USD 99.00. The company claims to have analyzed the genes of 450,000 customers to this day. The saliva samples were said to have been tested for about 200 genetic diseases and 99 other predispositions. In Big Data circles of the time, it was “hip” to ask “have you already or have you not yet…?” The marketing machine promised much in 2009, including the “genetic fountain of youth.”²⁴ In 2013, the United States Food and Drug Association (FDA) banned the company from selling the test, although the FDA stressed that there had been a dialogue for many years and it had tried to support the provision of legally compliant offers. However, even after these many interactions with 23andMe, we still do not have any assurance that the firm has analytically or clinically validated the PGS (Personal Genome Service) for its intended uses, which have expanded from the uses that the firm identified in its submissions.²⁵

24 The following website provides further information as well as depictions: ScienceBlogs: 23andMe offers free genome scans to 4,500 senior athletes, seeking genetic fountain of youth. Online: http:// scienceblogs.com/geneticfuture/2009/08/12/23andme-doing-free-genetic-tes/, [accessed on: August 6, 2014]. http://www.google.de/imgres?imgurl=http%3A%2F%2Fscienceblogs.com%2Fgeneticfuture %2Fwp-content%2Fblogs.dir%2F274 %2Ffiles%2F2012 %2F04 %2Fi-ce6030d80cf23822cb0cea49d86a 454c-palo-alto-online_23andme-senior-athletes-ad.jpg&imgrefurl=http%3A%2F%2Fscienceblogs.com %2Fgeneticfuture%2F2009 %2F08 %2F12 %2F23andme-doing-free-genetic-tes%2F&h=327&w=515&tbnid=Ga9SEySaOG5WSM%3A&zoom=1&docid=Z0kQx3OevDC5IM&ei=3t3ZU7S7D8an4gTa3oHABA&tbm=isch&iact=rc&uact=3&dur=353&page=9&start=161&ndsp=22&ved=0COoBEK0DMEw4ZA 25 FDA: Warning Letter. Online: http://www.fda.gov/iceci/enforcementactions/warningletters/2013/ ucm376296.htm, [accessed on: 6.8.2014].

30  Peter Langkafel

The promises of the genetic testing could not be validated. 23 and me has backtracked and currently offers a test on its website that “only” claims to analyze the “genetic genealogy.” It will be exciting to see whether new attempts will be made to offer disease predictions. The German Ethics Council made the following statement on this issue: “To what extent the enormous economic expectations of the DTC test provider and the fears by politicians that the demand for these tests will increase dramatically are realistic, remains to be seen. The initial expectations of an expanding DTC market have so far not materialized.”

4.5 Big Data, “Personalized Medicine” and IT A short time ago I was invited to a presentation by an IT company that wanted to introduce the blessings of information technology in the field of genetic analysis under the title Big Data. After the event I said to the speakers: If what you say is true, you’ll receive not one but five Nobel Prizes in Medicine tomorrow! Medical critical expertise is urgently needed here in order to take a little air out of all the hype from time to time. For example, the “Network for Evidence-based Medicine”²⁶ is an important organization in this area that helps to classify medical “knowledge” and the corresponding capabilities. We face the challenge of simultaneously generating more and more knowledge at an ever faster pace, and integrating the existing knowledge – which consequently also increases – into everyday life, into medical treatment, better and quicker. This is basically an “execution problem from bed to bench and from bench to bed.” A central potential of Big Data lies in the development of innovative models, and the integration of expert knowledge and new publications into our daily activities. A largely under-developed potential can also be seen in the area of public health: If we know that up to 80 % of health can depend on psychosocial factors – why do we focus so much on genomes? Are bike lanes (organization) and the purchase of a bicycle (personal action) the “better” option to change our health sustainably? Can Big Data help to identify new correlations in this area – both on an individual as well as an epidemiological level? What factors are critical for the avoidance, development and course of diseases? What makes and keeps you healthy? Big (!) questions that also go with Big Data. All the authors who have made a contribution to this book address these issues in one way or another. My special and sincere thanks go to them all.

26 http://www.ebm-netzwerk.de/

1 Intro Big Data for Healthcare?



31

Literature Bahnsen, U.: Der Schaltplan des Menschen. In: Die Zeit. Online: http://www.zeit.de/2012/37/ Encode-Projekt-Birney, [accessed on: August 5, 2014]. Bertolt Brecht: Leben des Galilei: Schauspiel. Berlin 1998. Brammer, L./Brilliant, L./Ginsberg, J. et al: Detecting Influenza Epidemics Using Search Engine Query Data. In: Nature 457, February 19, 2009. Online: http://www.nature.com/nature/journal/ v457/n7232/full/nature07634.html, [accessed on: August 6, 2014]. Deutscher Bundestag: Nr. 37/13 (November 6, 2013), 2013. FDA: Warning Letter. Online: http://www.fda.gov/iceci/enforcementactions/warningletters/2013/ ucm376296.htm, [accessed on: August 6, 2014]. GMDS: Dokumentenmanagement, digitale Archivierung und elektronische Signaturen im Gesundheitswesen. Dietzenbach 2012. Goethe: Triumph der Empfindsamkeit, IV. Halper/Krishnan: TDWI Big Data Maturity Model Guide. 2014. Lazer, D./Kennedy, R./King, G./Vespignani, A.: The Parable of Google Flu: Traps in Big Data Analysis. In: Science 343/2014, pp. 1203–1205. Mayer-Schönberger, V./Cukier, K.: Big Data: Die Revolution, die unser Leben verändern wird. Munich 2013. ScienceBlogs: 23andMe offers free genome scans to 4,500 senior athletes, seeking genetic fountain of youth. Online: http://scienceblogs.com/geneticfuture/2009/08/12/23andme-doing-free-genetictes/, [accessed on: August 6, 2014]. http://www.google.de/imgres?imgurl=http%3A%2F%2Fscien ceblogs.com%2Fgeneticfuture%2Fwp-content%2Fblogs.dir%2F274 %2Ffiles%2F2012 %2 F04 %2Fi-ce6030d80cf23822cb0cea49d86a454c-palo-alto-online_23andme-senior-athletesad.jpg&imgrefurl=http%3A%2F%2Fscienceblogs.com%2Fgeneticfuture%2F2009 %2F08 %2 F12 %2F23andme-doing-free-genetic-tes%2F&h=327&w=515&tbnid=Ga9SEySaOG5WSM%3A& zoom=1&docid=Z0kQx3OevDC5IM&ei=3t3ZU7S7D8an4gTa3oHABA&tbm=isch&iact=rc&uact=3& dur=353&page=9&start=161&ndsp=22&ved=0COoBEK0DMEw4ZA TK: Innovationsreport 2014. Online: http://www.tk.de/centaurus/servlet/contentblob/641170/Datei/ 121104/Innovationsreport_2014_Langfassung.pdf, [accessed on: August 6, 2014]. Verhej: The Process of Big Data Solution Adoption. 2013. Wikipedia: Grundrecht auf Gewährleistung der Vertraulichkeit und Integrität informationstechnischer Systeme. Online: http://de.wikipedia.org/wiki/IT-Grundrecht, [accessed on: August 1, 2014]. Wikipedia: Personalisierte Medizin. Online: http://de.wikipedia.org/wiki/Personalisierte_Medizin, [accessed on: August 6, 2014].

Schepers Josef and Martin Peuker

2 Information Management for Systems Medicine – on the Next Digital Threshold The first article is deliberately written from a hospital point of view. The authors describe the “digital threshold” at which university hospitals and cutting-edge medicine find themselves today, from the perspective of Europe’s largest university hospital, the Charité – Universitätsmedizin Berlin: The combination of administrative, clinical and research data is just as necessary as the connection of “bench to bed” and “bed to bench” – i. e., the greater integration of medical experiments and laboratory in clinical practice and vice versa by means of Big Data scenarios. Here the focus for the Charité lies not only in the data itself, some of which is already processed with in-memory technology. The customized presentation of the data via mobile devices, which then enables the use of this data, is another component of the Charité’s “Big Data strategy.”

Content 1 New Data Volumes  34 2 The Digitalization of the Knowledge Cycle  36 3 Consolidation of the Digital Medical Care and Research Systems  38 4 Integration and Differentiation  41 Literature and Presentations  43

Whenever Big Data in healthcare is discussed, the focus rightly shifts to omics analyses¹ and personalized medicine very quickly. Due to the relevance of character strings that are several gigabits in size and describe pathological molecular structures and processes and influence treatment decisions, information technology is pretty much indispensable at the forefront of medicine. It is thus that Thomas Lengauer from the Max Planck Institute for computer science stated at the World Health Summit 2012 at the Charité: “IT is at the core of Individualized medicine – both in health care and in research.” Regardless of that, many tasks in the areas of diagnosis, therapy and management in hospitals are being continuously optimized with the help of more and more digital systems with ever-increasing amounts of data. These two developments are now coming together, particularly in the digitalization of the knowledge cycle in

1 Omics data: genome, epigenome, transcriptome, proteome und metabolome data.

34  Schepers Josef and Martin Peuker

university medicine. In addition Heyo Kroemer, the President of the Medizinischer Fakultätentag, pointed out the coming changes of patient clientele and staff structures at the annual TMF conference in March 2014: “The challenges of demographic change for the health system can only be managed with the involvement of university medicine. This essentially requires the medical faculties to be well equipped with modern research infrastructure.” He stated that information technology would play a key role.² The IT departments of university hospitals and maximum care hospitals are currently facing several mammoth tasks at the same time: the consolidation of existing electronic patient documentation, authorized data provision from a health care context for research, the integration of complementary documentation taking into account semantic interoperability, the development of episode knowledge from clinical texts, the transparency of resource availability for clinical decision makers who support clinicians in dealing with the flood of information, and – last but not least – ensuring data protection and data security. Together, these issues lead to the three “Vs” of Big Data: The amount of data (volume), the wide range of requirements (variety) and the need for accelerated support (velocity). In medicine, a fourth V is also necessary: the reliability of instructions and warnings (validity).

1 New Data Volumes In her keynote speech “What will move us in (to) the future?” at the annual GMDS conference 2012 in Braunschweig, Elisabeth Steinhagen-Thiessen reported that molecular diagnostics and genome sequencing are already a part of modern everyday research.³ Therapy tailored to individual patients, which promises an increase in efficacy, the elimination of undesirable side effects and the avoidance of ineffective drugs, requires the integration of large amounts of information for each case. She also warned against a careless use of the possibilities because of the changing ethical issues and aspects, and explicitly named the “many false-positive indications of disease.” At the World Health Summit in October 2013, the start of the so-called National Cohort and the participation of Charité institutes were a particular Big Data topic. Starting from the year 2014, the genetic makeup and health of 200,000 German participants are to be examined over decades in a study. New insights into the genesis and progression of widespread diseases such as diabetes, cardiovascular disease,

2 Kroemer: Erfolgsfaktoren für die translationale Medizin aus Sicht der Medizinischen Fakultäten. Keynote lecture at the TMF Annual Meeting 2014, April 3, 2014 in Jena. 3 Steinhagen-Thiessen/Charité 2012: Was bewegt uns in der/die Zukunft – aus der Sicht der Medizin, Keynote lecture at the GMDS Annual Meeting, Braunschweig, 2012.

2 Information Management for Systems Medicine – on the Next Digital Threshold



35

cancer, dementia and infectious diseases are to be gathered. The flood of data provided by this long-term study would have been technically and methodologically unmanageable a few years ago. At the IDRT/i2b2 meeting in Erlangen in March 2013, Patrice Degoulet reported about “13 Years with an Integrated Clinical Information System at the Pompidou University Hospital.”⁴ He also reflected upon the two main clinical-epidemiological analysis approaches in the interplay of clinical patient data and genome sequences. Deviating single nucleotide polymorphisms (SNPs) were looked for in case-control studies with patients with a phenotype characteristic – in this case a particular disease or particular drug intolerance. In the evaluation plot the significance values of the deviating relative frequencies of polymorphisms (or their logarithm to the base 10) are shown for each chromosome. Alternatively, or additionally, the occurrence of diseases in treatment and control groups is investigated with different genotypes in cohort studies. The evaluation shows which diagnoses (or symptoms) occur frequently in SNP groups. Genome-Comprising Association Studies

Genotype-Specific Association Studies

(Search for genome peculiarities for a phenotype criterion, e.g., scleroderma)

(Search for effects of SNP allelic variants)

Cases (with scleroderma)

Control group (without scleroderma)

Identification of single nucleotide polymorphisms (SNP) that deviate significantly often

Allelic group G

Allelic group A

ATTGCAA

A TTACAA

Identification of diagnoses and symptoms that deviate significantly often

Source: Authors' illustration, modified according to Degoulet, Patrice 2013 (foil 35). Fig. 2.1: Basic principle of phenotype case-control studies and genotype cohort studies

4 Degoulet/Patrice: 13 Years with an Integrated Clinical Information System at the Pompidou University Hospital. Lecture at the i2b2 meeting in Erlangen, March 2013.

36  Schepers Josef and Martin Peuker

Other research groups – for example that of Hans Lehrach of the Max-Planck-Institute for Molecular Genetics in Berlin Dahlem – formulate models for the inhibitory effect of drug molecules on destructive protein multiplication and present new approaches for the interruption of pathological processes. Without information technology with large, fast processing capabilities, such research and the application of its results in practice would not be possible either. The cooperation of the Charité and Max Delbrück Center for Molecular Medicine (MDC) will focus on the translation of the results of basic research in health care and interdisciplinary systems medicine at the Berlin Institute for Health Research and be mutually dependent, explained Professor Rietschel in his speech at the founding of the institute in June 2013.⁵ This is where systems medicine turns away from individual disease patterns where fundamental molecular causes of disease can be identified, in order to develop new diagnostic procedures, therapies and preventive measures. Extensive technology platforms will be developed for BIG/BIH research in the next few years. Not only are the genotypes differentiated through SNPs, but also further epigenome, transcriptome, proteome and metabolome data will first be related to clinically collected patient data in research and then in treatment. The genome of each and every human being consists of three billion base pairs. If these were fully sequenced for all the patients of just one university hospital – such as the Charité – the data volume would quickly amount to terabytes or even petabytes. However, the amount of data is already “Big” if only partial sequences are considered. Transcriptome, proteome and other molecular descriptions generally have the same orders of magnitude. Add to that drug information, biosignal patterns, images and everyday medical documentation. In the not-too-distant future, growth rates amounting to terabytes will no longer be a rarity in the IT systems managed by the Charité’s IT department. The new large biomedical data volumes are rightly being appreciated more and more. But it would be irresponsible to consider this to be the sole or most important challenge with regard to the use of information technology.

2 The Digitalization of the Knowledge Cycle The challenges for the future deployment of information technology at the Charité can be illustrated by the necessary, integrated digitalization of the TRACT knowledge

5 Ernst Rietschel, Executive Chairman of the BIG/BIH: Speech at the grand opening of the Berliner Instituts für Gesundheitsforschung (BIG) – Berlin Institute of Health (BIH) on June 18, 2013. Online: http://www.berlin-sciences.com/en/aktuelles/berlin-sciences-aktuell/meldungen/feierliche-eroeffnung-des-berliner-instituts-fuer-gesundheitsforschung-big-berlin-institute-of-health-bih/, [accessed on: April 29, 2014].

2 Information Management for Systems Medicine – on the Next Digital Threshold



37

cycle: Treatment – Research – Analysis & Adaption – Clinical Trials – Translation – Treatment. Information on symptoms, diagnosis and therapy effects from a treatment context (T) form a central basis for medical research (R): from bedside to bench. From the observation of relationships between personal, diagnostic, therapeutic or other features (including genotype and phenotype) in individual cases or in small groups, researchers can derive hypotheses that are applied to in-depth statistical analyses (A) and the selective adaptation of procedures (e.g., the development of new drugs). In so far as hypotheses for the adapted procedures are substantiated in the analyses, they have to be confirmed by clinical trials (C). Often randomized, prospective, double-blind intervention studies are necessary so that the theses can develop into reliable, new medical models. The assessment of the relevance of variations in genome sequences and morphological structures is currently a key issue. The translation of new model knowledge into clinical treatment (T) – from bench to bedside – completes the knowledge cycle. The additive or substitutive application of the new knowledge leads to new data and new questions that are then included in the next round of research.

Bedside to bench

Research data, images, signs and biomaterial (R)

Information Warehouse

Patient care (T)

(Data & ontologies)

Hypotheses

Statistical analyses and adaptation (A)

Translation (T) Clinical studies (C) Model knowledge Theories for care Source: Authors' illustration, modified according to S. Murphy6, Partners Center, i2b2 Boston. Fig. 2.2: TRACT Knowledge Cycle

The imaginary digital knowledge cycle ideally revolves around a real information warehouse that serves both research and medical care. It is essential that the new knowledge be represented not only in explanatory texts, but also in digital ontology.

6 Presented by Sax, Gansladt and Löbe 2014 at the TMF Annual Meeting in Jena, April 3, 2014.

38  Schepers Josef and Martin Peuker

This is where an inevitable need for semantic standards can be seen, so that the individually created knowledge components can be made available and adopted in a semantically interoperable fashion (for example, in the form of knowledge cartridges) across multiple systems.

3 Consolidation of the Digital Medical Care and Research Systems The digital knowledge cycle can deploy its full effect only if a new specific collection of patient data is not necessary in every phase. This initially concerns the research step and provision of data from bed to benchside. Much data is currently still read from files, transferred manually or collected separately, as it is either not available in a digital format or is “hidden” in texts that are available electronically. The desired system landscape in this constellation could have the following structure (see Fig. 2.3). Addition of department system

Translational research

Biobank

Translational research

Clinical data

Pathology

Therapeutic

Diagnostic

Clinical

Metabolism

Genome

Telemonitoring/patient Data from medical equipment

Proteome

Transcription

Register of deaths Tumor documentation

Public databases e.g. lists of active substances

e.g. genome atlas

e.g. PubMed

e.g. HapMap

Source: Authors' illustration. Fig. 2.3: Integration of IT systems and knowledge at the Charité

A consolidation of digital medical records is required both for the optimization of patient care itself and for the authorized provision of data for research. For many users a paperless hospital is the method of choice, and they see the need to radically improve the coexistence of paper files and electronic documentation. The authors agree with this stance, but expect a prolonged transition period, due in part to resource constraints but mostly because of the current lack of software ergonomics. For the time being, the digitalization of medication as well as the rollout of an electronic temperature chart with the collection, presentation and transmission of vital parameters, are the leading projects. For a long time the ergonomics at the bedside presented an almost impossible challenge to the use of

2 Information Management for Systems Medicine – on the Next Digital Threshold



39

electronic temperature charts. This has now improved thanks to the use of software applications and tablet PCs. The rollout is now being hampered mostly by the scarcity of resources and the requirements that have to be implemented, such as data protection.

reality patient

human perception texts

More than one prespective is reasonable

codes statistics, groups

Source: Straub/Lehmann: A semantic clinical data repository— how the work on DRGs can serve clinical medicine, too. In: Swiss Medical Informatics, no 71. 2011, pp. 34–36.

Fig. 2.4: Reduction of factual knowledge through documentation and task-oriented classification

We should also determine, from the point of view of systems ergonomics, whether the increased digitalization of patient records can be better achieved by extending the documentation in entry screens and structured tables or by computational linguistic development of episode knowledge from the findings, healthcare reports and medical reports that are already digitized – or a combination of the two. As explained by Hans-Rudolf Straub, the loss of information in the imaging of patient realities in texts is lower than in structured features, particularly if not documented for a localized issue.⁷ The Berliner Forschungsplattform Gesundheit (Berlin research platform for health) project⁸ was able to show that, with the appropriate effort, computational linguistic methods are now in a position to make episode knowledge from clinical texts available for analysis and search processes with high sensitivity and precision. Conversely, there is also a need for consolidation of electronic patient records with regard to the transfer of new knowledge from research to patient care: from bench to bedside. For example, the provision of explicit warnings for possible genotype-specific drug intolerances requires the digital mapping of the genome sequence

7 See Fig. 2.4. 8 See Schepers/Geibel/Tolxdorff et al.: Evaluation der computer-linguistischen Texterschließung neuro-radiologischer Befunde im Berliner BFG-Projekt. 2013 and Geibel/Trautwein/Erdur/Schepers et al.: Ontology-Based Semantic Annotation of Documents in the Context of Patient Identification for Clinical Trials 2013.

40  Schepers Josef and Martin Peuker

or DNA segments with relevant single nucleotide polymorphisms (SNPs) as well as the digital documentation of drug prescriptions. Although in non-medical Big Data applications the data regularly features an inherent similarity (customer behavior, GPS data, weather physics), in systems medicine a variety of different data types come together, each of which can make a relevant contribution to diagnostic differentiation and therapy optimization. The ideal electronic medical record therefore includes structured demographic data, coded symptoms, diagnoses and procedures as well as event dates in tables, plain text symptom and procedure descriptions, omics data in their own notational forms, molecular models of metabolisms and drugs, image data, signal data, course profiles (e.g., creatinine increase) as well as “intelligent data” processing of combined constellations (e.g., time-delayed reactions to MB fractions of creatine kinase, troponin in serum and other cardiac biomarkers in a cardiac event). In the coming years, these challenges of data heterogeneity will also be addressed by the promotional program “Integrative Datensemantik” (integrative data semantics) of the Federal Ministry for Education and Research: […] provision of basics for the development of targeted delivery systems for systems medicine, in which structured as well as unstructured data are integrated and used interoperably on a semantic basis. This has the aim of beneficially implementing new forms of data integration and data interoperability for patients. These could be, for example, the basic principles for systems that combine individual patient data with generic disease models to provide forecasts on the course of a disease or to explore alternative courses of action.

The issue of “developing methods and tools for content analysis of unstructured data objects with the aim of structuring the content and thus making it accessible for further computational processing (search, comparison, aggregation, etc.)” is addressed. In addition “procedures of picture and/or pattern recognition and interpretation (visual analytics) for visual data” are also to be considered.”⁹ Incidentally, it could also be shown, this side of the new individualized procedures, that the support of decisions through digital references (alerts) can cause an improvement in medical care. As a striking example, let us refer to a Harvard study on a somewhat banal topic. In the study “Electronic Alerts to Prevent Venous Thromboembolism among Hospitalized Patients” by Nils Kucher et al.,¹⁰ the reduction in the incidence of vascular complications as a result of digital reminders of medical and physical thrombosis prophylaxis (support stockings!) was demonstrated.

9 Waltemath/Hahn/Schomburg: “i:DSem – Integrative Datensemantik in der Systemmedizin”, strategy paper 2013. 10 Kucher/Koo/Quiroz et al.: Electronic alerts to prevent venous thromboembolism among hospitalized patients. In: N Engl J Med. 352(10)/2005, pp. 969–977.

2 Information Management for Systems Medicine – on the Next Digital Threshold



41

But on the other hand, electronic support for decision making should not degenerate into information overload and provide too much imprecise information with low predictive value. Elisabeth Steinhagen-Thiessen’s ethical warning against too much false-positive evidence for diseases was cited above. The appropriate handling of the trade-off between avoiding false-positive indications on the one hand and the falsenegative omission of alerts in the case of suspected allergies, intolerance and inefficacy on other hand, must be seen as a relevant ergonomic challenge of future IT systems.

4 Integration and Differentiation Two relevant differences between medical care data systems and research data systems must be brought to mind as modifying factors in respect of the urgent need to integrate information from medical care and research into the knowledge cycle, which suggest the qualified separation of both worlds: In a health care context, the consent of patients to the use of their data is part of the health care contract. In a research context, however, patient consent often has to be obtained separately and documented in a special authorization register (policy registry), unless there is a specific legal basis for its use, or the data can be anonymized without loss of information. Secondly, the demands made on the data constellation in the Online Transaction Processing (OLTP) of the medical care differ from those made on the same constellation in the Online Analytical Processing (OLAP) of the research. The process of checking whether the new information technology requirements can continue to be managed with classical product architecture with major systems and subsystems has not yet been concluded. In this architecture, data, logic and user interfaces for specific functions are all combined in one product. The patient and case numbers of the main system are taken from the various components of the overall system. The subsystems, which are interconnected via a communication server, often store multiply redundant patient data, for example data or laboratory findings from complementary systems. Often authorization systems are managed separately. The heterogeneity of data management and non-homogeneous nomenclature complicate integrated use such as integrated visualization, as is strictly necessary in a systems-based medical approach in tumor conferences. In such a product architecture, the efficient processing of data diversity with common ontologies or even merely organizational hierarchies may actually be facing almost insuperable hurdles. The possibility of the gradual construction of components for a service and a platform architecture with separate health care and research domains, which will be influenced both by defined limits and by defined connections, is currently being tested at the Charité (see Fig. 2.5).

42  Schepers Josef and Martin Peuker

"Omics"facility

Stem cell facility

Screening of lowmolecular compounds

Clinical research groups

Biobanking

Tomography facilities

Data security layer

Access to research information

Integration engine Transformation, routing, storage

Connectivity of clinical system Data security layer

Applications in the area of hospital information

Electronic patient files

Other departmental systems

Tomography

LAB

Internal research database

4 PB of clinical data

Source: Authors' illustration. Fig. 2.5: Approaches to system integration at the Charité

Within the domains, the aim is to separate the concentrated data management in a few clinical data repositories from the interchangeable data logic and interchangeable user interfaces. These three basic components must be supplemented with a cross-reference database (data registry), a centralized authorization system (policy repository) and centrally maintained patient identification (master patient index). Requirements are to be supported more effectively in future by means of interoperable classifications (e.g., Systematized Nomenclature of Human and Veterinary Medicine [SNOMED]), standardized documentation requirements (e.g., HL7 Clinical Documents Architecture, IHE) and a coordinated interpretation system (shared ontologies). Data intelligence or contextual knowledge will have to be deposited in defined logic modules. Each logic module should be available to various user interfaces and, in turn, have access to various repositories. Understandable critical reviews of currently existing hospital information systems regret the primacy of the economic functions and the low support of clinical staff in managing their medical care and research tasks. However, one will not be able to escape the economic requirements in future either. It will still be necessary to compile the required information for billing, and to provide the people working in medical care with feedback about the processes, quality, efficiency and profitability of their work. Given Heyo Kroemer’s warning about the contribution of key information technology infrastructure to improvements in efficiency and the mastery of demographic change, this can no longer be in contrast to the completion of core tasks in the future. In fact, all optimization efforts must be based on the continuous improvement of both the processes of care for patients and the optimization of workflows for staff. The rollout speed of systems-based medicine is likely to be significantly increased if the expectations relating to cost reduction and the lessening of side effects can be demonstrated through empirical tests.

2 Information Management for Systems Medicine – on the Next Digital Threshold



43

Literature and Presentations Degoulet, P.: 13 Years with an Integrated Clinical Information System at the Pompidou University Hospital. Lecture presented at the i2b2 meeting in Erlangen. March 2013, foil 35. Geibel, P./Erdur, H/Zimmermann, L. et al.: Patient Recruitment for Clinical Trials with OntologyBased Information Extraction from Documents. In: Proceedings KEOD 2013. Scitepress. Geibel, P./Trautwein, M./Erdur, H. et al.: Ontology-Based Semantic Annotation of Documents in the Context of Patient Identification for Clinical Trials. In: Proceedings ODBASE 2013. Heidelberg/ New York 2013. Kroemer, H.: Erfolgsfaktoren für die translationale Medizin aus Sicht der Medizinischen Fakultäten. Keynote lecture at the TMF Annual Conference 2014 in Jena. Kucher, N./Koo, S./Quiroz, R./Cooper, J. M.: SZ. Electronic Alerts to Prevent Venous Thromboembolism among Hospitalized Patients. In: N Engl J Med Mar;352(10)/2005, pp. 969–977. Online: http://www.nejm.org/doi/full/10.1056/NEJMoa041533, [accessed on: April 29, 2014]. Lengauer, T.: Podiumsdiskussion Individualisierte Medizin. World Health Summit. Charité – Universitätsmedizin Berlin on October 22, 2012. Sax, U./Ganslandt/Löbe, T./Löbe, M.: Standardisierte Datenaufbereitung und Zusammenführung für die Forschung, TMF-Projekt V091-MI IDRT Integrated Data Repository Toolkit. Lecture at the TMF Annual Meeting 2014 in Jena, foil 2. Schepers, J./Geibel, P./Tolxdorff, T.: Evaluation der computer-linguistischen Texterschließung neuroradiologischer Befunde im Berliner BFG-Projekt. Lecture at KIS-RIS-PACS- and DICOM meeting in Mainz – Schloß Waldthausen, June 21, 2013. Straub, H. R./Lehmann, M.: A Semantic Clinical Data Repository – How the Work on DRGs Can Serve Clinical Medicine, Too. In: Swiss Medical Informatics 71/2011, pp. 34–36. Online: http://medi text.ch/texte/Semantic-CDR-2011.pdf, [accessed on: April 29, 2014]. Waltemath, D./Hahn, U./Schomburg, D. et al.: i:DSem – Integrative Datensemantik in der Systemmedizin, strategy document.

Albrecht von Müller

3 Some Philosophical Thoughts on Big Data Big Data as a chance, a challenge or a danger and then again as a chance in medicine – perhaps that is the shortest way to summarize this book? Big Data, the use of new technologies with their “dual use,”¹ the influence on processes in and in conjunction with organizations as well as on organizations themselves, the interrelation with the legal framework and the possible influence on society – all of that is more or less included in the following articles. Von Müller, as a philosopher and “knowledge manager,” outlines the potential influence on thinking (and acting) in itself as follows: “But as long as we keep looking only at the tracks, we will never get to see the hiker.”

Abstract: The following is only meant to provide a sketchy outline of four philosophically interesting perspectives of Big Data as food for thought, while making no claims for their value. It basically concerns a chance, a challenge, a danger and yet again a chance.

Content 1 2 3 4

A Chance  45 A Challenge  47 A Danger  48 And Yet Another Chance  49

1 A Chance Never before in the history of humankind has so much instrumental power faced so much generally avoidable and thus unnecessary suffering. At the same time, the survival of humanity has never before been exposed to such self-endangerment. My assumption is that this is not primarily a consequence of greed or an excess of demands, recklessness, incompetence or even ill will. All of these phenomena

1 The term refers to the basic usability of an economic good for civil as well as military purposes. Dual-use products are generally subject to export control.

46  Albrecht von Müller

may in fact exist, but they merely play the role of a fire accelerant, as it were, and are not actually the cause of the precarious situation. To me, the root cause seems to be the structural bias of our perception of time and reality that is deeply anchored in the historical development of our thinking. By increasingly reducing our perception of reality to its de facto legacy, and our perception of time to its linear sequential structure of withdrawal, we overlook the best part of being in our world, the concertus mundi, the to some degree explainable but also continually new and wonderful happening part of the world, the constant taking-place of reality. The occurrence of reality happens in the present time-space and in the mode of event reality, i. e., as a constellational occurrence in which everything that is present actually becomes what it effectively is thereafter. Facts are only the traces of that act of happening, like those a hiker leaves in the snow. But as long as we keep looking only at the tracks, we will never get to see the hiker. And that is precisely the problem of modern science. Trapped in a mindset that could be described as facticity imprisonment [in English in the original], it misses reality’s actual act of happening that is taking place in the time-space of the present [Gegenwart], and as the actual present moment [Gegen-Wart]. Only because we are now able to quantify and pin down what is happening in reality, modern science makes do with something like reality’s exhaust fumes. However, as we are blanking out the hiker, i. e., the actual constellational occurrence of reality, we have not really been getting anywhere for roughly a hundred years when it comes to the really fundamental questions. Neither have we succeeded in solving the so-called measurement problem of quantum physics, or in defining its relationship with classical and relativistic physics. Nor have we found access to the phylogenetic and ontogenetic self-constitution of life or even the phenomena of consciousness and the mind. Seen from the epistemological perspective outlined here, the reason for that is the categorical or conceptual narrowing of modern science to the factual aspect of reality. Both in quantum physics and in the self-constitution of life and spirit, the original happening of reality comes to light – and anyone who cannot deal with the concept of this stand-alone and even basic mode of reality has no chance of coming even close to understanding the phenomena mentioned. But what does all of this – if such a radical philosophical critique of much of modern science is even true – have to do with Big Data? I think quite a lot. I believe quantum physics is the biggest advance in knowledge of the 20th century. We may not actually have fully understood or digested this progress yet, but it has opened fundamentally new perspectives in the understanding of the reality of events, i. e., the door of our imprisonment in facticity has been opened at least a little. This was made possible due to the fact that for the first time in the history of science we have come so close to the actual “texture” of real happening in quantum physics, meaning that the usual explanation trick no longer works. The trick was to

3 Some Philosophical Thoughts on Big Data



47

maintain that every incomprehensible phenomenon was based on a further, linearcausal mechanism, which had unfortunately not been discovered yet. But this trick no longer works in quantum physics (at least not since the formulation of Bell’s inequalities). We have, according to the thesis put forward here, come so close to the actual occurrence of reality for the first time, that a pre-causal, i. e., constellational self-occurrence can be seen as a basic mode. The same could happen through Big Data, particularly in the areas of health and social interaction. As long as we only had the classic, very grainy representation of reality at our disposal, we were able to spin yarns as to all the causally reconstructable processes that supposedly took place in the intermediate spaces and intermediate times. Big Data is nothing other than a much higher resolution, more filigree documentation of actual events. It could very well be that – as with the improvements of reality observation in quantum physics – we are now also approaching the actual reality of events here too, meaning that we can also recognize their pre-causal, constellational dynamics. If that were the case, it could be a great opportunity to leave familiar patterns of thought behind and to discover fundamentally new aspects or to see already known phenomena in a whole new light. Whether this opportunity can be used depends very much on how far we are able to interpret the new floods of data sensibly, i. e., to make sense of it. Which leads to my second point, the challenge.

2 A Challenge Understanding [Verstand] and reason [Vernunft] are not the same thing. Understanding can be defined as the ability to come to increasingly precise analytical distinctions. Reason, however, can (according to the physicist and philosopher Carl Friedrich von Weizsäcker) be understood as the ability to experience [wahrnehmen] a whole as a whole. To put it bluntly: Understanding alone makes no sense. Sense is constituted only in and from the true experience [Wahr-nehmung, taking as true] of the whole. As a small example of the difference between understanding and reason let us take the well-known sequential game and social-psychological experiment known as the dollar auction. A dollar is auctioned to a group of people. The only special rule is that the second highest bid also has to be paid. Starting at 10 cents the bids quickly rise to 80 or 90 cents. But now the person who has made the second-highest bid must pay that amount but without receiving the dollar. So for that person it is completely rational [rational] to continue, even up to the point where profit and loss are equal. But in that situation there is again a second-highest bid, e.g., 90 cents,

48  Albrecht von Müller

whose bidder again receives nothing. Therefore it seems rational for that bidder to increase the bid to USD 1.10, as that would reduce his loss by 80 cents. And in accordance with precisely that logic, the second-highest bidder now continues to bid – and so on. It is often the case that the dollar is actually auctioned at a price of USD 2 to USD 4 – which of course is not a particularly good deal. So what happened? From a local perspective, each participant acted completely rationally and tried to minimize his or her loss. Only when you take a step back in your mind and look at the interaction of the two behaviour patterns – which are rational locally, i. e. at each step – does the fatal downward spiral of the whole become apparent. Reason never stands in contrast to rationality [Rationalität], but it does – through the perception of the whole – go far beyond it. In principle, our little epistemological laboratory model pretty much describes the momentum of financial markets, conflict escalation, or the “logic” of an arms race. Overall it seems as though our modern civilization is afflicted by a dramatic hypertrophy of local, ratiomorph optimizations with a simultaneous atrophy of reason. In this regard Big Data seems to be powerful, but very ambivalent. It could lead to a kind of turbo version of the optimization trap, and a number of signs certainly do support that notion. However, it could also go in the opposite direction and help strengthen the perception of the whole through a more accurate illustration of patterns and relationships. I do not think that Big Data in itself will bring us to our senses [zur Vernunft]. But if we remind ourselves of the distinction between understanding and reason, we could certainly use it to strengthen reason. However, this requires that we not only look for simple causal or statistical patterns in the new evidence [Evidenzen], but also that we be open to the phenomena, much more difficult to experience, of the constellational self-unfolding of a whole.

3 A Danger The great danger of Big Data is the continued selling off of the vast heritage of enlightenment: A return to self-imposed immaturity. The idea of human dignity and its implementation in the form of individual freedom and social self-determination or democracy are at risk if you place the unqualified real or supposed need for protection (e.g., from terrorist attacks) above all other values. The confidentiality of documentation and the exchange of ideas, issues and concerns is a fundamental right of free human beings. Of course there is a certain risk that protected communications can also be misused for the preparation of criminal acts. This then requires an appropriate and carefully thought-out consideration of interests. Currently, however, it seems that at least in some countries pretty much

3 Some Philosophical Thoughts on Big Data



49

everything technically possible is monitored and/or evaluated, without the responsible weighing up of interests and without the protection of the dignity even of those people who are not citizens of the respective state. But this represents a return to (information) despotism. Big Data will become a danger if it is not handled with the highest degree of responsibility. Once again, lawgivers are lagging far behind current technological developments. In future, deprivation of freedom of information should basically be punishable just as much as physical deprivation. The same applies to the breach of property rights to information. The severity of impairment in the information realm often outweighs those in the material or physical realm. Big Data should be seen as an opportunity to thoroughly think these issues through, and then to adapt the legal framework accordingly. This could prevent the great risk of a fall-back to feudal (but now IT) structures and present a major step forward in the direction of a pro-human use of technology that unfolds human life opportunities.

4 And Yet Another Chance A new field of research has emerged in recent years, which we in the Parmenides Foundation have given the name “cognostics.” The task of cognostics is (a) to better understand complex thinking and its implementation in the human brain, and (b) based on that, to develop new methods and tools with which to support our thinking when dealing with ever-increasing complexity. This is where the above-mentioned complementarity of ratiomorphic and constellational thought processes plays a central role. An effortless interaction of both modalities of thinking has proven to be the key to intellectual breakthroughs – which are never purely ratiomorph, but also almost never come about without a good analytical penetration of the situations in question. In accordance with the Kahneman paradigm of “Thinking Fast and Slow,” it is assumed here that non-ratiomorphic, intuitive processes play a central role. However, going beyond Kahneman, an attempt is currently being made to understand much more accurately the way in which the brain combines and recombines mental content in this second thinking modality. This is where the “Logic of Constellations” comes into play. In our view it is a completely independent modus operandi, the way the brain works, where the double phenomenon of “mutual semantic unfolding – according to emergent principles” plays the central role. The first part of this double phenomenon, as already indicated, deals with the fact that the nature of a constellation is such that its constituents develop their full meaning only reciprocally. The second part deals with the fact that this semantic development is neither accidental nor predetermined. It occurs according to specific,

50  Albrecht von Müller

not completely determinative but guiding principles of consistency – which, and this is the point – only emerge “on the go,” i. e., in the course of the process itself. At first all of this may sound very strange. However, it can be demonstrated with a pretty good example. During jazz improvisation, the way in which the music will develop in detail is not at all defined at the beginning. And yet this is by no means a purely random process. A musician begins a theme and develops it a bit further. Another one then takes up the theme and develops it further still. The developed theme then either goes back to the first person or it gets picked up by a third person and is again developed further – individually yet harmoniously. At some point all of the musicians might together pick up the theme thus developed. Constellatory unfolding – according to emergent principles – is the way in which our brain works when it is not linking and calculating in a ratiomorph fashion, but when it perceives or produces meaning or significance. The central role is played by the criterion of “emergent coherence” – and that is exactly why all cognitive science theories that attempt to grasp human thinking with the metaphor of computers are so fundamentally wrong, and fatally mislead our understanding of the world and ourselves. Finally, in contrast to Kahneman, the thesis is put forward that this ability of constellational processing can also be used on a level of very abstract conceptual constructs by some, typically very innovative, people. There was no ratiomorph railing which Einstein could have mechanically dragged himself along for the transition from Newtonian to relativistic physics. Intellectual breakthroughs always require a successful interaction of ratiomorph and constellational mental operations. Based on these basic theoretical considerations, it is therefore natural to support the human brain in both analytical as well as constellational operations, and especially in an effortless interaction between the two modalities of thought. Here, the difficulty is that, for the reciprocal semantic development of the components in their respective constellations, the constellational part always requires a synoptic cognitive representation of the overall context. But that is exactly what becomes more and more difficult with rapidly increasing complexity and therefore requires special support. For the initial applications of the new cognostics procedures we at the Parmenides Foundation selected the areas of “medical diagnosis,” “advanced education,” “responsible strategizing” and “re-juvenating democracy.” The support of difficult medical diagnoses has so far progressed the furthest. No good doctor sees his patients only as a list of symptoms. What an individual symptom really means arises only from the constellation of symptoms. So, if we really wish to support outstanding medical thinking, the traditional Boolean procedures based on the principle of the constancy of significance will be of little use. Hence the failure of the numerous attempts since the 1960 s to set up medical-expert systems within the framework of classical logic. A breakthrough will come only once people are able to reflect and support the phenomenon of constellational development of significance, which is at the heart of really good medical thinking.

3 Some Philosophical Thoughts on Big Data



51

In all four areas Big Data provides great opportunities for supporting the newly developed cognostics procedures with high-resolution empirical evidence, and thus for making them even more fruitful.

Thomas Brunner

4 Big Data from a Health Insurance Company’s Point of View With more than 25 million members, AOK is the largest health insurance company in Germany. But do those who pay the bills actually know what they’re paying for? Should they know it, or not? The exchange of data between hospitals and health insurance companies is carefully regulated from a legal point of view. What potential is there today and in the future for using the insurance contributions in the best possible way? Or, put a different way: How can the use of money be improved in patient-centered treatment in the future? Brunner quotes the American president Barak Obama as saying that “buying health insurance is never going to be like buying a song on iTunes.” The question that faces us in the near future: Is the time-honored separation of sectors going to apply to the Big Data charts of the future?

Abstract: In this article the initial position in statutory health insurance and the effect of Big Data is first outlined. After that, the meaning of datability for health insurance companies is discussed. It is then shown, with the example of AOK, how a Big Data strategy can be implemented in statutory health insurance practice. Finally, concrete and conceivable application scenarios are described.

Content 1 Big Data as a Competition Factor in Statutory Health Insurance  54 2 Datability in Statutory Health Insurance  54 3 Big Data Strategy at AOK  56 4 Application Scenarios in Statutory Health Insurance  58 4.1 Prevention  59 4.2 Fighting Misconduct  59 4.3 Service  60 4.4 Marketing/Sales  60 4.5 Process Automation  60 5 Conclusion and Outlook  61 Literature  61

54  Thomas Brunner

1 Big Data as a Competition Factor in Statutory Health Insurance Big Data is radically changing our thinking and acting: We can do things that we were never able to do before. […] We are moving away from the paradigm of search to the correlation of data in advance, in order to know what is going to happen.¹

The only constant in statutory health insurance is change, which has been manifested in numerous reform efforts.² Fewer and fewer health insurance companies are managing to adapt flexibly to new situations. It is thus that the number of health insurance companies more than halved from 324 to 132 in the period between 2003 and 2014.³ In addition to initial closures, the decrease resulted primarily from mergers. The demographic development will lead to a significant increase in the number of persons in need of care and the associated nursing costs. The expenditures for treatment care and home care rose by 65 % between 2007 and 2012.⁴ Add to that the rising costs for medical and technological progress. The planned introduction of the new income-related additional contribution will continue to intensify price competition between the health insurance companies.⁵ Big Data as a social phenomenon shows its effects on health insurance companies in a variety of ways. So-called digital natives (as coined by Marc Prensky) expect health insurers to be available online, in the way they are accustomed to from other areas of daily life. “Bring your own device” and company software that is just as easy to use and almost as quick as an Internet search engine can make the difference when it comes to competing for the clever minds of Generation Y.

2 Datability in Statutory Health Insurance The new era of Big Data will create the largest surveillance machine ever.⁶ Germany has fallen into data security hysteria. But actually data does so much good.⁷

1 See: Schirrmacher: Wir wollen nicht. 2013. 2 From 12/1998 to 08/2013 alone, there were 32 statutory revisions. Cf. Institute of Work and Qualification at the University of Duisburg-Essen: Chronologie gesetzlicher Neuregelungen. 2013. 3 See: GKV-Spitzenverband: Kennzahlen der gesetzlichen Krankenversicherung. 2014, pg. 24. 4 From EUR 2.34bn to EUR 3.88bn. Cf. GKV-Spitzenverband: Kennzahlen der gesetzlichen Krankenversicherung. 2014, pg. 7. 5 The GKV-Finanzstruktur- und Qualitätsweiterentwicklungsgesetz was enacted on June 5, 2014 by the Bundestag. 6 See: Schirrmacher: Wir wollen nicht. 2013. 7 See: Bernau: Sammelt mehr Daten! 2014.

4 Big Data from a Health Insurance Company’s Point of View



55

The two quotes show how differently the opportunities and risks of Big Data can be evaluated. Data protection is a valuable asset, especially for health insurance companies that handle particularly sensitive data. In addition to personal data such as name, address, etc., this relates particularly to information on an insured person’s health. This includes medical diagnoses as well as information on hospital stays and data from pharmacies, etc. Only when there is no special regulation for data protection in the Social Security Code do the German Federal Data Protection Act and state data protection laws apply. The data is subject to strict appropriation, and the health insurance companies are obliged to ensure data protection through appropriate technical and organizational measures. Private individuals voluntarily disclose intimate information in social networks. Big Data technologies make it possible to gather and evaluate such private information as business data. Clouds and data lines through foreign territory raise legal questions. App services and functions access personal information, for example the current location, and thus pose risks such as the possibility of tracking someone. The fact that data protection requires special attention in the age of Big Data, and that companies are increasingly admonished to handle large amounts of data in a responsible and sustainable way is summed up by the term “datability.” The neologism comes from the word “data” and the suffix “-bility,” which is reminiscent of ability, sustainability and responsibility. The importance of the subject is emphasized by the fact that the main theme of CeBit 2014 was Datability; a motto that, according to Chancellor Angela Merkel even “encourages you to dream and carry on.”⁸ First of all, the insured themselves have the (co-)responsibility for the protection of personal data (“self-privacy”). This also means that they should avoid “digital exhibitionism” in social networks. However, in order to make a competent decision on the use of their data, the insured have to know which parts of their data is actually stored. The health insurance companies have to handle the data of the insured transparently and should provide opportunities to gain information on the stored data and its use (such as Google Dashboard). Health insurance companies must define clear rules and regulations in cooperation with government policy makers, regarding the extent to which and under what circumstances – e.g., the duty to obtain the consent of the insured person – insurance companies are allowed to advise policy holders of statistically determined health risks and possible therapies. During the development of new software solutions, data protection should be considered from the beginning. Apps for the insured should preferably be provided with privacyfriendly default settings from the outset. The best possible use should be made of technology to protect insurance data against unauthorized access from the outside.

8 See: Merkel: Speech by Federal Chancellor Merkel at the opening of the CeBIT 2014.

56  Thomas Brunner

Due to recent data scandals, datability has found its way onto the political agenda. The first knee-jerk reactions have given the impression that data itself is the Devil’s work. Politics should be called upon to create a uniform legal framework for the use of Big Data. The Federal Government is committed to an international solution in the context of a privacy regulation in Europe. For all the justified concerns about “transparent patients” and the abuse of data, the benefits of Big Data for the insured are very obvious. The more data can be analyzed, the more accurate will the statements be concerning the developments in given diseases, and the more appropriate the respective prevention measures.⁹ In contrast, the harm that is feared remains largely abstract and uncertain.¹⁰

3 Big Data Strategy at AOK Data is the oil of the 21st century. One should not use it wastefully, but make use of the opportunities it provides.¹¹

AOK today uses the statutory health insurance industry solution, oscare®, based on the SAP insurance solution, extended with add-on developments for statutory health insurance. The data from CRM, ERP and other systems is brought together in the BW system for analyses, reports and statistics. Due to the existence of separate data sets for operational and dispositive purposes, complex batch processes are necessary overnight for updating, synchronization and processing. For performance reasons, various intermediate stages or aggregations are included in BW. The same data has to be iteratively processed in complex ways, and stored redundantly for different purposes – e.g., for Morbi-RSA, official statistics, and controlling. This means that AOK has to deal with data amounts in a high double-digit terabyte range for planning alone. AOK has decided to adopt the Big Data solution, SAP HANA.¹² For one thing, HANA provides a solution for current (performance) challenges and fits perfectly with SAP-based oscare®. For another, in addition to the core functionality of a conventional database system and modeling tools, HANA provides an in-memory computing engine and other components.

9 Dave Eggars brings this finding to an acute point in his novel “The Circle”, which takes a critical look at the consequences of Big Data, by using the succint slogan “To heal we must know. To know we must share [Data].” Eggers: The Circle. 2013, pg. 150. 10 As a representation of these fears, Dave Eggars devised the “Complete Health Data Program (CHAD)”. Its basic idea “[…] that with complete information, we can give better care” seems logical, but is perverted to a complete surveillance and control machine, for example if people want to determine the nutritional habits of others. Eggers: The Circle. 2013, pg. 357/155. 11 See: Bernau: Sammelt mehr Daten. 2014. 12 HANA stands for High Performance Analytical Appliance.

4 Big Data from a Health Insurance Company’s Point of View



57

This will enable developers of company software to create entirely new applications, and will allow users and administrators of company applications to present and store their data in a completely new way.¹³

HANA therefore fits very well into the AOK strategy. For example, with IT, innovations for disease-specific prevention and care programs can be supported faster than before. HANA is being introduced gradually. In a first stage the old BW database systems are being replaced by SAP HANA. Neither the existing BW logic nor the ETL processes are being changed in the process (non-disruptive introduction). The huge performance gains through SAP HANA became evident even during the evaluation and subsequent piloting. On average, query response times are reduced by at least 90 %. The run times for load and processing in the standard procedure of BW day processing are reduced by 50 %, and in special runs, e.g., for official statistics, by more than 60 %. The measured data compression is approximately 5:1.¹⁴ Targeted HANA optimizations provide further potential for improvements in performance (see Fig. 4.1). k. A.

Wrong conduct

10 160 99

Diabetic foot

900 540.000 7

Travel costs

44 1.260 7

HKP

26

1

10 HANA optimized

960

100 HANA 1:1

1.000

10.000

100.000

1.000.000

without HANA

Source: AOK

Fig. 4.1: Examples of performance optimizations with SAP HANA (data in sec., logarithmic scaling)

The first AOK subsidiary has been running with oscare® on HANA since September 2013. The AOK-wide rollout began in 2014. In subsequent stages the databases of CRM and ERP systems will be switched to HANA in the same way. In the final stage, from 2016 onwards, SAP HANA is intended to be the “single source of truth” for all operational oscare® applications and at the same time serve the real-time business warehouse.

13 Plattner/Zeier: In-Memory Data Management. 2012, pg. 7. 14 Incl. Unicode conversion, in an Oracle database to HANA-DB ratio. Source: AOK.

58  Thomas Brunner

Initial situation oscare ®

Stage I (starting in 2013 ): BW on HANA

ERP

oscare® CRM

oscare® MC

DB xy

DB xy

DB xy

C/S 1 C/S 2

oscare ® ERP

oscare® CRM

oscare® MC

DB xy

DB xy

DB xy

C/S 1 C/S 2 …

… oscare ®

oscare ®

DB xy

BW

BW oscare ®

oscare® CRM

oscare® MC

HANA

HANA

DB xy

non-oscare ®

oscare ®

Stage II (starting in 2015): Business suite on HANA

oscare ® ERP

HANA

non-oscare ®

Final stage (starting in 2016): oscare® on HANA So-called OLTP/OLAP convergence oscare ® ERP

C/S 1

oscare® CRM

oscare® MC

oscare® …

C/S 1

C/S 2

C/S 2

…

… oscare ® BW

oscare ® BW

HANA

oscare ®

non-oscare ®

HANA

oscare ®

non-oscare ®

Source: AOK Fig. 4.2: HANA roadmap by AOK (DB xy = conventional databases; c/s = client server applications) It is of the utmost importance in this context that the need for separate operational and analytical systems be eliminated. In-memory technology enables the performance of analyses based on operational data, which simplifies both the software and hardware environments of a company and ultimately leads to lower overall costs.¹⁵

Previous data redundancies are unnecessary and the IT landscape is even more unified, which helps to reduce the growing IT complexity and provides advantages in system administration.

4 Application Scenarios in Statutory Health Insurance Buying health insurance is never going to be like buying a song on iTunes.¹⁶

That HANA means more than mere speed advantages can be demonstrated through examples, some of which are in a concept phase or being implemented and some of which exist only as ideas. Since one of the distinctive characteristics of Big Data is that at the starting point the evaluation purposes are not yet or not entirely known, we can expect the future to bring many innovations. 15 Plattner/Zeier: In-Memory Data Management. 2012, pg. 7. 16 Barack Obama on the occasion of the failed introduction of a planned health care reform.

4 Big Data from a Health Insurance Company’s Point of View



59

4.1 Prevention The earlier and more accurate a prediction can be made as to the probability of an insured person getting a specific disease – based on specific indicators (diagnoses, age, sex, medical history, etc.) – the higher are the chances of maximizing life expectancy. Due to the complexity of the issues, the need for statistical processes/data mining and the knowledge that very large quantities of data increase the validity of the relevant models, the area of prevention is ideally suited as a field of application for HANA. In addition or as an alternative to the usual statistical methods for predicting disease progression, HANA would also enable similarity comparisons in which the database is searched for matches to a current case and the person responsible is shown the most similar cases as a set of hits directly in the operational case management system (e.g., through side panel technology). In this way the person can manage the current case directly in terms of specific prevention and treatment programs. The insured person has the advantage that diseases can be avoided altogether or at least the severity of the progression of the disease can be influenced positively. In addition to the benefits in terms of image and the associated stronger competitive position, the health insurance company benefits from the fact that an illness avoided or quickly detected and given the best possible treatment results in much lower costs than a severe illness with high and usually long-running medical costs.

4.2 Fighting Misconduct Health insurance companies are subject to large financial losses caused by corruption in the public health sector. That is why “centers to combat misconduct in health care” have been set up in health insurance.¹⁷ Again, this provides economically interesting approaches for HANA, as for the detection of fraud patterns it is necessary to apply statistical methods to larger data sets. In addition to studying the interaction of different parties (pharmacists, physicians, insured persons) information in social networks can also be included in the analysis. Due to the faster data throughput with HANA, iterations are possible in shorter intervals than with the application of data mining in traditional databases. Lessons learned can be integrated quickly into the process and additional and/or more accurate results will be available. SAP is also going to provide more HANA-specific content for the area of fraud detection. 17 See § 197a/SGB V.

60  Thomas Brunner

4.3 Service Service quality can also be improved with HANA. In order to respond to the changing habits and expectations of the insured, health insurance providers are expanding their online offerings, with the aim of ideally being available 24/7. With an “electronic patient receipt”¹⁸ the insured persons can inform themselves online as to how much the insurance company spent on the last hospital stay or consultation. In order to achieve roughly comparable response times in the traditional IT environment, it is now necessary to store data redundantly. In addition to the existing range of information on medical insurance cover, this could include filling out online forms, registration in specific care programs, support through chat features, updating address data, etc. For the office clerk the use of HANA means the clerk can respond to even complex requests for customized solutions and services during a call, and flexibly respond to questions.

4.4 Marketing/Sales For image reasons it is as a matter of course that health insurance companies present themselves in social networks. On the other hand, the data from social networks and chat forums can be evaluated, for example to assess the current image of a health insurance company. Specific image campaigns can be used to react to negative trends. In order to get digital natives interested in the health insurance company, specific apps can also be implemented. Apps also play an important role for the sales representatives of an insurance company. Scheduling and route planning can be supported, specific offers for the insured can be accessed online on location and new data can be collected.

4.5 Process Automation It is becoming increasingly difficult for health insurance companies to recruit qualified staff. Time-consuming and repetitive tasks that require a lot of manual effort and tie up employee resources – the manual collection of documents, for example – should therefore be fully automated as much as possible. This should also increase the quality of service. Efficiency improvements make economic sense in cases where legislation specifies processing deadlines, for example for applications to determine the need for long-term care.¹⁹ Documents from a variety of data sources go through

18 See § 305/SGB V. 19 See § 18 Abs. 3b/SGB XI

4 Big Data from a Health Insurance Company’s Point of View



61

various channels in a health insurance company and, despite their variety (paper documents, faxes, e-mails, short messages, photos, chat entries, etc.), it must be possible to scan/accept and check them, it must also be possible to extract user data from them, and, if applicable, auto-completions must also be possible and the data has to be prepared for its various uses.

5 Conclusion and Outlook Big Data provides risks, such as the abuse of data, but it also has great potential for use. AOK has decided to develop these new “oil wells,” and has developed a Big Data master plan. BW on HANA is already being rolled out as the first stage on the roadmap, and from 2016 onwards, the IT landscape will be completely converted to HANA. In a compulsory health insurance landscape that is increasingly defined by competition and cost pressure, AOK expects HANA to provide a strategic competitive advantage. The benefits of the performance and efficiency gains of the first stage can already be seen today. In the future it will be possible to optimize disease-specific prevention and treatment programs, to improve service and transparency for the insured, to identify fraud patterns better and faster, to automate and design processes more efficiently, and provide innovations more quickly (“time to market”). This will create a win-win situation for both the insured and AOK.

Literature Bernau, P.: Sammelt mehr Daten! In: FAZ. No. 52 from March 3, 2014. Eggers, D.: The Circle. New York/Toronto et al., 2013. GKV-Spitzenverband: Kennzahlen der gesetzlichen Krankenversicherung. 2014. Online: http:// www.gkv-spitzenverband.de/media/grafiken/gkv_kennzahlen/kennzahlen_gkv_2014_q2/ GKV_Kennzahlen_Booklet_Q2-2014_300dpi_2014-09-16.pdf, [accessed on: March 18, 2014]. Institute of Work and Qualification at the University of Duisburg-Essen: Chronologie gesetzlicher Neuregelungen Krankenversicherung und Gesundheitswesen 1998–2013. Online: http://www.sozial politik-aktuell.de/2013-691.html, [accessed on: March 19, 2014]. Merkel, A.: Speech by Federal Chancellor Merkel At the Opening of the CeBIT. Online: http:// www.bundeskanzlerin.de/Content/DE/Rede/2014/03/2014-03-09-merkel-cebit.html, [accessed on: March 19, 2014]. Plattner, H./Zeier, A.: In-Memory Data Management. Ein Wendepunkt für Unternehmensanwendungen. Berlin/Heidelberg, 2012. Schirrmacher, F.: Wir wollen nicht. Online: http://www.faz.net/aktuell/feuilleton/debatten/ueberwa chung/im-zeitalter-von-big-data-wir-wollen-nicht-12545592.html, [accessed on: August 26, 2013]. Sozialgesetzbuch/SGB. CW Haarfeld, 2014.

Harald Kamps

5 Big Data and the Family Doctor “Human beings are more than the sum of their data…” says the author – with the experience of many years in general medical practice. Christina von Braun is professor for cultural anthropology at the Humboldt University in Berlin. In 2000 she wrote a paper called “gene and bytes as figures of the corpus christi mysticum.” In this highly impressive lecture, she describes “secularization” as the overcoming of religious thought – or as the act of transporting religious messages into the real world. “The soul is indeed tied to the individual, but refers to the transcendent parts in its being.” The gene, however, is tangible and intangible at the same time, and genetic research has only limited interest in the afterlife. However, what is rather appropriate is the comparison to a host, the “corpus christi mysticum,” which describes both the body of Christ, the word incarnate, as well as the community of believers. The gene, which could also be called “the” metaphor of modernity, has adopted both functions. The question arises as to whether Big Data has the potential to take over the discourse phenomena of the gene – after all, it’s hardware, software and a certain immateriality all at once… At the same time this is a level of discourse that would probably go far beyond the limits of this little book: Big Data as a secular form of faith…? Even if Harald Kamps’s article is one of the shorter ones, finding criticism of the concept of Big Data is an elemental part of understanding Big Data and its boundaries – and not only from the perspective of data protection or abuse, but rather based on the fundamental criticism of the belief of “measurability” and its finiteness. Nevertheless, data transparency can still be greatly improved, especially in outpatient care: If you are looking for a doctor today who provides high quality medicine, you will often have no other choice than to rely on personal recommendations – as in the Middle Ages, when good craftsmen were recommended by word of mouth. The data in the systems – including those of general practitioners – is not evaluated in such a way as to let me, as a patient, know whether I am receiving state-of-the-art treatment. But it would certainly be helpful to see – both for the doctor and for the patient – whether current treatment guidelines are being adhered to and what good or bad reasons may be given for not following them. This could be an area that provides potential for the physicians’ administrations. Unfortunately I was not able to obtain an article on the subject from the KBV (national association of statutory health insurance physicians). Patient review portals are trying to fill this gap. But in addition to the subjective experience of treatment, the integration of hard data is also imperative for such an evaluation.

Content 1 Good and Bad Data  64 2 From Fax to E-mail  64 3 Digital Expert Systems  65 4 Human Beings are More Than the Sum of Their Data  66 5 Medicine of the Person is Made More Difficult  68 Literature  68

64  Harald Kamps

1 Good and Bad Data Today’s family practice comes packed with Big Data. Access is granted only with a smart card, patients are given a number in the EDP system and every doctor collects a great deal of data during every contact. This data collection can be of good and bad quality. The good and valid data will lead to certain entitlements: for a specific drug in the pharmacy or a few days’ paid sick leave. The prescription data and the data from the certificate of incapacity for work are also thoroughly analyzed and evaluated. The less good, because less valid, data are the code numbers for diagnoses that are provided. The diagnoses established in general practices provide little information about why patients actually go to the doctor. The codes fit into the world of hospitals with well-defined diseases – but they do not fit the complex world of sick people who visit their family doctor. There are treatment programs (disease management programs) for some groups of chronically ill people – the treatment course is meticulously documented every three months and sent to a data processing center for evaluation. The validity of this data is controversial. What is even more controversial is the significance of the data sent by the doctor to the clearing center every quarter. The billing codes are, to a great extent, general estimations, and only provide a very rough evaluation of the intensity of the labor provided by the doctor. Nevertheless: the transmission of the data is mostly done digitally, via a secure channel to the respective physicians’ association. And, this data is well paid for. The good data from the medical practice is also well paid for. What is also very popular is prescription data, which provides information in near real-time regarding the prescribing behavior of physicians in a small region.¹, ² In the past, this data had to be painstakingly asked for at doctors’ offices and pharmacies. Today market researchers, for example those at IMS Health, have direct contracts with more than 2500 physicians. In Germany the Compugroup, owner of the largest physician information systems, also passes on the data to a Frankfurt-based American company. The data is actually passed on anonymously, but comes from such a small area that information can be derived from it that can be used for targeted advertising to individual doctors.

2 From Fax to E-mail The most modern communication equipment in most family doctors’ practices is the fax. Wanted and unwanted data is printed out or introduced directly into the doctor-information system on a daily basis. This includes letters from cooperating

1 Gläserne Patient. In: DIE ZEIT 35/2013. 2 Behandelt und verkauft. In: DIE ZEIT 45/2013.

5 Big Data and the Family Doctor



65

specialists and hospitals, as well as requests from patient care services or from patients themselves. The less desirable data includes advertising about new books, medical consumables and invitations to unnecessary events. Even if the internal communication system of a medical practice is digitized – outside contact is generally via analog means: by fax and by telephone. Only very few have actually opened their systems to incoming digital communication. The technology exists for safe access to the calendar of a medical practice in order to request an appointment via app, or to order a prescription via the database of permanently prescribed medication, or to submit a question via the digital medical consultation mailbox. Patients very much appreciate such pleasant contact with a medical practice outside of office hours. But doctors are generally concerned. These concerns are also fueled by the fact that the providers of such services are at the same time the mediators of prescription data. The market for medical practice information systems is confusing, but dominated by a few large suppliers that also provide communication from medical practice to medical practice. This enables data to be imported from a patient file directly to a colleague’s file – in a closed network. This too is a service that is used by only a small minority. More often patients send an e-mail to their doctors and receive a digital response. Many medical practices already have a website and offer digital contact. Security concerns are only rarely expressed in this area – in fact, patients’ acceptance of modern communication technology is rather high. Most people have a positive perception about the possibilities of an electronic health card – but the concerns of doctors are pronounced. Some hospitals also give their GPs the opportunity to dial into the hospital information system via a separate portal in order to receive access to test results, laboratory data and discharge letters.

3 Digital Expert Systems Medical research leads to an enormous increase in the knowledge that is accessible for each physician. This knowledge must be processed in a way that makes it readily accessible and thus applicable in daily practice. The know-how concerning therapies with medication is advertised particularly heavily in medical practices – also through the direct use of promotional staff from the industry, and through training events that are free for the participating physicians thanks to funding from sponsors. This kind of knowledge transfer is often criticized,³ and replaced by methods used in

3 See: http://www.mezis.de, [accessed on: July 16, 2014].

66  Harald Kamps

evidence-based medicine: The members of the German College of General Practitioners and Family Physicians (DEGAM) develop a variety of guidelines, including those on symptoms, diseases or types of treatment.⁴ However, in everyday life it is still difficult to quickly apply these in individual cases. Digital knowledge systems have spread only very slowly, as it seems complicated to be receiving quick answers to current issues during patient contact.⁵ Furthermore, one is often afraid of the annual costs or the relevant programs are available only in English.⁶ Access to English-language primary literature is made difficult for practitioners due to very high license fees. This is an area where Norwegian doctors and citizens are in a much better situation: The state has entered into a collective agreement that gives all residents free access to the main medical journals in the world.⁷

4 Human Beings are More Than the Sum of Their Data All data is based on a theory. In my worst case scenario that would be the assumption that the diseases of all patients can be explained if we only gather all the necessary information from the organs concerned. While 100 years ago the fitting metaphor for the human body was the “industrial palace” with its impressive machinery, today it is that of a body as a data machine, with the brain as a powerful hard drive – the human being as a terabyte data packet. Although medicine based on measurable data is still in its infancy, due to the increasing possibilities of imaging and measurement it does exert an incredible fascination on doctors as well as patients. Many years ago those who wanted to measure their vital signs had only blood pressure devices to rely on. Today they have a bracelet that counts their pulse and steps, a smartphone that measures the quality of their sleep and a family doctor who can insert data transmitted by e-mail directly into their medical records. The EU is spending billions of euros to explore a computer model that simulates all the synapses of the human brain. Analyses of the entire genome are becoming increasingly affordable,⁸ even if one of the providers in the United States was prohibited from providing health-related data directly to the consumer. Even though the high expectations of genomic medicine quickly gave way to disillusionment, diagnosis and therapy will

4 5 6 7 8

See: http://www.degam.de/leitlinien-51.html, [accessed on: July 16, 2014]. See: http://www.ebm-guidelines.de, [accessed on: July 16, 2014]. See: http://bestpractice.bmj.com, [accessed on: July 16, 2014]. See: Mehr Lesefreiheit für Oberärzte. In: FAZ from July 1, 2013. See: Kamps: Gendiagnostik. In: Deutsches Ärzteblatt. 110/2013, pp. 1088–1090.

5 Big Data and the Family Doctor



67

be increasingly based on the analysis of individual genes in the coming years. So what’s so bad about that? People will lose the ability to sense something directly and instead rely on measurable data. Instead of developing a feeling for immediate well-being or discomfort, physical impulses will be associated with an organ (chest pain, stomach ulcer), even if the respective organ is completely healthy. And everything that cannot be clearly assigned to an organ in this process will be attributed to a mental condition – and misinterpreted as a “mental health problem.” In this case the living body will be denied the competence to convey the feelings in its personal situation adequately. Because these feelings cannot be transferred as data, but only as integrated, analog impressions. Gut feelings will become void through the apparent superiority of a data-driven analysis. Holistic medicine has become the buzzword of so-called alternative medicine, although scientific medicine has long had a concept for immediate physical feeling. In the German-speaking world Hermann Schmitz is the founder of body philosophy and “new phenomenology.”⁹ He recognized the dilemma faced by the Western concept of man: the personal and subjective part of direct feeling (“Do I feel healthy?” “Is that dangerous?” “Am I full?”) separated and reduced in such a way that it becomes measurable and comparable: Only the person whose laboratory values are within normal limits is actually healthy. Danger becomes a calculable risk and medicine the process of risk reduction, and you eat until weight and blood sugar are just right. The subjective experiences and feelings that are then left over are misconceived as “mental.”¹⁰ Unmediated physical experience is not the sum of the (digitalized) sensory impressions, but an analog, holistic performance of the body, taking into account its entire “physical history.” This history is always woven into the personal situations and atmospheres of one’s life. And that is what should be brought to life when talking to your family doctor – no data machine can help with that. More interpersonal communication about immediate feelings and more digital information about relevant body processes do not necessarily exclude each other. German family doctors would do well to make better use of the possibilities of digital development: for example digital communication with patients, with other medical specialists, and with pharmacies and hospitals. Doctors’ skepticism when it comes to sharing their own data and making it accessible is currently being strengthened by the fact that it is very difficult to adequately protect digital communication against unauthorized usage. Only, the more barriers and digital firewalls are built, the more impractical such solutions will be for everyday life.

9 See: Schmitz: Der Leib (Grundthemen der Philosophie). 2011. 10 See: Soentgen: Die verdeckte Wirklichkeit. 1998.

68  Harald Kamps

5 Medicine of the Person is Made More Difficult German general medicine is fighting for its reputation. Family doctors are outnumbered by medical specialists working in outpatient care, and many people appreciate the direct and easy access to orthopedic specialists or cardiologists. These organrelated medical disciplines will greatly benefit from the rapid development of digital medicine. Better, even three-dimensional imaging will help these specialists to provide appropriate diagnoses. More extensive data analyses of blood values will make it easier for diabetologists to provide flexible insulin therapy, preferably prescribed with digitally controlled insulin pumps. Human genetic information will significantly improve the accuracy of drug treatment after cardiac infarction and carcinosis. Big Data will thus contribute to better, more personalized medicine. The fact that this development is also touted as “personalized medicine” shows the dilemma of general medicine: as persons, with our capacity for arbitrary self-ascription, we are placed in the background and are externally determined and classified through a medical system that persuades us of its value through accumulated knowledge. But it actually makes us – as individuals – ever more helpless in our personal situations. However, the strength of general practice is the integration of medical, objective knowledge (facilitated by Big Data) with the subjective facts that come to light in a conversation about the personal situation. My fear is that the more colorfully and accurately Big Data presents our body to us, the harder it will be for the body, in all its contradictoriness and vibrancy, to claim its own validity.¹¹

Literature Kamps, H.: Gendiagnostik. Ein Selbstversuch für 99 Dollar. In: Deutsches Ärzteblatt 110/2013, pp. 1088–1090. Schmitz, H.: Der Leib (Grundthemen der Philosophie). Berlin/Boston 2011. Soentgen, J.: Die verdeckte Wirklichkeit. Einführung in die Neue Phänomenologie von Hermann Schmitz. Bonn 1998. Rappe, G.: Leib und Subjekt. Bochum/Freiburg 2012.

11 Rappe: Leib und Subjekt. Bochum/Freiburg 2012.

Alexander Pimperl, Birger Dittmann, Alexander Fischer, Timo Schulte, Pascal Wendel, Martin Wetzel and Helmut Hildebrandt

6 How Value is Created from Data: Experiences from the Integrated Health Care System, “Gesundes Kinzigtal” (Healthy Kinzigtal) One of the most fascinating experiments of the German health system is currently taking place in one of the most beautiful valleys in the Black Forest: Since 2005, a strong network of family doctors, specialists and hospital doctors, psychotherapists, physiotherapists and nursing facilities, based on a contract for integrated care, is planning and coordinating the treatment of members of AOK BadenWürttemberg and the Sozialversicherung für Landwirtschaft, Forsten und Gartenbau (SVLFG – social security for agriculture, forestry and horticulture, formerly LKK). The questions that have arisen in the context of Big Data are as follows: What if data and resources can be controlled in an “integrated” fashion? What advantages will arise for patients, overall control and Big Data? Here the authors provide good insight into the potential (that has been put into effect). Note: In some parts this article may be adorned with specific technical details – but it is very well suited as an introduction to the topic of business intelligence.

Abstract: The networking of the various players in integrated health care leads to extensive data. Using this to create added value is associated with a number of challenges. Based on a best practice model – Gesundes Kinzigtal – it can be shown that data from various sources can be linked and processed in a data warehouse, and made available to management via a business intelligence front end: from project preparation through ongoing project management to evaluation after project completion. The resulting advantages for patients, doctors, management companies and health insurance companies are described and associated with the main project.

Content 1 Introduction  70 2 Gesundes Kinzigtal – Background, Goals  72 3 Business Intelligence Infrastructure of Gesundes Kinzigtal  74 3.1 General Overview  74 3.2 Source Systems  75 3.3 Basic Database  76

70  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

3.4 Analytical Database  77 3.4.1 Integration of Insured Selections  78 3.4.2 Risk Adjustment: Matched Pair Groupings  79 3.4.3 Illustration of Relative Time References  79 3.4.4 Scenario and Standard Calculation  79 3.5 Benefits/Areas of Application  80 4 Lessons Learned & Outlook  82 Literature  85

1 Introduction In the last decade, integrated health care – i. e., the networking of various service providers in health care – often appeared on the political agenda in Germany. The integration and cooperation projects received a strong development boost through the statutory health insurance modernization act (GMG 2004) in particular, which also provided economic incentives for integrated health care through so-called startup financing.¹, ² The expiration of start-up financing in 2008 did lead to a certain stagnation in the years that followed,³ but one can assume that the innovation fund for the promotion of cross-sectoral health care, to the amount of EUR 300 million per year as agreed in the coalition treaty, will bring integrated forms of care back onto the growth track.⁴ The legislative bodies are making this effort in order to address the problems currently attributed to traditional health care in statutory health insurance, such as a lack of exchange of information, coordination problems between different levels of 1 With the statutory health insurance modernization act (GKV-Modernisierungsgesetz) 2004, policy makers created an economic stimulus with the one percent start-up financing. Up to one percent of the respective overall compensation for the Association of Statutory Health Insurance Physicians and the hospital compensation could be deducted from the health insurance companies for integrated health care. The health insurance companies were obliged to conclude correspondingly high amounts in the form of integrated health care contracts. 2 Mühlbacher/Ackerschrott: Die Integrierte Versorgung. In: Wagner/Lenz (Ed.): Erfolgreiche Wege in die integrierte Versorgung. Eine betriebswirtschaftliche Analyse. 2007, pg. 18; BQS Institut für Qualität & Patientensicherheit: BQS-register140d. Online: http://www.bqs-register140d.de/, [accessed on: January 31, 2009]. 3 Sachverständigenrat zur Begutachtung der Entwicklung im Gesundheitswesen: Sondergutachten 2012 des Sachverständigenrates zur Begutachtung der Entwicklung im Gesundheitswesen. Wettbewerb an der Schnittstelle zwischen ambulanter und stationärer Gesundheitsversorgung. 2012, pp. 343 ff. Online: http://dip21.bundestag.de/dip21/btd/17/103/1710323.pdf, [accessed on: April 31, 2014]. 4 Bundesverband Managed Care e. V.: Der Innovationsfonds kommt! Online: http://www.bmcev.de/ bundesverband-managed-care-ev/news/detailansicht/archiv/2013/dezember/article/565/der-innovationsfonds-kommt/, [accessed on: December 19, 2013].

6 How Value is Created from Data



71

care, numerous interfaces, lack of networking between service areas, economic disincentives for a non-indexed extension of services, deficits in terms of common goals and values, as well as missing assignments of functions and positions in the system of health care processes.⁵ In addition, for example Amelung et al.⁶ believe that future challenges will lead to the following.⁷ – The treatment of a high and increasing number of chronically ill people. – The supply of a large (and projected increasing) number of patients with mental disorders (esp. depression) and obesity. – An increasingly strong degree of specialization in medicine that goes along with the rapid medical advances in technology, This will also lead to increased coordination and networking between the various health care providers and beyond (schools, companies, associations, etc.). A powerful information and communication technology infrastructure is seen as one of the keys to the success of this sort of integrated health care. It is both a prerequisite and a result of integrated health care. It can be considered the glue that holds together the players in the network. Information and communication technology has to ensure that the right information is shared and understood, and thus forms the parallel basis for stable cooperation and change.⁸ To some extent legislature has also addressed this insight. Holding documentation in common, in integrated health care, is also defined by law (according to § 140b para. 3). But the mere creation of an information and communication technology infrastructure will not suffice, as is shown for example in a series of case studies on integrated health care systems in the US. The data that can be generated via the information and communication technology infrastructure must also be prepared so that it can initiate a continuous learning and improvement process.⁹ The term business intelligence or BI is often used as an overarching expression: “Business intelligence is the process of transforming data into information and, through discovery, into knowledge.”¹⁰ 5 Mühlbacher: Integrierte Versorgung. Management und Organisation. Eine wirtschaftswissenschaftliche Analyse von Unternehmensnetzwerken der Gesundheitsversorgung. 2002, pg. 55. 6 Amelung/Sydow/Windeler: Vernetzung im Gesundheitswesen im Spannungsfeld von Wettbewerb und Kooperation. In: Amelung/Sydow/Windeler (Ed.): Vernetzung im Gesundheitswesen. Wettbewerb und Kooperation. 2008, pg. 12. 7 See listing in: Beek/Beek. Einführung in die Gesundheitsökonomik. 2009, pp. 179 ff. 8 Janus/Amelung: Integrated health care delivery based on transaction cost economics. Experiences from California and cross-national implications. In: Savage/Chilingerian/Powell (Ed.): International health care management. 2005, pp. 28 f. 9 Vijayaraghavan: Disruptive Innovation in Integrated Care Delivery Systems. 2011, pp. 1 f. Online: http://www.christenseninstitute.org/wp-content/uploads/2013/04/Disruptive-innovation-in-integrated-care-delivery-systems.pdf, [accessed on: April 31, 2014]. 10 Behme: Business Intelligence als Baustein des Geschäftserfolgs. In: Mucksch/Behme (Ed.): Das Data-Warehouse-Konzept. Architektur – Datenmodelle – Anwendungen. 1996, pg. 37.

72  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

But the actual implementation of a BI system provides particular challenges, especially in the context of integrated health care, since a variety of different data sources, with often minimally structured data (e.g., clinical findings), from independent partners such as doctors, hospitals, nursing homes, etc. have to be combined in a sensible way. Even health insurance companies, which generally have access to cross-sectoral data, have to overcome a break that pertains to the organization in order to achieve a cross-sectoral consolidation and analysis for the management of integrated health care approaches due to the sector-based structure. In addition, successful integrated health care can usually be measured only over long periods, which is why data needs to be retained and analyzed over time. This need for a combination of data over a longer period and a variety of data from different sectors of health care can quickly lead to fairly large, complex data constructs. The road to Big Data is therefore mapped out in the context of integrated health care. In reality, however, most integrated health care approaches of Big Data solutions are still far off. A survey of networks in medical practices by the University of Erlangen-Nuremberg,¹¹ for example, came to the conclusion that the use and networking via information and communication technologies is inadequate in a majority of doctor networks. Only very few networks have a common IT architecture and IT strategy, let alone a business intelligence system. One of these networks is Gesundes Kinzigtal. In 2013 the business intelligence solution that it uses, developed by OptiMedis AG, was awarded the BARC Best Practice Award for Business Intelligence and Data Management in the SME category by an expert jury and 300 visitors to the BARC Business Intelligence Congress.¹² In this chapter, the implementation and application of this business intelligence solution will be discussed in more detail, and challenges and lessons learned outlined.

2 Gesundes Kinzigtal – Background, Goals The Gesundes Kinzigtal regional health network was initiated in 2006 through a contract for integrated health care in accordance with §§ 140a-f between the AOK

11 Purucker/Schicker/Böhm/Bodendorf: Praxisnetz-Studie 2009. Management – Prozesse- Informationstechnologie. Status quo, Trends und Herausforderungen. 2009, pg. 44. 12 Business Application Research Center (BARC): Merck und OptiMedis gewinnen den BARC Best Practice Award 2013. Online: http://www.barc.de/content/news/merck-und-optimedis-gewinnenden-barc-best-practice-award-2013, [accessed on: November 25, 2013].

6 How Value is Created from Data



73

Baden-Württemberg (AOK BW), Sozialversicherung für Landwirtschaft, Forsten und Gartenbau (SVLFG) Baden-Württemberg (formally LKK) and Gesundes Kinzigtal GmbH. The integrated health care contract was concluded for a term of nine years. The contract specifies that Gesundes Kinzigtal GmbH – a community foundation of Medizinischen Qualitätsnetzes Ärzteinitiative Kinzigtal e. V. (MQNK) and the management and investment company OptiMedis AG – has the economic and medical responsibility for all medical indications and performance areas (with the exception of dentistry) for the approximately 31,000 insured in the two health insurance companies, who live in the Kinzigtal ZIP code region. That means that, contrary to most integrated health care contracts, Gesundes Kinzigtal assumes responsibility for all insured persons, regardless of whether they are included in the integrated health care contract or whether they are treated by a service partner of Gesundes Kinzigtal or by another health care provider. For the insured, there are no restrictions regarding the choice of doctor or hospital.¹³ Remuneration of Gesundes Kinzigtal by the collaborating health insurance companies is performance-based. Only if the medical care, while costing less, is at least as good as or better than for insured persons of comparable age, gender and health status, will Gesundes Kinzigtal receive appropriate remuneration.¹⁴ In addition to improving the health status of the population and the cost efficiency of the health care, improving health care acceptance by the insured (patient experience) is the third major goal of the integrated health care contract. In this respect, Gesundes Kinzigtal orientates itself on the Triple Aim developed by the IHI Institute: improving the patient experience of care (including quality and satisfaction), improving the health of populations, reducing the per capita cost of health care.¹⁵ In order to achieve these objectives, Gesundes Kinzigtal coordinates health care processes across different sectors, implements its own disease management and prevention programs, but also makes intensive use of disease management programs, concludes contracts locally with health care providers for additional services and compensation, integrated sports and social clubs, social and self-help services, and

13 Hermann/Hildebrandt/Richter-Reichhelm/Schwartz/Witzenrath: Das Modell ‚Gesundes Kinzigtal‘. Managementgesellschaft organisiert Integrierte Versorgung einer definierten Population auf Basis eines Einsparcontractings. In: Gesundheits- und Sozialpolitik 5, 6/2006, pp. 12 ff. 14 Hermann/Hildebrandt/Richter-Reichhelm/Schwartz/Witzenrath: Das Modell ‚Gesundes Kinzigtal‘. Managementgesellschaft organisiert Integrierte Versorgung einer definierten Population auf Basis eines Einsparcontractings. In: Gesundheits- und Sozialpolitik 5, 6/2006, pp. 15 ff. 15 Berwick/Nolan/Whittington: The Triple Aim: Care, Health, and Cost. In: Health Affairs (Project Hope) 27, Nr. 3. 2008, pp. 759 ff.; Hildebrandt, H./Schulte, T./Stunder, B.: Triple Aim in Kinzigtal, Germany. Improving population health, integrating health care and reducing costs of care – lessons for the UK? In: Journal of Integrated Care 20, 4/2012, pp. 206 ff.

74  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

carries out health services research studies and regular controlling.¹⁶ A prerequisite for the two latter tasks is efficient networking via an information and communication infrastructure as well as an adequate BI system based on that. Such a BI system was implemented by OptiMedis AG, the management partner and co-owner of Gesundes Kinzigtal GmbH, and is being continuously developed. In the following sections the BI system, the data warehouse contained in it, and the analytical capacity are discussed in detail.

3 Business Intelligence Infrastructure of Gesundes Kinzigtal 3.1 General Overview Fig. 6.1 provides a simplified, schematic overview of the BI system developed by OptiMedis AG for Gesundes Kinzigtal. It outlines how data from different sources – such as the accounting data of the statutory health insurance, data from practice management systems or electronic documentation on the disease management programs of medical service partners, data from the local management company as well as primary collected data (e.g., patient satisfaction) – is integrated into the data warehouse (MS SQL server) through various ETL processes,¹⁷ initially in a basic database and from there via further ETL processes into the analytical database. The data is then processed in On-Line Analytical Processing (OLAP) cubes,¹⁸ after which it can be used for various analyses and reports in the BI front-end Deltamaster. In the following, the individual stages shown in Fig. 6.1 are discussed in detail in the depicted order, beginning with the base layer and ending with the evaluation layer.

16 Hildebrandt/Hermann/Knittel/Richter-Reichhelm/Siegel/Witzenrath: Gesundes Kinzigtal Integrated Care. Improving Population Health by a Shared Health Gain Approach and a Shared Savings Contract. Online: http://www.ijic.org/index.php/ijic/article/view/539/1050, [accessed on: February 22, 2013]; Pimperl/Schulte/Daxer/Roth/Hildebrandt: Balanced Scorecard-Ansatz. Case Study Gesundes Kinzigtal. In: Monitor Versorgungsforschung 6, 1/2013, pp. 26 f. 17 ETL processes refer to technical routines through which data from various sources is extracted, transformed according to target parameters and then loaded into the target data base. 18 OLAP is often used as a synonym for multidimensional data analyses. For further details please refer to Azevedo/Brosius/Dehnert et al.: Business Intelligence und Reporting mit Microsoft SQL Server 2008. OLAP, data mining, analysis services, reporting services und integration services mit SQL Server 2008. 2009, pp. 44 ff. as well as 3.4.



6 How Value is Created from Data

75

Planning/ scenario calculation

Evaluation layer

Standardized reports (e.g., care cockpits, potential analysis)

Analysis layer

OLAP Cubes Analytic database Loading Transformation Extraction

Staging Area (analytical processing)

Basic layer

Basic database (Core Data Warehouse)

Staging Area (standardization, data quality, etc.)

Health insurance data

Data from service partners (e.g., from PVS, eDMP)

Data from local management company

Processing

Meta data management

Ad-hoc care analyses

Loading Transformation Extraction

…

External (comparative) data

Integration

Source system

Source: Authors' illustration.

Fig. 6.1: Schematic overview of the BI system of OptiMedis

3.2 Source Systems Multiple data sources are fed into the data warehouse at OptiMedis AG. The central data of the data warehouse is based on the routine data from the two collaborating health insurance companies, AOK and SVLFG Baden Württemberg. Insurance-related billing data for all sectors is provided for the approximately 31,000 insured persons included in the integrated health care contract. That includes hospital bills – including primary and secondary diagnoses, operations, reference numbers for fees, diagnoses by physicians, pharmaceutical regulations, medical prescriptions and much more.¹⁹ Data from AOK is available since 2003 and from SVLFG since 2004. The data is continually updated. Data delivery is generally on a monthly basis in the form of flat files and is pseudonymized.²⁰ The pseudonymization is carried out for all insured persons and also for the service providers, who have no contractual relationship with Gesundes Kinzigtal GmbH and for which no data protection release was issued. Each insured person and each 19 For comprehensive information on the statutory health insurance routine data please refer to: GKV Spitzenverband: GKV-Datenaustausch. Elektronischer Datenaustausch in der gesetzlichen Krankenversicherung. Online: http://www.gkv-datenaustausch.de/, [accessed on: February 22, 2013]. 20 From a data protection point of view, this is actually quasi-anonymous data, as OptiMedis AG and Gesundes Kinzigtal have no possibility of resolving the pseudonyms. These are generated by the health insurance company and not passed on.

76  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

service provider is assigned a unique pseudonym, so the pseudonymized data of insured persons and service providers can be linked across all data tables. This statutory health insurance routine data is enriched with additional insured pseudonymized information of the performance partners – i. e., the service providers cooperating with Gesundes Kinzigtal. That includes secondary data, which is directly extracted from the practice management systems or the electronic documentation for the electronic disease management programs from medical service partners. The advantage of this data is that, on the one hand, it is available faster, while contract medical data for insurance companies is often only available with a delay of up to nine months. On the other hand, medical parameters (e.g., laboratory or cytology results) and other parameters such as blood pressure, BMI (height, weight), etc. are included in the PVS and EDMP data, which is missing in the routine data from statutory health insurance companies. Documentation belonging to the local management company, Gesundes Kinzigtal, is also integrated. This includes cost calculation data, for example for special integrated healthcare services (risk score surveys, special program registrations and/ or treatment flat rates for disease management programs, etc.) that service partners can settle with Gesundes Kinzigtal, e.g., quality circle and project group participation. Moreover, work is currently ongoing regarding the acceptance of structured data (e.g., treatment pathway documentation)²¹ from the standardized IT solution for doctor networks CGM-NET,²² developed together with the CompuGroup Medical Deutschland AG, and used in Gesundes Kinzigtal. This detailed data also enables the detailed analysis and control of special services of Gesundes Kinzigtal that do not appear in the regular statutory health insurance data. Finally, primary collected data (for example from patient satisfaction surveys),²³ as well as external catalog files (e.g., ICD, OPS, ATC) and external reference data, such as the allocation amounts from the Morbi-RSA or results from quality reports, is also included in the data warehouse.

3.3 Basic Database In a first step, the data from the various source systems is unified for the basic database via ETL processes (e.g., unified identification numbers of the insured and service partners across all data sources), subjected to quality control, including potential cleanup, and normalized. 21 The program by the name of Gesundes Gewicht (“healthy body weight”) documents the following: weight, abdominal girth, blood pressure, blood sugar, standard value blood sugar, etc. 22 For more information refer to: OptiMedis AG: Die fortschrittliche IT-Lösung für Ärztenetze. Online: http://www.optimedis.de/integrierte-versorgung/leistungsanbieter/it-vernetzung, [accessed on: March 31, 2014]. 23 Stoessel/Siegel/Zerpies/Körner: Integrierte Versorgung Gesundes Kinzigtal – Erste Ergebnisse einer Mitgliederbefragung. In: Das Gesundheitswesen 75, 8/9/2013, pp. 604 f.

6 How Value is Created from Data



77

3.4 Analytical Database During this stage, the data in the basic database is converted to a structure suited for multidimensional OLAP cubes. The term OLAP cube defines the logical representation of a multidimensional data set. Here the OLAP cube (structure and content) is Sicht auf Sektor

Indikation: KHK Zeit: alle Jahre Sektor: Krankenhaus

Sicht auf Zeitraum

Indikation: alle Zeit: Jahr 2005 Sektor: alle

An A n yysewer nalyse ew werte: Analysewerte: K Ko o osten ost Kosten Z Zu uweisu un ngen Zuweisungen Dec De D eckungs eckun ku ungsbeit ungsb gssbeitra g beitrag eitrag ag g Deckungsbeitrag e et tcc. etc.

Sicht auf Indikation, Zeit und Jahr (Teilwürfel)

Se kt or

Indikation: KHK Zeit: alle Jahre Sektor: alle

Indikation

Sicht auf Indikation

Zeit

In n Indikation: Depression Ze e Jahr 2003 Zeit: Se e Sektor: Arzneimittel

2005 Ø pro Patient mit KHK Kosten Arzneimittel (brutto) Kosten Arzt

2012 Ø pro Patient mit KHK

Wachstumsrate 2011 auf 2012

2012 Ø Patienten gesamt

968,90

1.179,51

–1,1%

510,98

815,71

1.109,46

–3,6%

491,03

Kosten Krankengeld

73,75

197,04

–15,1%

181,49

Kosten Krankenhaus

2.192,85

2.893,79

12,0%

883,05

Kosten Rehabilitation/kur Kosten sonstige Leistungen Gesamtkosten (unbereinigt)

187,56

154,23

1,7%

47,14

600,60

553,11

–5,6%

240,50

4.839,35

6.087,15

3,2%

2.354,19

Gesamtkosten (bereinigt Morbi-RSA)

4.815,67

5.512,95

6,9%

2.005,95

Zuweisungen (Morbi-RSA)

3.288,91

5.205,71

9,9%

2.448,11

–1.526,75

–307,24

27,5%

442,15

Deckungsbeitrag

Source: Authors' illustration.

Fig. 6.2: Schematic representation of a simple three-dimensional OLAP cube

78  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

uniformly defined via measures (indicators such as costs, margins, number of prescriptions) and dimensions (e.g., indication, time, sector, insurance selection, etc.). The figures can then be analyzed via one or more axes (dimensions) of the relevant cube.²⁴ Fig. 6.2 illustrates this concept using the example of a simple three-dimensional cube. Apart from these typical technical transformation steps, analytical processing also takes place at this stage. This is particularly to enable standardized, simple and fast evaluation in the BI front-end for the analysts. Selected critical analytical processing steps are explained in more detail below.

3.4.1 Integration of Insured Selections For multifaceted analyses it is necessary to first select a special population of insured persons – if, for example, only the cost and performance data of this subgroup is to be examined, such as hospitalizations and medication of patients with a diabetes type II illness or the rehospitalisation rate of patients with heart failure. There are a variety of selection criteria with which to identify these patients: One could, for example, identify Type II diabetes patients through their outpatient and/or inpatient diagnoses (ICD E11), the prescriptions (e.g., a specific antidiabetic therapy) and/or their enrollment in corresponding DMPs. If necessary, validation may also be indicated for outpatient diagnoses: i. e., the diagnosis must be reliable and/or the diagnosis must be provided in at least two different quarters of a year. Which selection method makes most sense is determined by the application context. Appropriate studies on the selected topic may also be able to provide help with selection.²⁵ Consultation with the doctors concerned in the analysis is usually also very helpful, as they will often be able to identify important information about differences between practice reality and documentation practice. The different selection models are usually first tested in the BI front-end Delta Master by the analysts, and the validated selections of insured persons, which are needed for regular standard reports, then added to the ETL processes for the analytical database.

24 For detailed information on the setup of a data base for OLAP please refer to Azevedo/Brosius/Dehnert/ Neumann/Scheerer: Business Intelligence und Reporting mit Microsoft SQL Server 2008. OLAP, data mining, analysis services, reporting services und integration services mit SQL Server 2008. 2009, pp. 49 ff. 25 See Hauner/Köster/Ferber: Prävalenz des Diabetes mellitus in Deutschland 1998–2001. Sekundärdatenanalyse einer Versichertenstichprobe der AOK Hessen/KV Hessen. In: DMW – Deutsche Medizinische Wochenschrift 128, Nr. 50. 2003, pp. 2632–2638; Windt/Glaeske/Hoffmann: Lässt sich Versorgungsqualität bei Asthma mit GKV-Routinedaten abbilden? In: Monitor Versorgungsforschung 1, 2/2008, pp. 29 ff.

6 How Value is Created from Data



79

3.4.2 Risk Adjustment: Matched Pair Groupings If causal effects are to be estimated on the basis of routine data from statutory health insurance companies (for example if health outcomes from disease management programs or the performance of different service providers are to be measured), then it is vital to ensure that a similar distribution of risks prevails in the compared patient populations. To achieve this, various methods of risk adjustment can be applied. One possibility is the matched pair method, which is also used for the analytical processing of data at OptiMedis AG. With this procedure each insured person from an intervention group – e.g., heart-insufficiency patients in the intervention program “Starkes Herz” (strong heart) by Gesundes Kinzigtal – is paired with another insured person (control group) with similar risks (age, sex, morbidity, etc.). Depending on the problem, both exact matching and propensity matching methods are used.²⁶ The outcomes of the intervention group can then be evaluated in comparison with the risk-adjusted control group.

3.4.3 Illustration of Relative Time References Another essential processing step is the implementation of relative time references, i. e., the question of what happened in a quarter after or before the enrolment in a disease management program. Was the insured person hospitalized? What medication did he or she receive? The time intervals are calculated in the analytical database for all data of utilization relative to the defined starting point (e.g., point of enrolment).

3.4.4 Scenario and Standard Calculation Beyond that, regularly returning calculations and models are also precalculated in the analytic data base: e.g., RSA/Morbi-RSA (scenario) calculations, medical expenses attribution models, modeling of the deceased and/or new additions, etc.

26 Stuart: Matching methods for causal inference. A review and a look forward. In: Statistical science. A review journal of the Institute of Mathematical Statistics 25, Nr. 1. 2010; for detailed information on the issue of risk adjustment and the matched-pair procedure see Riens/Broge/ Kaufmann-Kolleu et al.: Bildung einer Kontrollgruppe mithilfe von Matched-Pairs auf Basis von GKV-Routinedaten zur prospektiven Evaluation von Einschreibemodellen. In: Gesundheitswesen 72, Nr. 6. 2010; Angrist/Pischke: Mostly Harmless Econometrics. An Empiricist’s Companion. 2008.

80  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

3.5 Benefits/Areas of Application The BI system is used throughout the management cycle: from project preparation through ongoing project management and evaluation to final measurement of success in project completion. A simple economic analysis for project preparation is shown in Fig. 6.2. For insured persons with coronary heart disease the costs, allocations from the Morbi-RSA and the resulting margin are shown over a period of five years, broken down according to service sector as a template for an appropriate evaluation by the project manager of Gesundes Kinzigtal and the doctors of the project group that is typically formed. Using the cube shown in Fig. 6.2 such an analysis could also quickly be carried out for other diseases such as depression or hypertension, since only the selection on the indication axis would have to be changed. These analyses can generate initial findings to determine the burden of disease from an economic perspective in the population considered. In addition, the BI system is also used to broaden the economic perspective with a medical/epidemiological one. The aim is to identify needs and opportunities for intervention and prioritize accordingly. The prioritization takes place in the close collaboration between the project managers (who are normally trained in health economics) and the doctors on site. In the phase of ongoing project management and evaluation as well as during the completion of the project, the focus is on comparative analysis and representation (benchmarking) of the following: – Structures (e.g., age and morbidity structure of patients/doctors). – Processes (e.g., review of medical care quality – among others guidelines orientation, implementation of contracts, economic prescriptions: me-too/generic quotas). – Results of health care (e.g., mortality, margins, patient satisfaction). A balanced-scorecard approach was chosen as a central, strategic performancemanagement concept for benchmarking. This was and is continuously rolled out across the entire network, from the network-management level to the individual network partners (GPs and specialists, hospitals, pharmacies, etc.). Based on Donabedian,²⁷ performance indicators from the structure to the process and outcome level are prepared – it contains indicators for quality as well as for economics and patient satisfaction (Triple Aim) – and used for a continuous improvement process. An example of a BSC report (supply cockpit for family doctor’s practice) is shown in Fig. 6.3.

27 Donabedian: Evaluating the Quality of Medical Care.” In: Milbank Quarterly 83, 4/2005, pp. 692 ff., 721; see also a similiar approach in Gröbner: Controlling mit Kennzahlen in vernetzten Versorgungsstrukturen des Gesundheitswesens. Dissertation. 2007, pp. 266 ff.

6 How Value is Created from Data



Ø-NLPMin/ Eigene Ø-LPPraxis Hausärzte Hausärzte Max LP (n=22) (n=39) (Praxis 9) (n=17) 3. Ergebnis: Wie wirken Maßnahmen auf medizinische, versichertenbezogene & finanzielle Outcomes? Qualitätsindikatoren und relevante Kennzahlen

4.Quartal 2012 AOK/LKK

3.1 Finanzergebnisse (Morbi-RSA)

3.2 Gesundheitsbezogene Outcomes

Zuweisungen (Morbi-RSA) pro Patient - Gesamtkosten (tagesgenau) pro Patient

726,13

765,33

687,81

937,79

574,19

764,78

677,81

= Deckungsbeitrag pro Patient

151,95

0,55

10,00

251,72 326,69

Verstorbene % (riskoadj. Mortalität)

3.3 Patientenzufriedenheit (GeKiM 2012/13)

54,70

87,42

98,55

42,35

0,24%

0,57%

0,60%

0,00% 0,00%

KH-Fälle pro 1.000 Patienten (risikoadj.) Osteoporose-Patienten mit Fraktur %

0,00%

8,49%

12,98%

Praxiseindruck sehr gut - ausgez. %

58,4%

61,0%

-

Med. Behandl. sehr gut - ausgez. %

49,2% 87,3%

53,0%

-

83,3% 79,2%

84,6%

-

95,6%

21,0%

20,1%

24,1%

12,5%

2,7%

1,3%

1,6%

0,6%

Patienten >= 35 mit KV-Check-Up %

8,5%

7,8%

7,1%

17,1%

Erwerbsfähige Patienten mit AU %

38,6%

41,7%

43,8%

33,8%

4,97

5,93

6,37

3,87

Generikaquote am generikafähigen Markt

88,1%

88,6%

87,2%

93,0%

Herzinsuff.-Pat. mit leitlinienkonf. VO %

74,0%

75,4%

72,9%

100,0%

Vers. >=65 mit pot. inad. Med. (PRISCUS)

10,6%

13,2%

12,5%

4,2%

2,0%

4,8%

4,3%

0,6%

Weiterempfehlung best. - wahrsch. %

2. Prozess - Worin müssen wir hervorragend sein? 2.1 Verbesserung der

N.n.bez. Diagnosen % Verdachtsdiagnosen %

Diagnoisequalität 2.2 Kennzahlen zum Inanspruchnahmeverhalten

AU Dauer pro erwerbsfähiger Patient 2.3 Verbesserung Arzneimittel-Management

Patienten >=65 mit VO (FORTA Klasse D) %

1. Struktur - Wie sieht die Zielgruppe aus und wie wird diese erreicht? Welche Strukturen müssen wir leben, damit Qualität entstehen kann? 1.1 Patientenstruktur 1.1.1 Allgemeine Charakteristika

825,0

Ø-Anzahl Patienten pro Praxis

52,5

53,5

Weiblich %

54,8%

56,5%

55,8%

65,2%

Erwerbsfähige Patienten %

63,4% 3,6%

58,5%

60,5%

72,7%

7,7%

7,0%

13,0%

Ø-Charlson-Score

1,29

1,26

1,14

1,99

Regionaler Hausarzt Risikoscore (Morbi-RSA)

1,00

1,05

0,94

1,29

IV-Eingeschriebene an gesamt %

72,7%

61,1%

10,2%

88,8%

DMP Eingeschr. mit Potentialdiagn. %

73,7%

53,9%

32,0%

81,9%

1,3

1,0

-

1,9

Patienten mit Pflegestufe % 1.1.2 Morbidität

1.1.3 Einschreibequoten

1.2 Lernen & Innovation

931,0

485,3 54,6

338,9

52,8

Ø-Alter Patienten

Teilnahme an Qualitätszirkel-Arbeit (Ø = 1,0)

Benchmark-Bericht zur Kennzahl „Patienten >=65 mit potentiell inadäquater Verordnung gemäß Priscus Liste‟ Arzt Patienten

davon mit pot. inad. VO (eigene Praxis) %

Praxis 19

22

4,5 %

Praxis 5

281

8,5 %

Top Wirkstoffe: Anzahl Patienten >=65 mit potentiell inadäquater Verordnung gemäß Priscus Liste

Praxis 18

154

9,1 %

Praxis 10

328

9,5 %

Praxis 1

204

10,3 %

Praxis 9

255

10,6 %

Patienten mit VO

Praxis 14

178

11,8 %

Amitriptylin

6

Praxis 17

72

12,5 %

Nifedipin

6

Ø-NLP Hausärzte

2.565

12,5 %

Acetyldigoxin

3

Ø-LP Hausärzte

3.074

13,2 %

Doxepin

3

Praxis 12

219

13,2 %

Levomepromazin

2

Praxis 16

207

14,0 %

Diazepam

2

Praxis 8

223

14,3 %

Trimipramin

2

Praxis 4

208

14,9 %

Hydroxyzin

1

Praxis 11

183

15,3 %

Dikaliumclorazepat

1

Praxis 6

167

15,6 %

Clozapin

1

Praxis 2

171

17,0 %

Doxazosin

1

Praxis 7

231

22,1 %

Metoprolol und Nifedipin

1

4. Quartal 2012

Source: Authors' illustration.

Fig. 6.3: Information sheet of the supply cockpit for a family doctor’s practice incl. two detailed reports (example for exports from the Deltamaster business intelligence suite).

81

82  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

These quarterly reports are sent to all cooperating GPs. In the information sheet each practice sees its results in comparison with those of other practices in the network (LP) and also in comparison with practices in the region that do not have a contract with Gesundes Kinzigtal (NLP). For each key figure the development over time is shown via miniature diagrams (sparklines). A notation concept is also on file. The color red means that this key figure should be kept on the low side (the final assessment, however, is always up to the doctor), for example, for the key figure: “Patients >=65 with potentially inappropriate prescription according to Priscus list.” For this key figure you can find two predefined detail areas in Fig. 6.3, which the doctor can browse if necessary: on the one hand a detailed benchmark report and on the other a list of prescriptions that are potentially inappropriate according to the Priscus list. In addition to the dispatch of the supply cockpits, these are also integrated in a variety of ways in the management routine of Gesundes Kinzigtal. They can thus, for example, be used in meetings of the project groups (e.g., drug commission), for practice trips by staff to the doctors and at annual meetings of network management with network partners, as well as selected excerpts be processed in internal newsletters.²⁸ In addition, the BI system is also used for various health research projects. Among other things, a project with the National Association of Statutory Health Insurance Physicians for research, development and pilot implementations of a set of quality indicators in the outpatient sector (AQUIK) was implemented.²⁹ Work is currently ongoing on behalf of the Central Institute for Statutory Health Care to validate and further develop statutory health insurance routine data-based quality indicators through a link with medical treatment data from practice administration systems.

4 Lessons Learned & Outlook Looking back at the more than nine-year history of the development and continuous improvement of the BI system, one can see a number of lessons learned that may be of relevance to other integrated healthcare systems. One key to success is the commitment of the independent network partners (doctors, health insurances, etc.). This commitment makes it possible to use the 28 For detailed information see: Pimperl/Schulte: Balanced Score Card. Managementunterstützung für die Integrierte Versorgung? 9. DGIV Bundeskongress, Berlin, 2012.; Pimperl/Schulte/Daxer/Roth/ Hildebrandt: Balanced Scorecard-Ansatz. Case Study Gesundes Kinzigtal. In: Monitor Versorgungsforschung 6, Nr. 1. 2013, pp. 26–30. 29 OptiMedis AG: KBV testet Qualitätsindikatoren in “Gesundes Kinzigtal.” 2010, pp. 1 f. Online: http:// www.optimedis.de/images/.docs/pressemitteilungen/optimedis_pm_20100525_kbv-test.pdf, [accessed on: March 31, 2014].

6 How Value is Created from Data



83

relevant previously described data sources and to establish a data-driven management process. Gesundes Kinzigtal sees collaboration in a spirit of trust, in an organizational and contractual framework (heterarchic network), as a prerequisite for the commitment of the network partners. For one thing, in addition to the intrinsic motivation of providing good medicine, the network partners have a strong interest in learning from the data and initiating a continuous process of improvement due to the contractual arrangement as a partner of Gesundes Kinzigtal GmbH, coupled with the performance-based compensation model through the health insurance company. For another, the BI system’s arrangement as regards content was developed together with the network partners from the beginning, thus establishing a foundation of trust and identification with the BI project itself. In addition, a simple, unified and well-structured design for the reports (complexity reduction with equal information density (e.g., via graphical tables, sparklines, notations, etc.)³⁰ was essential for acceptance. Today, detailed information about feedback reports are even actively requested by some service providers. In most cases, however, it is still necessary to combine electronic submission with face-to-face meetings in the context of office visits by employees of Gesundes Kinzigtal. Interestingly, the regular evaluations have already shown a number of improvements in the development of many key figures compared with non-participating providers, for example in the reduction of non-induced multiple medication for the elderly, in the reduction of hospital case numbers, in the increase of vaccination rates and guideline-compliant use of medication, for example after a heart attack. Further possible improvements are also addressed and discussed by the management of Gesundes Kinzigtal in the monthly meetings of the Medical Advisory Board, in project groups and quality circles as well as at training events and general assemblies. Patients therefore benefit from the increased full attention and motivation of the healthcare providers as well as the increased readiness of those healthcare providers to share and compare information. The BI system also has to be extended to include (health-related) scientific logic (such as propensity score matching) in order to generate valid analyses. This must be implemented in such a way that standardized and automated analyses and reporting are possible, since the production of evaluations can otherwise not be conducted economically. At the same time, the BI system must also retain the flexibility for ad hoc analyses. This can be ensured by means of sophisticated models already on the database level, in combination with a high-performance OLAP system. Ensuring data quality, completeness and rapid availability are a major challenge. In order to ensure the best possible results, a large number of cleanup and extrapolation models must be integrated into the data warehouse, while at the same time new developments are carried out on the source systems (e.g., the networking software 30 Bissantz: Bella berät: 75 Regeln für bessere Visualisierung. 2010; Gerths./Hichert: Professionelle Geschäftsdiagramme nach den SUCCESS-Regeln gestalten. 2011.

84  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

CGM-NET developed together with CGM). And there is also great potential for the future. For example, so far only data from the medical service partners has been tapped. But also data from other partners, such as hospitals, pharmacies, physiotherapists etc., could increase the speed of availability as well as providing additional value towards understanding and improving the health care processes through information that was previously unavailable in the accounting data. A medical training center managed by Gesundes Kinzigtal itself is currently also in the plans. A variety of data on the training progress and health of the insured persons being cared for might be taken from the exercise equipment through mobile apps and other devices, data which could in turn be provided to all those involved in the treatment process. Overall, the fast, ideally complete and quality-assured provision of data is only a first step. But the biggest added value is expected to arise from the enrichment and combination of the data with other information already included in the BI system, from which a decision support tool can then be generated at the right time in the workflow for the agents in attendance. Of course, with such an approach the protection of sensitive data of the insured persons must receive the highest priority. Here it is important to weigh the potential benefits in relation to risk. So far, much in this area has not yet been clearly defined from a legal point of view, which provides a number of challenges for integrated health care projects. Truly meaningful approaches are often blocked by data security issues since unclear legal rules are interpreted rather restrictively by data protection authorities, as a precaution. Approaches such as predictive modeling, through which patients at risk can be identified and invited into appropriate disease management programs before the potential occurrence of severe disease events (e.g., hospitalization), are currently theoretically possible in Kinzigtal, but require additional effort and are therefore not used correctly. The reason for that is that the data protection officers of the health insurance companies and the state of Baden-Württemberg have asked Gesundes Kinzigtal to voluntarily take on only pseudonymous data from the insurance companies and store it in the data warehouse – even given signed written exemptions for the data by the persons enrolled in integrated healthcare. A direct warning to the doctors regarding possible risk situations, advantageous to patients, is thus made all the more difficult. In addition, data privacy is also a matter that concerns the German federal states, which complicates the replicability of concepts once they are approved in other federal states. On the other hand, the issue of data protection in health care is rightly a particularly sensitive issue and could – in case of abuse – dangerously disavow the whole topic of integrated care. On the part of Gesundes Kinzigtal, the agents – who, in addition to the management, include a specially contracted external and legally qualified data protection officer – are therefore in very close consultation with the various competent authorities in Baden-Württemberg as well as on a federal level, which has led to, among other things, legislative improvements via federal initiatives.

6 How Value is Created from Data



85

Even if this is not the place to analyze the cost-benefit ratio of the investments for the data warehouse and the evaluation routines since 2006 in detail,³¹ it can nevertheless be noted that the very big effort made was certainly worthwhile. Without this investment the positive seven-digit results of performance contracting by Gesundes Kinzigtal that have been growing for years could not have materialized at this level. Above all, without its own data basis, Gesundes Kinzigtal would never have acquired the ability to conduct its own results calculations and to point out sources of error in the calculations of health insurance companies. But a company that is dependent on the goodwill and the blind acceptance of the evaluations of a contractor, is actually not an independent company and not an equal partner. It should be noted that a variety of electronic networking and data-based management possibilities already exist under current framework conditions. It must be assumed that its full exploitation – but especially its further development into a comprehensive business intelligence solution, including other service sectors and especially the data from mobile health solutions – will still require quite some time. So the goal remains: The management of health care networks must be developed further from pure empiricism to data-supported evidence.

Literature Amelung, V.E./Sydow, J./Windeler, A.: Vernetzung im Gesundheitswesen im Spannungsfeld von Wettbewerb und Kooperation. In: Amelung, V.E./Sydow, J./Windeler A. (Eds.): Vernetzung im Gesundheitswesen. Wettbewerb und Kooperation. Stuttgart 2008, pp. 9–24. Angrist, J. D./Pischke, J. S.: Mostly Harmless Econometrics. An Empiricist’s Companion. Princeton 2008. Azevedo, P./Brosius, G./Dehnert, S. et al.: Business Intelligence und Reporting mit Microsoft SQL Server 2008. OLAP, data mining, analysis services, reporting services und integration services mit SQL Server 2008. Unterschleißheim 2009. Business Application Research Center (BARC): Merck und OptiMedis gewinnen den BARC Best Practice Award 2013. Online: http://www.barc.de/content/news/merck-und-optimedis-gewinnenden-barc-best-practice-award-2013, [accessed on: November 25, 2013]. Beek, G./Beek, K.: Einführung in die Gesundheitsökonomik. Munich 2009. Behme, W.: Business Intelligence als Baustein des GeschäftserfolgS. In: Mucksch, H./Behme, W. (Eds.): Das Data-Warehouse-Konzept. Architektur – Datenmodelle – Anwendungen. Wiesbaden 1996, pp. 27–46. Berwick, D. M./Nolan, T. W./Whittington, J.: The Triple Aim: Care, Health, and Cost. In: Health Affairs (Project Hope) 27(3)/2008, pp. 759–769. Bissantz, N.: Bella berät: 75 Regeln für bessere Visualisierung. Nuremberg 2010.

31 See also Reime et al.: From Agreement to Realization: Six years of Investment in Integrated eCare in Kinzigtal. In: Meyer, I./Müller, S. & Kubitschke, L. (eds.) in print. Achieving Effective Integrated E-Care Beyond the Silos, Hershey, PA: IGI Global. 2014.

86  A. Pimperl, B. Dittmann, A. Fischer, T. Schulte, P. Wendel, M. Wetzel and H. Hildebrandt

Bundesverband Managed Care e. V.: Der Innovationsfonds kommt! Online: http://www.bmcev.de/ bundesverband-managed-care-ev/news/detailansicht/archiv/2013/dezember/article/565/derinnovationsfonds-kommt/, [accessed on: December 19, 2013]. BQS Institut für Qualität & Patientensicherheit GmbH: BQS-register140d. Online: http://www.bqsregister140d.de/, [accessed on: January 31, 2009]. Donabedian, A.: Evaluating the Quality of Medical Care. In: Milbank Quarterly 83(4)/2005, pp. 691–729. Gerths, H./Hichert, R.: Professionelle Geschäftsdiagramme nach den SUCCESS-Regeln gestalten. Freiburg im Breisgau 2011. GKV-Spitzenverband: GKV-Datenaustausch. Elektronischer Datenaustausch in der gesetzlichen Krankenversicherung. Online: http://www.gkv-datenaustausch.de/, [accessed on: February 22, 2013]. Gröbner, M.: Controlling mit Kennzahlen in vernetzten Versorgungsstrukturen des Gesundheitswesens. Dissertation. Munich 2007. Online: http://ub.unibw-muenchen.de/dissertationen/ediss/ groebner-martin/inhalt.pdf, [accessed on: January 31, 2009]. Hauner, H./Köster, I./Ferber, L.: Prävalenz des Diabetes mellitus in Deutschland 1998–2001. Sekundärdatenanalyse einer Versichertenstichprobe der AOK Hessen/KV Hessen. In: DMW – Deutsche Medizinische Wochenschrift 128(50)/2003, pp. 2632–2638. doi: 10.1055/s-2003-812396. Hermann, C./Hildebrandt, H./Richter-Reichhelm, M. et al.: Das Modell ‚Gesundes Kinzigtal‘. Managementgesellschaft organisiert Integrierte Versorgung einer definierten Population auf Basis eines Einsparcontractings. In: Gesundheits- und Sozialpolitik 5, Nr. 6. Baden-Baden 2006, pp. 11–29. Hildebrandt, H./Hermann, C./Knittel, R. et al.: Gesundes Kinzigtal Integrated Care. Improving Population Health by a Shared Health Gain Approach and a Shared Savings Contract. Online: http:// www.ijic.org/index.php/ijic/article/view/539/1050, [accessed on: February 22, 2013]. Hildebrandt, H./Schulte, T./Stunder, B.: Triple Aim in Kinzigtal, Germany. Improving Population Health, Integrating Health Care and Reducing Costs of Care – Lessons for the UK? In: Journal of Integrated Care 20/2012, pp. 205–222. Janus, K./Amelung, V. E.: Integrated Health Care Delivery Based on Transaction Cost Economics. Experiences from California and Cross-National ImplicationS. In: Savage, T. G./Chilingerian, A. J./Powell, M. F. (Eds.): International Health Care Management. Vol. 5. Advances in Health Care Management, Greenwich 2005. Mühlbacher, A.: Integrierte Versorgung. Management und Organisation. Eine wirtschaftswissenschaftliche Analyse von Unternehmensnetzwerken der Gesundheitsversorgung. Bern 2002. Mühlbacher, A./Ackerschrott, S.: Die Integrierte Versorgung. In: Wagner, K./Lenz, I. (Eds.): Erfolgreiche Wege in die integrierte Versorgung. Eine betriebswirtschaftliche Analyse. 1 edition. Stuttgart 2007, pp. 17–41. OptiMedis AG: Die fortschrittliche IT-Lösung für Ärztenetze. Online: http://www.optimedis.de/integ rierte-versorgung/leistungsanbieter/it-vernetzung, [accessed on: March 31, 2014]. OptiMedis AG: KBV testet Qualitätsindikatoren in “Gesundes Kinzigtal.” 2010. Online: http:// www.optimedis.de/images/.docs/pressemitteilungen/optimedis_pm_20100525_kbv-test.pdf, [accessed on: March 31, 2014]. Pimperl, A./Schulte, T.: Balanced Score Card. Managementunterstützung für die Integrierte Versorgung? 9. DGIV Bundeskongress, Berlin, 2012. Online: http://www.dgiv.org/Veranstaltungen/ 10/9_DGIV_Bundeskongress/artikel,146,3,1.html, [accessed on: April 31, 2014]. Pimperl, A./Schulte, T./Daxer, C. et al.: Balanced Scorecard-Ansatz. Case Study Gesundes Kinzigtal. In: Monitor Versorgungsforschung 6(1)/2013, pp. 26–30. Purucker, J./Schicker, G./Böhm, M./Bodendorf, F.: Praxisnetz-Studie 2009. Management – ProzesseInformationstechnologie. Status quo, Trends und Herausforderungen. Nuremberg 2009.

6 How Value is Created from Data



87

Reime, B./Kardel, U./Melle, C. et al.: From Agreement to Realization: Six Years of Investment in Integrated eCare in Kinzigtal. In: Meyer, I./Müller, S./Kubitschke, L. (Eds.) in print. Achieving Effective Integrated E-Care Beyond the Silos. IGI Global, Hershey, PA 2014. Riens, B./Broge, B./Kaufmann-Kolle, P. et al.: Bildung einer Kontrollgruppe mithilfe von MatchedPairs auf Basis von GKV-Routinedaten zur prospektiven Evaluation von Einschreibemodellen. In: Gesundheitswesen 72(6)/2010, pp. 363–370. Sachverständigenrat zur Begutachtung der Entwicklung im Gesundheitswesen: Sondergutachten 2012 des Sachverständigenrates zur Begutachtung der Entwicklung im Gesundheitswesen. Wettbewerb an der Schnittstelle zwischen ambulanter und stationärer Gesundheitsversorgung. 2012. Online: http://www.svr-gesundheit.de/fileadmin/user_upload/Gutachten/2012/GA2012_Langfas sung.pdf, [accessed on: March 22, 2014]. Stoessel, U./Siegel, A./Zerpies, E./Körner, M.: Integrierte Versorgung Gesundes Kinzigtal – Erste Ergebnisse einer Mitgliederbefragung. In: Das Gesundheitswesen 75/08/09/2013, pp. 604–605. Stuart, E. A.: Matching Methods for Causal Inference. A Review and a Look Forward. In: Statistical Science. A Review Journal of the Institute of Mathematical Statistics 25(1)/2010, pp. 1–21. Vijayaraghavan, V.: Disruptive Innovation in Integrated Care Delivery Systems. 2011. Online: http:// www.christenseninstitute.org/wp-content/uploads/2013/04/Disruptive-innovation-in-inte grated-care-delivery-systems.pdf, [accessed on: March 31, 2014]. Windt, R./Glaeske, G./Hoffmann, F.: Lässt sich Versorgungsqualität bei Asthma mit GKV-Routinedaten abbilden? In: Monitor Versorgungsforschung 1(2)/2008, pp. 29–34.

Rainer Röhrig and Markus A. Weigand

7 Ethics The practical handling of data, information, half knowledge, probabilities and “truths” is going to carry on increasing in the “scientific world” of medicine, also referred to as the knowledge business. The authors present the dimensions of ethics as practical philosophy in this article, using very concrete examples. In the field of tension between data protection, informational self-determination, individual responsibility and collective expectations we need both: First, the emancipated personal dealing with Big Data – including the transparency of rights and obligations – and, secondly, a socially-normative framework, including a law that does not leave patients and citizens standing alone between computers. What freedom of movement will patients have within diagnosed probabilities in the future? Perhaps we will also require a new form of disremembering?¹ Digital information with an expiration date or at least with the right and the opportunity to digitally delete it? Four ethical principles can be seen as the possible guides,² especially with respect to Big Data in medicine³: 1. Nonmaleficence: This refers not only to the lack of harm to the people or groups being studied, but also to the protection of (sensitive) data, as well as the privacy and anonymity of the data. 2. Beneficence: The well-being and self-determination of the people or groups being studied must be guaranteed – which applies in particular for the recognition of and respect for their values and decisions, for example regarding the question of informed consent and the protection of their right to informational self-determination. 3. Justice: The principle of treating all people equally includes the question of which people are excluded from Big Data analyses (for example due to non-use of social networks) and what consequences this has. 4. Fidelity: The principle of fidelity refers not only to informed consent or privacy issues, but also to the disclosure of research as well as taking seriously any fears and concerns on the part of users. The issues of the justification and responsibility for and consequences of the applications of and research approaches to Big Data will therefore gain far greater importance in medicine.

Abstract: The development of information technology and its introduction to medicine has made a substantial contribution toward improving medical decisions and processes. Similarly to the way there is no effect without possible side effects in drug therapy, there are also no benefits without risks when it comes to the use

1 Meyer-Schönberger: Delete: Die Tugend des Vergessens in digitalen Zeiten. Berlin 2010. 2 Fenner: Einführung in die Angewandte Ethik. Tübingen 2010. 3 Heise: Big Data – small problems? Ethische Perspektiven auf Forschung unter Zuhilfenahme onlinebasierter Kommunikationsspuren. Online: http://www.univie.ac.at/digitalmethods/programm/bigdata-small/, [accessed on: June 1, 2014].

90  Rainer Röhrig and Markus A. Weigand

of information technology. These must be weighed against each other, while respecting the interests and rights of people as individuals. This article tries to present ethical challenges and the resulting requirements on the basis of case studies, in order to provide food for thought for us all to make our own assessments.

Content 1 Introduction  90 2 Informational Self-determination  91 2.1 Individual Benefits  92 2.2 Group Benefits  93 2.3 Indirect Impairment of Informational Self-determination  95 3 Right to Know and Not to Know  96 4 Reliability of Information  97 5 Genetic Diagnostics Act  98 Literature  98

That which has once been thought, cannot be taken back again. (Friedrich Dürrenmatt)

1 Introduction Ethics is understood as the teaching of the justifiability of morality, i. e., the discussion and development of principles and criteria of action with the goal of providing a good life for all people. In applied ethics, this leads to values, recommendations on how to act and norms. The advent of new methods and technologies, along with the opportunities that they provide, also leads to new ethical questions. As with medicine, where there is no effect without a side effect, new methods and technologies have to be assessed according to the type of use, but also according to certain perspectives, meaning that there can be no simple classification into good and bad. One of the tasks of ethics is to show the different perspectives based on commonsense examples and thereby to stimulate discussions, which form a social consensus on dealing with the new methods and technologies. Two important principles are as follows:

7 Ethics

– –



91

Article 1 of the UN Charter of Human Rights “All human beings are born free and equal in dignity and rights.”⁴ The Categorical Imperative: “Act only according to that maxim by which you can also will that it would become a universal law.”⁵

Ethical considerations always have the human being at their center. Two paths of information have to be considered for an ethical assessment of data processing: control over the disclosure and use of personal data (informational self-determination) and the control of the information (about ourselves) that we receive, i. e., the right to know and not to know). In this chapter the dialectic of problems associated with information processing will be demonstrated by means of examples. This is to encourage readers to reflect on their own position, as comprehensive or general solutions cannot always be given. For reasons of improved comprehension this demonstration is based not only on examples of Big Data methods or individualized medicine; however the principles underlying the examples can be applied to them.

2 Informational Self-determination Informational self-determination refers to the fundamental right to make decisions about the disclosure and use of data about oneself. A person can only make a free choice if that choice is made in an informed way, i. e., if the person has access to information regarding the consequences of his or her decisions. Due to the imbalance of knowledge between doctor and patient, one refers to so-called informed consent, which requires sufficient elucidation of the patient before he or she can consent to treatment or participation in a research project. This principle of consent is also followed by data protection legislation in which the admissibility of data collection, processing and use of personal data is regulated.⁶ Each person’s right to self-determination can, however, clash with other fundamental rights, such as the well-being of the general public, or freedom in research and teaching. This is where it is important to find solutions that exclude situations of conflicting ethical principles, or where the benefits outweigh the risks. These questions arise especially when it turns out that existing rules are insufficient for new technical solutions.

4 United Nations: Resolution of General Assembly:“Universal Declaration of Human Rights.” 5 Immanuel Kant: “Grundlegung zur Metaphysik der Sitten.” 6 § 1, § 3 (6), § 4 – German Data Protection Act in the version from January 14, 2003 (BGBl. I pg. 66), which was last amended by Article 1 of the Act from August 14, 2009 (BGBl. I pg. 2814).

92  Rainer Röhrig and Markus A. Weigand

2.1 Individual Benefits The following example illustrates opportunities and risks in data processing, where the benefits for an individual person must be weighed against the risks: Regular intake is essential for the safe use of many medicinal products. Failing to take the medication or taking it too late can not only lead to a loss of effect (the rejection effect after a transplantation is no longer suppressed) but also to a paradoxical effect (rebound effect, e.g., with beta-blockers) or a “chronification” of diseases (development of resistance in the case of inadequate antibiotic levels). Particularly in an aging society with an expected increase in dementia, regular drug intake is a major challenge. Medical practice has developed means to control the proper dispensing or delivery of drugs, but not to ensure they are taken. The development of so-called smart pills (a.k.a. ingestible event markers or ingestible sensor systems), which have a chip that sends a message from the body after activation and voltage supply through gastric signals, has made reminders through smart phones or remote therapy monitoring through nursing services or doctors possible, thus averting potential injury to patients.⁷, ⁸ While the benefits are obvious, one also has to think about the risks and potential of abuse. If the same technology is used for the regular taking of contraceptives (birth control pills), it could influence the decision to give a two-year project to one of two women whose contracts are both expiring, if one of the women did or did not display a pause in the monthly cycle of pill intake. Even if the chosen example is not a classic Big Data example, it does nevertheless illustrate several ethical problems and solutions: – Whether the benefits outweigh the risks, and whether its use is desirable or undesirable depends on the use in a particular situation and not (alone) on the technology. – Methods and technologies should be designed so that they are as robust as possible against errors and abuse (inherent safety, privacy by design). – However, inherent security or privacy by design is not always possible or can be achieved only at the cost of disproportionately strong efforts. The decision on how to evaluate the risk/benefit ratio for the use of a method or technology is up to the person concerned. In order for a person to make such a decision, he or she must be aware of the (residual) risks and be able to evaluate and possibly

7 Eisenberger et al.: Medication adherence assessment: high accuracy of the new Ingestible Sensor System in kidney transplants. 2013. 8 Food and Drug Administration: Medical devices; general hospital and personal use monitoring devices; classification of the ingestible event marker. 2013.

7 Ethics

–



93

accept them. Patronization through (data protection) legislation should be avoided in order to enable free decision making. Using a specific system can lead to normative pressure for other people, which restricts the freedom of their decisions or even puts social values (in this example, the family friendliness of the company) at risk. Therefore, solutions must be found that not only enable a free decision to accept the residual risks, but also protect those who do not accept them. This can often be achieved only through prohibitions of data use, in which case the detection of an offense and thus the control and enforcement of such prohibitions will often be pretty much infeasible.

2.2 Group Benefits The use of new technologies or data does not have to serve the person concerned directly. In particular, the analysis of large amounts of data can also be of use to a group of people or society. Here are several examples: The spread of infections (endemics, epidemics, pandemics) represents a danger for many citizens. Therefore, the occurrence of infectious diseases is monitored in order to investigate the penetration and efficacy of vaccines against diseases such as measles⁹ and influenza,¹⁰ or the source of infection and the ways by which they spread – for example in the case of enterohemorrhagic Escherichia coli (EHEC) or Shiga-toxin-producing E. coli (STEC)¹¹ – in order to take measures against their continued spread, thus protecting the rest of the population. In this case the group benefit is judged to be higher than the individual’s right to informational self-determination. This is reflected in exemptions and reporting requirements in the Infection Protection Act.¹² However, evaluations of “normal” treatment data, so-called routine data, can also provide valuable information to improve health care, as follows. It has long been known that most bronchial infections have no bacterial cause and, accordingly, antibiotic therapy is not only ineffective but potentially harmful due to the possibility of the development of resistance by the affected person and other contact persons. An evaluation of the prescription behavior shows that between

9 Knol et al.: Euro Surveill. 2013 (9). 10 Uphoff et al.: PLoS ONE 2011; 6(7):e19932. (12). 11 Hauri et al.: Euro Surveill. 2011 (2). 12 § 6ff Infection Protection Act from July 20, 2000 (BGBl. I pg. 1045), which was last amended by Article 2 Section 36 and Article 4 Section 21 of the Act from August 7, 2013 (BGBl. I pg. 3154).

94  Rainer Röhrig and Markus A. Weigand

1996 and 2010, despite such knowledge, 60–80 % of all patients were prescribed bronchitis antibiotics.¹³ This information can be used to improve attempts to educate doctors and patients, and hence the quality of healthcare. The benefits of such studies are undisputed. However, the same data could be used by a pharmaceutical manufacturer to analyze medical practices to determine which ones might prove most receptive to a targeted advertisement for an antibiotic drug for the treatment of bacterial respiratory infections. This example therefore also helps to illustrate different ethical and legal aspects: – The informational self-determination of patients should also apply to the purpose for which their personal data will be used. This can be achieved through informed consent by patients who wish to allow their treatment data to be used for a specific research purpose (specific consent) or research in general (broad consent). – Routine data is often also used retrospectively for scientific analyses. In such cases the consent of the persons concerned cannot be had at all, or can be had only by dint of extraordinary efforts. In this case, the public interest and freedom of research must be weighed up against the right to informational selfdetermination. – When using treatment data great care must be taken not to reveal any patient’s secrets.¹⁴ Therefore, as much personally identifiable data as possible should be removed before the disclosure and evaluation (anonymization). – The data collected here not only affects the patient, but also relates to the treating physicians. In this case it must be evaluated to what degree the public interest in disclosure of quality indicators by doctors or hospitals to benchmark quality indicators outweighs the doctors’ rights to informational self-determination. – Many questions can be answered only by linking the data with other data sets.¹⁵ The expansion of data sets can be carried out both on the level of individual data stets as well as via aggregated data. The first case requires at least one individual key (pseudonym) that allows personal allocation. In the second case there is a risk that the expansion of the data sets will allow re-identification through the additional data. One must carefully consider whether the public interest outweighs the right to informational self-determination in this case, too.

13 Barnett/Lindner: Antibiotic prescribing for adults with acute bronchitis in the United States, 1996–2010. In: JAMA 2014; 311(19), pp. 2020–2022. 14 Medical confidentiality – § 203 Penal Code in the version from November 13, 1998 (BGBl. I pg. 3322), which was last amended through Article 1 of the Act from April 23, 2014 (BGBl. I pg. 410). 15 Swart/Stallmann/Powietzka/March: Datenlinkage von Primär- und Sekundärdaten: Ein Zugewinn auch für die kleinräumige Versorgungsforschung in Deutschland? In: Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 2014; 57(2), pp. 180–187.

7 Ethics



95

As is evident from these points, particularly in the case of secondary data analyses, it is necessary to weigh the public interest against the individual’s right to informational self-determination. In order to ensure this, doctors have a legal obligation to consult with an ethics committee before beginning a research project with data or biomaterials that can be related to individuals.¹⁶ This compensatory confidencebuilding measure of approval by an ethics committee is required by most journals for publication. However, due to regulations of professional law, this only applies to doctors and psychologists.

2.3 Indirect Impairment of Informational Self-determination The previous examples all considered the collection and processing of the data of the persons concerned. With the increase in data stored, increasingly an information network is developed with a pattern over a society. This has implications for the informational self-determination of individuals: – In social networks, members’ address books are often selected. This creates a shadow database of acquaintances among people who are not members of the social network and have therefore not given their consent to the use of their data. This effectively curtails their right to informational self-determination. – The more people with a particular characteristic (age, profession, etc.) can be found on the Internet, the more those for whom no information is available will stick out (white spot in a dense network). Numerous situations can be thought of in which this could be stigmatizing. This can lead to normative pressure on the individual person. While the construction of shadow banks is (theoretically) regulated by law,¹⁷ the normative pressure caused by an emerging information network provides a general social development, which is technically impossible to solve. Therefore, the protection of individuals in society, especially from the state, must be ensured by common values and non-discrimination rules.¹⁸ Another curtailment of informational self-determination by third parties arises with the development and increasingly widespread availability of genetic testing. The genome of a human being provides not only information about the person, it also contains information about their next of kin. Therefore, when storing large

16 § 15 Medical Association’s professional code of conduct or the respective codes of conduct by the relevant federal medical chambers. 17 This is practically unenforceable due to the globalization of the information society. 18 As a thought experiment one simply has to imagine the course of German history if an information network the way it exists today had existed in 1933–1945 or 1989.

96  Rainer Röhrig and Markus A. Weigand

amounts of genetic data, shadow databases will also be generated about individuals who have decided against analyses or the storage of their genetic data. In this case the right to informational self-determination of a person who wishes to store personal genome data for future therapy options conflicts with the right to informational self-determination of the person who has decided against the storage of his or her genome data.

3 Right to Know and Not to Know In addition to controlling what information is revealed about ourselves, we also have a right to decide what information about ourselves should be retained. There is a right to know and not to know. It is part of a person’s freedom to make choices about such matters. Before an operation one is informed of the risks associated with surgery and anesthesia. Patients have the right to decide whether they want to be informed about all relevant risks. Especially when it concerns unavoidable operations, many patients choose not to be informed in detail about the potential risks. In such a case the information is not relevant to their decision as to whether the operation should proceed or not, and would only increase the natural fear of the impending event. Especially when one is unable to change an unpleasant fact, many people find the information about it to be a burden, thus limiting their quality of life, when compared with not knowing about it. However, other people consider that uncertainty is particularly unpleasant and limits their quality of life. It is therefore important to have the right to decide how much information we are to receive. This also applies to situations in which information can lead to a decision. Today, a number of diagnostic procedures are available during pregnancy (e.g., amniocentesis for trisomy 21 [Down’s Syndrome]). If the mother or parents are sure they want to have the child irrespective of the result of any tests, not only the test itself but also a possible positive (pathological) result would constitute an (unnecessary) emotional burden. What was previously accepted as fate by the mother or parents and their environment, is now made a part of their responsibility to decide. Therefore consenting to undergo an examination involves a broader decision. This shows that consideration of the possible consequences of acquiring information, along with appropriate consultation in a medical environment, should not follow an examination, but should in fact take place before any such examination. In addition to the obvious direct consequences of providing information, there are also indirect ones. If it is established that a person has a high probability of a poor prognosis (e.g., a mutation in the BRCA1 gene which points to a high probability of breast cancer) this will have consequences for the person’s life planning, affecting not only the individual, but also third parties:

7 Ethics

– –



97

What would a boss decide if that person was up for promotion, tied to an expensive training program? Can the affected person still take out a high-paying life insurance or disability insurance policy?

In these cases, the informational self-determination of the person concerned is contrary to the legitimate interests of third parties from whom the person expects a benefit. This raises the question of whether the person concerned must in this case disclose the information or whether they may conceal it and thus offer false testimony. Since the affected person is gaining a pecuniary advantage from suppressing the facts, this actually corresponds to fraud.¹⁹

4 Reliability of Information The examples specified above assumed a secured, monocausal cause-and-effect relationship, and thus almost certain prognoses. However, this is rare in complex biological systems due to the many external and internal factors. Therefore probabilities have to be used in most cases. This has the consequence that for (medical) decisions you always have to be aware of the uncertainty that lies at the basis of a decision. The difficulties and limitations are discussed in the chapters that focus on mathematical method, and in other literature.²⁰ That is why only two points will be addressed here. From an epistemological point of view, the analysis of existing data sets cannot be used other than to generate hypotheses. Particularly when there are many influencing factors and only a few cases, one runs the risk of finding apparent connections that do not in fact exist. Certainty can be achieved only by falsifying the hypotheses in an experiment, or through a prospective study in which an event in the future is predicted. Therefore, findings that have been made with the help of Big Data on the basis of large existing data pools must be regarded critically until they are scientifically confirmed. In medical studies one differentiates between the quality of the test, the ability to distinguish conditions (sensitivity and specificity) and the probability that an individual’s condition will be correctly diagnosed (positive and negative prediction). The quality of a test that assesses the probability with which a sick person is categorized as sick and a healthy person as healthy is usually represented simply and comprehensibly. The quality of the test, as well as the likelihood of being ill (prevalence) both flow into the probability with which a person classified as ill really is ill. This is

19 § 263 German Penal Code in the version from November 13, 1998 (BGBl. I pg. 3322), which was last amended by Article 1 of the Act from April 23, 2014 (BGBl. I pg. 410). 20 A good and easy-to-understand overview of the pitfalls in epidemiology is provided by Beck-Bornholdt/Dubben: “Ein Hund der Eier legt”. 2001.

98  Rainer Röhrig and Markus A. Weigand

more difficult to understand and often leads to wrong interpretation and thus an incorrect basis for decisions, even among physicians.²¹ Prognostic systems that are well suited to assessing and comparing large groups (e.g., in quality assurance), can fail completely in the case of individual forecasts (treatment decisions). Here, the analysis of large amounts of data can help in the estimation of the prevalence of the affected patient population and thus improve the individual prognosis of a test. On the other hand, the estimation itself must be tested for its quality, to ensure it does not increase random error in the prognosis due to its inherent uncertainty. If you share a specific number or a prognosis with a patient or doctor, decisions will be based on them, even if they are known to be uncertain (anchoring effect).²² Thus it is best to consider to what extent information can improve a decision or whether it is actually safer to suppress such information. The person who prepares the information takes on some of the responsibility for the decision to be made on the basis of the information.

5 Genetic Diagnostics Act Many of these ethically relevant questions in the context of genetic analyses are legally specified in the German Genetic Diagnostics Act.²³ This regulates, among other things, the informed consent and the consultation by a qualified physician in the case that the examination results are at hand, and also that no person may be disadvantaged due to his or her genetic characteristics. The dilemma of insurance fraud in the case of a known genetic predisposition is pragmatically solved in the Act through a proportionality clause with a maximum amount up to which a statement on existing findings may be refused.

Literature Allgemeine Erklärung der Menschenrechte: A/RES/217 A (III). (vol 1948) Online: www.un.org/depts/ german/menschenrechte/aemr.pdf, [accessed on: May 25, 2014]. Barnett, M. L./Linder, J. A.: Antibiotic Prescribing for Adults with Acute Bronchitis in the United States, 1996–2010. In: JAMA 311(19)/2014, pp. 2020–2022.

21 Gigerenzer et al.: Helping doctors and patients to make sense of health statistics. In: Psychological Science in the Public Interest 2007; (8), pp. 53–96. 22 Madhavan: Cognitive anchoring on self-generated decisions reduces operator reliance on automated diagnostic aids. In: Hum Factors 2005; 47(2), pp. 332–341. 23 Genetic Diagnostics Act from July 31, 2009 (BGBl. I pp. 2529, 3672), which was amended by Article 2 Section 31 and Article 4 Section 18 of the Act from August 7, 2013 (BGBl. I pg. 3154).

7 Ethics



99

Beck-Bornholdt, H./Dubben, H.: Der Hund, der Eier legt. Erkennen von Fehlinformation durch Querdenken. Reinbek 2001. Eisenberger, U./Wüthrich, R. P./Bock, A. et al.: Medication Adherence Assessment: High Accuracy of the New Ingestible Sensor System in Kidney Transplants. Transplantation 96(3)/2013, pp. 245–250. Food and Drug Administration, HHS: Medical Devices; General Hospital and Personal Use Monitoring Devices; Classification of the Ingestible Event Marker. Final Order. Federal Register 78(95)/ 2013, pp. 28733–28735. Gigerenzer, G./Gaissmaier, W./Kurz-Milcke, E. et al.: Helping Doctors and Patients to Make Sense of Health Statistics. In: Psychological Science in the Public Interest 8/2007, pp. 53–96. Hauri, A. M./Gotsch, U./Strotmann, I. et al.: Secondary Transmissions during the Outbreak of Shiga Toxin-Producing Escherichia coli O104 in Hesse, Germany, 2011. In: Eurosurveillance 16(31)/ 2011. Kant, I.: Grundlegung der Metaphysik der Sitten: Akademie-Ausgabe Kant Werke IV 1968, 421. Knol, M./Urbanus, A./Swart, E.: Large Ongoing Measles Outbreak in a Religious Community in the Netherlands since May 2013. In: Eurosurveillence 18(36)/2013. Madhavan, P./Wiegmann, D. A.: Cognitive Anchoring on Self-Generated Decisions Reduces Operator Reliance on Automated Diagnostic Aids. In: Human Factors 47(2)/2005, pp. 332–341. Swart, E./Stallmann, C./Powietzka, J./March, S.: Datenlinkage von Primär- und Sekundärdaten: Ein Zugewinn auch für die kleinräumige Versorgungsforschung in Deutschland? In: Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 57(2)/2014, pp. 180–187. Uphoff, H./an der Heiden, M./Schweiger, B. et al.: Effectiveness of the AS03-Adjuvanted Vaccine against Pandemic Influenza Virus A/(H1N1) 2009-A Comparison of Two Methods; Germany, 2009/10. In: PLoS ONE 6(7), 2011, p. e19932.

Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

8 The New Data-Supported Quality Assurance of the Federal Joint Committee: Opportunities and Challenges The G-BA (Gemeinsamer Bundesausschuss), Germany’s Federal Joint Committee, oversees the performance of the country’s statutory health insurance, whose total health spending amounted to EUR 170 billion in 2012. By comparison, the Helios hospital group turnover amounted to approximately EUR 3.4 billion in 2013, Techniker Krankenkasse had a turnover of approximately EUR 16 billion, SAP had a worldwide turnover of EUR 17 billion and the revenues of the country of Austria amounted to EUR 156 billion in 2013. That makes one thing very clear: The G-BA is thus by no means “any old authority,” rather it is the central decision-making body of the joint self-governing healthcare institutions, with enormous economic potential. The “establishment of a new institute for quality assurance and transparency” is provided in the 2013 coalition agreement – but unfortunately there is not really much news to talk about regarding this issue, even though some may already have called it a Big Data institute. Critics expect even now that the implementation will be at least as bulky as the title of the proposed undertaking… Above all, this article provides a good look at the data-related tasks of the G-BA and ventures a glimpse into the future of “Big” Data.

Abstract: The German Federal Joint Committee has a number of statutory functions relating to the quality assurance (QA) of medical care. Sector-based, data-driven QA in what are currently 30 inpatient service areas as well as in dialysis, carried out by contract doctors, has been established for years. Legislative changes since 2007 have introduced a cross-sectoral QA and created the conditions in which to merge multiple records of patients from different treatment locations and treatment times using patient-related pseudonyms. The opportunities and challenges presented by this new type of QA are described.

Content 1 Legal Mandate of the Federal Joint Committee  102 2 Quality Assurance Through the Federal Joint Committee  102 2.1 Tasks in Quality Assurance  102 2.2 Data-Driven, External, Inpatient Quality Assurance  103 2.3 New Approaches in Data-Driven Quality Assurance: A Cross-Sectoral Approach  104

102  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

2.4

Pseudonymization of Patient-Identifying Data Through a Trust Authority  106 3 Opportunities and Challenges of Comprehensive Data Usage in the New Quality Assurance  107 3.1 New Data Sources: Enhanced Options for Quality Assurance  107 3.1.1 Social Data in Health Insurance Companies  107 3.1.2 Introduction of Patient Surveys  109 3.2 Use of an Electronic Health Card and Telematics Infrastructure for the Triggering of QA  109 4 Outlook  110 4.1 Expectations from Politics and the Public  110 4.2 Opportunities and Challenges of the New Data-Driven Quality Assurance  111 Literature  112

1 Legal Mandate of the Federal Joint Committee The Federal Joint Committee (Gemeinsame Bundessausschuss, G-BA) is the highest decision-making body of the joint self-government of health insurance companies and service providers in healthcare (physicians, dentists, psychotherapists, hospitals) in Germany. The legal basis for the G-BA and its work is the German Fifth Social Code (SGB V). The G-BA essentially fulfils its duties by determining guidelines that are legally binding for all health insurance companies and players in public health insurance. The G-BA thus fulfils essential tasks in the area of quality assurance (QA), including the promotion of quality in contractual medical, dental and inpatient care, and uses various data stores, some of which are comprehensive, for those purposes.

2 Quality Assurance Through the Federal Joint Committee 2.1 Tasks in Quality Assurance Since 2000 the tasks in the area of QA assigned to the self-government by legislators have steadily increased. In accordance with § 135a SGB V, all service providers in inpatient and outpatient care have been obligated to implement QA and the relevant internal quality management since 2003. In accordance with § 137 SGB V, the G-BA establishes minimum requirements for the quality of the structure, process and outcome of medical services. It determines

8

The New Data-Supported Quality Assurance of the Federal Joint Committee



103

minimum amounts for services where the quality of the treatment outcome is particularly dependent on the number of services provided, and decides content, scope and format of the data to be published in an annual quality report by hospitals on their structures and services, as well as their quality-related activities and results. The G-BA defines requirements for internal facility quality management and sets regulations for quality inspection and assessment of the contract medical care in accordance with § 136 SGB V. In addition, the data-driven, external (comparative) QA represents an integral part of the G-BA’s regulations regarding QA. In particular in inpatient care, but also in contractual medical dialysis, data-driven QA procedures have been established for years.

2.2 Data-Driven, External, Inpatient Quality Assurance Due to the German government’s directive on quality assurance procedures in hospitals,¹ since 2004 all German hospitals have been required to document their medical and nursing data in 30 selected performance areas with more than 460 quality indicators, and to make them available for a national quality comparison. The data transmitted, which currently includes a total of about four million data sets from more than 1,650 hospitals, is subjected to a multistage data validation procedure and evaluated at both a federal and provincial level. Each individual hospital receives an annual report on its results. These results are shown in comparison with the anonymized results of the other hospitals within the relevant federal state (benchmarking). The mathematical discrepancies determined on the basis of reference areas in a hospital are then subjected to a determination of causes and examined by external experts as part of a review process (so-called structured dialogue). In this way mathematical discrepancies that represent an actual qualitative abnormality can be determined, and measures can be initiated to improve quality, both through the external review process and via internal quality management. The anonymized results of all hospitals in these QA procedures are published in an annual quality report.² In addition, the hospitals are also required to publish most of these results in their non-anonymized hospital-related quality report.³ Specific hospital comparisons are thus made possible.

1 Richtlinie über Maßnahmen der Qualitättsicherung im Krankenhaus. Online: https://www.g-ba.de/ downloads/62-492-790/QSKH-RL_2013-06-20.pdf, [accessed on: April 23, 2014]. 2 AQUA-Institut. Quality report. Online: https://www.sqg.de/themen/qualitaetsreport/index.html [accessed on April 23, 2014]. 3 Reference data base on the machine-usable quality reports of hospitals. Online: http://www.g-baqualitaetsberichte.de/, [accessed on June 24, 2014].

104  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

2.3 New Approaches in Data-Driven Quality Assurance: A Cross-Sectoral Approach In 2007, the GKV-Wettbewerbsstärkungsgesetz (GKV-WSG), an act to strengthen competition in German statutory health insurance, revoked the sectoral approach in QA, and provided regulations for an introduction of cross-sectoral QA in § 137 SGB V. This happened against the background of the political objective of increasing inter-sectoral healthcare, thereby improving profitability.⁴ However, cross-sectoral QA also makes sense from a medical point of view: A sectoral consideration and assessment of the service quality is not sufficient to assess the results quality, given shorter lengths of stay in hospital, or in the case of treatments through various service providers at different times. Given the rising number of chronic illnesses with long disease processes and cross-sectoral treatment episodes as well as the increasing number of both inpatient and outpatient services, a cross-sectoral reorientation of QA must be seen as vital for the future. At the same time an intensified scientific founding of QA was pursued by integrating a technically independent institution with the organization and execution of legal QA with the tasks of the G-BA. This institution, according to § 137a SGB V, has the task of developing the necessary indicators for measurement and representation of healthcare quality, if possible across sectors, and taking part in the execution of QA across all facilities (data acquisition, acceptance, processing, analysis and evaluation). The G-BA began with the development of the structures and regulations necessary for cross-sectoral QA: After carrying out a Europe-wide tendering procedure, it gave an institution (according to § 137a SGB V) the task of developing cross-sectoral quality indicator sets in various medical fields (colorectal cancer and nosocomial infections among others)⁵ and created the normative basis in the form of guidelines for quality assurance across facilities and sectors (Richtlinie zur einrichtungs- und sektorenübergreifenden Qualitätssicherung, Qesü-RL).⁶ In the first section of this document, these guidelines dictate the basic regulations and define the organizational structure, the process, and in particular the data flow procedure (Fig. 8.1). The theme-specific provisions relating to the various QA procedures are defined in the second section.

4 Bundestag print 16/3100. Online: http://dipbt.bundestag.de/dip21/btd/16/031/1603100.pdf, [accessed on: April 23, 2014]. 5 Further commissioned QA procedures: see G-BA business report 2012; pg. 103. Online: https:// www.g-ba.de/downloads/17-98-3514/2013-08-23_G-BA-Geschaeftsbericht-2012-final_bf.pdf, [accessed on: April 23, 2014]. 6 Guidelines for cross-facility and cross-sectoral quality assurance. http://www.g-ba.de/informationen/richtlinien/72/, [accessed on: April 23, 2014].

8



The New Data-Supported Quality Assurance of the Federal Joint Committee

Medical practice working according to collective agreement

Hospital

Legend

Medica practicel working according to individual agreement

Patient-identifying data PSN Patient pseudonym

Service provider (LE)identifying data

AD

AD

AD

LEP Service provider pseudonym

Quality data, case number AD Administrative data M

Report-based data

LQS/LKG

Transport encrypted container

Trust authority

KV/KZV

Public key LEP

Private key for Administrative data

LEP

M

M

LEP

LEP

M

LE- identifying data Patient-identifying data

Trust authority

Quality data at LQS/LKG Quality data at federal evaluation facility

PSN LEP

M

Federal evaluation facility

PSN LEP

Source: http://www.g-ba.de/informationen/richtlinien/72/ Fig. 8.1: Data flow Qesü-RL

Treatment during the course of time

Operation

Outpatient follow-up treatment

Inpatient treatment

Outpatient follow up treatment

QS data I

QS data II

QS data III

QS data IV

Insurance no.

Insurance no.

Insurance no.

Insurance no.

Medical advisory board: Pseudonymization of insurance number

QS data I

QS data II

QS data III

QS data IV

Pseudonym

Pseudonym

Pseudonym

Pseudonym

Merging of data sets by means of standardized pseudonym

QS statements/evaluation of quality

Source: Authors' illustration.

Fig. 8.2: Data unification across time and sectors by means of patient pseudonyms

M

105

106  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

The introduction of the Qesü-RL cleared the path for a new form of QA, which enables not only punctual (cross-sectional) data acquisition and evaluation of medical performance. For the first time, the merging and longitudinal analysis of different data sets of a patient that were collected as part of a QA procedure in different places and at different times is possible using a patient-related single pseudonym (Fig. 8.2).

2.4 Pseudonymization of Patient-Identifying Data Through a Trust Authority A prerequisite for the implementation of this approach spanning time and sectors in QA was a change in the data protection regulations. The § 299 SGB V, introduced with the GKV-WSG, laid the legal foundations for the ability to collect personal data for purposes of QA, to process and use it without the need for the consent of the person concerned. This is connected with data protection prerequisites: It is thus that in its regulations the G-BA must set out the purpose, as well as the necessity for this highly sensitive data (from a data-protection point of view) for the specific purpose of QA, and provide the possibility of pseudonymization of any data by which a patient can be identified. The lawful and proper implementation of the pseudonymization of the data that can be linked to a person follows in an independent trust authority (independent in terms of location, personnel, and organization), which was commissioned by the G-BA after a Europe-wide tendering procedure. This has the task of accepting and pseudonymizing all data sets with person-identifying data within the framework of the intended data flow (see Fig. 8.1). The basis for the creation of a pseudonym is the lifetime health insurance number, which is used today for practically all persons with statutory health insurance cover. The methodological approach to pseudonymization was defined in accordance with the provisions of § 299 SGB V and taking into account the recommendations of the German Federal Office for Security in Information Technology. In the process, the pseudonymization algorithm (cryptographic hash function RIPMD-160) was harmonized with the procedure for determining the respective pseudonymization key to be used for each specific procedure. The generation of this key is carried out by the independent trust authority using a true random number generator (non deterministic random bit generator) and is known only by the trust authority. Moreover, the evaluation of the data sets obtained at different places and times, and merged by means of pseudonym, may only be carried out by an independent authority, and only for the purpose of QA. This task is carried out through the institution under § 137a SGB V, the federal evaluation facility according to Qesü-RL.

8

The New Data-Supported Quality Assurance of the Federal Joint Committee



107

3 Opportunities and Challenges of Comprehensive Data Usage in the New Quality Assurance After the establishment of the necessary structures and regulations and the development of the first cross-sectoral QA procedures (indicator sets) through the institution in accordance with § 137a SGB V, the G-BA began with the execution of trial operations for the preparation of a comprehensive implementation of cross-sectoral QA procedures. This confirmed that the proposed data flow, including the pseudonymization process, as well as the merger of data sets and the interaction of the various players, can be largely implemented. However, certain problems also became evident: The biggest challenge is the triggering of the QA documentation. This refers to the automatically generated prompt to the service provider for a given treatment, which informs the service provider of the requirement for documentation of QA data. This must follow with sufficient sensitivity and specificity, both at the initial triggering of a QA case and at any subsequent triggering during the remainder of treatment. This is possible in inpatient care thanks to the use of ICD/OPS codes together with the coding guidelines – in this case binding – and has proven itself over many years in external inpatient QA. In contrast, the triggering with medical practitioners has still proven to be somewhat difficult, as binding coding guidelines are lacking and codes are used inconsistently. However, addressing the triggering issue is a vital prerequisite for functioning cross-sectoral QA. The following section describes possible solutions.

3.1 New Data Sources: Enhanced Options for Quality Assurance 3.1.1 Social Data in Health Insurance Companies New possibilities and insights for QA as a whole are expected through the use of data stores that health insurance companies maintain primarily for billing purposes. These very comprehensive data collections can be used both to circumvent the defined problems of QA triggering and to enable the capture of QA-relevant data that cannot be obtained via the previous documentation for the service providers (e.g. “death after dismissal”). In addition, the use of other billing data is possible (for example from rehabilitation and care facilities, pharmacies), which thus far has not been available for the QA of the G-BA. The use of this data source has not yet been developed, and was made possible only through an amendment of § 299 SGB V (insertion of paragraph 1a) through the Statutory Health Insurance Healthcare Structure Act in 2012. This empowered and obliged health insurance companies to process or use social data for purposes of

108  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

QA, in so far as this data and its recipients, as well as the necessity for QA are determined in guidelines through the G-BA.

PID

KK

LE

QD

AD

Health insurance PID AD

KK

LE

KK

PID

QD AD

LE

QD

AD

PID Patient-identifying data (PID) PID

PSN Patient pseudonym (PSN) LE Service provider (LE) -identifying data

(according to theme-specific regulations) LEP Service provider pseudonym (LEP) KK Health insurance-identifying data (KK)

KK

QD AD

Data collection facility PID

KKP LEP QD AD

KK P

PID

KKP Health insurance pseudonym (KKP)

LE

LEP QD

AD

QD Quality assurance data AD

Administrative data according to

14 Para. 4 of directive

Transport encryption or safe transmission path

PID AD

KKP LEP QD

AD

Trust authority Encryption with public key from

PSN KKP LEP QD

AD

PSN KKP

LEP QD

AD

PSN KKP LEP QD

AD

Data collection facility (DAS KK) Trust authority (VST)

Federal evaluation facility (BAS) Decryption with private key from Data collection facility (DAS KK) Trust authority (VST)

Federal evaluation facility

Federal evaluation Facility (BAS)

Source: http://www.g-ba.de/informationen/richtlinien/72 Fig. 8.3: Data flow model of social data (Qesü-RL)

The G-BA has integrated this usage in the Qesü-RL including the corresponding data flow model (Fig. 8.3). The necessary structures are currently being established. In this context there is particular need for a data collection point for the social data at the health insurance companies that takes on the data supplied by the currently 132 health insurance companies for a QA process and performs the required data set tests, technical transformations and pseudonymizations. The use of social data in the QA procedures of the G-BA will presumably take place in 2016, once the first topic-specific regulations, including the specifications for the respective QA procedures, are integrated into the appropriate guidelines. As this is always a retrospective and secondary use of data, a reliable assessment of the quality and usability of these data stores for purposes of QA can be made only after initial testing with real data. In addition to the expected benefits, the following disadvantages of social data for use in QA must be taken into account:

8

The New Data-Supported Quality Assurance of the Federal Joint Committee



109

Secondary data generally has a lower validity (i. e., what it is intended to measure it measures less accurately) than primary data (in this case collected for the purposes of QA), and may not contain all the information required for a comprehensive risk adjustment. In addition, the often stated advantage of social data at health insurance companies (i. e., that it is a less onerous alternative for service providers than data that they gather specially for QA purposes), is valid only at first glance: Because social data is also largely based on billing data collected by service providers, which can already automatically be used for QA purposes by the service providers using appropriate software. Additional costs for the service provider will only arise through the collection of additional data, in the event that QA requires further information that cannot be covered through social data.

3.1.2 Introduction of Patient Surveys The inclusion of the patient perspective is an established process in the inpatient and contract medical sector in the framework of quality promotion measures and internal quality management. Patient surveys are also used extensively by the cost bearers in the area of rehabilitation.⁷ Internationally, too, patient surveys are increasingly recognized as being significant. In Great Britain, for example, they have proved themselves as an early warning system for quality deficits in healthcare.⁸ The G-BA plans to integrate patient surveys as another pillar of data acquisition in QA in the future. For this purpose it has commissioned the institution according to § 137a SGB V to develop disease-specific survey instruments for the first crosssectoral QA procedures. In addition to the development of the first validated survey instruments and clarification of legal (data protection) basics, the definition of the data flow and the creation of the necessary structures are also required.

3.2 Use of an Electronic Health Card and Telematics Infrastructure for the Triggering of QA In addition to establishing new data sources, the G-BA is also considering plans to use an electronic health card for data-driven, cross-sectoral QA, especially to overcome the stated trigger problem. The application of QA markers on the card or the use of the

7 Deutsche Rentenversicherung. Reha-Bericht 2012. Online: http://www.deutsche-rentenversicherung.de/cae/servlet/contentblob/235592/publicationFile/30905/rehabericht_2012.pdf, [accessed on: April 23, 2014]. 8 Report of the Mid Staffordshire NHS Foundation Trust Public Inquiry. Online: http://www.midstaffspublicinquiry.com/report, [accessed on: April 23, 2014].

110  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

telematics infrastructure is currently being discussed. The goal is to inform service providers of an existing requirement to document a patient’s course of treatment. In addition to the opportunities arising from the use of such new technical possibilities in healthcare, there are still substantial obstacles: Current legislation does not yet allow the required marking without the consent of patients, and any use of telematics independent of any marking on the card is subject to the need to clarify the legal options. Furthermore, the proportionality of cost and benefit must be considered. Thus with both procedures, the card would have to be swiped for every inpatient and outpatient visit to a doctor. The use of the telematics infrastructure would additionally require a continuous online connection with a central background service and a query for every visit to a doctor as to whether QA documentation requirements apply in the given case.

4 Outlook 4.1 Expectations from Politics and the Public The high level of importance that legislation attributes to QA – particularly databased QA – has led to a gradual broadening of the competencies of the G-BA in this field over the past years, and is again highlighted by the coalition agreement signed on November 27, 2013, and the current German Government proposal for the Statutory Health Insurance Financial Structure and Quality Development Act from March 26, 2014.⁹ It is thus that the establishment of a new Institute for Quality Assurance and Transparency is planned, which, on behalf of the G-BA, would work on QA and the representation of the quality of healthcare in the legal form of a foundation governed by private law. There are also plans for the continued development of the data stores published in the quality reports of hospitals, as well as the expansion of the possibility of comparing individual hospitals based on their quality data and the possible linking of parts of remuneration to quality results. Finally, the use of social data in health insurance companies and patient surveys, as the data sources that complement QA, is explicitly considered a task of the G-BA and its institute. This ensures that the existing extensive data stores can be used for purposes of QA. It also takes into account the expectations of the general public for increased transparency when it comes to the data on the quality of the service providers’

9 Coalition agreement from Nov. 27, 2013. Online: http://www.bundesregierung.de/Content/DE/ _Anlagen/2013/2013-12-17-koalitionsvertrag.pdf, [accessed on: April 23, 2014].

8

The New Data-Supported Quality Assurance of the Federal Joint Committee



111

results and on continuing possibilities of data-based decision making. This is generally only possible with exhaustive surveys and thus contradicts the expectations simultaneously expressed by policy makers for less complex and costly data collection, e.g., through random sampling – because these sorts of data collection are typically inadequate to the task of providing the required transparency as well as a nationwide comparison of service providers. In addition, they have the disadvantage that no reliable claims for quality of service provision can be made in the event of infrequent occurrences such as complications.

4.2 Opportunities and Challenges of the New Data-Driven Quality Assurance The possible uses of personal data and comprehensive data sources that were presented here have opened up new possibilities for QA. Each data source has its own specific advantages, meaning that QA procedures can have an ideal configuration only if all data sources (Fig. 8.4) are used. The greatest challenges concern data security and the reasonableness of the effort required to ensure it. While the QA of the G-BA has so far worked with stores of anonymized patient data, new challenges are now arising through the use of merged, non-anonymized data with regard to the need to protect this highly sensitive patient data. The G-BA uses numerous measures to ensure data protection according to legal requirements. However, an eye must be kept on the evaluation of merged patient data, for example in Great Britain¹⁰ or in the United States¹¹ (even though it is implemented to a greater extent than in Germany), where it has led to controversy. Similar debates can be expected in Germany, once commercial companies, for example, begin to make more use of these pools of data in the context of secondary QA data use. The focus here is on the increasing risk of deanonymization, which can be caused by the merging of patient data from different sources, as well as by an accumulation of patient information over time from individual data sets. In the context of the use of these data stores, a reasonable ratio between effort (cost) and benefit must always be kept in mind. The requirement for the assurance of quality and transparency in medical services must be weighed against the need for reasonableness in the costs involved.

10 Hoeksma: The NHS’s care.data scheme: what are the risks to privacy? In: Brit Med J 2014; 348:g1547. 11 Sachin/Rosenblatt/Duke: Is Big Data the new frontier for academic-industry collaboration? In: JAMA published online April 03, 2014. doi:10.1001/jama.2014.1845

112  Karola Pötter-Kirchner, Renate Höchstetter and Thilo Grüning

Social data at health insurance companies

Data capture by service providers

Specific QA procedure Patient surveys

Analysis

Evaluation

Results of quality assurance Source: Authors' illustration.

Fig. 8.4: Various data sources for use in QA

We can safely expect that in future the technical developments in collection, transmission, processing and analysis of the large amounts of data required for QA will be able to drastically reduce the time between the performance and evaluation of the quality of a given service. Thus feedback to the service provider and publication of the results simultaneously with the provision of service could be enabled and effect continuous quality improvement. In addition, the treatment results, together with a comparative assessment, could be made available to each patient in order to improve patient autonomy. Because the G-BA’s mandate is the constant improvement of quality assurance. The fundamental purpose of QA is to make steady contributions to the quality of medical care, and thus to the health of patients.

Literature AQUA-Institut: Quality Report. https://www.sqg.de/themen/qualitaetsreport/index.html, [accessed on: April 23, 2014]. Bundestag print 16/3100. Online: http://dipbt.bundestag.de/dip21/btd/16/031/1603100.pdf, [accessed on: April 23, 2014]. Deutsche Rentenversicherung: Reha-Bericht 2012. Online: http://www.deutsche-rentenversicherung.de/ cae/servlet/contentblob/235592/publicationFile/30905/rehabericht_2012.pdf, [accessed on: April 23, 2014]. Hoeksma, J.: The NHS’s care.data Scheme: What Are the Risks to Privacy? In: British Medical Journal 348/2014, p. g1547.

8

The New Data-Supported Quality Assurance of the Federal Joint Committee



113

Coalition Agreement from November 27, 2013. Online: http://www.bundesregierung.de/Content/ DE/_Anlagen/2013/2013-12-17-koalitionsvertrag.pdf, [accessed on: April 23, 2014]. Reference Data Base on the Machine-Usable Quality Reports of Hospitals. Online: http://www.g-baqualitaetsberichte.de/, [accessed on: June 24, 2014]. Report of the Mid Staffordshire NHS Foundation Trust Public Inquiry. Online: http://www.midstaff spublicinquiry.com/report, [accessed on: April 23, 2014]. Guidelines on Measures for Quality Assurance in Hospital. Online: https://www.g-ba.de/downloads/ 62-492-790/QSKH-RL_2013-06-20.pdf, [accessed on: April 23, 2014]. Guidelines for Cross-Facility and Cross-Sectoral Quality Assurance. http://www.g-ba.de/informatio nen/richtlinien/72/, [accessed on: April 23, 2014]. Sachin, H.J./Rosenblatt, M./Duke, J.: Is Big Data the New Frontier for Academic-Industry Collaboration? In: JAMA published online April 03, 2014. doi:10.1001/jama.2014.1845. Further Commissioned QA Procedures: See G-BA Business Report 2012; p. 103. Online: https:// www.g-ba.de/downloads/17-98-3514/2013-08-23_G-BA-Geschaeftsbericht-2012-final_bf.pdf, [accessed on: April 23, 2014].

Werner Eberhardt

9 Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies In the introductory chapter, I described Galileo Galilei, who arranged the lenses of a telescope differently and thus dramatically increased its performance potential. Perhaps the small group of people who sat together at the Hasso Plattner Institute a few years ago had also read Galileo. What if we simply call the existing design into question? Perhaps a new way of thinking, a redesign and optimization of existing technologies, and the associated change of conception was also the basis for the new “invention” of in-memory technology. Out of 3 make 2 – can that be done? What if the basic construction of the IT trilogy of 1st application, 2nd database and 3rd main memory were possible in a different way? Could we make do with two components instead of three, and combine the main memory and the database? But what we might think of as technical tinkering and playing around is in fact a milestone in information technology – because, this simplification laid the cornerstone of in-memory. SAP is only at the beginning of understanding the potential of data analysis in medicine. But the project experiences already show us today: merging existing data pools can create completely new potentials, if they can be processed in real time. In fact, this first step does not even concern the new collection of data, but actually using existing data at all or using it better.

Content 1 Challenges that Require New Technologies  115 2 SAP’s Vision for Healthcare  116 3 Information Technology Pillars  117 4 SAP Technology in Use  119 5 Summary and Outlook  121 Literature  121

1 Challenges that Require New Technologies The demographic development in industrial nations and our desire for best-possible medical care require new concepts in health service. They seem to be within reach against the backdrop of dramatic technological change where Internetbased services accompany our entire life, large amounts of data can be processed in fractions of a second, and the results can be made available on mobile devices at low cost. The use and rapid processing of large amounts of data in healthcare can enable us to identify yet unrecognized causes of disease as well as unhealthy

116  Werner Eberhardt

and healthy patterns from which new diagnostic, preventive and therapeutic procedures can be derived. It is thus that an increased risk for type 2 diabetes was recently found among inhabitants of the American continent in a comparison of 125 genome variations from 629 individuals carried out by a renowned research institute using SAP technology. The challenges and technological opportunities form the basis for the currently increased commitment of many IT companies in the healthcare sector. From an IT point of view, the challenges lie primarily in the following: – The ever-growing volume of data. – The heterogeneity of the data sources. – The sensitivity of the data. How can data systems be planned and implemented in the healthcare sector if the data volume doubles every 18 months? The added value arises mostly from the connection of previously separate data systems, for example if diagnostic results as well as treatment processes can be considered for specific patient groups. But how can data from completely disparate systems, users, times and standards across national borders be made comparable and usable, and how can we ensure that personal data will not be misused at any time? These three questions show us that a whole series of innovative approaches are needed, of which the technological ones are an essential but not sufficient part. Professor Karl Max Einhäupl, CEO of the Charité in Berlin, sums up the requirements for personalized medicine when he says that a combination of distributed medical data and the analysis of millions of patient data in seconds with the help of in-memory technology will be required.¹

2 SAP’s Vision for Healthcare One of the objectives of SAP is to improve people’s lives. Even today, 74 % of worldwide sales and 16 trillion purchases by consumers are affected by an SAP system. In connection with the possible applications of technologies in healthcare, this part of the SAP strategy is thus given a completely new, more intense meaning. SAP will make its contribution toward promoting efficiency and effectiveness in healthcare and supporting the progress toward more precise and personalized medicine with IT innovations.

1 Einhäupl: In Praise of “In-Memory Data Management – Academia”. In: Plattner/Zeier: In-Memory Data Management (pg. V). 2012.

9 Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies



117

3 Information Technology Pillars From SAP AG’s point of view, in the coming years and decades medicine will be transformed, due to the convergence of sensor data, data from imaging procedures, genome and proteome analysis and social networks as well as the increasing possibilities of rapid data processing and bandwidth in mobile networks. In this context, the commitment of SAP is, among other things, based on the following technological pillars. – In-memory – highly parallel, rapid calculations with large amounts of data in the main memory of a computer with access times accelerated a hundred thousand fold compared to the use of data on hard disk drives. – Cloud – data processing as a service that is provided over the Internet so that a part of the IT landscape no longer has to be operated at the user’s location. – Mobile – the secure connection of mobile devices. – Semantics – the precise processing of unstructured data with the help of word analysis in documents, for example doctors’ letters. – Analytics & Predictive – the use of efficient analytics tools, which, among other things, also enable forecasts about the expected course of a disease. In these areas, SAP has taken on a leading position and is positioned particularly well to deliver critical contributions for the processing and use of large amounts of data in healthcare. For example, in March 20014 SAP announced that it had received an entry in the Guinness Book of Records. Together with the partners of BMMsoft, HP, Intel, NetApp and Red Hat, a 12.1 petabyte (12,100 terabyte) large computer system was set up using SAP HANA and an SAP Sybase IQ database.² If you were to burn all the data onto DVDs, it would result in a stack that is roughly 3 km high. In a single secure data center not far from Heidelberg several thousand servers are available to process customer data. Twenty-nine million users already use cloud-based SAP software today. A variety of technologies are available for the secure connection of mobile devices, the number of which now actually exceeds that of the entire human population on earth. In addition to data security and the ability to program applications easier, the SAP Sybase SQL Anywhere database, for example, enables the use of mobile devices even in remote areas without wireless connection.³ The

2 Bort: Software Company SAP Just Set A Guinness World Record For A Mind-Boggling Big Computer System. Online: http://www.businessinsider.com/sap-just-set-a-guinness-world-record-2014-3, [accessed on: March 22, 2014]. 3 Harvey, S.: Health in rural India will never be the same. Online: http://www.forbes.com/sites/sap/ 2014/04/10/health-in-rural-india-will-never-be-the-same/, [accessed on: April 28, 2014].

118  Werner Eberhardt

ability of rapid semantic processing of unstructured data is part of SAP HANA, which is required for the processing of the various reports and notes of medical personnel. Finally, powerful analytical tools such as SAP Lumira, SAP Predictive Analysis or SAP InfiniteInsights have to be used to filter out the important information from the flood of data in order to identify trends. In future, data scientists in medical research institutions and the pharmaceutical industry will play a crucial role in using the latest technologies to translate the resulting flood of data into medical findings. As shown in Fig. 9.1 and described in the following section on the basis of selected examples, the range of applications for these technologies is very broad.

Clinician Decision support

Medical knowledge cockpit Researcher Precision medicine

Genome data for precision medicine Proteome-based diagnostics

Medical insights

SAP HANA

Prescription analysis

Administration Optimized operation

Patient management (IS-H) analysis

Source: SAP. Fig. 9.1: Scope of application of SAP technologies and in particular SAP HANA

In-memory technology in particular must be emphasized again because of its special meaning. On average, the power of the computer CPUs (central processing units) doubles roughly every 20 months. Given the fact that the performance of individual cores in CPUs has pretty-much stagnated since the beginning of the new millennium, this was and is only possible because chip manufacturers introduced socalled multi-core technology. For software programmers this meant that they had to deliberately include parallel and distributed processing in their programming approaches in order to make use of the increased performance. With the development of multi-core technology, the connection between the CPU, the main memory

9 Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies



119

and other data input and data output devices quickly became a bottleneck, as highlighted in Tab. 9.1.⁴ Tab. .: Typical access times for data input and data output devices Activity

Time required in nano seconds

Access to main memory Transfer of 2,000 bytes via 1 Gb/s network Solid state disk read access Hard drive search

 ns , ns , ns ,, ns

Chip manufacturers reacted with the integration of a direct connection between the processor and main memory, which was also significantly cheaper. Close collaboration with Intel, and a number of other innovations such as the conversion of data storage from rows to columns, and various compression techniques enabled the introduction of the ground-breaking in-memory technology, SAP HANA.

4 SAP Technology in Use In order to provide an overview of the application possibilities of SAP technology, we will deal with the application examples shown in Fig. 9.1, moving in a clockwise direction through the different sectors, beginning with the researcher and precision medicine. Genome analysis has seen a significant upturn in recent years as a result of technical progress in sequence analysis and the corresponding reduction in cost since 2007, which will accelerate even further in the coming years. However, the information-technology processing of genome data, which, after all, consists of 3.2 billion letters for a complete sequence, is still a challenge. The result of sequence analysis for a complete exome consists of one terabyte of data of about 20,000 puzzle pieces for a computer to put together, and then determine the anomalies by comparing it with reference data. Putting together the sequence can be speeded up by a factor of 300 with SAP HANA, as was the case with the company MolecularHealth (www.molecularhealth.com), and be completed in hours. For a cancer patient waiting for an individual treatment recommendation, this constitutes significant progress, which can improve the prospects for successful treatment.

4 Plattner: A Course in In-Memory Data Management: The Inner Mechaniscs of In-Memory Databases. 2013.

120  Werner Eberhardt

In its system-biological approach, the company Alacris (www.alacris.de) has gone so far as to develop a virtual patient model consisting of a variety of differential equations. It was possible to speed up the solution of these equations by a factor of 5000. Healthcare providers would like to know how likely it is that people with certain genetic modifications will contract a given disease. So far such analyses have been associated with very high costs and long waiting times. Researchers from a Californian university used SAP HANA to combine the data of their “Variants Informing Medicine” (Varimed) database and the 1000 Genomes project (www.1000.genomes.org), and to compare 125 mutations that are connected with type 2 diabetes for 629 individuals. They got the result in less than a minute. So far, such studies have been restricted to the analysis of 20 mutations for performance reasons. In the process, a higher risk of type 2 diabetes was shown for Americans when compared to other ethnic groups. While DNA represents our construction plan, proteins are the actual building blocks that control all important processes in our body. A better understanding of our proteome, the entirety of all proteins in a living creature, has been the subject of intense research for decades, and provides important stimuli for medicine. A facility, www.proteomicsdb.org, was created in cooperation with the Technical University of Munich, which is helping to speed up the definition of the human proteome with SAP HANA. The findings were recently published in a scientific paper in Nature.⁵ Research is currently dealing with the detection of diseases through changes in protein expression. The aim of these efforts is to provide a basis for protein-based diagnostics. In a project funded by the German Federal Ministry of Education and Research (BMBF), a system for proteome-based cancer diagnosis on the basis of blood samples (SAP, 2013) was developed in collaboration with the Biocomputing Group of Freie Universität Berlin. Such a sample provides 150 million data points, which can be analyzed and compared in order to detect changes. The most immediate impact for the treatment of patients is expected from the academic research centers, which are closely connected to the operation of a clinic. This is how the National Center for Tumor Diseases (NCT) was set up in Heidelberg as a joint facility of the German Cancer Research Center, DKFZ, and the University Hospital Heidelberg. It brings together patient care, cancer research and cancer prevention under one roof. For such a body it is of vital importance to be able to analyze clinical data from different sources in a research and treatment context. For this purpose, structured data from the patient management system, from the laboratory and the cancer registry as well as unstructured data from doctors’ letters is merged in SAP HANA and analyzed with “Medical Research Insights.” For the first time, this

5 Wilhelm et al.: Mass-spectrometry-based draft of the human proteome, Nature 509, 582–587, May 29, 2014.

9 Big Data in Healthcare: Fields of Application and Benefits of SAP Technologies



121

enables the connection of the 3.6 million merged data points in real-time and thus any kind of analysis, without being limited by the otherwise necessary preparation of specific queries. For the first time, information from doctors’ letters is available in this context and enables, for example, the characterization of patient groups including the biomarker information provided in the doctors’ letters. In this process, a special emphasis is placed on data protection. Appropriate anonymization and pseudonomyzation processes as well as role-based access rights management ensure that sensitive patient information is protected. Finally, an example from prescription analysis will be specified here. The company Medimed has a cloud-based system running on SAP HANA that includes 1.5 billion prescription records from 10 million patients and 10,000 doctors. For example, an otherwise hour-long analysis showed in a second that neurologists prescribe drugs for migraine that are different from those prescribed by general practitioners. Such findings are important in many ways, and could be used for appropriate corrective measures as soon as the information is available.

5 Summary and Outlook The overview provided here shows us that we are standing at the beginning of an extremely promising development for each and every one of us, if we succeed in broadly implementing the technological possibilities in practice. However, in order to do so, a number of hurdles first still have to be overcome. A new way of thinking must be effected in the healthcare sector by overcoming the silo mentality and working together across multiple sectors. Citizens and patients need to weigh the risks and benefits objectively from case to case, while insisting that the very important issue of data protection is upheld. And policy makers have to lay the foundation for innovation in the healthcare sector through a reliable and far-sighted framework. Due to current demographic trends, hardly any desirable alternatives exist. SAP is committed to making a contribution to the improvement of people’s lives.

Literature Bort, J.: Software Company SAP Just Set a Guinness World Record for a Mind-Boggling Big Computer System. Online: http://www.businessinsider.com/sap-just-set-a-guinness-world-record-20143, [accessed on: March 22, 2014]. Einhäupl, K. M.: In Praise of “In-Memory Data Management – Academia.” In: Plattner, H. & Zeier, A.: In-Memory Data Management (pg. V). Heidelberg et al. 2012. Harvey, S.: Health in Rural India Will Never Be the Same. Online: http://www.forbes.com/sites/sap/ 2014/04/10/health-in-rural-india-will-never-be-the-same/, [accessed on: April 28, 2014].

122  Werner Eberhardt

Plattner, H.: A Course in In-Memory Data Management: The Inner Mechaniscs of In-Memory Databases. Springer, Heidelberg 2013. SAP: Proteome-based Cancer Diagnostics. SAP Innovation Center. Online: http://www.sap-innova tioncenter.com/2013/09/24/proteome-based-cancer-diagnostics-sap-hana/#, [accessed on: March 23, 2014]. Wilhelm, M. et al.: Mass-Spectrometry-Based Draft of the Human Proteome. Nature 509/May 29, 2014, pp. 582–587.

Axel Wehmeier and Timo Baumann

10 Big Data – More Risks than Benefits for Healthcare? The authors of “Telekom Healthcare and Security Solutions” try to provide a status-quo of the current situation with a focus on the service providers (hospitals). This area is said to be “years behind other industries.” Conclusion: In this group of customers, Big Data is currently not yet the hype that some congressional lions have made it out to be. Instead, the “focus in business intelligence is on specific economically relevant KPI assessments of the recent past.” In order to lift the potentials of Big Data in healthcare too, the authors recommend a very practical four-point plan: Work out benefits for users, transparency, privacy and interoperability. These are described here for future scenarios. Hence this is an article by practitioners in real life – with pleasant or nonexistent? – Big Data hype.

Content 1 2 3 4

Limited Penetration in Healthcare  123 Assessment of the Status Quo  124 Looking Ahead  127 Taking the Next Steps Actively and not Passively!  128

1 Limited Penetration in Healthcare Big Data is the new digital hype. Everyone is talking about it, similarly to eBusiness at the beginning of the new millennium. Marketing professionals are raving on about the new possibilities for control, automotive experts are tinkering with new process chains, and weather researchers are optimizing their forecasts. For them it is a matter of fact: The combination of collecting more data and being able to evaluate it better creates new, high-value opportunities. We refer to it as Big Data, in contrast to classical “business intelligence,” if the analyzed data is characterized by a very high data volume (for instance the approx. 12 terabytes of tweets per day around the world), a high velocity (e.g., the analysis of 5 million commercial transactions per second in real time) and large variety (like the unstructured data of pictures, videos and documents on the Web, which makes up to 80 % of all data traffic). Not all users automatically connect themselves with this assessment. Many want to counteract the profiling and disable location services in their smartphones. Ultimately, it is up to the individuals themselves to decide whether they wish to

124  Axel Wehmeier and Timo Baumann

make their personal data available, although the cost of not doing so would be a significant restriction of communication possibilities. Data in exchange for service – that is the implicit deal between users and service providers. The awareness of this issue on the part of the users is continuing to grow, and initial suggestions for data in exchange for service plus money are being discussed. In the healthcare industry too, the potential use of Big Data is obvious. Hardly anyone would deny that a better database often leads to better diagnoses, that digital, personalized efficacy testing enables better therapy management or that health service research would receive a very different empirical validity, if more data were available. However, in practical healthcare we can observe a far lower penetration of Big Data applications than in other areas. So what are the reasons for this reluctance? Is it due to the fact that the IT of healthcare service providers is years behind the IT of leading industries such as finance and the automobile sector? Or is it due to the fact that re-financing does not follow immediately? Does it have something to do with data protection? Or have we simply not found the right applications yet? This article is an assessment of the status quo with a main focus on clinical care providers. Where does the industry stand? What motivates the individual players? In the second part, we will provide a recommendation as to how the path can be smoothed for the acceptability of Big Data in healthcare in the medium term. User and data security play a crucial role in the process.

2 Assessment of the Status Quo Big Data is now part of the general IT industry portfolio and is provided with various technology tools in an IT company like Telekom. Examples of a multiplicity of services that have only just been developed in recent years are Cloudera Hadoop, an ecosystem for distributed processing of large data volumes with the use of lowpriced standard hardware and software; and Splunk, for intelligent visualization of contexts determined by using data analytics, such as Real Time Security Analytics: analysis through data access in real time. This application provides informative dashboards as a starting point for more detailed, security-related incidents, such as compliance correlations of various log data from multiple sites, and regional plausibility checks of applications from the central office. This can be provided either in real time or as a historical analysis. It is visualized using Geo-IP and maps. Another approach that is also of interest for the healthcare industry is so-called intelligent news discovery. Users can analyze up to 2 billion Web pages and videos per day, without having to possess their own IT capacities. Reduced indexation is carried out just like a semantic analysis of unused information from audio and video material. Based on thematic search requests, contextually relevant sections are found in the indexed data and made immediately available.

10 Big Data – More Risks than Benefits for Healthcare?



125

But remember: The tools described are largely used outside of the healthcare industry. However, it is not hard to outline reasonable scenarios for the healthcare industry. It is thus that Security Analytics is already used in the US, against the background of HIPAA and HiTech, as a part of a verifiable risk management system (see US HIPAA). News discovery would also be useful in the healthcare industry: Medical knowledge, based on the amount of published study results, doubles every two years, and new studies with new insights are provided every single day. So why do we not use modern data analytics technology to channel the results directly to the relevant doctors? For German service providers these issues are not yet of relevance. Instead, the focus in business intelligence is on specific, economically relevant KPI assessments of the recent past. The analyses include DRG numbers, case mix, and costs for personnel and material/medication in the past month, the last quarter or the fiscal year. The drivers are based on economic aspects (utilization of beds or operating rooms), cost control (materials, labor) or quality indicators such as complication rates, re-hospitalization or death rates per medical intervention. When it comes to the Big Data classification criteria of volume, velocity and variety, hospitals generally provide a mixed bag of results: – There is definitely a high volume, at least as regards the number of data sets. A typical HIS (hospital information system) contains hundreds of thousands of outpatient and inpatient episodes/treatment cases per year. This can easily result in millions of performance requirements and documentations. – And there is also no lack of variety: The data sets are contained in many different source systems from which they have to be exported. Abstraction levels, such as a central patient record with medical information (an IHE affinity domain) are still not standard in the industry today, but would certainly help. So the criterion of variety is clearly fulfilled. – In contrast to the velocity: The evaluations are carried out retrospectively, and not in real time. This has to do with more than the technology. This results mainly from the requirements of managing directors, financial directors and medical controllers. But the technology is often based on batch runs, which import the data from the individual applications into the data warehouse at night and then process it. But ad hoc analyses are common; whatever is contained in the data warehouse can be quite flexibly evaluated. According to the authors, there is today no hospital in Germany, Austria or Switzerland, which has not yet established such a tool. The standard involves encapsulated solutions: Databases with interfaces to HIS, LIS, RIS, Patho, etc. with pre-made evaluations (reports). However, the implementation is very complex. Considerable customizing is required to adapt to the individual facility’s account system, house structure, etc. This has nothing to do with Plug-and-Play, nor with the integration of unstructured data.

126  Axel Wehmeier and Timo Baumann

A first example of real Big Data is provided by the American radiology service provider vRad. It uses Big Data to optimize the logistics in radiology. All radiological studies are classified according to a specifically developed categorization. This therefore helps to answer questions like, “Which study belongs to which doctor? And when does that apply?” Furthermore, benchmarking can be carried out across different service providers. With a total of 23 million studies, the criterion of volume is fulfilled. Especially since the average size of a study is expected to be around 40–50 MB.¹ So the second characteristic, variety, is also fulfilled. vRad is a service provider for 2,200 health organizations in 50 US states, and the data is not elaborately prepared and structured. The criterion of velocity is at least indirectly fulfilled, as the findings can be applied directly to incoming studies. The cost-bearers also have a rich quarry in the form of huge amounts of data. But here too one can see a gridlocked, essentially outdated technological base, with which billing data that is limited in scope is evaluated ex post, especially with regard to financial optimization. Add to that specific perspective evaluations concerning healthcare practice. Furthermore, evaluations in real time are not possible in these structures nor is the inclusion of the entire variety of possible data sources. Therefore control potentials through modern data analysis remain unused here too. So far patients receive next to no findings related to them personally from Big Data. The Quantified Self movement took the path towards databased self monitoring. It is based on state-of-the-art sensors, which are typically not certified as medical products and are introduced into the market according to the rules of the digital economy. Sensor technology in the form of wearables from different, mostly American, providers have today found a relevant group of customers in Germany. Sales figures in excess of 100,000 devices are foreseen for 2014. The idea is to capture as much data, from exercise, sleep, stress, and nutrition to blood sugar levels, weight and pulse with state-of-the-art sensors and mobile data transmission. The focus is currently still on self-monitoring. However, with increasing user numbers and data, it is only a small step to Big Data applications – whether for individual use or for evaluation by a third party. A medical Big Data application that is increasingly relevant from the patient’s point of view is genome sequencing. Still largely, but not exclusively privately funded, companies such as sistemas genomicos in Spain or bio.logis in Germany offer patients analyses of their personal genome. The volume of data involved in an analysis is enormous: a complete genome has roughly 1 TB of raw data. And this is only one, albeit highly complex, source (the genome). Evaluation in real-time is not (yet) of any relevance. But perspectives are changing. What is the probability of a

1 Source: own research for D, A, CH.

10 Big Data – More Risks than Benefits for Healthcare?



127

patient contracting a certain disease? Or what therapeutic methods are proving to be the most promising options in the light of similar cases? With the increasing use of Big Data, a variety of new application scenarios will gradually appear. Last but not least, Big Data can also help to make a clinic’s own IT safe, as data protection advocates have been continuously reminding us. Analytical services like Cyber Sec. Adv. or Cyber Defence SOC from Deutsche Telekom have long since proven themselves in other industries to be a means of sighting and handling sources of danger. Other countries have already gone further than Germany: In the United States, that is already used in the context of HIPAA/HiTech legislation. Service providers have to provide a risk management system in the area of data protection and data security, for which they are certified. Monitoring of logs is considered an effective means for that (HIPAA compliance). In Germany, this is not an issue (yet), despite recognized data security flaws. At Telekom we can currently see that other industries are beginning to show interest, which still cannot be said for the healthcare industry.

3 Looking Ahead These few examples already show the big potential of Big Data for future healthcare, but in particular for healthcare research: Big Data is both the key for supply-relevant analyses in real time and in a real healthcare context, as well as the largest lever for a dramatic improvement in prognostics. Doing without Big Data in healthcare is simply no longer thinkable – it’s already happening. From our point of view, the crucial question now is how we can find a path to the future that provides quality assurance and is reasonable from a data protection and data security aspect. The basis for this must be a culture of voluntary and informed consent. We see four steps: 1. Show benefits for users: Evaluations are not an end in themselves. What benefits can patients and clinicians gain from the application of Big Data? Thus, anyone can carry out a personal risk assessment and ideally decide whether they wish to participate and provide data or not. For example NHS “care.data”: Do I feel that the possible medical progress and knowledge gained regarding drugs, operation technologies, etc. is worth my making my health data available to a national data library for medical research? 2. Create transparency as to what will be done with the data: What data is stored for how long? Where and how is it stored? How is access by unauthorized third parties, for example employers, secured? Who has access to the data and what evaluations are carried out? It is also recommended that new applications be discussed beforehand in a body that includes data security and data protection. To some extent, compa-

128  Axel Wehmeier and Timo Baumann

3.

4.

nies such as Telekom dispose of an interdisciplinary data protection advisory council with external representatives from politics, trade unions, academia, jurisprudence, and the Chaos Computer Club. Clear legal framework: Particularly in Germany, no modern regulation currently exists that takes into account today’s technological state of things. The field of data security is worthy of an upgrade, as the possibilities and the necessity of, for example, remote maintenance of software by a provider are only partially provided for (one exception being the Berlin State Data Protection Act). In § 203 StGB, criminal law takes a rigorous stance when it comes to the transfer of data. It simply forbids it. However, the fact that in the not-too-distant future hospitals will not be able to operate their own IT without external, non-medical service providers, is something that neither the penal code nor any of the special legislative Acts have taken into account. We do not necessarily have to follow the pragmatism seen in the United States, where HIPAA and HiTech clearly govern what data may be passed on in healthcare. But we will not be able to avoid reforming the current regulatory framework, in order to finally provide legal certainty in this area, and to pave the way for participation in the potential offered by a digitally supported healthcare industry. Interoperability: Prerequisites for the secure exchange of data in the form of standards are actually available in the healthcare sector too: ICD; OPS, DRG, LOINC et al. are already firmly rooted in everyday life, and help to change data into information. That is a basis for findings and insights or data mining, as it were. The main challenges that remain are the very large IT environments. Hospitals typically have between 30 and 100 core applications. What is also critical is the absence of semantic abstraction levels, such as IHE Affinity Domains, both within the individual service provider organizations as well as across sectors. This is where the ELGA in Austria has set a good example.

4 Taking the Next Steps Actively and not Passively! In sensitive areas such as healthcare, the issue of Big Data is currently characterized by NSA & Co. Essentially, you can see the risks, although the opportunities are obvious. The Guardian calls them “opportunities of biblical proportions.”² Medical data mining with the data of 50 million patients in the United Kingdom will certainly deliver findings for the benefit of the patients.

2 The Guardian UK from February 22, 2014.

10 Big Data – More Risks than Benefits for Healthcare?



129

A strong dynamic flow has developed in medicine around the world to specifically explore and apply data analytics for healthcare. Not to do so in Germany would mean the non-participation or totally insufficient participation in major developments in medicine, such as personalized medicine or integrated care concepts. From our point of view, this is not an option, especially seeing that, in the long term, as with the Internet economy, taking on American standards is inevitable. Based on a culture of informed consent and transparency, we have outlined a path that differs from that of US American pragmatism, without giving in to anything like denial. This is where service providers, suppliers and patients also have to start thinking differently and open up to stronger reciprocal communication and new forms of task distribution. Incidentally, only in this way will it be possible to include political and ethical issues in this development and to find a responsible path to future healthcare that is better supported by modern digital technologies. To do without Big Data in that process is not an option, since the opportunities clearly outweigh what are, from our point of view, controllable risks.

Marcus Zimmermann-Rittereiser and Hartmut Schaper

11 Big Data – An Efficiency Boost in the Healthcare Sector In this article Siemens experts demonstrate the potential that can be found today and in future, especially at the interface between medical devices (such as CTs or MRIs) and information technology. The authors are convinced that “the success and the use of Big Data applications in the healthcare environment depend on more than technology.” The article is especially worth reading as it demonstrates the practical application of Big Data today and outlines its use in the near future but also describes the complex framework for a “patient-related context.”

Content Introduction  131 How Big Data Can Provide a Boost to Efficiency in the Health Care Sector  132 3 Use of Big Data at Siemens Healthcare  135 4 Recommendations for the Future  137 Literature  138 1 2

1 Introduction Since the emergence of Big Data as a buzzword in 2012, we have probably all come up with our own definition for Big Data and where Big begins. However, given the growing digitalization of our lives, there can be no doubt that data represents values and provides a potential for the generation of new knowledge, which can be extracted from the data, if put in the right context according to the respective task. Interestingly, Big Data has already been produced and processed for many years – banking and climate research are good examples – but the hype about Big Data, in healthcare too, was stimulated through the introduction of certain technologies – e.g., IMDB (In Memory Data Base) and so-called scalable distributed computing frameworks – with which Big Data processing and retrieval in real time is actually possible. Recent acquisitions by Google and Facebook in the consumer sector are further signs of the future potential and value of data, and for existing as well as planned new business models. So it is not surprising that in all areas where there is both a great demand for the production, distribution, analysis, structuring and visualization of huge amounts

132  Marcus Zimmermann-Rittereiser and Hartmut Schaper

of data – even in real time – and also fast growing expert knowledge and expertise, Big Data per se and the processing of Big Data is regarded as a key component. In fact, some industry players even see Big Data and the associated potential as THE key to success or failure in the relevant industry. Big Data is also a key issue in the healthcare sector – even if its importance, scope and extent are not yet completely clear to the different players.

2 How Big Data Can Provide a Boost to Efficiency in the Health Care Sector If we consider some of the distinctive features of the healthcare sector more closely, it becomes clear why we at Siemens Healthcare are convinced that Big Data has enormous potential, not only to increase efficiency but also to provide a boost for the effectiveness of diagnosis and treatment of patients in particular. The early detection of diseases can thus also be managed much more effectively – for example by knowing that there is epidemiological and genetic evidence, and thus a particular risk for the occurrence of a chronic disease in a particular person. According to a report by the World Economic Forum and the Harvard School of Public Health, published in September 2011,¹ 63 % of all global deaths are associated with the four major chronic diseases, also known as “Non-Communicable Diseases (NCDs).” These diseases are: heart/circulatory disease, cancer, chronic respiratory diseases – mainly COPD (Chronic Obstructive Pulmonary Disease) – and diabetes. Mental illnesses will also have to be considered more and more in the future. Although the percentage annual increase in case numbers, for example in Europe, is slowing, the impact and burden in terms of costs and productivity losses are continuing to rise. The continuing increase in life expectancy of about two years per decade, the structure of the aging population – with a huge increase in the proportion of over-65 s by 2050 – and additional environmental issues, such as increasing air pollution, also contribute toward further increasing the number of cases. However, not only the number of cases for each disease is increasing, but also the number of patients affected by more chronic diseases simultaneously. As early as1999 almost 50 % of the US population of over 65-year-olds had at least three chronic diseases, and 21 % had at least five. In terms of the prevalence of these diseases there are no major differences around the globe, but in terms of healthcare spending two numbers have to be singled out.

1 Harvard School of Public Health/World Economic Forum: The Global economic Burden of Noncommunicable Diseases. In: World Health Statistics 2012, WHO/Eurostat, Statistics in Focus, 07/2012.

11 Big Data – An Efficiency Boost in the Healthcare Sector

–

–



133

In the US, 5 % of the total population is responsible for almost 50 % of total national health care expenditure.² And these 5 % are almost exclusively the chronically ill and the elderly. In Europe, only 3 % of total healthcare expenditure is spent on prevention programs.³

In consequence, that means: If we could improve the treatment of chronically ill patients – i. e., not per episode, but with a stronger individual focus and according to the phenotypes of patients, and with a greater emphasis on prevention in order to avoid costly hospital stays – then we could greatly improve the efficiency and effectiveness of the individual treatment of each individual patient. Big Data is one of the key elements along the way to making all of that become reality. But unfortunately the data that makes Big Data is today stored in hundreds of separate repositories, registries, databases and IT-specific subsystems, which are in the hands of hundreds of different organizations, including hospitals, health insurers and providers of all kinds of health care services, medical practices, pre- and aftercare facilities, secondary, tertiary and social welfare and rehabilitation facilities, health authorities, government and research institutions, the pharmaceutical industry, epidemiological and local, regional, national and international biobanks. The majority of the data was – and still is – completely unstructured and consists of pure text entries or is only available on paper documents, which become accessible only if they are converted into electronic documents and analyzed with ontologic and semantic processes. In addition, the exchange of such data is very complicated, as the variety of different data formats is hardly compatible. But even if that is the case, and access is granted, the statistical significance of the correlations derived is questionable, since the sample size is usually too small to derive the necessary evidence from it. And even if the technical requirements and conditions are met, the issue of the legal conditions for data security, privacy and protection of privacy – equally important, nowadays – have to be solved, and implemented with much better – i. e., safer – technical means than are used today. Also, innovations in the field of Big Data will only be implemented in practice fast enough if a viable business model exists, i. e., there is a reasonable price-performance ratio, and it is obvious who is receiving the added value and who is paying for it. That is why all Western healthcare markets, not only in the USA, have to get away from the existing business model of “fee for service” – in which payment is for performance – and move to “fee for outcome” – where payment is for results. Only those who deliver demonstrable

2 HEALTHCARE IT AND DISTRIBUTION – The Future of HCIT Population Health Management. Leerick Swann, Healthcare Equity Research. 3 Health at a Glance 2012 Europe 2012. OECD Publishing.

134  Marcus Zimmermann-Rittereiser and Hartmut Schaper

results are rewarded, whereas those who do not provide a result corresponding to the treatment offered are “punished.” In order to support this realignment, all available data sources must be used, and structures or conditions developed with which Big Data in the healthcare sector can become a reality. Given the conditions described above, we must assume that Big Data will begin on a small scale in specific focused applications and be extended to potentially complex indication areas. But we should not forget one thing: Even if the data from Big Data is made available to the user in a structured, harmonized, modelled and clear way, it still requires a doctor to use it as a tool for clinical decisions – a doctor who can interpret all the data presented to him or her and can associate it with the corresponding disease in a “patient-related context.” This link must be made against the background of the total specific phenotype of the patient, which is associated with a substantial amount of possible existing and new data sources, all of which must be understood: data from imaging systems, laboratory data, electronic medical records (EMR), data stored in particular IT sub-systems (e.g., pathology, endoscopy and dermatology systems), so-called omics data, data from genome sequencing, epidemiological data and much more. And most likely it will not only be doctors who will apply this knowledge. Doctors will have an interdisciplinary team of experts working with them – such as epidemiologists, mathematicians, computer scientists and bio-informatics specialists, to name a few – who will help them interpret the large amount of data. This is where the importance of the concept of a “patient-related context” should be noted. The healthcare IT market has already seen various attempts, in particular from companies that did not come from the healthcare sector, to bring together all kinds of information and a certain amount of pooling and processing of the primary information – also through targeted user interaction – so as to then present a “result” in real time. However, it is hardly necessary to go into the very details of artificial intelligence to recognize that it is an illusion to think that simply more data means more understanding. It’s not about collecting more data from Big Data, but about relevant data in a specific patient context. That is why we are convinced that the success and the use of Big Data applications in the healthcare environment depend not only on technology. Instead, the critical factor is the need to develop – through the use of Big Data – the potential with which the existing distributed data sources and formats can be linked together, to then put them in a disease-specific context and thus to provide either the proof of the success of a new, promising clinical approach, or to verify diseases at a molecular level and to be able to treat these patient populations much more effectively in future. Due to the growing contribution of genetic and epidemiological data in understanding, in forecasting and in clinical decision making, the use and availability of Big Data technology in these areas is also critical to success, in order to be seen as the foundation of a new approach to prevention and to medical care.

11 Big Data – An Efficiency Boost in the Healthcare Sector



135

3 Use of Big Data at Siemens Healthcare While the Big Data applications identified above are developing only gradually, the use of Big Data and the associated technology has already been on the agenda for years at SIEMENS Healthcare. A practical example is the area of customer support. This is where we are connected with our more than 50,000 different systems (mostly imaging systems, such as CT scanners, MRTs, positron emission CTs, angiography and card angiography systems, etc.) via our service platform and can monitor, among other things, system status, so as to detect system failures as soon as possible. Only a decade ago system status information on components, assemblies or on the specific load of an x-ray tube, for example, were converted from hexadecimal into understandable text messages, which then provided the facility technicians with a reference for the possible causes of failure. Nowadays this effort is no longer made. Instead, huge amounts of raw data are directly transmitted from the system via the service platform, then structured and processed, to be used not only for efficient service management but also for numerous other new services for our customers. These include the following: Prognostic maintenance By analyzing Big Data from the globally installed base, we are able to inform our customers about necessary and thus planned maintenance at an early stage. In that way unnecessary patient preparations – e.g., the administration of contrast agents before the examination and/or the logistical challenges of rescheduling patients due to system maintenance – can be avoided. We can actually predict the failure of certain components very accurately through the use of specially developed algorithms and simulation models. For example, we are able to predict the failure of an X-ray tube of a CT scanner with an accuracy of one week. If we combine this data with the service activities on the modality or with external influences such as room temperature, humidity or other environmental data, we are able to detect certain dependencies that are important both for the customer and for us in order to further reduce system downtimes. Equipment utilization Given the expected future budget cuts, the cost efficiency and utilization rate of devices is becoming increasingly important. This is especially true for the clinical disciplines such as radiology or cardiology that are highly dependent on imaging. By analyzing device data (e.g., the number of patients per time period, examination initiation, end of the examination, timestamps for non-use of equipment during transport or exchange of a patient, number of scans/sequences performed, type of sequences used, dose employed, selected image characteristics, administered contrast agents, etc.), we can provide the equipment user with a detailed usage report

136  Marcus Zimmermann-Rittereiser and Hartmut Schaper

and also suggest measures to optimize utilization. Knowing the utilization of one’s own equipment can provide information on where and how improvements are possible. However, if the performance data of a device is compared with that of thousands of other anonymized users, possible best practices can be derived and one can see how far away one is from the best – and what kind of effect making up the difference would have. All this is accomplished by bringing together and analyzing millions of data sets, which are produced every minute by our systems installed around the world. But in future we should aim to tap an even greater potential that goes far beyond the machine-related knowledge described above. Just think of the immense potential we could tap if, for example, we could link the data from imaging, or the imaging system, with information from an RIS (Radiology Information System) or CVIS (Cardiovascular Information System) and the associated billing systems, or even with data and information stored in a Clinical Information System (CIS) or Hospital Information System (HIS). Or imagine you had to undergo a specific x-ray examination. You would want this examination to be carried out with the smallest possible dose necessary to enable an appropriate safe diagnosis. But how would you know whether the smallest necessary dose would apply to you, and whether a lower dose might not actually have been enough to ensure a safe diagnosis? A real time comparison with thousands of other users, who were given the same dose for this clinical issue and diagnosis, would provide a first indication. And would not everyone who uses x-rays on patients be thankful for a reference before the imaging process to confirm that the applied dose is actually the benchmark for the respective image? Another example would be the use of existing findings and image data, and their combination with a clinical problem, for example regarding a specific diagnosis in liver tests. By combining the problem with the corresponding existing findings and image data as well as the use of semantic and ontological technologies, you would be able to compare your own test results with those in a reference library, which could then also provide additional clinical information. These are only two case examples that illustrate the enormous value that results from the combination and use of data and go far beyond the mere use of machine-related information.

4 Recommendations for the Future In order to exploit the full potential of Big Data in the healthcare environment and to make healthcare more outcome-oriented, we believe the following important challenges have to be addressed and overcome:

11 Big Data – An Efficiency Boost in the Healthcare Sector

1.

2.

3.

4.



137

Creation of national centers of excellence for Big Data to explore applications, advantages and risks of Big Data With respect to Big Data, today’s existing IT technologies are not suitable in terms of volume nor with regard to structuring or analysis, and certainly not in terms of heterogeneity and the ability to implement machine learning. In addition, dealing with Big Data – as already indicated – requires an interdisciplinary approach involving the various competencies in the field of healthcare. We have to invest in competence centers for Big Data research, in which the basic structures are defined and created, and where new technologies – in particular for IT security – are evaluated. The initiative to promote centers of excellence, recently launched by the BMBF, is an important step in this direction. Investment in new technologies for IT security Given the loss of confidence in the areas of data security and privacy – the plans by Britain’s NHS provided a negative example in the area of healthcare earlier in the year – it is imperative that we also invest in research to improve IT security. The idea is not necessarily to always achieve 100 % IT-security – that would be an unrealistic goal. However, it is necessary to come as close as technically feasible to the best possible safety through continuous developments. Continuation of efforts to promote recognized open standards of interoperability and further investments for the introduction of electronic patient records Without openness (for everyone) and standardization (by everyone), data sharing and interoperability are simply not possible. At the same time we must significantly accelerate the deployment and implementation of clinical information systems. Surveys in Europe have shown that there is still a great deal to do in that respect.⁴ An existing business model Even if it is technically possible to solve the above-mentioned challenges of Big Data, we still lack a viable and sustainable business model in the essential healthcare markets – apart from the USA – which supports the use of Big Data for transformation in the direction of outcome orientation. Not only the model is missing, but also an idea of how such a model could be put into practice and how payment could then be based on the value or the outcome.

As long as such a business model is not established, investments in Big Data and its technologies will not lead to the desired results.

4 European Commission, JRC Scientific and Policy Reports, European Hospital Survey: Benchmarking Deployment of eHealth Services 2012/2013.

138  Marcus Zimmermann-Rittereiser and Hartmut Schaper

Literature European Commission, JRC Scientific and Policy Reports: European Hospital Survey: Benchmarking Deployment of eHealth Services 2012/2013. Harvard School of Public Health/World Economic Forum: The Global Economic Burden of Noncommunicable Diseases. In: World Health Statistics 2012, WHO/Eurostat, Statistics in Focus, 07/2012. HEALTHCARE IT AND DISTRIBUTION – The Future of HCIT Population Health Management. Leerick Swann, Healthcare Equity Research. Health at a Glance 2012: Europe 2012. OECD Publishing.

Thilo Weichert

12 Medical Big Data and Data Protection Big Data in medicine – it’s unthinkable without data protection. The author, the representative for data security for the federal state of Schleswig-Holstein, points out the fact that “Big Data in the area of healthcare is possible in a way that is constitutional and conforms with civil rights.” “But this presupposes that the protagonists lose the notion that Big Data is only a matter of technical feasibility.” Because, when handling health data, the usual data security objectives apply: privacy, integrity, availability, non-linkability, transparency and the ability to intervene. However, the author also sees need for action: “In this respect, up to now national assemblies have – unjustifiably – seen no need for action. The legislative shortcomings in the area of health data privacy are already striking. And the situation will only be further aggravated with the use of Big Data technology in this area. It’s nice to see that the German Bundestag has actually arrived in the digital age by recently establishing a committee for digital policy. But this is merely the beginning.” Data protection has to be a central aspect for applications of Big Data in medicine. The balance between data protection and “informational self determination” will have to be discussed further in the future: Who decides what happens with my personal data? Do I have the freedom or the obligation to decide, or will (should?) others decide for me? Do we need other – European – rights? Or better control and executive authority? Similarly to the way in which the issue of Big Data should not be left to technicians, jurists alone will also not be able to find solutions.

Abstract: Big Data using health information is subject to high constitutional demands. The data sources as well as the objectives must be clearly defined beforehand. Possible unintended effects must be addressed in a risk assessment and be controlled through safeguards. Constitutional protection and at the same time high gains in medical knowledge can be attained through anonymization and (possibly complex) pseudonymization, as well as through legally secured technical, organizational and procedural requirements.

Content 1 2 3 4 5 6 7

Expectations  140 Legal Biological Principles  140 Data Sources  142 Health-Specific Use of Data  144 Risks  144 Data Protection Mechanisms  145 Data Economy in Particular  146

140  Thilo Weichert

8 9

Transparency  148 Conclusion  149

1 Expectations The promises sound almost too good to be true: Big Data will revolutionize our health system. We are told that research, prevention and treatment could be massively improved while at the same time costs could be massively reduced; data could be provided virtually anywhere and in real time, thus providing the social and individual prerequisites for an optimization of health protection. More skeptical voices point out, though, that the objectives pursued through Big Data can be reached only if there is public acceptance of the intended data analyses. People have specific expectations of confidentiality where their health data is concerned. They believe that when Big Data is used in the context of health data, these expectations are ignored or not sufficiently taken into account. And in fact it is for good reason that legal restrictions limit the evaluation of personal data for the purpose of gaining medical insights.

2 Legal Biological Principles Medical data has been subject to medical confidentiality since the so-called Hippocratic oath, which has been in force for more than 2000 years. This is based on the consideration that a person seeking help will trust a potential helper fully only if this will have no adverse consequences for the person himself. This comprehensive entrustment is necessary for the person who is helping, in order to provide adequate – i. e., individual, professional, situational and sufficient – help. This basic principle of the Hippocratic Oath has today found its way into the German legal system in a number of different ways: § 203 Paragraph 1 of the German penal code (StGB) makes the unwarranted disclosure of professional secrets of auxiliary persons such as doctors, pharmacists or governmental social workers a criminal offence. In the professional codes of the medical chamber or chamber of pharmacists, patient secrecy is recognized as a rule of conduct. In § 35 of the German Social Security Code (SGB) I, the same is defined as “social secrecy” for social services providers. The protection of health data is based not only on the basic idea of particular professional confidentiality protection, but also on the consideration that such information is of the highest sensitivity from a legal point of view, and can put an affected person in a situation of highest vulnerability in any given social context.

12 Medical Big Data and Data Protection



141

That is why general data protection law particularly protects such data and makes the admissibility of processing principally dependent on consent or closer, detailed consideration (§§ 3 Paragraph 9, 28 Paragraph 6–9 German Data Protection Act – BDSG, § 7 Paragraph 12 SGB X). This protection is also expressed in a variety of other laws, for example in the specific standards for statutory health insurance in the SGB V, in hospital laws, healthcare service laws, cancer registry laws, in the Genetic Diagnostics Act (GenDG), Infection Protection Act (InfSchG), Pharmaceuticals Act (AMG) and in many other area-specific standards of medical law. The protection of informational self-determination in terms of health data is only one aspect concerning the protection of more comprehensively understood medical self-determination. It states that each of us has a right to freely decide about and for ourselves, including the disposition of body and soul. A central aspect of medical self-determination lies in patients’ freedom of choice – such as the choice of doctor (see. § 76 SGB V). It includes our right to decide whether or not we want to call upon medical help, and if we do, in what form. This individual power of disposition does not absolve the state and society from the duty to exercise healthcare for the people. The State’s fiduciary duty may consist of having to provide certain health services, even of a purely informational nature. This can lead to specific services having to be provided in a specific way, and to the services not being open to privatization or only under strictly defined circumstances. A special aspect must be considered when legally assessing the processing of a great amount of medical data: its individual uniqueness, which is so often the case. This is especially true for the results of genome analyses (DNA analyses). Genetic data may be of significance when it comes to immutable psychological or physical dispositions, such as being prone to suffer from a certain illness. Even though such statements, according to the current state of knowledge, can only be made with a certain probability, the information does take on the form of something fateful and at the same time highly personal. The DNA and the individual sections of a human’s DNA are given the function of unique identifiers. The same often applies to conventionally established medical findings. This has the consequence that in the medical field anonymization of data is often difficult to achieve or not possible at all. In spite of its highly personal nature, this is information that, according to scientific findings, will apply to biological relatives or may even be of significance for an entire ethnic group. The national legal standards are today based on a common European constitutional understanding, which has been included in the national constitutions, such as the German Basic Law (GG), and in the European Charter of Fundamental Rights (EUGRCh) of 2009. Here, life and physical integrity are constitutionally protected,

142  Thilo Weichert

including the right to prior information and the right to free and informed consent. Any eugenic selection is prohibited (Art. 1, para. 2 sentence 1 of the Basic Law, Art. 3 EUGRCh). With respect to genetic characteristics, there is a prohibition of discrimination (Art. 3, para. 3 GG, 21 para. 1 EUGRCh). Art. 35 EUGRCh states a “right to healthcare and medical treatment” and declares “health protection” to be a task of the state. The informational side of this protection is ensured by the right to informational self-determination, derived from the general personal rights (Art. 2 para. 1 in conjunction with 1 para. 1 GG), which is now explicitly standardized as a fundamental right to data protection, in Art. 8 EUGRCh. Of course, such safeguarding based on the protection of the individual must be seen in the context of other constitutional guarantees, such as the freedom of research (Art. 5, para. 3 sentence 1 of the Basic Law (GG), Art. 13 EUGRCh) or the professional freedom that health professionals are entitled to (Art . 12 GG, Art. 15 EUGRCh). So our legal system on no account assigns absolute importance to medical or informational self-determination. In fact, these are embedded in a constitutionally characterized canon of rights and duties, with which individual as well as societal objectives have to be brought into harmony with one another. In this sense it should be noted that the European legal system – unlike for example the US system – has a comprehensive understanding of fundamental rights. This includes not only state bodies, but also – particularly dominant – private entities, which are themselves bearers of fundamental rights (freedom of research, economic activity), having a third-party effect. Furthermore, the understanding of modern European fundamental rights also includes the state’s thus derived obligation to protect. This is especially true in the healthcare sector. Such protective duties can intensify in such a way in the informational health sector that the State can itself be required to render Big Data-based information services, if they are not provided elsewhere.

3 Data Sources The possible data sources for Big Data are highly diverse in terms of their quality, their scale, their sensitivity and their significance. The first ones to mention are the private and public healthcare providers: doctors, pharmacies, hospitals, psychologists, medical and nursing services. Many of these service providers will be electronically networked in Germany in the foreseeable future through the telematics infrastructure developed for the electronic health card (eGK) (§ 291a SGB V). As healthcare services are in general not provided free of charge, the largely standardized billing procedures used by the associations of statutory health insurance physicians (§§ 77 ff. SGB V), the statutory health insurance providers (§ 284 SGB

12 Medical Big Data and Data Protection



143

V, § 94 SGB XI), the private health insurance providers and organization of family doctors (see § 73b SGB V) provide further rich data sources. It is not only the direct billing of health services by private and public health insurance companies that is at issue, as a data resource. Health services also have indirect economic and legal consequences, for example in the other areas of (professional) accident insurance, or care, pension and life insurance. The apparatus of health accounting has created other institutions that handle vast amounts of healthcare data for the public health insurance sector with the objective of economic control (§§ 106 ff. SGB V) and quality assurance (§§ 135 ff., 299 SGB V). In addition to the already mentioned associations of statutory health insurance physicians, this includes the medical services (§§ 275 ff. SGB V) as well as a variety of control and quality assurance institutions. Special data processing programs exist for specific diseases and certain forms of collaboration, such as the Disease Management Program or programs of integrated care. Information technology (IT) service providers have their own function within the area of diagnostics and therapeutics, for example as providers of medical information systems (MIS). The data processing for special treatment issues is often outsourced to specialized service providers. The same applies to the extensive accounting sector, from service providers for health insurance companies up to the wide variety of IT service providers in the area of treatment and care, such as pharmacy data centers (§ 300 SGB V). Health research draws on a variety of data sources. As a secondary use, it feeds in part on the above primary sources, but also in part on its own sources of data acquisition. Institutionalized interlinking of treatment and research is present in the area of disease registry, for example in the cancer registry regulated by law, or in treatment and research registers for other diseases. A completely new area of health-related data processing is being accessed by the wellness and spa industry as well as the lifestyle industry that heavily uses the Internet. With the help of “wearable computing,” usually using mobile devices, people capture health data about themselves in everyday life, such as breathing, heart activity and circulation, for example during (sports) activities or even at work, with the objective of self-motivation or self-optimization (buzzword: “quantified self”). Often such services are provided via apps and are a part of wider platforms, e.g., social networks. Self-help networks that are organized conventionally as well as via the Internet are also extremely interesting for the healthcare industry. In addition to the acquisition of actual health information, other databases for which a link to health is assumed also play an important role for Big Data applications. This can be socio-demographic or statistical data on financial, family and work circumstances, on infrastructure, on voting behavior, consumer attitudes or even on climatic conditions.

144  Thilo Weichert

4 Health-Specific Use of Data The purposes of health-specific Big Data analysis can be as different as the data sources themselves (see 3 above). The primary objectives that are pursued have to do with medical treatment, including the associated communication and division of labor. It’s about optimizing the use of personnel, hospital beds and other treatment resources, the prevention of side effects caused by medication, through to personalized medicine by adapting the treatment to the individual patient’s disposition. A special case – which is also becoming increasingly common in treatment – is gene sequence analysis and the matching of results to phenotypes/diseases. This technique, which requires large computing capacities, is increasingly used for diagnoses. Genetic analyses have long been one of the main fields of activity in medical research. Today, they have even found their way into the lifestyle business. Further areas of application include care, support and prevention. To these can be added the services of Ambient Assisted Living (AAL), for improving the care and life situation of the disabled, the elderly, or otherwise vulnerable subjects. Representative data is required for purposes of medical quality assurance, which can then be compared with the individual treatment events in order to identify structural or individual treatment deficits and opportunities for optimization. Another extensive area of application pursues the objective of preventing the explosion of costs and reducing health expenditure that appears to be unnecessary or disproportionate. This can be pursued through optimization of the use of personnel and other resources or by efficiency checks in line with the mentioned quality assurance. The behavior of those involved can be influenced through bonus and malus systems or co-payment by patients. A typical application for the use of large health databases is the management, negotiation and recalculation of compensation contracts (§§ 82 ff. SGB V). A great potential of Big Data lies in medical and biotechnological research. The research is at the same time both a frame condition and an independent objective of the analysis of large health databases. Finally, a strong expansion of health-related data analyses is taking place in the wellness/spa and leisure industry. This includes offers for sports and entertainment activities or for self-optimization.

5 Risks The individual and social risks when using Big Data relate primarily to classical data protection. Not agreed-to, secret and/or improper use of data from the most personal healthcare sector can generally affect personal development, often in existential areas of life such as employment, family or sexuality. A compromise in confidentiality can

12 Medical Big Data and Data Protection



145

affect the faith in the assistance or care provided, which may affect the utilization of medical help. The impairment of the integrity and availability of data in the field of medical treatment can lead to tangible damage to health. A special feature of medical data processing is that the imposition of knowledge can cause health and mental damage to the affected person, for example if information about untreatable genetic dispositions is disclosed. On the other hand, withholding health-related data can also have negative health consequences, if for example personalizable findings from medical research about the treatability of a medical condition are not passed on to the affected person or the person treating him. Behavior can be influenced in a targeted way through incentives or through malus systems, which the person concerned may not even be aware of. Thus the freedom of choice in health matters might be restricted. The economically, ethnically or otherwise motivated selection of risks can ultimately lead to manipulation of and medical discrimination against those affected, as well as (sometimes) poor medical care. In the end, economic considerations are almost always behind Big Data analyses in the health sector. These could be a market advantage for the data processor, an exclusion or discrimination of competitors, or targeted advertising means that manipulate behavior. This can often result in the commercial exploitation of the precarious health situation of individuals in order to maximize corporate profits.

6 Data Protection Mechanisms The existing regulations for handling health data pursue the general data security objectives of privacy, integrity, availability, non-linkability, transparency and the possibility of intervening. The objectives of confidentiality and non-linkability serve to achieve preferably non-discriminatory and unhindered use of medical help. Third parties should not obtain any knowledge of a potentially embarrassing predicament. Such a predicament may have physical or mental, but also social, familial, or economic reasons. A strict appropriation of the collected sensitive data is more important here than in most other areas. While Big Data conceptually aims at full concatenation of existing data resources, the objectives of confidentiality and non-linkability stand in stark contrast to that. A concatenation (transmission, internal other use) and a change of purpose for health data is permitted under German law only if one of the following three conditions is met: 1. The person affected has given his or her explicit informed consent. 2. Clear sector-specific legislation on standards is applicable. 3. The data was anonymized in such a way that a re-identification of the people affected is practically impossible.

146  Thilo Weichert

When it comes to the determination of both the consent of the person affected (see 2 above) as well as the specific legal regulation (see 3 above), a definition of three aspects is mandatory: a) Type of data processed. b) Purpose of the processing. c) Designation of the associated bodies involved. The capacity to consent for Big Data applications in the healthcare sector has strict limits. Their very objective is to break the bonds of responsibility, appropriation and limited datasets in order to gain additional insights. In addition, for a large number of data retrievals the effectiveness of constitutive voluntariness must be questioned critically if an economic (employer), health (doctor, hospital), social (attendant) or other dependency exists. Health data is often found in particularly hierarchical dependencies. Without a doubt, concessions can be made regarding the concreteness of requirements a to c specified above, if these are compensated by additional technical, organizational as well as material and procedural safeguards. These could be, for example, specific appropriation regulations (e.g., research secrecy), the right to refuse to give evidence, exemption from confiscation, pseudonymization obligation, approval reservations, certification requirements, etc. Without such binding compensations, which are thus enforceable by the persons concerned, Big Data consolidation and analysis would be regularly unconstitutional. Ultimately, therefore, the implementation of such analyses, even with the inclusion of certain elements of consent, is subject to statutory reservation. Legislation has drawn consequences from these circumstances, for example in the area of cancer registries, which were comprehensively regulated by parliaments. If additional purposes are pursued, appropriate amendments will be necessary. Accordingly, the epidemiological cancer registries are currently being complemented by functions regarding treatment monitoring and quality assurance.

7 Data Economy in Particular The principle of data economy applies in data protection law, i. e., data avoidance, earliest possible anonymization/pseudonymization or data deletion and preferably comprehensive non-linkability. For Big Data, this principle is knowledge poison. This contradiction cannot, in fact, be resolved smoothly and completely without loss. The same constitutionally related limitations to findings that exist in the field of criminal investigation apply to knowledge pursued with Big Data. In this case the means of choice are anonymity and the use of pseudonyms.

12 Medical Big Data and Data Protection



147

Anonymization requires that any re-identification, even with a relatively large amount of time, cost and labor, is no longer possible (§ 3 para. 6 BDSG). This can be achieved only if the identifying specifications (so-called master data) are completely disposed of. Master data can refer to different individuals or institutions, not just a patient or subject, but also to a health worker (doctor, pharmacist, psychologist, hospital) or service provider. If a health worker has characteristic data about a patient, a data query will normally suffice to allocate the seemingly anonymous data set to an affected patient. Therefore the elimination of the patient identifiers is normally not enough. Instead, the removal of the identifiers of all helpers, service providers or other third parties that possess allocation knowledge is also required. The objective of anonymization is to avoid the risk of any individual allocation in the case of comprehensive Big Data analysis. This can only be achieved through aggregation of datasets, which however, will inevitably affect the fine granularity of the data processing. Aggregation is possible in two ways: by combining individual data sets to group data sets, or by concealment or consolidation of features in individual data sets. In order to achieve effective anonymization, a combination of both approaches is usually necessary. The need for very extensive aggregation is justified by the additional knowledge made accessible through globally networked digital data processing, with which individuals may be re-associated with their anonymized data sets. This is not limited to the public Internet, but includes all other databases as a resource that can be accessed via the Internet and be “reasonably” envisioned as data sources of a data holder. The revelations about the NSA provided by Edward Snowden showed us the almost infinite potential associated with this process. But remember: The assumption of re-identifiability does not depend on where the data for possible allocation is available, by whom and for what purpose. It also makes no real difference that access to this data is protected by law and therefore is inadmissible. The only thing that is relevant is the actually resulting effort of re-identification. And, given the increasing power of analysis tools and the potentially available data sets achieved through Big Data, this effort is getting smaller and smaller. In this context it is necessary to clarify a misunderstanding that is unfortunately rather widespread in the health sector: Pseudonymization of data sets that can be allocated to individual persons is not enough to rule out any re-identification, especially if the pseudonyms are used for the long-term assignment of individual data sets. This conclusion is not altered by the fact that hashing methods or asymmetric encryption methods with the deletion of the second part of the key pair are used for pseudonymization. Re-identifiability already arises through the very fact that the original master data is changed according to the pseudonymisation process and the pseudonymous data is compared with the supposedly anonymous data. This therefore provides direct allocation of this data to the master data. In the case of persistent encryption and hashing procedures, the body holding the data may as well not try to keep them secret, since the illegal procurement of the relevant keys should not pose too much of a problem.

148  Thilo Weichert

The more detailed and comprehensive a data set, the easier it can be allocated back to individuals using additional knowledge. This rules out any long-term allocation of health data from different data sources to one “anonymized” person, as the person-relatable additional knowledge regarding individual characteristics held by third parties (physicians, pharmacists, health insurance companies, service providers, social networks) is available. Pseudonymization is defined as replacing the name and other identifying features with a symbol in order to rule out the determination of the person concerned, or substantially complicate such a determination (§ 3 para. 6a BDSG). In fact, with pseudonymization procedures, a comprehensive Big Data evaluation of health data that conforms to data security regulations can be achieved. However, such procedures necessitate specific prerequisites and regularly require the use of multiple pseudonymizations. To be specific: Not only the patient or subject information has to be pseudonymized, but also the information on the data sources (doctors, hospitals, pharmacists, service providers). Furthermore, separate pseudonyms for the procedure stages of collection, (long-term) storage and data publication (including evaluation and use) have to be used. If we are dealing with details concerning genotypes and phenotypes in biobanks it also makes sense to operate the two databases separately under different pseudonyms. The evaluations must be carried out in a protected and controlled space, where the pseudonym allocation and the output of the analysis results are in particular need of specific monitoring that should rule out any re-identification. This is where complex technical procedures can be used, which permit role- and purpose-specific pseudonym matches, but prevent others securely. In any case, the constitutional requirement for the above presented Big Data analysis using pseudonyms is that a legal basis is created for it with procedural as well as technical and organizational security. A number of such approaches exist in SGB V (e.g., § 209 or §§ 303a ff., not however § 300 para. 2 sentence 2), as well as in cancer registry legislation.

8 Transparency In addition to the requirements mentioned above for Big Data in the healthcare sector, a further specific requirement has to be explicitly mentioned: Without transparency, Big Data with sensitive data cannot be implemented from a constitutional point of view. This transparency must relate to all key aspects of data processing: legal bases, rules and regulations derived from it, organization, procedures, technical documentation, data management, data security management and data protection management.

12 Medical Big Data and Data Protection



149

There are various addressees of transparency. Particularly relevant from a democratic and social point of view is the general public, including the (scientific) professional public, parliaments and the media. The required transparency can be achieved via the Internet, into which database registries, trial registries, policies, procedures, annual reports, or even references and forms can be uploaded. A specific need for transparency exists with regard to those affected, i. e., the patients or subjects who have an independent legal right to information (see § 34 BDSG), which however may also extend to the logical structure (see § 6a para. 3 BDSG). Given the potential complexity and sensitivity of the information requirements for the persons concerned, qualified advisory structures may be needed, similar to the ones already established in the field of human genetics in Germany. What also must be taken into account is something that is just as much acknowledged in the healthcare sector: that the people concerned are entitled to be spared the information (the right not to know). An additional level of transparency exists in terms of qualified control and approval procedures, i. e., for ethics committees, company data protection officers, data protection supervisory authorities, as well as other authorities that are responsible for the approval of procedures or for the certification or auditing of processes or products.

9 Conclusion Big Data in the area of healthcare is possible legally and in accordance with fundamental rights. This, however, presupposes that the protagonists let go of the conception that Big Data is only a matter of technical feasibility. We have actually entered into the era of the information society, in which information has become a central productive factor. In order to exploit the potential of information processing in this respect, without sliding into Silicon Valley capitalism (as a linguistic variation of Manchester capitalism), a democratic and liberal configuration of information processing is required. This task exists particularly with regard to sensitive information, which undoubtedly includes the information concerning health. In this respect, the national assemblies have so far – unjustifiably – not seen any need for action. The legislative shortcomings in the area of health data privacy are already striking. And the situation will only be further aggravated with the use of Big Data technology in this area. It is good to see that the German Bundestag has arrived in the digital age by recently establishing a committee for digital policy. But this can only be a beginning. Given the non-simultaneity of technical development and political awareness, technical and expert players are required. This means that the existing instruments of regulated self-regulation should be used. These are already at hand to a large

150  Thilo Weichert

extent in the SGB V, and privacy law has § 38a BDSG to offer. The medical associations will soon have to provide independent regulation competencies. And even below that level, the exploration and development of standard operating procedures, best practices, codes of conduct and voluntary audits will provide many possibilities. These possibilities must be used to set off public debate and thus to move policy makers into problem-resolution mode. Unless we tackle this problem in its social context and find technical as well as legal solutions, Big Data in healthcare will not be accepted nor be brought into accordance with the constitution.

Sebastian Krolop and Henri Souchon

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View In this article, the two authors from the management consultancy Accenture raise two key issues when dealing with Big Data: “Why does Big Data have to be embedded in a strategy? Do we need Big Data to define the strategy or is Big Data part of the strategy?” What are the main drivers for this development? The authors are actually less interested in individual fields of application, but try to put the issue into a strategic context with the associated severe organizational consequences. To lift the fog or to use it has always presented a strategic advantage – whether in warfare or to prepare strategic decisions. Along this metaphor the authors describe the approach in the context of the certainty/uncertainty of analyses. In order to underpin these considerations, the dimensions of Big Data are presented transparently, based on the fictitious example of a health insurance company.

Abstract: From the perspective of a consulting firm, this chapter describes how companies can gain a significant competitive advantage through the intelligent application of Big Data, and secure their long term strategic positioning. The authors address the historical background of corporate strategy and show how data-driven decision-making has developed. The focus of this article is on the classification of Big Data in the discussions about digitalization, degree of digitalization, smart services and digital operating models. The criteria that are important for the transformation of Big Data to Smart Data are addressed by means of a sample strategy for a fictitious insurance company, and the authors show how the strategy can be supported by the use of intelligent data and what potential comes with it. In addition, future trends and challenges are also outlined.

Content 1 Data as a Basis for Decisions  152 1.1 Strategy and Data  152 1.2 Big Data as a Part of Digitalization  153 1.3 Consulting as a Decision Support  155 2 The Path to an Informed Decision  155 2.1 The Transformation of Big Data to Smart Data  156 2.2 New Technologies in the Area of Big Data  159 3 Summary and Outlook  160 Literature  162

152  Sebastian Krolop and Henri Souchon

1 Data as a Basis for Decisions So far, Big Data has been shown from a variety of angles. But how can Big Data actually be used for the strategic direction of a company? Why does Big Data have to be anchored in a strategy? Do we need Big Data to define the strategy or is Big Data part of the strategy? The most prominent example of these questions is Google. Google’s turnover in 2013 amounted to approximately USD 50 billion,¹ based on the analysis and evaluation of mass data. Roughly 85 % of it originated from advertising (mainly AdWords, which is a prime example for a digital business model). So how can Big Data be implemented and used to sustainably support the corporate strategy? How can you generate intelligent knowledge from a “data cemetery” so as to gain a decisive advantage over the competition? Why should companies address the issue of Big Data when they are already finding it difficult to deal with Small Data? In order to narrow down the choice of answers to this question, it is worth looking into the past: Competition for information, knowledge, and thus strategic advantages, is as old as humanity itself. Our ancestors already had to merge and assess huge amounts of data – usually without a computer.

1.1 Strategy and Data A strategy is put into practice through decisions that serve the idea, the objective or the ideology. This is where we can distinguish between two types of decisions: empirically substantiated, analytical data-based decisions and so-called “gut decisions.” Empirically substantiated decisions rely on analysis, evaluation, collective knowledge and experience, while gut decisions are mainly based on spontaneous, subjective assessment, which is partially supported by empirical data. Key characteristics of these two types of decisions include the time required to make a decision, or the amount of information and data taken into consideration that then leads to the decision. In order to make an informed decision, an analysis of the existing information is necessary. This is where, based on the available data, added value is generated, which leads to a decision. This is certainly a simplistic differentiation of decision types, but nevertheless sufficient for the further discussion. Because, it illustrates the challenges we face if we want to execute a strategy. While in private life gut decisions are tolerable, we expect decision makers in our key businesses to make

1 Statista: Höhe der Werbeumsätze von Google von 2001 bis 2013 (in Milliarden US-Dollar). Online: http://de.statista.com/statistik/daten/studie/75188/umfrage/werbeumsatz-von-google-seit-2001/, [accessed on: May 16, 2014].

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View



153

decisions that are well thought out, sound, knowledge-based, transparent and generally smarter than the decisions made by the competition. But what happens if the data at hand is available in different formats (e.g., drawn maps, documents, music, video, etc.) and in huge quantities? The confusing amount of data can be compared with Clausewitz’s fog of war. Data alone provides only unstructured and, in the case of missing correlations, incomplete information. This fog stands for uncertainty, but cannot be completely avoided when implementing a strategy. It is therefore necessary to lift the fog through the extensive collection of data and the efficient and reliable interpretation of the same. While the origins of this fog can be found in military history and in ages past provided Sun Tzu with a distinct advantage, the analysis of data is today more important than ever.

1.2 Big Data as a Part of Digitalization The availability of data is already enormous and is growing rapidly due to the increasing level of digitalization (e.g., Internet, emails, sensor technology). Among other things, the degree of digitalization refers to the processing and storage of analog information, to make it available electronically. In 2007, approximately 94 % of global information capacity was already available digitally, compared with 3 % in 1993.² The management consulting company, Accenture, believes that companies that have achieved a high degree of digitalization and have already started the transformation to a digitalization champion “will start the race for market share from the pole position.”³ The study also highlights the potential of digitalization in healthcare, especially in the field of sensor technology. Above all, according to the study, health monitoring, mobile health services and telemedicine will shape the healthcare sector. The amount of data will continue to increase rapidly through the enormous variety of data sources. The result is and remains Big Data. Especially the development of mobile devices (smartphones and complementary devices such as the Samsung Gear smartwatch, Nike FuelBand, Jawbone Up, etc.), (end) devices for monitoring vital signs, digitized analog data (e.g., books) and applications (e.g., electronic patient records and electronic health cards) highlights the multitude of additional data sources. Digitalization will reach its next peak when these media and devices are networked and can communicate with each other. For companies this

2 Hilbert/López: The World’s Technological Capacity to Store, Communicate, and Compute Information. Science April 1, 2011: Vol. 332 no. 6025 pp. 2012, pg. 60–65. 3 Accenture Study TOP-500: Neue Geschäfte, neue Wettbewerber: Die Top-500 vor der digitalen Herausforderung. Accenture 2014. Online: http://www.accenture.com/Microsites/wachstum/Docu ments/de_de/Static/static-index.html, [accessed on: May 16, 2014].

154  Sebastian Krolop and Henri Souchon

means that media breaks (manual transformation of data between two or more systems) in internal processes can also be removed. The following figure illustrates the data sources that are perceived as the major drivers for data growth: Mobile use of Internet via smartphones, tablets, etc.

59%

Cloud computing (IaaS, PaaS, SaaS)

53%

Collaboration (file sharing, web conferencing, etc.)

31%

IP based communication (VoIP, chat, video, etc.)

47%

Machine-to-Machine (M2M)/use of sensors, etc.

24%

Digitalization of business models (eCommerce, online, etc.)

34%

Social media (Facebook, Twitter, Blog, etc.)

44%

Video streaming and media distribution (audio, video, TV)

35%

Online gaming and entertainment (browser games, apps)

13% 1%

Other

4%

Not specified/don't know

n=100 multiple answers possible

0%

10%

20%

30%

40%

50%

60%

70%

Source: Velten/Janata: Datenexplosion in der Unternehmens-IT. 2012, pg. 7. Fig. 13.1: Drivers for the growth of data on the Internet

This development is today summed up with the term digitalization; more and more companies are developing individualized and digital operating models. Beyond that, the term Industry 4.0 describes digitalized production processes and today generally also the topic of smart services – the provision of an available service. The authors of the study Smart Service Welt⁴ emphasize that future physical services and digital services will gradually converge to digital business models and recommend, among other things, the introduction of national centers of excellence for smart service platforms.⁵ The basis for these smart services is Smart Data, i. e., data from different data sources that is linked together to provide new insights. In addition, it must be noted that Big Data is generated in all industries and also across industries. Examples are financial transactions, customer data, usage data, etc. In the health sector this particularly concerns patient data, diagnostic and therapy information, invoicing, charging among service providers and insurance companies, etc.

4 & 5 Arbeitskreis Smart Service Welt: Smart Service Welt – Umsetzungsempfehlungen für das Zukunftsprojekt Internetbasierte Dienste für die Wirtschaft. 2014. Online: http://www.acatech.de/de/ aktuelles-presse/presseinformationen-news/news-detail/artikel/weckruf-an-die-industrie-guteprodukte-reichen-langfristig-nicht-aus-smart-services-sind-die-neue.html, [accessed on: May 16, 2014].

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View



155

1.3 Consulting as a Decision Support For roughly 100 years, company leaders have now been able to call on the assistance of business consultants to help them with their task of clearing the fog and evaluating their strategic positioning. While the focus of the first consulting firm Arthur D. Little (founded in 1886 by the MIT professor of the same name) was still on technological research, Booz & Company (founded in 1914), McKinsey & Company (founded in 1926) and the Boston Consulting Group (founded in 1963) specialized in management consultancy with a main focus on strategy. Even though nothing has changed when it comes to the consultants’ fundamental task of “fog-fighting,” the advent of information technology has changed their daily jobs dramatically in recent years. Only 15 years ago, well-paid armies of management consultants spent weeks or even months seeking, analyzing and evaluating data. Much of the information existed only on paper. Once data was also increasingly available in computer systems, elaborate queries had to be programmed. These queries could last nights or even several days – always with the associated risk that the system might crash before the end of the query without providing useful results, and would then have to be started over again. Today, access to information is possible much faster and more accurately. Companies are increasingly pursuing the trend away from point-based data inspection towards continuous analysis of data. Nowadays the task of consultants is to check data for completeness, to make it available and usable, and to support decision-makers in implementing their strategy. This is where the one-time, project-based number crunching is no longer the main focus, but rather the ability of customers to continuously recognize the wealth of data and to make use of it. Consultants support companies during the transformation to digital companies, with the associated digital operating models. They enable their clients to better understand the future needs of their respective markets and deduce the appropriate responses and action plans for the companies. The increasing segmentation of markets into individual demands requires new strategies that can be implemented successfully only by using corresponding digital operating models. Here consultants can generate added value for their customers through their expertise and their knowledge across different industries.

2 The Path to an Informed Decision The success of a well-founded decision depends largely on the ability to make the best use of data through clever processing and structuring and to extract all the information from the data. In that way relationships, correlations, trends and forecasts – which were previously not known to the company – can be identified and anticipated through data analysis.

156  Sebastian Krolop and Henri Souchon

2.1 The Transformation of Big Data to Smart Data Current literature usually distinguishes between four essential criteria in the handling of data. Against the background of rising healthcare expenditures, which exceeded the EUR 300 billion mark for the first time in 2012,⁶ and an ever-aging population with an increasing number of chronically ill patients, let us roughly outline the possibilities of Big Data based on the fictitious strategy of “quality leadership among the chronically ill” at a fictitious health insurance company. The strategy will be implemented through a Smart Data-driven health program. – Data volume: The optimization of programs for prevention and better care of the chronically ill requires data that is provided en masse by various data sources over years. These include patient data, diagnostic and therapeutic information, data from telemonitoring, data from electronic medical records, etc. This data helps, for example, to identify the drivers of successful and unsuccessful types of treatment or high-risk groups. The amount of digital data doubles approximately every two years and it is believed that the amount will increase to about 40 zettabytes by 2020, which is approximately 50-times as much data as was available three years ago.⁷ The basic conditions are thus set, and the challenge is clear. It is urgent that companies begin now to prepare for the flood of data and to take action to make the flood of data usable. The first movers in the area of “making data usable” already have a competitive advantage now. The AOK, as one of Germany’s largest health insurance companies, has already been concerned with the implementation of new technologies (Business Warehouse with SAP HANA) since 2012, in order to improve the healthcare of more than six million cases of illness and to gain a competitive advantage. This is the first step into a new era.⁸ – Speed: The criterion of speed usually refers to the availability of data. The development of broadband communication enables the exchange of data in real time. However, even more important than the real-time availability of data is the time that is required to evaluate this data. Conventional Business Warehouse systems can still require many hours to provide meaningful analyses.

6 Destatis: Gesundheitsausgaben 2012 übersteigen 300 Milliarden Euro. Pressemitteilung Nr. 126 vom 7.4.2014. Online: https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/2014/04/ PD14_126_23611.html, [accessed on: May 16, 2014]. 7 Jüngling: Datenvolumen verdoppelt sich alle zwei Jahre. In: Die Welt. Online: http://www.welt.de/ wirtschaft/webwelt/article118099520/Datenvolumen-verdoppelt-sich-alle-zwei-Jahre.html, [accessed on: May 16, 2014]. 8 SAP.: Im Dialog – AOK: Neue Analysemöglichkeiten mit SAP HANA. 2012. Online: http://news.sapim-dialog.com/aok-neue-analysemoeglichkeiten-mit-sap-hana/, [accessed on: May 16, 2014].

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View



157

A re-parameterization of the evaluation, for example the integration of an additional dimension, can also be very time consuming. The use of new technologies, on the other hand, enables the evaluation of data in real time, which can mean a direct competitive advantage and/or cost advantage. The technical progress that has taken place is breathtaking: Moore’s Law states that the complexity of integrated circuits doubles with minimum component costs approximately every 12 to 24 months.⁹ In other words, this means that today a commercially available smartphone has more computing power than all of NASA together in 1969, when it sent two astronauts to the moon.¹⁰ The next impressive example of Moore’s Law: The ASCI Red was a super computer, installed in 1997, that was at the top of the TOP500 list of the fastest computer systems between June 1997 and June 2000. Its development cost approximately USD 55 million, it required about 150 square meters of floor space and its hourly electricity consumption was approximately equivalent to that of 800 homes – its main task was the simulation of nuclear tests. In 2005 Sony introduced its PlayStation 3 for roughly USD 500 – with performance parameters identical to those of the ASCI Red.¹¹ For our example this means that the performance costs associated with the program can be evaluated in real time and, for example, that fraud cases can be directly identified as soon as an invoice is entered in the respective core system. Here the check takes place on the basis of complex sets of rules using the complete volume of data, in order to identify, for example, invoices for services that have not been rendered, not been necessary or have been invoiced twice, etc. In that way comparable cases can be analyzed directly too, and empirical values formed for the future. Unjustified costs are thus avoided before they occur. This is an extreme advantage, as it is assumed that the statutory health insurance companies make a yearly loss of between EUR 5 and 18 billion through invoicing fraud, false accounting and corruption.¹² Furthermore, the scope and cost of services can be optimized and adjusted in real time without adversely affecting the quality of care. Another important innovation is that evaluations are easier due to the changed data platforms, architectures and technologies. It is not necessary to request analyses from IT far in advance, as these can be requested ad-hoc by any user. Decision makers thus have the opportunity to directly access transactional data,

9 Moore: Cramming more components onto integrated circuits. In: Electronics. 38, Nr. 8, 1965, pp. 114–117. 10 Schmiedchen: Die Zukunft der Menschheit wird fantastisch. 2013. In: Die Welt. Online: http:// www.welt.de/wissenschaft/article112447946/Die-Zukunft-der-Menschheit-wird-fantastisch.html, [accessed on: May 16, 2014]. 11 Brynjolfsson/McAfee: The Second Machine Age. 2014, pg. 49. 12 Zillmann: Trendpapier – Big Data bei Krankenversicherungen. 2013, pg. 16.

158  Sebastian Krolop and Henri Souchon

–

the one true and current dataset in the core system (single point of truth), and to generate analyses themselves. Structure: Data has always been available in different shapes and structures. So far, however, there have been few opportunities to link these formats, let alone to use them for pattern recognition. Recent years, however, have made clear the immense use and advantages that, for example, a combination of structured data (e.g., ERP systems, database tables, KIS) and unstructured data (e.g., social networks, diagnostic images, etc.) can provide. We have probably all noticed how after we have bought something on Amazon we are offered similar articles not only on Amazon or by email, but also on Facebook. This linkage of different data formats (which also includes music or videos) will be crucial to a successful strategy. This is also illustrated by the data volume. In 2012 it was estimated by experts that only about 15 % of all available data was structured, whereas 85 % was available as a complex and unstructured mass.¹³ For our example, this could mean that potential candidates for the health program are directly referred to it on Facebook. Or that comments in social networks or online forums are evaluated so that an enhanced patient profile can be created. Of course, due to the prevailing data protection regulations in Germany, this is still a future scenario, but in other countries such technologies are already in practical use. The Israeli website operator Treato has for several years already been publishing information on drugs that are extracted from discussions in social networks and forums, and provides users, among other things, with an overview of side effects of drugs, which the manufacturers have not yet published. For this purpose, unstructured data with a volume of up to 200 million datasets a day is taken from various channels, linked and made available using the Hadoop technology from Cloudera.¹⁴ Another example is the already mentioned linkage of sensor data in telemonitoring, e.g., for the early detection of infections in premature babies. In order to analyze the symptoms, thousands of elements of sensor data have to be continuously checked. The success of such a project was already proven by the University of Ontario in cooperation with IBM Canada in 2010.¹⁵

13 Manig/Giere: Quo Vadis Big Data – Herausforderungen – Erfahrungen – Lösungsansätze. TNS Infratest GmbH. München 2012. Online: http://www.t-systems.de/loesungen/ergebnisse-der-studiequo-vadis-big-data/975734_2/blobBinary/Ergebnisse-der-Studie-Quo-vadis-Big-Data.pdf, [accessed on: May 16, 2014]. 14 Cloudera: How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase. 2012. Online: http://blog.cloudera.com/blog/2012/05/treato-analyzes-health-related-big-data-withhadoop/, [accessed on: May 16, 2014]. 15 Schroeck et al.: Analytics: Big Data in der Praxis. 2012. Online: http://www-935.ibm.com/services/ de/gbs/thoughtleadership/GBE03519-DEDE-00.pdf, [accessed on: May 16, 2014].

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View

–



159

Reliability and integrity are still the main criteria, given the mass of data, the different structures and the speed with which it is transferred and analyzed. Incomplete, outdated or even wrong data leads to incorrect decisions and jeopardizes the implementation of a company’s strategy. For the evaluation of clinical pictures, the analysis of outbreaks and epidemics, as well as the control of health programs for prevention and care, reliability of the data plays a greater role than in other areas. In our example, it would be devastating if incorrect or incomplete data for the optimized composition of the program were used. This would lead directly to increased treatment costs, as, for example, the wrong therapy approaches would be implemented or suboptimal treatment paths defined.

The main criteria for handling Big Data can therefore be tied directly to a strategy. Against the background of these developments and changes, the question quickly arises as to how one can master the situation and arrive at the desired interpretation of the data. This quickly raises the question of appropriate decision-making tools that can be used to make new data relationships visible and usable as well as to structure them reliably in order to ultimately make them meaningful, or put another way, to make Big Data into Smart Data. This is such a burning question that it has also been recognized by political decision makers, who made it an issue as part of the Digital Agenda in the coalition agreement of 2013. Specific support programs by the Federal Government include BIG DATA (BMBF) and SMART DATA (BMWi). Big Data is therefore one of the key issues that are to be addressed through investment in new technologies.

2.2 New Technologies in the Area of Big Data One of the developments that we increasingly see with our customers is the extension or replacement of conventional technologies through new technologies and methodologies (e.g., changing from row-oriented databases to column-oriented data bases, introduction of In-Memory technologies, use of new technologies for better data compression), in order to orchestrate data availability and analysis. The transition to such technologies is carried out either as part of a comprehensive transformation, or as a so-called side-car solution, where, for example, only the business warehouse is switched to a new technology. Another noticeable trend is data storage in the cloud, in order to facilitate access to the data. In this case there are still concerns regarding data privacy and data security, which have been amplified due to the scandals surrounding NSA and Co. Big Data has also become an integral part of corporate strategy for intelligence services. A further development, which will become increasingly important in the future, is the special form of forecast models (predictive analytics). This is the ultimate disci-

160  Sebastian Krolop and Henri Souchon

pline of data analysis. The existing database is not only used to analyze the current situation, but also to make predictions about developments in the near or distant future. A good example of this is the Big Data project by Google to forecast the outbreak of flu epidemics. Using a specially developed algorithm that browses the search requests for flu-related terms, patterns are detected that may indicate a corresponding outbreak of flu epidemics.¹⁶ The following illustration shows that, in addition to reporting, data mining, and data visualization, forecasting models are seen to be the most important of the Big Data analysis tools. Queries and reporting

91%

Data mining

77%

Data visualization

71%

Prediction models

67%

Optimization

65%

Simulation

56%

Text in natural language

52%

Geodata analysis

43%

Analysis of data streams

35%

Video analysis

26%

Speech analysis

25% 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% of those surveyed (n per statement=508-870 out of 1,144), 2012

Source: Schroeck et al.: Analytics: Big Data in der Praxis. Online: http://www-935.ibm.com/services/de/gbs/thoughtleadership/GBE03519-DEDE-00.pdf, [accessed on: May 16, 2014]. Fig. 13.2: Big Data analysis tools

3 Summary and Outlook In summary, it can be said that whoever is in control of Big Data has a strategic advantage. Previously unknown connections between large amounts of data can be harnessed. The fog will get thinner to the degree that data can be reliably interpreted. The linking of data will be crucial. The data is merged in a so-called single point of truth, which then serves as a starting point for all analyses. Data analysis will also increasingly be possible through management. The consulting firm Accenture summarizes the success factors in managing Big Data on the basis of the “four Ts”: What is required is

16 Ginsberg et al.: Detecting influenza epidemics using search engine query data. Google Inc. Centers for Disease Control and Prevention in Nature Vol. 457, 2009, pp. 1012–1014.

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View



161

an understanding of the different types of data within and outside the company, the latest technology for its analysis that is carried out with the latest innovative techniques, and the talent to enable a higher return on investment through Big Data.¹⁷ The statements and examples make it clear that digitalization is progressing in huge steps, unfolding new facets, such as Smart Services and Industry 4.0, and that data availability is continuing to rise. It is no longer enough to have a functioning IT department, but new technologies are now at the heart of the company and are essential for the successful execution of corporate strategy. Without embedding Big Data in corporate strategy, either as a means to an end or as a strategy, key competitive advantages will be lost.

49%

Data protection and security

45%

Budget/priorities Technical challenges of data management

38%

Expertise

36%

Low level of awareness of Big Data applications and technologies

35% 0%

10%

20%

30%

40%

50%

60%

% of people surveyed (n=82*), 2012 * KMU and large companies from various sectors

Source: Fraunhofer IAIS: Big Data – Vorsprung durch Wissen. Online: http://www.iais.fraunhofer.de/fileadmin/user_upload/Abteilungen/KD/uploads_BDA/ FraunhoferIAIS_Big-Data_2012-12-10.pdf, [accessed on: May 16, 2014].

Fig. 13.3: Major problems in the area of Big Data

For the development of the full potential of data, some challenges and problems still have to be dealt with. The following diagram shows the areas that are perceived as the major problems for Big Data: Data privacy and data security, in particular, limit the possibilities of Big Data. Furthermore, many companies still fail to provide an adequate budget, or any budget at all for the development of necessary infrastructure and necessary new technologies. Or they simply lack the know-how to design and implement a Big Data solution. That is where we, as consultants, come into play and try to assist our clients. Because, in the end, it simply has to be said: whoever controls the data can acquire a clear competitive advantage.

17 Accenture Infographic: “A Formula for Big Data Success.” Accenture 2014. Online: http://www. accenture.com/us-en/Pages/insight-formula-big-data-infographic.aspx, [accessed on: May 16, 2014].

162  Sebastian Krolop and Henri Souchon

Literature Accenture Infographic: “A Formula for Big Data Success.” Accenture 2014. Online: http://www.accen ture.com/us-en/Pages/insight-formula-big-data-infographic.aspx, [accessed on: May 16, 2014]. Accenture Study TOP-500: Neue Geschäfte, neue Wettbewerber: Die Top-500 vor der digitalen Herausforderung. Accenture 2014. Online: http://www.accenture.com/Microsites/wachstum/Docu ments/de_de/Static/static-index.html, [accessed on: May 16, 2014]. Arbeitskreis Smart Service Welt: Smart Service Welt – Umsetzungsempfehlungen für das Zukunftsprojekt Internetbasierte Dienste für die Wirtschaft. 2014. Online: http://www.acatech.de/de/ aktuelles-presse/presseinformationen-news/news-detail/artikel/weckruf-an-die-industriegute-produkte-reichen-langfristig-nicht-aus-smart-services-sind-die-neue.html, [accessed am: May 16, 2014]. Brynjolfsson, E./McAfee, A.: The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. First Edition. New York 2014. Cloudera: How Treato Analyzes Health-related Social Media Big Data with Hadoop and HBase. 2012. Online: http://blog.cloudera.com/blog/2012/05/treato-analyzes-health-related-big-data-withhadoop/, [accessed on: May 16, 2014]. Destatis: Gesundheitsausgaben 2012 übersteigen 300 Milliarden Euro. Press release Nr. 126 from April 7, 2014. Online: https://www.destatis.de/DE/PresseService/Presse/Pressemitteilungen/ 2014/04/PD14_126_23611.html, [accessed on: May 16, 2014]. Fraunhofer IAIS: Big Data – Vorsprung durch Wissen. Fraunhofer Institut für Intelligente Analyseund Informationssysteme IAIS. Innovationspotenzialanalyse. 2012. Online: http://www.iais. fraunhofer.de/fileadmin/user_upload/Abteilungen/KD/uploads_BDA/FraunhoferIAIS_BigData_2012-12-10.pdf, [accessed on: May 16, 2014]. Ginsberg, J. et al.: Detecting Influenza Epidemics Using Search Engine Query Data. Google Inc. Centers for Disease Control and Prevention in Nature 457/2009. Hilbert, M./López, P.: The World’s Technological Capacity to Store, Communicate, and Compute Information. In: Science 332(6025)/April 1, 2011, pp. 60–65. Jüngling, T.: Datenvolumen verdoppelt sich alle zwei Jahre. In: Die Welt. Online: http:/www.welt.de/ wirtschaft/webwelt/article118099520/Datenvolumen-verdoppelt-sich-alle-zwei-Jahre.html, [accessed on: May 16, 2014]. Kaku, M.: Die Physik der Zukunft – Unser Leben in 100 Jahren. Reinbek 2013. Manig, M./Giere, J.: Quo Vadis Big Data – Herausforderungen – Erfahrungen – Lösungsansätze. TNS Infratest GmbH. Munich 2012. Online: http://www.t-systems.de/loesungen/ergebnisse-derstudie-quo-vadis-big-data/975734_2/blobBinary/Ergebnisse-der-Studie-Quo-vadis-Big-Data.pdf, [accessed on: May 16, 2014]. Moore, G. E.: Cramming More Components Onto Integrated Circuits. In: Electronics 38(8)/1965, pp. 114–117. SAP.: Im Dialog – AOK: Neue Analysemöglichkeiten mit SAP HANA. 2012. Online: http:/news.sap-imdialog.com/aok-neue-analysemoeglichkeiten-mit-sap-hana/, [accessed on: May 16, 2014]. Schmiedchen, F.: Die Zukunft der Menschheit wird fantastisch. In: Die Welt. Online: http:// www.welt.de/wissenschaft/article112447946/Die-Zukunft-der-Menschheit-wird-fantas tisch.html, [accessed on: May 16, 2014]. Schroeck, M. et al.: Analytics: Big Data in der Praxis. Wie innovative Unternehmen ihre Datenbestände effektiv nutzen. IBM Institute for Business Value in cooperation with Saïd Business School, University of Oxford. 2012. Online: http://www-935.ibm.com/services/de/gbs/though tleadership/GBE03519-DEDE-00.pdf, [accessed on: May 16, 2014].

13 Big Data in Healthcare from a Business Consulting (Accenture) Point of View



163

Statista: Höhe der Werbeumsätze von Google von 2001 bis 2013 (in Milliarden US-Dollar). Online: http://de.statista.com/statistik/daten/studie/75188/umfrage/werbeumsatz-von-google-seit2001/, [accessed on: May 16, 2014]. Velten, Dr. C./Janata, S.: Datenexplosion in der Unternehmens-IT. Wie Big Data das Business und die IT verändert. Eine Studie der Experton Group AG im Auftrag der BT (Germany) GmbH & Co. oHG. Ismaning 2012. Zillmann, M.: Trendpapier – Big Data bei Krankenversicherungen. Bewältigung der Datenmengen in einem veränderten Gesundheitswesen. Publikation der Lünendonk GmbH – Gesellschaft für Information und Kommunikation, in Zusammenarbeit mit SAS. Kaufbeuren 2013.

Peer Laslo

14 Influence of Big Pharma on Medicine, Logistics and Data Technology in a State of Transition This article focuses on the logistical processes related to the production, distribution and allocation of drugs. Big Data in medicine can mean not only the optimized analysis of clinical data. The optimization of logistics, in particular, is a possible potential that could be implemented rather quickly, given the fact that experience from other industries could be transferred. “The OECD assumes that between 1 and 30 % of medicine is counterfeit…” Laslo writes. That demonstrates the enormous potential – both from a medical point of view (administering ineffective/harmful medication) and from an economic point of view (according to destatis, the volume of the global pharmaceutical market amounts to approximately EUR 980 billion). By way of an example, the article shows how the logistics chain in a hospital can be optimized when it is equipped with modern information technology. Additionally, the current state of discussion and implementation of national, European and global initiatives on the issue of “pharma tracking” is described.

Abstract: Where does all of the data in healthcare come from? This is primarily “only” billing data, diagnoses, findings, temperature curves and medication instructions. The data becomes big because it is often repeatedly generated and used by different participants. This generally happens because each participant generates data in his or her own process without basing it on a previous diagnosis, which might be based on a different standard – both technically and semantically. In addition, many imaging procedures generate huge data streams, and finally, the most important multiplier is the fact that there are a lot of patients in healthcare who generate data beyond the duration of their lives. Deleting the data is not a good idea, because in long-term data lie treasures to be discovered. And new research and analytical processes – automated, finer-grained and usually more accurate – are constantly added, creating ever more new data streams. The use of automated data processing technology in medicine is still in its infancy, even if individual applications of modern diagnostics, with their imaging and automated analysis machines, give us the impression of a high level of automation. For it is the people who interpret the results and make a diagnosis, often without the systematic support of an expert system, which could list likely causes of a disease and appropriate treatment recommendations while ignoring statistically irrelevant individual experience. In development and design, engineers are today already supported by systems that assist them during development and provide estimates of the possible effects of or on a construction. This article is not a criticism of data

166  Peer Laslo

technology in medicine, but is designed to illustrate the potential that a consistent use of the available data brings with it, and to show the data sources that are here later referred to as Big Data.

Content 1 Data Processing Technology in Medicine  166 1.1 Integration of Logistics  167 2 Data Sources and Formats  167 2.1 New Technologies for Evaluation  168 2.2 Logistics as a Data Source  168 2.2.1 Procedures for Marking Prescription Drugs  169 2.2.1.1 Technological Prerequisites for Compliance Reporting  170 2.2.1.2 Simplified View of an Authentication Process  170 2.2.1.3 The Sequence of Authentication of Medication  171 2.2.1.4 Participants in Authentication  172 2.2.2 Method for Tracking Drugs for Individual Patients – “Unit Dose”  173 2.2.2.1 Technological Prerequisites for Unit Dose  173 2.2.2.2 Unit Dose – the Process  173 2.2.3 Marking of Medical Products and Status Tracking of Products  174 2.3 What is the Meaning of Big Data in the Logistics of Medication?  175 3 What is the Meaning of Big Data in Medicine?  176

1 Data Processing Technology in Medicine Information technology is today primarily used in the administration of patients and the communication of diagnostic data between the attending physicians, laboratories, pharmacists, nurses, etc. Often, this communication takes place using limited IT technology, which is only partially standardized. This frequently leads to the creation of actual technology islands on which the individual treatment facilities find themselves, which are actually connected to each other, but do not necessarily speak the same IT language. A lack of standards, lack of data security and corporate greed thus lead to data silos with extremely valuable patient data that cannot be used very easily. To some extent that actually describes the basic problem at hand, and is also the cause of the low speed in the development of expert systems and the limited degree of crosslinked use of data for research, diagnosis and therapy. At this point we need to answer the question of how patient data can be dealt with globally in order to evaluate data from the existing sources and at the same time to anonymize it enough to protect patients, both in academic and commercial

14 Influence of Big Pharma on Medicine, Logistics and Data Technology



167

research, but especially in the immediate treatment of patients. Scenarios for the use of valuable data are available, ranging from reviews of treatment with an expected increase of treatment quality to supported diagnoses, based on a high number of previous diagnoses, therapies and courses of disease. This is where promising new technologies, such as the use of SAP HANA in the oncolyzer at the Charité, are being implemented, which can deal with the challenges connected with a lack of standards and large amounts of data. Even the anonymization of patient data is technically feasible, which would facilitate the handling of the data.

1.1 Integration of Logistics A field that currently seems to be getting little attention, but is already of direct use to individual patients, is the closer integration of logistics in the administration of drugs, tracking and integration of medical technology with patients. Here are some examples of technological approaches. Implants that are clearly marked might be an advantage, as a callback would address the right patients, the correct instruments would be available at a possible follow-up operation (e.g., to replace an artificial joint), and the surgeon would not have to interrupt the operation because the right instruments for the relevant implant were not at hand. As another example, the tracking of sterile goods would not necessarily protect patients from being operated on with instruments used for an operation on a patient suffering from BSE, but the possibility exists of withdrawing instruments that are known to have been used on a patient diagnosed with BSE, in order to reduce the risk of passing on proteinaceous infectious particles (prions) via the sterile goods. These examples describe processes and areas where appropriate integration could improve treatment with the option of creating globally available data for further analysis.

2 Data Sources and Formats In order to assess the general usability of data, one has to look into the data sources. There are primarily two sources of data: – First, the logistic systems, in which native digital data regularly arises. These are the sources that can only be expected to be available in a structured and consistently harmonized form after the current legal process – the outcome of which is unforeseeable – is complete. This would be a first important condition for further use beyond any local use that is mostly limited to a primary purpose in a small system.

168  Peer Laslo

–

Second, the medically far more important field of diagnostic and medical data, which is dominated by very short process sections on which results are documented and exchanged. The results are then usually weakly structured, often read manually several times by different participants, and respectively reinterpreted, reassessed and reclassified. This leads to very large data sets that, in the case of native digitally generated data, could be raised to a globally readable digital standard by automated migration and – in the case of manually created data – could be transformed into a consolidated, globally usable form just through evaluation using new machines and algorithms.

2.1 New Technologies for Evaluation This is the precise area where new technology designs come into play, which – given the corresponding computational effort and intelligent algorithms – make it possible to structure, evaluate and possibly combine the raw unstructured information with other data in a way that makes it possible to provide a meaningful picture automatically and without time delay. It is precisely this distinction between native digital data that also describes the current option of using the data to continuously use uniform structures and classes vs. a technology that can initially cope with unstructured data and generates reliable valuable information from it. Since we cannot realistically expect to have a universal system of data control – because of the multiplicity of participants with their sometimes diametrically opposed vested interests – the possibilities of using technology to bridge the lack of structure becomes particularly important if you want to raise the value of the data collected. The objectives of such an evaluation are subjected to diverse and dynamic processes. For example, data-based decision aids are conceivable, such as the pilot application at the Hasso-Plattner-Institute, the oncolyzer. This is a kind of modern database in oncology that can assist in questions of therapy by drawing on global datasets, thus enabling a view of potential scenarios. Similar scenarios are also conceivable in many other fields.

2.2 Logistics as a Data Source Logistics processes accompany and support almost any medical treatment. However, from a medical standpoint they are seen rather as auxiliary processes that somehow have to be taken care of. In these processes, some very important data is generated, the evaluation of which can be a particularly worthwhile endeavor, as this context information can provide valuable conclusions. In the following, a description of

14 Influence of Big Pharma on Medicine, Logistics and Data Technology



169

exemplary logistic processes and the resulting options and links for using the native digital data from systems that have resulted primarily from economic intentions is provided.

2.2.1 Procedures for Marking Prescription Drugs In future, prescription drugs will be subject to mandatory marking on the dispensing packaging (single item) in order to make the production and distribution of counterfeit drugs more difficult. The drastic dimensions of this issue will not be discussed here in detail. However, a rough indication must be made: As early as in 2008 the OECD defined the size of the problem (counterfeiting) as between 1 % and 30 % depending on region and country.¹ Corresponding regulations are either already in force (China, Turkey) or on the way to becoming binding legislation (EU, USA, Brazil, India, etc.). The marking usually consists of a printed 2D code (two-dimensional matrix code), the Global Trade Identifier plus a unique serial number for the respective GTIN. The manufacturers store the data that accumulates during the packaging of blisters, vials or bottles filled with drugs (GTIN + serial number) in the company in a database, for example in an OER (Object Event Repository), and report this data in the context of so-called compliance reporting to the database of the regulatory authority, where access to authenticity verification (authentication) is provided for the participants involved. These procedures are still new and have so far only been sporadically tested in pilot schemes. There is a new trend toward instructing the dispensing pharmacies to scan the label and switch the status of the package to “dispensed.” There are also a huge number of further possible variations of including the participants, ranging from pharmaceutical wholesalers, who receive, modify and relay all the data in the hierarchy of packaging as a kind of e-pedigree, up to an import control in the area of customs, where officials can search for counterfeit drugs with the help of this technology. Consumers could also be involved, both by receiving data concerning authenticity, characteristics, use, side effects, etc., and by returning data regarding usage. The manufacturers that are confronted with the labeling obligation are already planning to derive consumption from the scanning process at the point of sale (POS) and to use this information directly for production planning, which in this case is seen as evidence of the effectiveness of data sharing.

1 OECD: The Economic Impact of Counterfeiting and Piracy. Online: https://oami.europa.eu/tunnelweb/secure/webdav/guest/document_library/observatory/resources/research-and-studies/EconImpacts-OECD_en.pdf, [accessed on August 12, 2014]

170  Peer Laslo

2.2.1.1 Technological Prerequisites for Compliance Reporting The necessary conditions have not yet been fulfilled by most participants and have yet to be created. On the one hand this involves empowering pharmaceutical manufacturers to add identifiers to medication packaging. This they would do during the packaging process, by printing on the dispensing package using fast 2D matrixenabled printers or a label. The manufacturers then have to register the thus marked individual packages at the next packaging level, sometimes referred to as a hierarchy or level of aggregation, and again provide them with a label or printed text at the next packaging level. In addition, each marking has to allow the one applied previously to be read in order to verify the quality and content of the labeling, as well as to know the contents at the next packaging level. Add to this ability, which is implemented by engineering, the corresponding data technology provided by IT, the integration of production management to manufacturing integration, i. e., the systems for data storage on the production level. This data is now again compressed toward the enterprise level, for which it is written in the enterprise repository (OER) via middleware. Enterprise level

Factory or regional level

Auto ID Infrastructure with plugin for integration

Production machine level

Source: SAP Fig. 14.1: System view of serial number assignment with SAP

2.2.1.2 Simplified View of an Authentication Process In order to introduce the procedure of authentication, a simplified description of the authentication process shall be provided. An object is created at the time of manufacture. For the pharmaceutical industry this means the packaging, as this is

14 Influence of Big Pharma on Medicine, Logistics and Data Technology



171

where the relevant serial numbers are assigned. This information GTIN+ SNR is compiled as an object with other attributes in the SAP OER, the object event repository. The product now follows the value chain and can be authenticated and changed in its status by every authorized participant along this chain. This change in status is done by adding an event, which can take place through a simple scan with a scanner, radio frequency identification (RFID) reader and automated gate or similar. If the object is used up, the object remains in the OER with the status “used” until it is archived.

Production Transportation

Storage

Customer support CRM

Consumer

Wholesale Pharmacy

Source: SAP Fig. 14.2: SAP OER

2.2.1.3 The Sequence of Authentication of Medication There are many possible scenarios for the process of authentication of the individual parts of the dispensing packaging, which differ mainly in respect to who provides the database for authentication. Here is an example of how securPharm e. V.² envisages authentication and, a fact that is very important when it comes to the ownership of data, how the data is to be stored on separate databases. Other procedures access a single central database, in which the manufacturer information is stored, and change the status of the data object, which represents the unit package, there.

2 securPharm is an initiative to protect German pharmaceutical sales from the infiltration of counterfeit pharmaceuticals, founded by the industry associations.

172  Peer Laslo

The end-to-end control system by securPharm for protection against counterfeit drugs Pharmaceutical Company

Wholesale

Delivery

Storage of the serial number of every package with DataMatrix-Code

HerstellerDatenbanksystem

Optional: Verification of individual package

Pharmacy/ Clinic Delivery

Administration

Verification of every pack No Administration, Requires Investigation

1. Inquiry 2. Answer 3. Derecognition

Patient with prescription

Pharmacydatabasesystem

Source: Securpharm. Fig. 14.3: SecurPharm, control system

2.2.1.4 Participants in Authentication The possible participants involved in the authentication of dispensing packages are mostly associations and government organizations. The intention to undertake this task and operate an object event repository, which in this case is used as something of a data hub, can have various reasons. It may lie in the statutory mission of the regulator, in the possibility of managing valuable data, or in the ability to secure the influence of association members. In Europe, associations such as the European Federation of Pharmaceutical Industries and Associations (efpia), the European Directorate for the Quality of Medicines & HealthCare (EDQM), the Securpharm e. V. association, the Bundesverband der Pharmazeutischen Industrie e. V. (BPI, the federal association of the pharmaceutical industry), the Bundesverband Deutscher Apothekenverbände (ABDA, the federal association of German pharmacies) and others are at issue here. In the United States, the data will probably be stored by the FDA, in China by the CFDA. The list can be continued, but currently still includes a great deal of uncertainty.

14 Influence of Big Pharma on Medicine, Logistics and Data Technology



173

2.2.2 Method for Tracking Drugs for Individual Patients – “Unit Dose” How does the unit dose process work? This process involves the tracking of individual doses of medicine in order to avoid adverse drug effects (ADE) in hospitals and in medical care. By tracking the dose of a drug without interruption from the individual system-based prescription by the doctor via a defined aut idem/aut simile process in the supplying pharmacy to the immediate manual electronically controlled drug administration at the bedside by a member of the nursing staff or the setting in an infusion pump, it is possible to dramatically reduce these ADEs and to additionally produce very precise data for use in a medical as well as administrative sense. The benefits for health and economics that are provided when using this technology are clear, as using this technology makes it possible to effectively reduce the risk through ADEs, which has been impressively shown in several studies. A wider use of unit dose for the application of drugs in therapy and care would, in addition to the primary objectives of immediately greatly increasing quality and safety in the administration of drugs, also provide a strong, verified database for the effectiveness of drugs, the evaluation of which would probably be just as interesting for clinicians and manufacturers as it would be for the officials of all associations. 2.2.2.1 Technological Prerequisites for Unit Dose If the labelling of medication on a single item level were to be extended to the single dose in the dispensing package (unit dose model) using a parent-child model, this would provide the key for use in a unit dose process, as labeling a unit dose in hospital treatment or care institutions would hardly seem feasible in the right quality, and especially at a reasonable cost. The use of labeling as used in forgery protection scenarios as a basis for a unit dose process is therefore only an extension of the supply chain to increase efficiency, which logistics has always aspired to. Technological prerequisites for the use in treatment or care facilities are a patient data management system (PDMS), appropriate reading technology for 2D matrix or RFID and a pharmacy that handles the goods according to regulations, processes aut idem/aut simile, and uses the respective logistics to supply the drugs to the patient’s location for the caregivers to administer. 2.2.2.2 Unit Dose – the Process The process is initiated by the prescription. The prescription can choose from possible medications stored in the system and passes this information on to the pharmacy, which then acts aut idem/aut simile according to established procedure. The finished prescription is put together in the pharmacy and then packed in a transportation container and sealed or blistered if necessary. A logistic step ensures that the container then arrives at the patient’s location. According to specifications and after being prompted to do so, a member of the nursing staff takes the medication, authenticates him or herself and the patient, and then imports the medication in order to get

174  Peer Laslo

an OK from process control to administer or not to administer the dose – for example if it is the wrong drug, the wrong time, the best-before date has expired, etc. The PDMS then stores all data on the patient and now has an evaluable patient history, which provides information such as who gave how much of what to whom and when. This data is now very easy to compare with other data, such as vital monitoring data in the PDMS, in order to evaluate a patient-related development. However, this data, which has never before been available with such precision, can now be used even further for evaluation purposes. Irrespective of the individual patient, it is now possible to establish links between countless parameters such as time, season, medication, classified patient attributes, pharmacokinetics, weather, type of care facility and respective specializations, to name only a few. Pharmacy

Intensive Care

Core Process

Prescription

Consignsignent ment

Shipping

Exit at pharmacy

Generation eration icker of sticker

Sticker on unit dose

Storage of unit dose

Subprocess: Box/Container Container

Container registered

Storage at patient

Monitored administration

Entry En PD PDMS

Subprocess: Patient Registration

Subprocess: Unit Dose Creation

Receipt of data

Receipt at ward

RFID sticker assignment

Registration of patient/case

Application A of patient armband

Patient discharge

Subprocess: Employee Registration

Employee registration

Allocation of badge

Wearing of badge

Source: SAP. Fig. 14.4: Processes for unit dose in hospitals

2.2.3 Marking of Medical Products and Status Tracking of Products The labeling of Class III medical devices includes products such as pacemakers, stents, artificial joints, electrodes etc., which are currently marked at batch level. Some manufacturers also voluntarily mark complex products with serial numbers. This segment is currently undergoing a certain amount of change. Appropriate legislation on more detailed labeling on a batch level will soon come into force or is currently being decided. However, a systematic allocation to patients only follows in exceptional cases and is not generally planned. A full allocation of products

14 Influence of Big Pharma on Medicine, Logistics and Data Technology



175

to patients would enable conclusions and improve the quality of data for therapy and research.

2.3 What is the Meaning of Big Data in the Logistics of Medication? The burden on databases for the pharmaceutical manufacturers, which will be required under the load of the processes for regulation, can already be seen now. This fact is currently already boosting demand for innovative, high-performance, inmemory technologies. The participants in pharmaceutical manufacturing and distribution are working together with manufacturers in the information technology industry to find IT solutions for the near future, to ensure the ability to deliver under the coming regulations. All major manufacturers in the pharmaceutical industry are gathering the necessary experience in large projects in order to ensure a timely implementation. The projects are too big and far too complex to be implemented by IT companies only shortly before the new regulations come into effect. The questions regarding the expected global databases for the serial numbers of parts all around the world, as is currently envisaged by regulatory authorities, have not yet been fully answered from a technological point of view, and cannot be solved through the use of a little more silicon – i. e., technology – alone. The available data has differing retention periods in the system, which means that billions of data records cannot simply be automatically archived after 12 months. Some products do in fact have a very long shelf life, and hence will have to remain available in the database for an accordingly long time for authentication purposes. This means that the individual information of each dispensing package for consumption will have to be available in the databases for several years. This is where certain questions will arise, such as what happens with a palette that goes back after 12 months, and is supposed to reveal its entire content with a scan, if this particular palette hierarchy has already been archived and is no longer available to the logistician at goods delivery in real time. If the logistician cannot accept the palette, the entire return process will come to a standstill. This is where the existing approaches must be further tested and optimized to ensure a timely response to authentication requests through an authorized requestor. Without reliably predictable response behavior, these processes for the protection of drugs against counterfeiting cannot be implemented, since response times will have an elementary effect on usage patterns. Overall, the tracking of drugs will have the most significantly positive effect on the supply of non-counterfeit drugs. The price of this technology is partially outweighed by greatly improved transparency and efficiency in logistics and improved determination of consumption in real time. Counterfeit drugs are not only dangerous for patients, but also significantly reduce the turnovers of both the trademark owners and the manufacturers of generic products.

176  Peer Laslo

In this sense, Big Data has become an unavoidable issue in the context of tracking medication. The question that now remains is that of the right technology and experience so that all the involved parties can get maximum benefit from the necessity of having to deal with huge amounts of data.

3 What is the Meaning of Big Data in Medicine? In the big picture, Big Data in medicine is a logical next step toward achieving the sometimes diametrically opposed objectives in the treatment and care of patients. We want to use the comprehensive details and experiences of billions of individual diagnoses and therapies for individual therapy and research, and to provide the prepared information to attending physicians while preserving the highest quality and safety standards at all times. We also want to deliver correct, non-counterfeit medication with 100 % certainty to the right patient at the right time. All this should enable a constant decrease in health care spending. Such a gain in efficiency cannot be realized without the use of state-of-the-art automation through information technology, which almost automatically leads us to the conclusion that only an active discussion of the issue can lead Big Data to play a leading role in the areas of quality, security and cost optimization in the treatment of patients.

Michael Engelhorn

15 Semantics and Big Data Semantic Methods for Data Processing and Searching Large Amounts of Data “The ball breaks through the table because it is made of styrofoam. What is made of styrofoam – the ball or the table? This is one of the questions that could be asked computers at the Turing test – a competition that aims to find out whether computers have similar intelligence to humans. Computers usually fail these kinds of questions. Because computers generally do not understand certain semantics nor irony, sarcasm or humor.”¹ Therefore, a detailed article on semantics is also required in a book about Big Data: Semantics is the “theory or science of the meaning of signs.” Signs can be words, phrases or symbols. What methods and challenges arise for data evaluation in medicine? Much of the data in the German health care system is still not digital but only available on paper (50–80 %, depending on estimates) and, furthermore, available only in unstructured form. This article provides a good overview, and summarizes the major methods and framework conditions with many practical examples.

Abstract: Big Data is dependent on an efficient and targeted search. For this purpose, the source texts must be developed, processed and stored accordingly. This is carried out by semantic methods that enrich the content with meta-information and prepare it for a targeted search. Using the definitions of “data,” “information” and “knowledge,” methods of structuring information are presented and described by way of examples. One chapter is devoted to search and navigation, with an emphasis on their semantic support. A new kind of navigation is also presented: navigation horizons.

Content 1 Introduction  178 2 Data – Information – Knowledge  179 2.1 Explicit and Implicit Knowledge  180 2.2 Incomplete and Fuzzy Knowledge  181 3 Where Medical Knowledge is Found – Knowledge Acquisition  181 3.1 Acquisition of Knowledge  182

1 Quote: Martensen: Über die positiven Seiten des Abhörskandals. In: Die Zeit. Edition December 12, 2013.

178  Michael Engelhorn

4 Methods of Structuring Information  182 4.1 Statistical Methods  184 4.2 Thesauri  186 4.3 Taxonomies  186 4.4 Nomenclatures  187 4.5 Terminologies  188 4.6 Ontologies (Knowledge Spaces)  188 4.7 Semantic Networks  190 4.8 Concept Maps  190 5 Search and Navigation  191 5.1 Semantic Search  193 5.2 Navigation Horizons  193 6 Outlook  194 Literature  195

1 Introduction When talking about Big Data, this refers not only to the creation and storage of large amounts of data, but also – and especially – to the search for information in this data. This comes very close to the proverbial search for a “needle in a haystack.” A search in Google, for example, may – depending on the precision of the search term – provide no results at all or millions of hits (I call this the “Google effect”: one makes a specific query and receives a vast flood of answers). But neither extreme helps the person to satisfy the query, and the order of the results is presented according to an internal Google system of sorting, which is not always obvious. This is where the fact comes into play that these searches are not semantically supported. The search for the term “bank” could mean either the edge of a body of water or a financial institution, and the search for “T34.1” could refer to ICD-10 coding or a military vehicle. A unique allocation or classification (i. e., whether waterside or financial institution was meant) can only be achieved through the context of the stored text – with the help of “semantic enrichment” through “meta-information.” This meta-information can then guide the person with the query to the right area for the search, making it more effective and specific. Which brings us to the fundamental importance of semantic methods for Big Data. Semantic meta-information provides an abstraction of the contents and helps with the search. Nevertheless, the results are often unwieldy, since they are not presented in the way a human user would expect them to be, as the person with the query often has no knowledge of the structure and the content of the information provided. Often the search begins with a vague idea and is then – depending on

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



179

results and search progress – clarified, narrowed down or even changed to include new terms and new amounts of information. The search is assisted by the navigation instruments, which also use semantic methods, provide better and clearer representations and narrow down both the amount of information as well as the search area, and gradually lead the person with the query to the target. I have coined the term “navigation horizons” for this purpose.² Another problem is the adequate storage of the information that supports the semantic search. However, this is not to be the subject of these considerations.

2 Data – Information – Knowledge There is no universal understanding of the concept of “knowledge.” The terms “data,” “information” and “knowledge” are not always clearly distinguished. In computer science, the separation of the different terms has been useful and allows us to make a clear distinction between the processing levels. It is thus that “pure data” is very difficult to interpret and assign. The numbers 63 and 47, for example, could stand either for age or weight, or the height above sea level (as well as many other interpretations). Also, the number 2010 need not necessarily be an indication of the year. The name “Miller” can be both a surname and a profession. The real significance – and thus the information – is often revealed based on the context. Therefore, 63 kg is clearly an indication of weight. The transition to “knowledge” then takes place in the context of collective experience, which enables us to deduce whether a person is fat or thin based on the information of “63 kg.” The following is a brief definition of these terms using various sources. Definition of data Data is signs, signals and facts. Data concerns the basic building blocks of informational science and characterizes the symbolic representation of facts. It consists of a virtually unlimited amount of available facts, statistics, pictures, etc., but is still largely unstructured and context-independent. It can be stored for a long time on data carriers, in documents, etc., without losing its value. Definition of information Information is data with semantics (meaning). Information develops when the data is assigned a meaning with a focus on context by enriching it with “meaningful

2 Sturm: Auswahl, Implementierung und Bewertung von Verfahren zur Visualisierung von Navigationshorizonten. Bachelor Thesis. Medical Informatics, Faculty of Informatics, University of Applied Sciences Mannheim, August 19, 2011.

180  Michael Engelhorn

content.” It is subjectively perceptible and useable and thus able to expand, restructure and change knowledge. In contrast to data, information is valid only in a specific context. Definition of knowledge Knowledge is information in a given context and includes much more than the mere information. Information only becomes knowledge if the application-oriented or situation-based meaning of information is recognized, and if the relevant information is filtered out and organized in a meaningful way. A knowledge-creating process always begins with expectations based on past experience. The context-specific linkage of information with subjective assumptions, theories, intuitions and conclusions from education, experience, experiments and their subsequent use in an action-related context gives rise to knowledge. The context-forming factors are, for example, culture (shared values, behavioral norms as well as ways of thinking and acting) and time (knowledge is seen differently in different times). If the conduct thus derived is followed by an event that confirms one’s expectations, one gains the relevant experience. If this cycle is repeated several times, knowledge is established. The more frequent the confirmation, the more mature the knowledge, and it can be called “reliable knowledge.”³

2.1 Explicit and Implicit Knowledge Explicit knowledge consists of theoretical expertise that is available in the respective specialist domain. It is explicitly described in the sources and can be directly extracted from it. Implicit knowledge consists of practical knowledge, expertise and general or everyday knowledge. This knowledge is not described explicitly in the source and has to be accessed via other indirect methods. In addition to the explicit knowledge in the sources, i. e., all the knowledge that can be clearly referenced and stored, there is also implicit, i. e., hidden knowledge. This can be found, for example, in the minds of employees, in oral operating procedures, processes and work routines. In addition, implicit knowledge is also hidden in circumstances and in contexts, such as, for example, the information that Rosenthal, a district of Berlin, is also part of the Federal Republic of Germany, which in turn is part of Europe. Explicit knowledge is, to a large extent, processed by machine. Implicit knowledge requires new methods of knowledge processing that involve the appropriate methods of modeling (ontologies). Implicit knowledge from specific

3 km:conceps: Zeichen, Daten, Information, Wissen. Online: http://www.km-concepts.de/service/ glossar/zeichendateninformationenwissen.html, [accessed on: April 13, 2014].

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



181

procedures as well as knowledge in the minds of employees is very difficult to access and is not the subject of these considerations.

2.2 Incomplete and Fuzzy Knowledge In our everyday life we often encounter incomplete or fuzzy knowledge, such as “it’s a hot day” or “Munich is far away,” which we fit into our “view of the world,” usually without further question. The terms “hot” or “far” are subjective terms that are based on opinions and can be included in our view of the world only through the given context (e.g. “Munich”). In everyday communication this type of information transfer can be very usefully, as the amount of information is significantly reduced. However, in documentation of medical situations, for example, this may be the source of misunderstanding and thus error. Thus the statement “600–800 mg, three times daily” is inaccurate and “400 mg, four times daily or 200 mg once” is inconsistent and contradictory. Statements with uncertain knowledge are very common and include information with explicit and implicit exceptions and assumptions. Causes of incomplete and fuzzy knowledge are: – Ignorance – Inaccuracy, e.g., measurement inaccuracies – Vagueness of terms – Expressions of opinion Automatic processing of incomplete and fuzzy knowledge requires particular methods such as “fuzzy logic” or “Bayesian networks.” Problems in medical practice are often difficult to formally solve, as they: – Cannot be precisely delineated – Are poorly (diffusely) structured – Are of a complex nature – Can be interpreted differently

3 Where Medical Knowledge is Found – Knowledge Acquisition Knowledge is generally distributed far and wide. Particularly in the health sector, the entire knowledge is distributed among different partners (hospitals, doctors’ offices, nursing services, health departments, associations, cost bearers, the pharmaceutical industry, medical device manufacturers, pharmacies, etc.). The Internet provides other new sources of knowledge across borders, for example, via portals.

182  Michael Engelhorn

Generally, you can find medical – which in this context includes administrative medical – knowledge in: – Various IT systems in hospitals and other health care facilities – The IT systems of private physicians, group practices and medical care centers – Research institutions – Miscellaneous paper records in health care facilities – Audiovisual archives – Work procedures, instructions, processes and SOPs This widely scattered and highly fragmented knowledge is partially available in a – more or less – well-structured form, but mostly (about 80 % of it) in a completely unstructured form. The challenge for knowledge acquisition now lies in the task of unlocking these sources of knowledge and making them accessible.

3.1 Acquisition of Knowledge When it comes to knowledge acquisition, what is most critical is the level of information that is available at the source. If the data at the source is structured, at least parts of the source information can be taken over as meta-information, for example the structural information from databases. If the data is available in an unstructured form, this will require complex process steps in order to enrich the data with metainformation, as described in the following chapter. Only after this step can the actual “creation of knowledge” be carried out. The acquisition of knowledge in only slightly structured or unstructured data follows a process chain with different stages – starting with the extraction of data, through the identification of the context, to the enrichment with metadata with the aid of dictionaries, nomenclatures and thesauri. The following example describes this process: – Data extraction (63, 3.14, kg, Miller, 2010, ...) – Information generation (63 kg → weight) – Knowledge generation (Walter Miller, current weight 63 kg, overweight)

4 Methods of Structuring Information The past 20 years have seen the development of architecture for the administration of unstructured information and framework architecture for developing applications for knowledge extraction from unstructured information, in particular natural language processing (NLP): The UIMA (Unstructured Information Management Architecture).

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



183

Natural Language Processing (NLP) defines the processing of natural language texts with computers in order to make the meaning of the texts accessible. Emphasis is placed on the idea that the language concerned is natural language, i. e., human language, and not a programming language because of the following difference between the two language types: Potential interpretations of natural language can be ambiguous, while programming languages are designed to be unique. With NLP systems, individual words as well as whole sentences can be analyzed. In this case, difficulties particularly arise when it comes to synonyms, negations, abbreviations (acronyms) and typing errors, which must be resolved (supported by thesauri). For example, name recognition plays an important role in biomedicine, as, among other things, the gene and protein names that are being searched for are often not clear. For example, proteins often have many different names that represent not only different variations of an expression, but also have no resemblance to each other. For example, caspase-3 and apoptosis-related cysteine protease are synonyms for the same protein. In addition, there are many short names and abbreviations that can often be assigned to several proteins and are also used as abbreviations for nonbiomedical terms in other contexts. Even ordinary words of the English language, such as “car,” “can” or “for” are used as names for biomedical terms.⁴ The concept of UIMA provides that a process chain is implemented in which data is first imported, then goes through various analysis and processing steps, and is finally delivered to several so-called consumers, who process the results, for example in a database. In each step of the analysis the data is provided with specific annotations, i. e. a defined range of the data set, for example a part of the text, which is given an explanatory note and is thus semantically enriched. An example of such a process chain is a simple application to calculate the average number of words per sentence in a text. For this purpose, a stage is first required that imports the text from a file, for example. In a second stage, the text is then subjected to a process in which all the positions of blank spaces in the text are determined and all words are marked. The third stage then carries out sentence recognition in which markers are placed from punctuation mark to punctuation mark. The final stage now only has to divide the number of marked words by the number of marked sentences and state the result (statistical methods). A possible extension of the process could include counting the number of verbs per sentence. In order to do so, so-called part-of-speech-tagging would be included after stage three, which provides each word with an annotation such as “verb,” “noun,” etc., thus enabling the user to count the number of terms that are defined as

4 Strötgen: UTEMPL – Aufbau und Evaluierung einer UIMA basierten Textmining Pipeline für biomedizinische Literatur. Master’s thesis to obtain the academic degree of Magister Artium (M.A.) Prepared at Department of Computer Linguistics at the Faculty of Modern Languages at the Ruprecht-Karls University Heidelberg. March 2009.

184  Michael Engelhorn

“verb” rather than the number of counted words. This process now gives rise to the problem of flexion, which is found in every language: The declension (conjugation) of verbs and declension (declination) of nouns, pronouns and adjectives. A method known as stemming (stem form reduction, normal form reduction) enables us to attribute the various different morphological variants of a word to their common word stem. So far we have examined only lexical and grammatical objects. In addition to the knowledge about “words” and “speech,” we still require their meaning, their context and their assignment to areas of knowledge. We have not yet recognized whether, when we say “bank,” we mean a financial institution or the river’s edge. Additional linguistic methods are necessary for further analysis and enrichment with meta-information. The use of nomenclatures, thesauri and taxonomies enables us to narrow down the areas that are actually related to the source text. For example, the term “Rosenthal” may, among many other things, refer to a German entertainer (Hans Rosenthal), a renowned person in science (Prof. Dr. Walter Rosenthal, director of the MDC), a manufactory (Rosenthal porcelain) or a geographic location (a district of Berlin). The ambiguities can be solved in particular through the context of the term in question. It is thus that the term most probably refers to a person if the terms “Mr.,” “Mrs.,” “Dr.,” “Professor,” “Director,” etc. are found in the context of the respective term. But this is not always sufficient, as is shown by the following example: If, for example, Rosenthal is found in connection with Berlin in the source text, it is highly likely that it is referring to a geographical context, as long as the Berlin in question lies in Germany and is not the place by the name of Berlin located in the USA. This relationship, which goes beyond mere nomenclatures, thesauri and taxonomies can be modeled in ontologies, semantic networks and concept maps. Nevertheless, it is not possible to allocate all cases with utmost certainty. Wellchosen methods – such as multiple ontologies that cover wide areas – can, however, minimize the number of cases that cannot be handled automatically. The following description of these methods is based on several sources and is not always absolutely exact in its classification.

4.1 Statistical Methods With the help of descriptive statistical methods, quantifiable properties of texts can be gathered in order to characterize, compare and classify them. The process begins with the definition and counting of quantifiable units of texts. Such purely descriptive methods result in frequency tables (histograms) and statistical parameters, such as mean values and indices, and can describe factors or features in the formation of texts that suggest historical, geographical, social or psychological conditions of origin and help to discover laws that govern the construction of texts. This makes it

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



185

possible to determine the linguistic source, for example, since the frequency in which a word occurs is a characteristic feature of a language (English, German, French, etc.). For example, the top ten of a typical word distribution of the English language with decreasing frequency is “the,” “of,” “and,” “to,” “in,” “I,” “that,” “was,” “he,” “his.” The text statistics count text elements and calculate statistical values of texts, measure syntactic and lexical homogeneity of individual texts or groups of texts, describe probabilistic characteristics of linguistic norms and deviations or features of linguistic varieties (e.g., specialized languages versus colloquial languages). They measure and compare the lexical richness of texts (e.g., by determining the number of different words in relation to the total number of words: “type-token-ratio”) and look for general characteristics, differences and regularities in classes of all kinds of texts (e.g., oral vs. written, message vs. comment, epic vs. drama, medieval vs. modern, dialect vs. high-level language) and allow for the prediction of unobserved data based on observed data. This is especially true for the Zipf law (due to the basic principle of least effort, the product of frequency-rank and frequency of use of words in texts is always constant).⁵ However, all statistical analyses regarding text face a number of difficulties. Trivially, the result of the work depends heavily on the definition of the text units being analyzed. The same distributions or laws do not necessarily apply for phonemes and letters, for syllables and morphemes, for lemmas and word forms, for syntagmas and phrases, for sentences and units of speech. Even with the simplest definition of “word” as a series of letters between blank spaces, a variety of different counts can occur. It is not easy to define the examined units of text accurately and it is often even more difficult to find a precise definition that is also suitable for the question at hand. That is why similar analyses cannot be generally compared with one another, and the individual case cannot always apply for a larger area or be true as a more general statement. Almost all statistical procedures and most models have been developed within the context of natural- and socio-scientific issues. They do not readily fit the issue of language and provide scope for error in linguistic transmission. Compared with other branches of science concerned with statistics, the area of text statistics is still in its infancy. Many studies are based on what is statistically easy to achieve. It is not always easy to find the appropriate procedures and the suitable mathematical model for a genuine scientific issue regarding text.⁶

5 Schmitz: Statistische Methoden in der Textlinguistik. In: Antos/Gerd/Brinker et al. (Ed.): Text- und Gesprächslinguistik. Linguistics of Text and Conversation. Ein internationales Handbuch zeitgenössischer Forschung. 1. Halbband: Textlinguistik. Berlin/New York 2000, pp. 196 ff. 6 Schmitz: Statistische Methoden in der Textlinguistik. In: Antos/Gerd/Brinker et al. (Ed.): Text- und Gesprächslinguistik. Linguistics of Text and Conversation. Ein internationales Handbuch zeitgenössischer Forschung. 1. Halbband: Textlinguistik. Berlin/New York 2000, pp. 196 ff.

186  Michael Engelhorn

4.2 Thesauri A thesaurus is defined as a model that attempts to precisely describe and represent a topic. It consists of a systematically arranged collection of terms that are thematically related. A thesaurus is a controlled vocabulary, also called attribute value range, for each attribute that is to be described. This primarily concerns synonyms (similar terms), as well as hypernyms and hyponyms. Often, however, antonyms (words with the opposite meaning) are not listed. A thesaurus can therefore also generally be used for disambiguation. In contrast to a single hierarchical table or database, a thesaurus may have a poly-hierarchical structure, i. e., a hyponym can have several hypernyms. In documentation science thesauri have proven to be suitable tools for subject cataloging and for retrieving documents. Here relations between terms are used for indexing (allocation of key words) and researching. The relations make it possible to find the relevant terminology for the respective terms in indexing and searching. Unlike a linguistic thesaurus, a thesaurus for documentation contains a controlled vocabulary, i. e., a non-ambiguous nomenclature (description) for each term. Different ways of spelling (color/colour), synonyms or quasi-synonyms that are treated the same way, abbreviations, translations, etc. are interrelated through equivalence relations. Terms are also linked through associative relations and hierarchical relations. In searches thesauri can be helpful due to the automatic query expansion to include synonyms and hyponyms. Thesauri can also be used for the detection of simple spelling mistakes, which is the current standard for text editing programs.

4.3 Taxonomies A taxonomy (or classification scheme) is a uniform method or model, with which objects are classified according to specific criteria, i. e., categories or classes. Scientific disciplines use the term taxonomy for a typically hierarchical classification (classes, subclasses, etc.). Taxonomies are of considerable importance for the development of science, as they facilitate the handling of individual cases and enable summary statements that can even lead to the explanation of interrelationships. They also force you to be clear about the differences between the categories and thus lead to a better understanding of the study area. Traditional methods were defined by morphological features, such as physique. But taxonomies are also helpful as a structuring element in other areas, such as microscopy, chemistry, biochemistry and genetics or geography. An example of a geographical taxonomy is shown in Fig. 15.1, however with an extended hierarchy which is therefore not strict.

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



187

Geography [OWL]

Classes C

continent C

Object properties

country C

state C

O

belongs to

O

borders with

city C

precinct C

C

Annotation properties C D

zip code

district

comment C

D

division

label

community C

zip code

Source: Author's illustration.

Fig. 15.1: Example of geographic ontology

4.4 Nomenclatures A nomenclature (system or catalog of names) is a collection of names and terms from a specific topic or area of application that is mandatory for certain areas. These words and terms are usually formed according to fixed rules. The main purpose of a nomenclature is the standardization of names and terms. The entirety of the words and terms used in a certain specialist area is the basis for the terminology of the specialist area. Medical nomenclatures are primarily used for the exact description of body structures (anatomy) or treatment-oriented medical documentation of diagnoses. In the latter case, they try to provide a specific name for each circumscribable medical condition. Nomenclatures are supplemented by so-called classifications (e.g., ICD10) although the distinction is not always easy. While nomenclatures are concerned with the description of individual cases, classifications are used in medicine to form groups of similar cases. Known nomenclatures in medicine: – SNOMED (Systematized Nomenclature of Medicine) – MeSH Thesaurus (Medical Subject Headings) – LOINC (Logical Observation Identifiers Names and Codes)

188  Michael Engelhorn

4.5 Terminologies Terminologies store information about (technical) terms, i. e., the technical expressions and their use within a specific context. The word “terminology” is ambiguous: Terminology refers generally to the study of the terms of a specific subject, but also to the set of all the terms in a specialist area. It is a part of the technical language that also involves other characteristics, such as phraseology or grammar. Terminologies can, for example, be written up in a dictionary, a glossary or a thesaurus. This provides a controlled vocabulary, which is an important basis for documentation and efficient translation. A binding, defined terminology of a specialist area is referred to as nomenclature (e.g., nomenclature in biology or nomenclature in chemistry). The technical language is influenced by many special terms, proper names, abbreviations and synonyms. Medical terminologies organize and unify these terms, thus providing structured medical knowledge. The medical terminologies are just as complex and diverse as medicine itself with its needs for documentation and communication. They range from simple micro vocabularies and code lists to extensive nomenclatures and hierarchical classifications or huge networked ontologies (structured, formalized and internally networked representations of knowledge). As is the case with medical knowledge, these terminologies are also subject to constant change. Terminologies are also an essential prerequisite for eHealth. Processed in an automation-assisted way, they are used for transmitting clinical information between IT systems by assigning the terms unique codes (“code lists”). However, for the receiver of the information these codes have to be “translated back” into terms that can be read or understood by humans, so that they can be evaluated accurately and reused accordingly (semantic interoperability). Only if all participants know the same codes with the same meaning, is semantic interoperability possible. In order to ensure both usability and comprehensibility for the users of the respective IT systems and the included information, both participants must agree to use the same terminology. In an increasingly networked healthcare system, this would require a number of mutual agreements, which could lead not only to errors, but also to large expenditures. Finally, such agreements would also have to determine who is responsible for the development of the terminology and how this is to be carried out.

4.6 Ontologies (Knowledge Spaces) The term ontology originates from Greek philosophy (of “being” and “teaching”) and deals with the representation of being, the environment and the things in themselves, with what is and what is not. In contrast, natural and social sciences deal

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



189

with how something is. Ontology is closer to philosophy without, however, giving up a claim to accuracy. Ontologies contain classes, attributes, and relationships between classes that can be instantiated, as well as a defined vocabulary as a systematic basis of entities. A precise, formal ontology makes use of mathematical logic as a means of expression and therefore can also be processed by machines. Ontology can also be understood as a “meta model” with which objects and relationships can be modeled. Many ontologies contain classifications and taxonomies with which, however, they are not synonymous. Thanks to ontologies, a uniform understanding of concepts and relationships is generated. Ontologies are always modeled according to their intended use. Thus, an ontology for representing geographical situations could, for example, contain the classes “continent,” “country,” “state,” “city,” “precinct” and “zip code” that form a hierarchical order (taxonomy). Further possible attributes include “inhabitants,” “area,” “latitude” and “longitude.” Possible relationships could be “belongs to” and “borders with.” However, not all the facts can be modeled in a strictly hierarchical structure. In our example, the strict hierarchy needs to be expanded, as in addition to the class “city” there is also a “district” class which has the hierarchical classes “community” and “Zip Code” (Fig. 15.2).

Deutschland (1451) belongs to

Berlin (25) belongs to

Mitte (1)

FriedrichshainKreuzberg (3)

borders with

Lichtenberg (2)

belongs to

Pankow (2) borders with

TreptowKöpenick (4)

Kreuzberg (3) belongs to

10961 (1)

belongs to

10969 (2)

Source: Author's illustration.

Fig. 15.2: Navigation horizon with geographic ontology and “rehabilitation facilities” statistic

190  Michael Engelhorn

By means of strictly logical description, inference engines (reasoners) can be set up that, for example, make the implicit existing neighborhood relationships between geographical units explicitly visible. An inference engine can recognize that two precincts have a common border if at least one district of a given precinct is adjacent to a district of the other precinct. These methods avoid the manual input of all possible relations in the ontology. By using inference engines that make the implicit knowledge available through logical conclusions, it is also possible to discover contradictions and make them visible.

4.7 Semantic Networks A semantic network is a formal model of terms and their relationships. In the field of information technology it is used for knowledge representation in the area of artificial intelligence. Occasionally one also refers to this as a knowledge network. Usually a semantic network is represented by a generalized graph. The intersections of the graph represent the terms, and the relationships between the terms are shown through the links of the graph. What relationships are allowed is defined very differently in different models. Often one finds the relationship “is a” and “has” between the terms (e.g., Tamiflu is a flu drug, the patient has pain). Semantic networks were proposed in the early 1960 s as a form of representing semantic knowledge. Thesauri, taxonomies and wordnets are forms of semantic networks with a limited set of relations. A (usually binary) relationship between two graph intersections could be one of the following: – A hierarchical relationship with inheritance relations, instance relations and partitive relations. – A synonym: this relation models the equality in meaning of expressions. – An antonym: this relation models the contradiction in meaning of expressions. – A causality: this relation connects verbal concepts (events, states) – one event causes another. – A characteristic: the monadic characteristic relation connects predicates with the objects that are within the range of values of these predicates.

4.8 Concept Maps A concept map is the visualization of terms (concepts) and their connections. It is a means for the graphic representation of knowledge and thus a means of thought arrangement and reflection. A concept map has to be distinguished from a semantic network, which has a formal nature. However, concept maps are suitable for the graphic design of semantic networks.

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



191

The visualization process can be described as a cognitive process in four steps. In these four steps, the (mental) concept of an object or circumstance is converted into a (real) graph to make it visible. The steps are: reduction, structuring, visualization and elaboration. The reduction is the first step in creating a concept map. It involves taking the available knowledge and reducing it to a few terms, i. e., the essence. Step two and three (structuring and visualization) proceed in synchrony by recording the result of the first step. Structuring practically means the spatial arrangement of two terms to one another, i. e., below or beside one another, far away or close to one other. This spatial arrangement results from the semantic content of the terms associated with one another. Once the spatial arrangement of all essential terms is complete, this network of terms is further refined in the fourth step, the elaboration. This could mean either that the relations are more precisely described (static or dynamic), or that the whole concept is expanded with other peripheral terms. The structuring of knowledge and the deeper exploration of individual concepts and relations improves the distinction between the individual fields of knowledge, making knowledge gaps easier to recognize. Concept maps seem particularly suited to represent elaborated contextual knowledge in a specific area of knowledge (knowledge domain). In addition, they can be used as a learning tool for active knowledge construction and as an instrument of knowledge diagnosis (qualitative and quantitative).

5 Search and Navigation The large amounts of available data and documents with Big Data make it almost impossible to find what you are looking for simply and quickly. Today’s search engines usually try to find the requested information via full-text search. In general – depending on how you framed the query – you then get either several million matches or none at all. Neither result is satisfactory. When you are given a large number of results, a second, additional search has to be performed through individual evaluation of the results, which is no longer practical in daily routine when confronted with a hit list of more than a hundred. Thus, only the first 10 to 30 results are often evaluated with the risk of overlooking the essential information. With only very few matches what is being looked for might not actually be included. Another limitation in the quality of hits often arises through abbreviations, variations in spelling, multilingualism and the ambiguity of terms. Last but not least, spelling errors when typing in the query can also lead to undesirable results. This is where the fundamental problem of this type of search becomes clear: there is no prior information about the number of matches to the query to make it

192  Michael Engelhorn

possible to make a decision ahead of time on the course of the search (the search strategy), i. e., whether to broaden the search if there were too few hits or to restrict it if there were too many. However, the number of potential matches is not always a sufficient criterion by which to adapt the search strategy. A “fuzzy” search or search in one or more areas of knowledge can also provide an additional opportunity here, in order to coordinate the search strategy to provide an optimal result. The type of representation of hits also has a significant influence on the search strategy. If the hits are presented in list form, as is usual with search engines, users have to adjust their search strategies to the ranking system implicit in the search (a situation which, at least at Google, set off an intense debate). What is ignored in this case is the “distance” of the individual results from the intended query. Navigation in the sense of “finding one’s way around” in databases is understood as the process “search” → “decide” → possibly “adapt” → “search again.” Key aspects of this process are: – Navigation means movement – Navigation implies decision making – Navigation is a process – Navigation needs a context In terms of movement, there are different forms of navigation, which are substantially characterized by the type of storage of the information. If the information is stored in the form of lists, the navigation is accordingly linear, i. e., the search only has a “forward movement” or “backward movement.” If the information is stored in the form of other structures, such as “trees,” a decision regarding not only the “forward movement” or “backward movement” has to be made, but also regarding which “branch” is to be taken. If the information is stored in “networks,” a directional decision has to be made at every intersection. This may lead to cyclic search paths, i. e., an intersection may be “visited” several times during the search. Every navigation in databases begins with an “entry point” that contains the initial query. In the case of large amounts of data a user can quickly get confused. If one wants to customize the navigation for users so that they are always sure of the current status, the following questions have to be answered: – Where am I? – What can I do here? – How did I get here? – Where can I go? In order to answer the question “where am I,” the support of the navigation process can follow through appropriate representation of the current position in the search space. In fact, the representation of linearly structured databases is relatively easy.

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



193

For trees – and especially in networks – the ability to display the results on what is usually a two-dimensional screen is clearly limited.

5.1 Semantic Search A semantic search is a search method that targets the meaning of a search query (on the Internet or in a digital text archive). By using background knowledge, a semantic search engine takes into account the texts’ and search queries’ meaning with regard to content. It thus concerns not only a search for words in text. This allows a query to be entered more accurately and associated with the texts that have relevant content. When searching in complex, unclear or unknown stocks of information, ontologies can provide support depending on the usage scenario – for example by asking which domain (whether a financial institution or a riverside) is actually meant. When it comes to formulating a query, by making use of knowledge of the ontological structure, queries can be clearly made, generalized according to results, or specialized and adapted to context or user and corrected accordingly. Furthermore, background information can be used for similarity-based (fuzzy) searches. Finally, for the presentation of results, knowledge-structure correlations formulated via user- or contextspecific presentation rules or sorting rules using ontologies can be implemented.

5.2 Navigation Horizons Navigation horizons are about the idea of a kind of intuitively usable type of interactive information visualization for a large number of information objects, for example a number of hits in a search query. Such information is normally displayed in a list. By using navigation horizons, however, one can present users with a graphical, far more intuitive form of interactive information visualization. Especially search queries where the user is not exactly sure or is not able to formulate what he or she is looking for should be better supported by navigation horizons. A navigation horizon is presented like a real horizon. The observer perceives objects that are in the vicinity with a high level of detail. More distant objects are presented with less and less detail, and some of the objects are not seen at all – they disappear over the horizon. As with the navigation on the high seas, users are given “navigation points” – similar to beacons at sea – to help navigate towards the destination. These are formed by “meta-information.” Users basically “see” the possible target directions to which they can navigate on the horizon. If a user navigates in the direction of a navigation point, new information objects will appear that were previously hidden behind the horizon, and information

194  Michael Engelhorn

objects that so far were in the immediate vicinity of the user will disappear. As a result, the user navigates a “sea of information objects.” Ontologies are an excellent basis for the navigation of this process. They basically represent the “sea” in which the user is moving. The procedure seems particularly interesting for several interconnected ontologies where several navigation horizons are spanned that change together during navigation. With the navigation horizons this property is again found with the aim of maintaining clarity when visualizing. Understandably, all information can never be shown at the same time. As with a real horizon, part of the information presented is abstracted or corresponding information is aggregated. Here, the abstraction and aggregation level of information objects that are in focus is lower than that of objects in context. Thus a way of visualizing large data structures is achieved that was proposed earlier.⁷ In order to ensure clarity, information objects are hidden in the wider context of the focus object behind the horizon. However, so that these hidden objects are nevertheless indicated, the user is given pointers, for example through a certain color choice, in the corresponding direction. If the user now navigates in a different direction, i. e., he selects an item of information in the context of the focus object, the navigation horizon will change by putting the newly selected information object into focus and displaying it according to its context. The user can thus approach the actually desired information step by step. If, for example, from the knowledge space of “geography” the objects “Zip Code,” “district,” “precinct,” “city,” “country,” “continent,” are presented with the additional information of the “number of rehabilitation facilities,” the network of interrelationships describing the geographical contexts will always present a current number of the geographic areas that are in focus – i. e., aggregated values if the superordinate object is considered – as well as the objects in the immediate vicinity (Fig. 15.2).

6 Outlook Semantic methods for the preparation of content and the support of searches are increasingly used on the Internet (semantic web). Search engines such as Google profit from that and provide “intelligent searches.” Semantic methods are already used in medicine as support for the coding of diagnoses and the procedures being used. When it comes to the content analysis of

7 Furnas: Generalized Fisheye Views. In: Human Factors in Computing Systems CHI ‘86 Conference Proceedings, 16–23. 1986.

15

Semantic Methods for Data Processing and Searching Large Amounts of Data



195

medical documents, however, this is merely the beginning – not least because up to 80 % of the documents that are available in hospitals are currently not yet available in electronic form.

Literature Furnas, G. W.: Generalized Fisheye Views. In: Human Factors in Computing Systems CHI ‘86 Conference Proceedings, 16–23. 1986. km:conceps: Zeichen, Daten, Information, Wissen. Online: http://www.km-concepts.de/service/ glossar/zeichendateninformationenwissen.html, [accessed on: April 13, 2014]. Schmitz, U.: Statistische Methoden in der Textlinguistik. In: Antos/Gerd/Brinker et al. (Ed.): Textund Gesprächslinguistik. Linguistics of Text and Conversation. Ein internationales Handbuch zeitgenössischer Forschung. 1. Halbband: Textlinguistik. Berlin/New York 2000, pp. 196 ff. Strötgen, J.: UTEMPL – Aufbau und Evaluierung einer UIMA basierten Textmining Pipeline für biomedizinische Literatur. Master’s thesis to obtain the academic degree of Magister Artium (M.A.) Prepared at Department of Computer Linguistics at the Faculty of Modern Languages at the Ruprecht-Karls University Heidelberg. March 2009. Sturm, P.: Auswahl, Implementierung und Bewertung von Verfahren zur Visualisierung von Navigationshorizonten. Bachelor Thesis. Medical Informatics, Faculty of Informatics, University of Applied Sciences Mannheim, August 19, 2011.

Florian Schumacher

16 Quantified Self, Wearable Technologies and Personal Data I particularly wish to thank Florian Schumacher for this article – as it is possibly the bravest and boldest. He sees “the quantified self movement as a harbinger of a new health system.” Critics may rightly argue that this enthusiasm could possibly include a little more modesty and evidence. But some figures certainly do suggest that the author is describing something that is already far more than a mere trend. According to an EMNID survey, “more than 62 % of German adults are interested in the possibility of using self-tracking services to improve their lives,” he writes. Computers are influencing ever more areas of life, smartphones facilitate access, and who knows today how wearables (the term sums up all forms of electronic devices that are carried on or in the body) might change health behavior in the future? Or will the tracking tools of the future no longer be worn, but actually implanted? Quantifying one’s self – will digital health monitoring in the future change from “nice to have” to “need to have”? Emancipation, self-management, self-control, self-monitoring, narcissism, external monitoring – all these issues are included in the area of “quantified self” and therefore so exciting. And perhaps this is actually not at all such a new thing? A kind of compulsion toward striking a daily balance was even awoken in the 37-year-old Goethe in 1796.¹ For more than 35 years he noted down the day-to-day progress of his work, but also with whom he ate, had tea or sat down for a chat, or which routes he took for his strolls. So is rationalizing one’s self – in the sense that life success can be learned² – more often an issue for men at a midlife-crisis age? In June 2014 Apple announced the health book – will the personal health tracker soon become a standard feature? And the Berlin-based startup 6Wunderkinder claims to have stored 280,000,000 to-dos in its digital list. This optimizing practice, the permanent striving for the top, is what the philosopher Peter Sloterdijk calls “vertical tension.” Have the great ideologies become obsolete and are the free people therefore left with only the one meta idea: make the most of your life? Or is this an “ideology of merit” or “internalized capitalism”? These too are big questions. This can be tried out within a few seconds by downloading an app – a self-experiment that is worth it.

Abstract: In recent years more and more self-tracking technologies have been developed that allow individuals to simply analyze their health and their personal behavior. This has now developed into a trend, and more and more people today use smartphone apps and wrist bands to record their sleep, their physical activity or their blood pressure. In future, the health data recorded by private individuals will play an increasingly important role in a health-conscious lifestyle and will also provide great potential for medical applications. This chapter describes the technological trends in self-measurement devices, and shows the potential for a digital health system.

1 Friedrichs: Das tollere Ich. In: Die Zeit 2013. 2 Großmann: Sich selbst rationalisieren: Lebenserfolg ist erlernbar. 28th(!) Edition. Lohmar 1994.

198  Florian Schumacher

Content 1

Introduction: The Quantified Self Movement as a Harbinger of a New Health System  198 2 Technologies for the Quantified Self  199 2.1 The Role of the Smartphone in Digital Health  199 2.1.1 Price Development of Microelectronic Components  199 2.1.2 Computers Becoming a Part of All Areas of Life  200 2.1.3 Health Apps are on the Rise  200 2.1.4 Technology Trends for Smartphones and their Potential in the Context of Health  201 2.1.5 Smartphones as a Hub for Connected Devices  202 2.2 Wearables and their Potential for Digital Health  202 2.2.1 Types of Wearables for Sport, Health and Lifestyle  203 2.2.2 Technology Trends in the Wearables Market  204 2.2.3 Wearables in a Clinical Setting  206 2.3 The Internet of Things and the Digitization of Health  207 2.3.1 Networked Devices for the Registration of Vitality Data  207 2.3.2 The Health of the People Becomes a Big Data Problem  207 2.3.3 New Services on the Basis of Personal Data  208 2.3.4 The Interface Between Personal and Medical Data  208 2.3.5 Changes for Healthcare Professionals through Digital Health  209 3 Target Groups and Growth in Digital Health Applications  209 4 Personal Data and its Potential for Big Data  210 Literature  211

1 Introduction: The Quantified Self Movement as a Harbinger of a New Health System The digitization of all areas of life is leading to a new allocation of roles between doctor and patient, thus changing the traditional notions of healthcare. Instead of being passively dependent on the help of a doctor, more and more people are today becoming active when it comes to matters concerning their own well-being. Particularly the Internet and services such as healthcare platforms or Wikipedia have decidedly improved the availability of medical information. Seventy-two percent of the American population makes use of these opportunities to learn more about their symptoms so as to have a better understanding of their own medical situation when talking to their doctor (PEW).³ The next generation of responsible patients will not only find 3 Pew Research Internet Project: Health Fact Sheet. Online: http://www.pewinternet.org/fact-sheets/ health-fact-sheet/, [accessed on: August 7, 2014].

16 Quantified Self, Wearable Technologies and Personal Data



199

the relevant information on the Internet but also measure their own body functions in order to better understand their physical condition and the impact of their behavior. As a result, they are given the opportunity to make informed decisions and to take control of their own lives. As early as 2007 Gary Wolf and Kevin Kelly of the US American Wired Magazine recognized this trend and founded quantifiedself.com, a website that reported about measuring devices for consumers to analyze their health and behavior. They understood the expression “quantified self” to mean the ability to measure and observe a wide variety of human attributes, which make certain correlations perceivable (“self-knowledge through numbers”) and the targeted implementation of change possible. In 2008 Kelly initiated a meeting in the San Francisco Bay Area in which users shared their personal experience concerning the measurement of data such as physical activity, weight, sleep, blood pressure, mood or even the analysis of their genome. The format quickly became well known and successful, and has since been taken up in more than one hundred cities around the world. The quantifiedself. com site, which was initially intended as a source of information on technology trends, soon developed into the central point of a worldwide network of users and providers who exchange information on the potential of personal data. Most of the so-called self-trackers use their data purposely to improve their athletic performance, but also to alleviate the symptoms of chronic diseases. The technical aids of the quantified self movement have now entered the mainstream, and ever better technologies and the rapidly growing trend toward digital self-measurement now form the basis of a new healthcare system. The phenomenon of self-measurement is not actually new and is in fact already being practiced by the majority of the population. According to PEW Research, 69 % of the American population measure one or more health values by themselves or let family members do it (PEW).⁴ Almost half (49 %) of those who take their own measurements actually keep the course of their values in their head. Thirtyfour percent of self trackers record their values on paper and 21 % of the people who measure their values use technological solutions that monitor the readings.

2 Technologies for the Quantified Self 2.1 The Role of the Smartphone in Digital Health 2.1.1 Price Development of Microelectronic Components Smartphones have paved the way toward self-tracking like no other technology. Since the introduction of the iPhone in 2007 the devices have been on the rise 4 Pew Research Internet Project: Health Fact Sheet. Online: http://www.pewinternet.org/fact-sheets/ health-fact-sheet/, [accessed on: August 7, 2014]; Pew Research Internet Project: Tracking for Health. Online: http://www.pewinternet.org/2013/01/28/tracking-for-health/, [accessed on: August 7, 2014].

200  Florian Schumacher

around the world and in 2014 more than one billion units will be sold for the first time. The enormous growth of this new computer category has thereby initiated the foundation for future change in the health system in several ways. Due to the economies of scale caused by the high quantities produced, prices for microelectronic systems have fallen dramatically in recent years and thus enable the cost-effective use of acceleration sensors, microprocessors and batteries in new equipment categories.

2.1.2 Computers Becoming a Part of All Areas of Life One of the big advantages of smartphones is the possibility of adapting them to one’s own needs by using apps – small software applications. The app stores that are pre-installed on the devices have managed to eliminate the difficulty of installation and maintenance often associated with software on classical computing platforms, thus creating a simple and secure distribution channel for applications. In addition to the traditional telephone features like contacts, calendar and emails, the leading smartphone platforms have developed ecosystems with more than a million available applications. As a result, many users have become used to finding an application for almost any task. The constant availability of emails, photos, notes and other data has created an increasing habituation or even reliance on digital media, which is why more and more personal information is used in digital form. Experienced smartphone users save not only their business-relevant information on their devices, but also more and more private data such as diaries, cookbooks and health data.

2.1.3 Health Apps are on the Rise Apps from the fields of health, fitness and medicine are among the most popular smartphone applications. The offer of so-called mHealth apps for Apple’s iPhone, Android phones and other mobile operating systems currently comprises 97,000 apps⁵ with forecast annual sales of USD 4 billion in 2014. Thirty-one percent of these applications are aimed at chronic patients, 28 % address fitness-oriented users. Many of these apps provide content for improved understanding of health issues, but an ever-increasing number of applications also provides the ability to record data about one’s self, from a person’s weight and the amount of exercise, to the amount of water the user drinks daily. These applications have the goal of making users more aware of their own health and their own behavior and of assisting them 5 Research2Guidance: mHealth App Developer Economics. Online: http://research2guidance.com/ r2g/mHealth-App-Developer-Economics-2014.pdf, [accessed on: August 7, 2014].

16 Quantified Self, Wearable Technologies and Personal Data



201

with tips and motivational functions to lead a healthier lifestyle. But the potential of such apps to improve the health of their users is far from exhausted. Most applications cover individual areas of application and manage their data separately from the rest of the health information that a user records with other apps. As a result, any complexity that might deter users is prevented, while at the same time important context information may be lacking due to a lack of data linkage from various areas of health and life, with which relationships could be detected and better results be achieved. Furthermore, the understanding and use of motivational mechanisms that promote behavioral changes has currently not yet been developed sufficiently and still has a lot of potential. Future generations of mHealth applications could link data that has been obtained from various sources and address the personality of individuals. In this way individual relationships could be identified and made use of, and each user could be supported with motivational aids corresponding to his or her character.

2.1.4 Technology Trends for Smartphones and their Potential in the Context of Health Modern smartphones have a variety of integrated sensors that simplify the operation of the equipment and make many services possible. For example, the integrated GPS of the devices enables the localization of a person, which is used for navigation but also for the recording of walking and bike trails. Acceleration sensors and gyroscopes enable the automatic adjustment of horizontal and vertical alignment but also the recognition of the type of movement of the user, including the specific acceleration pattern for walking, running or cycling, which is ascertained from the sensor data. Smartphones and the technologies integrated in them are increasingly becoming the platform with which extensive possibilities are arising, particularly in the areas of health, fitness and medicine. Thus, for example, the camera built into the units can be used to measure heart rate. If one shines the camera flash through one’s finger, the blood flow in tissue and thus the heart rate can be determined by measuring the absorption of light with the camera sensor. And the camera of smartphones can also be used for further telemedicine applications. For example, in cases with abnormalities in the skin, doctors can usually make a diagnosis based on a picture taken of the affected skin area, which is then sent in. Conventional authentication solutions, such as fingerprint scanners to control access to smartphones, which are already common today, will be complemented by the measurement of additional biometric values such as the heart rate variability. The sensors used in such cases could then also be used for medical purposes. The sensor and behavioral data, such as location, movement, type and number of available wireless networks in the area as well as utilization and communication behavior, can also be analyzed with regard to a variety of indicators. Initial research

202  Florian Schumacher

results show the potential of data analysis for stress and depression recognition but also for the early detection of diseases such as Alzheimer’s, which is expressed in specific changes in the tone of voice and movement behavior.

Source: Fitbit.

Fig. 16.1: iPhone case Alive Cor with ECG sensors

2.1.5 Smartphones as a Hub for Connected Devices It can be assumed that the future will see an increase in smartphone-based technologies in the health sector. The integration of additional sensors and the analysis of the data thus recorded will allow more and more body values to be captured with these mobile computers. The operating systems themselves will also assume an increasingly important role in the connection with external health sensors and health data management. In addition to the ability to capture health data with sensors and apps, the smartphone will also play an increasingly important role as a hub for networked devices. External devices can be connected to the smartphone via the Bluetooth connection protocol, which is the current standard for most devices. In addition to hands-free equipment and headphones, this possibility is also increasingly used by devices such as fitness wristbands, which monitor the movement activity and sleep of their owner and display the results on the screen of the smartphones. A wide range of Bluetooth instruments has now emerged from healthcare, ranging from activity trackers and body scales to blood pressure cuffs. Due to the possibility of storing data on the phone and then transferring it via the Internet, these mobile computers are increasingly taking on the role of a hub for the capture and management of health information.

2.2 Wearables and their Potential for Digital Health Wearable technologies – “wearables” for short – are a fast growing technology trend that will enable the next stage of the digitization of the healthcare system. All forms of electronic devices that are worn on or in the body are grouped under the term. Initial forms of wearables were tried out in the 1980 s and consisted mostly of portable computers in connection with head-mounted displays (“heads up” displays) that enabled simple computer applications. The pedometer by the American manufacturer Fitbit, presented in 2008, was one of the first successful mass market products. Today, wearable technologies include numerous product categories, from the fitness strap and networked watch to data glasses. The market

16 Quantified Self, Wearable Technologies and Personal Data



203

for wearables in the areas of sports and health is growing especially rapidly. According to a study by ABI Research, sales will rise from 20 million units sold in 2011 to 170 million devices in 2017.⁶ The devices in this fast-growing segment focus mainly on simple, individual applications. In the medium-term, according to ABI, a significant market will also develop for new computer platforms, such as smart watches, which, equipped with a variety of sensors, will integrate the functions of “single purpose devices” and partially replace these. Furthermore, data glasses such as Google Glass will also play a vital role in the healthcare system of the future. Doctors, in particular, could use the new computer platforms to access patient data at anytime and monitor the condition of their patients in real time.

Source: Fitbit.

Fig. 16.2: Activity tracker Fitbit Flex

2.2.1 Types of Wearables for Sport, Health and Lifestyle The current generation of wearables focuses on fitness and lifestyle oriented customers in particular. Products from manufacturers such as Fitbit, Jawbone or Nike appeal to a modern audience that wants to integrate more activity into its daily life or lose weight. Future generations of wearables will measure not only the number of steps or a person’s sleep but also address the needs of specific target groups. Endurance athletes can combine their wristbands with a heart rate belt to measure the heart rate during training; manufacturers like Samsung have integrated an optical sensor for measuring the pulse into their wristbands and smartwatches without additional electrodes. Sensors carried on or in a running shoe enable the analysis of the running style by measuring pressure distribution and heel-to-toe rolling action 6 ABI Research: Wearable Sports and Fitness Devices Will Hit 90 Million Shipments in 2017. Online: https://www.abiresearch.com/press/wearable-sports-and-fitness-devices-will-hit-90-mi, [accessed on: August 7, 2014].

204  Florian Schumacher

and thus facilitate the correction of biomechanical stress. Devices are designed for weight lifters that detect the movement of the arm using gyroscopic sensors, and also analyze fitness exercises such as squats, push-ups, etc. and the number of repetitions (e.g., Atlas). Likewise, devices for action sports enthusiasts like snowboarders, skateboarders and mountain bikers are also being developed that capture the number of jumps and the kind of tricks performed. In addition to wearables for lifestyle and sports, more and more equipment is also being developed for health-oriented applications. Insulin pumps worn on the body provide more freedom in everyday life for diabetics, as the required insulin is injected into the body through a needle, thus making manual injections unnecessary. Wearables could also play an increasingly important role in the areas of prevention and emergency aid in the future. Shirts with integrated textile electrodes and chest belts with sensors could take ECG measurements in everyday life to monitor the heartbeat of high-risk patients. The continuous measurement can recognize changes in the heart rhythm and predict an impending heart attack so as to be able to initiate countermeasures in advance. Older people can benefit from fall detectors that recognize if a person has fallen down and will then automatically trigger an alarm. However, the required approval of medical and safety-critical equipment poses a great challenge for many companies, which is why the devices will not become as widespread as lifestyle products until the market has matured accordingly.

2.2.2 Technology Trends in the Wearables Market In the current, early development phase of the wearables market, sport and health products constitute the highest-volume segment in the wearables market. Despite their clear focus, these devices will be used in an increasingly versatile manner in the coming years. Connected to the smartphone, they can display incoming calls, text messages and emails on their displays or accept calls by pressing a key, start and pause music and more. Design and usability are also becoming increasingly important. That is why some manufacturers are trying to distinguish themselves from the plastic designs of the competition by using high-quality materials such as aluminum, or by providing accessories such as designer wrist bands or chains into which the sensors can be integrated. Alternative concepts are also being used when it comes to the energy supply of the wearables. While most of the equipment is powered by a battery and needs to be recharged after a few days, more and more devices can now be used for several months with coin cell batteries, after which the battery has to be replaced. Other concepts are aimed at reducing the time and effort of having to charge the wearables. In addition to technologies involving so-called energy harvesting, where power is gained through movement, body heat or light,

16 Quantified Self, Wearable Technologies and Personal Data



205

wireless charging with the induction of an electromagnetic field for power supply is proving to be a particularly promising approach. Future generations of wearables will, in addition to acceleration sensors, also be provided with optical and electrical sensors with which data such as pulse, blood pressure, hydration and respiratory rate can be determined. One of the most ambitious projects in this area, Samsung SIMBAND, uses light from red, green and blue wavelengths to penetrate into different layers of the skin to detect the concentration of blood gases such as oxygen or carbon based on absorption. Furthermore, future wearables could use microneedles to analyze the interstitial fluid of the outermost skin layer, thus enabling deeper insights into the biochemistry of the body. This variety of readings also makes wearables relevant for medical applications and can be used simultaneously to develop smarter products in the lifestyle sector. Thus, for example, the analysis of metrics such as movement, skin resistance and pulse enable a more accurate calculation of calories consumed. But also the time of falling asleep, waking up and the course of the various stages of sleep during the night can be determined from the data. In the medium term, wearable computing platforms such as smartwatches and smartglasses will gain in importance. Equipped with the same sensors as the simple sports and health wearables, smartwatches can also measure body values and behavior and make the data usable by app. This will create a new ecosystem of software applications that give a variety of suppliers access to the data of accelerometers, gyroscopes, pulse sensors, etc. In addition, devices such as smartwatches will enable additional functions for communication and management of networked devices. But this will not only enable the control of a person’s smartphone, as smartwatches and smartglasses will also act as command centers for networked home electronics such as lighting and heating, which can be turned on or off with simple voice commands. The integration of sensors in textiles is also becoming increasingly widespread. Trousers and shirts with integrated textile electrodes measure pulse, respiration and muscle tension directly at the skin surface, giving patients and athletes even more accurate readings. Sports clothes with integrated textile electrodes and position sensors can detect not only an athlete’s performance, but also his or her posture. This will enable the analysis of a person’s movement during cycling, weight training or yoga, which can then be optimized using digital coaching services. The prices of smart textiles are currently still well above those of wearables such as wristbands and watches, which makes them more likely to be taken up by patients and professional athletes than by the general public. In addition to various wearable formats that are worn on the body, smaller and smaller implants are being developed with which to continuously monitor bodily functions. Inserted under the skin, they enable the continuous monitoring of heart rate or blood pressure. The data from the devices is transmitted wirelessly to a receiver and then passed on to the server of the manufacturer from where doctors are given access to the values. According to a Bitkom study, 31 % of German over-65 s

206  Florian Schumacher

are interested in such implants to improve their symptoms and as a preventive measure against health risks.⁷

Source: Medtronic.

Fig. 16.3: Implant for permanent ECG measurement

The simultaneous use of different wearables such as sensors in sports shoes, Bluetooth earphones and smartwatches will lead to the creation of a network on the body, a so-called Body Area Network. The different devices will exchange data with each other so that the variety of sensor data captured at different parts of the body can be evaluated with smartwatches or smartglasses and be displayed on their screens. The individual devices will often take on a variety of different functions and complement each other within a Body Area Network. Bluetooth earphones could, for example, play music from a smartphone, while at the same time also measuring the pulse in the inner ear, after which the readings are acoustically or visually displayed on a different wearable. In order for a Body Area Network to function smoothly, the different devices must be configured flexibly so that the input and output of information is possible in different setups.

2.2.3 Wearables in a Clinical Setting Wearables could also play an important role for healthcare professionals in daily clinical practice. Apps for networked eyewear and watches could thus facilitate access to patient reports or enable real-time monitoring of the measured values with sensor patches on patients’ bodies. The cameras built in to the smartglasses will make the live transmission of images from the perspective of a surgeon possible, thus enabling new applications in teaching as well as the integration of assisting 7 Bitkom: Viele ältere Menschen interessieren sich für Chip-Implantate. Online: http://www.bitkom.org/ de/presse/8477_78677.aspx, [accessed on: August 7, 2014].

16 Quantified Self, Wearable Technologies and Personal Data



207

experts from afar. The hospital of the future could therefore rely on complete digitization and use huge amounts of data to optimize the use of resources as well as ensuring the success of treatment. The condition of the patients would be permanently measured with networked devices and monitored with the help of algorithms. In an emergency, the medical staff would be instantly informed and provided with all the necessary information with the help of smartwatches, smartglasses or other wearables.

2.3 The Internet of Things and the Digitization of Health 2.3.1 Networked Devices for the Registration of Vitality Data The collection of health values is not only carried out via smartphones and wearables as, in fact, more and more networked health sensors are also being installed in private homes. Body scales with Wi-Fi connection automatically save a person’s weight and body fat, while sensors detect movement in bed, pulse and respiratory rate, and use these to calculate the duration and quality of sleep. Chronic patients can monitor their health with networked blood pressure cuffs and blood glucose meters and then store their readings in digital databases. Some pioneers have gone even a step further with the development of health scanners for home use. The devices determine various values such as body temperature, blood oxygen, pulse and blood pressure and transmit these to a server for analysis. As a result, doctors also have access to the data collected at home and can use it for health evaluation. The aim of the developers of such health scanners is the fully automatic diagnosis of diseases by means of algorithms. The Medical Tricorder award by the X-Prize Foundation is named after a system based on a similar principle from the science fiction series Star Trek. The roughly 40 participating companies plan to develop health scanners by the summer of 2015 that, based on algorithms, can diagnose 15 different diseases, from ear infection and Type 2 diabetes to sleep apnea.

2.3.2 The Health of the People Becomes a Big Data Problem In the medium-term, private individuals with smartphones, wearables and networked sensors will produce a variety of information about their health, which is stored on the servers of different manufacturers. In order to establish relationships between the various parameters, they have to be combined in a meta-analysis. Many of the major manufacturers provide interfaces for such purposes with which the data collected by the users can be accessed by third parties. Most of these APIs do not allow access to the raw data itself, but make only processed information with a

208  Florian Schumacher

lower level of detail available. Rather than focusing on the exchange of data, many manufacturers are developing platforms with different devices that cover the variety of vitality data and thus constitute their own ecosystem. Nevertheless, a rapidly growing trend toward the use of data across different providers through APIs and strategic alliances can be seen. Even manufacturers such as Apple and Samsung, or the software giant Google with its Android mobile operating system, have recognized the possibilities of personal data for smartphones. For example, Apple’s Healthkit is to become a central collection point for the variety of health data from smartphone users, which will simplify the use of the recorded data. The data generated with Apple phones and wearables as well as the apps and devices by cooperating manufacturers can be viewed with Apple’s Health App and be used for various applications. For example, users have the opportunity of sharing their blood pressure values with their doctor and then providing a nutrition app with exercise intensity in order for it to accurately calculate the user’s calorie consumption. This form of aggregation and management of data is the starting point for the next generation of applications that will enable better personalization, ranging up to all new, databased services, by using large datasets. This is a development that is especially in the interest of the users. Only the analysis of one’s own Big Data pool in an individually relevant context will enable optimum results for the management of one’s health and personal development.

2.3.3 New Services on the Basis of Personal Data The health and behavioral data collected with wearables and other networked sensors are the basis of a variety of new applications and services. It is thus that the sleep data captured today via wristbands can already be used to wake a person up with an artificial sunrise in sync with the individual’s light sleep phase. In future, the networked lighting of living quarters could also take into account indicators on exhaustion and stress and adjust the room atmosphere according to the needs of the person in question. The providers of video games or music streaming services are also interested in a person’s body values in order to adjust the excitement profile of their games or to select the proposed music based on the current user’s physical state. The quantified self data will thus become the starting point for the personalization of various products and services that go far beyond the health sector.

2.3.4 The Interface Between Personal and Medical Data Especially when it comes to medically relevant data, such as blood pressure or blood sugar, users often have the possibility of sharing it with their doctor. Ideally this would happen via an export function in the app with which the data was gathered so

16 Quantified Self, Wearable Technologies and Personal Data



209

that it can be sent to the doctor as a chart or PDF file via email. Other solutions envisage direct access by doctors or other healthcare professionals, even including special medical applications in the case of telemedicine solutions that help to efficiently control the values of different patients and to intervene quickly in an emergency. Already today, 70 % of American doctors report that one or more of their patients share their health data with them and 75 % of doctors believe that this will lead to better outcomes for the patients (Manhattan Research).⁸ The exchange of data, which is still largely in paper form, will in future be based more and more on digital services, and, by using algorithms, help doctors in their compilation of diagnoses or choice of appropriate treatment measures.

2.3.5 Changes for Healthcare Professionals through Digital Health Numerous changes will arise for healthcare professionals through the availability of large amounts of information about patient vitality and behavior. Instead of vague responses from patients, who often do not remember the course of their symptoms themselves, far more reliable data will be available in future, making more precise diagnoses and more effective treatment recommendations possible. By accessing the data, doctors and nurses will be able to monitor the disease progression in detail, and react quickly to critical situations. Their work will then be supported by assistance systems, which monitor and control the data of their patients. Doctors will thus be able to make decisions based on a steadily improving level of information. At the same time, digital self-tracking technologies will enable chronic diseases to be managed, where medical staff can assist patients via telemedicine to implement the recommended therapeutic measures in their daily life.

3 Target Groups and Growth in Digital Health Applications The use of quantified self services covers all age groups. While younger people primarily want to record their athletic performance and motivate themselves, the elderly are more focused on health issues and diseases. Patients with high blood pressure or diabetes register their medical values and at the same time use the lifestyle tools of the younger generation to become more active and to treat their symptoms. These solutions today cover almost every perceivable health problem but also 8 Manhattan Research: New Study Reveals That Physicians Embrace Patient Self-Tracking. Online: http://manhattanresearch.com/News-and-Events/Press-Releases/physicians-embrace-patient-selftracking#sthash.XvBAb9Pf.dpuf, [accessed on: August 7, 2014].

210  Florian Schumacher

other areas of personal development. According to an EMNID survey, more than 62 % of German adults are interested in the possibility of using self-tracking services to improve their lives.⁹ Acceptance problems regarding the new technologies of the kind that were common a few years ago are slowly disappearing into the background. In particular, the market entry of established companies has created confidence in the new instruments for self-monitoring. The growing ability of future solutions should increasingly lead to a positive result of the risk-benefit assessment and contribute to the proliferation of quantified self services. Sales of mHealth apps for smartphones should therefore increase from USD 4 billion in 2014 to USD 26 billion in 2017.¹⁰ Rapid growth is also predicted for the sale of wearables. ABI Research forecasts an annual increase of 41 % and a market volume of 170 million units sold in 2017.¹¹

4 Personal Data and its Potential for Big Data The data recorded with quantified self applications provides not only a great deal of potential for the users themselves, but also for other areas such as science and research. That is why many providers of self-tracking services ensure the right to the anonymized – and sometimes also the personalized – use of the user data in their terms of use. Personal data could therefore have an increasing influence on economic and political decisions in the future. The route data of cyclists who record their cycling routes enables the requirements analysis and planning of new bike paths. Discussions and feedback regarding successful outcomes and side effects in health services and medical apps provide new opportunities for the pharmaceutical industry. And also the rapidly growing number of sensors provides more and more data that can be transmitted by means of smartphones and other computer technologies in real time. Aggregated into larger, more meaningful data sets and enriched with contextual information, personal data will then enable the examination of human life far from any laboratory environments and the gathering of insights into people’s everyday lives. The quantified self tools are thus the starting point of a new society that addresses the different needs of its members more and increases the quality of life through a preventive healthcare system. The individual with his or her personal data will then form a new symbiosis with society, and make a vital contribution toward the progress of science and industry while at the same time benefiting from it. 9 Schumacher: Europäer interessieren sich für Self-Tracking. Online: http://igrowdigital.com/de/ 2013/10/studie-zum-interesse-an-self-tracking-in-europa/, [accessed on: August 7, 2014]. 10 Research2Guidance: mHealth App Developer Economics. Online: http://research2guidance.com/ r2g/mHealth-App-Developer-Economics-2014.pdf, [accessed on: August 7, 2014]. 11 Rank: The wearable computing market: aglobal analysis. Online: http://go.gigaom.com/rs/ gigaom/images/wearable-computing-the-next-big-thing-in-tech.pdf, [accessed on: August 7, 2014].

16 Quantified Self, Wearable Technologies and Personal Data



211

Literature ABI Research: Wearable Sports, and Fitness Devices Will Hit 90 Million Shipments in 2017. Online: https://www.abiresearch.com/press/wearable-sports-and-fitness-devices-will-hit-90-mi, [accessed on: August 7, 2014]. Bitkom: Viele ältere Menschen interessieren sich für Chip-Implantate. Online: http://www.bitkom.org/ de/presse/8477_78677.aspx, [accessed on: August 7, 2014]. Manhattan Research: New Study Reveals that Physicians Embrace Patient Self-Tracking. Online: http://manhattanresearch.com/News-and-Events/Press-Releases/physicians-embrace-patientself-tracking#sthash. XvBAb9Pf.dpuf, [accessed on: August 7, 2014]. Pew Research Internet Project: Health Fact Sheet. Online: http://www.pewinternet.org/fact-sheets/ health-fact-sheet/, [accessed on: August 7, 2014]. Pew Research Internet Project: Tracking for Health. Online: http://www.pewinternet.org/2013/01/ 28/tracking-for-health/, [accessed on: August 7, 2014]. Rank, J.: The Wearable Computing Market: A Global Analysis. Online: http://go.gigaom.com/rs/ gigaom/images/wearable-computing-the-next-big-thing-in-tech.pdf, [accessed on: August 7, 2014]. Research2Guidance: mHealth App Developer Economics. Online: http://research2guidance.com/ r2g/mHealth-App-Developer-Economics-2014.pdf, [accessed on: August 7, 2014]. Schumacher, F.: Europäer interessieren sich für Self-Tracking. Online: http://igrowdigital.com/de/ 2013/10/studie-zum-interesse-an-self-tracking-in-europa/, [accessed on: August 7, 2014].

Axel Mühlbacher and Anika Kaczynski

17 “For the Benefit of the Patient” ... What Does the Patient Say to That? ¹ There is probably no organization, no institution, no enterprise and no federation in the healthcare business that does not claim to be doing things “for the benefit of patients.” However, this patient is usually only an object, and we think we know what is good or bad for the patient. The patient as a digital subject is the great unknown in our healthcare systems. The authors focus on the issue of preferences that is “the result of the relative subjective evaluation of alternatives by considering the costs, risks and benefits.” They describe methods through which preferences can be measured, and distinguish the various relevant levels on which patient preferences should have an even greater influence in the future: The macro level comprises the political decision-making level, the meso level includes the organizational level and the micro level, which describes the level of care and research. The emancipation of patients through Big Data, as individual and collective preferences can now be ascertained much better and thus become effective?

Abstract There is hardly a company or regulatory authority today that does not purport to act in the interest of patients. However, when it comes to the specific decisions on approval or pricing, only very limited approaches of direct participation by citizens, patients and the insured exist. According to Article IV of the Declaration of Alma Ata in 1978, the right and even the duty to participate in the planning and implementation of healthcare services lies with the people. The demand for public participation in decision-making in healthcare is not new, especially when decision makers in politics and self-management are faced with the difficult task of setting priorities. The fulfillment or violation of the people’s subjectively perceived values is a measure of the acceptability of the affected populations.

Content 1 2

Delegation of Decision Rights: Who Decides and How?  214 Levels of Decision Making: When Does Gauging Preferences Make Sense?  215

1 In part, this article is a reprint of the publication: Mühlbacher/Kaczynski: Multikriterielle Entscheidungsprobleme: Warum die Präferenzmessung bei der Nutzenbewertung notwendig ist. In: Forum der Medizin_Dokumentation und Medizin_Informatik (mdi) 16(1)/2014, pp. 4–9. The article was revised and updated.

214  Axel Mühlbacher and Anika Kaczynski

Benefit Assessment and Preferences: Is There a Connection?  215 The Decision Problem: Identification, Weighting and Aggregation of Endpoints  216 5 Gauging Preference: What Methods are there?  217 6 Discussion and Future Fields of Application  219 Literature  221

3 4

1 Delegation of Decision Rights: Who Decides and How? Who makes the decisions? The basic premise is that every individual has the right to decide. Often consumers cannot decide alone on the consumption of health technologies. The reason for this lack of ability to decide lies in the unevenness in the quality of the available information. A rational decision is difficult, if not impossible, for an individual patient to make. This is why the authority to make decisions is delegated to experts. Physicians, cost bearers and self-governing bodies make decisions on behalf of the persons concerned. The delegation of decision-making power is subject to the condition that the decision maximizes the welfare of the population, the insured or the persons concerned. In order to meet this requirement, it is vital that decision makers be informed of the benefits to the patient of a given service. In this sense, the decision makers need to ensure that the decisions match patient needs, expectations and priorities. Even given delegation of decision-making authority, patients and insured persons should bring their preferences and their knowledge and experience to bear in the decision making process.² What information has to be taken into account? The decision makers are thus obliged to consider patient preferences. A self-governing body can meet that obligation only if it is able to provide a sufficient evidence base, and this is then systematically taken into account in the decision-making process. Therefore, healthcare decision makers always have to decide in the same way a consumer would have decided if he or she had been fully informed. Without the participation of those affected, i. e., the consideration of preferences in the benefit assessment, effective planning and implementation of healthcare systems is impossible.

2 Gagnon et al.: Introducing patients’ and the public’s perspectives to health technology assessment. In: Int J Technol Assess Healthcare 27(1)/2011, pp. 31–42.

17 “For the Benefit of the Patient” ... What Does the Patient Say to That?



215

2 Levels of Decision Making: When Does Gauging Preferences Make Sense? For which decisions should preferences be taken into account? The decision makers for self-governing bodies assess healthcare technologies and policies to ensure that adequate healthcare can be guaranteed for those insured by statutory health insurance. Policy makers decide upon reform measures in order to define national and local healthcare strategies for the people. Cost bearers determine healthcare benefits so that their insured can be taken care of accordingly. Doctors decide on the treatment measures to restore or improve the health of their patients. Patients evaluate treatment options together with their doctors to improve their healthcare. In general, three levels of public, patient and policyholder participation can be distinguished in healthcare.³ 1. The macro level comprises the political decision-making level. This decision level includes the concerns of the public, the insured and the patients through individual or group-based representation in various planning and decisionmaking bodies in the context of political decisions, such as the Federal Joint Committee (Gemeinsamer Bundesausschuss [G-BA]). 2. The meso level includes the organizational level. This level comprises participation in processes and decisions in the associations, corporations and institutions of healthcare and includes the participation of the public, the insured and patients in the bodies of the service providers or the cost bearers, for example in guideline development. 3. The micro level describes the level of care and research. It involves the personal interaction between doctor, patient and family members. This concerns patient orientation in that it considers the priorities and expectations of patients.

3 Benefit Assessment and Preferences: Is There a Connection? How can the benefits be assessed? When evaluating healthcare technologies, the patient benefit that can be achieved is a key element. In general, the benefit is understood as the degree of satisfaction of needs, i. e., the ability of a good or service to meet the needs of patient, insured person or citizen. In the context of a neoclassical orientation of health economics, benefits are not determined as the satisfaction of health-related needs, but registered via consumer decisions in choosing between

3 Gauvin et al.: “It all depends”: Conceptualizing public involvement in the context of health technology assessment agencies. In: Social science & medicine 70(10)/2010, pp. 1518–1526.

216  Axel Mühlbacher and Anika Kaczynski

alternative health technologies. If an alternative is preferred, it is assumed that the chosen alternative will provide a higher benefit. How are preferences ascertained? Preferences are the result of the relatively subjective evaluation of alternatives in consideration of the costs, risks and benefits. Experiments enable the analysis of the decision behavior of the insured, patients or citizens in hypothetical decision-making situations. With the help of experiments, the decisions made can be documented and analyzed. Preferences of patients or of insured persons are reflected in the choice of observable actions and can take on either the form of a genuine preference (A is strictly preferred over B) or a weak preference (A is minimally preferred over B). In borderline cases a subject feels indifferent between A and B (A and B are rated as equal). If a decision maker is not able to precisely quantify the benefits of an alternative because of uncertainty, these benefits are called the expected benefits.

4 The Decision Problem: Identification, Weighting and Aggregation of Endpoints The problem of multiple patient-relevant endpoints: Almost all activities in healthcare are based on decisions about alternatives. This in turn requires a prior assessment of the alternatives. Not infrequently, the assessments are based on multiple decision criteria. When evaluating, a variety of criteria can come into play, depending on the indication, which will affect the patient-relevant benefit. Depending on who in particular is making the decision, the evaluation of the type of healthcare can lead to different decisions, even given the same body of evidence. These differences may be the result of a different weighting of decision criteria or the relative importance of individual criteria for the different factions. Multi-criteria decision making: When assessing the benefit of a drug, the benefits can be operationalized through several endpoints that are relevant for patients. Multiple clinical decision parameters result in complex decision problems that are analyzed using methods of multi-criteria decision analysis. Thus procedures of multicriteria decision making can help when weighing up clinical effects (e.g., healing) with possible side effects. Furthermore, multi-criteria decision analysis (MCDA) can be used to integrate patient preferences in decision making. The different methods or techniques of multi-criteria decision analysis were developed in the context of operations research, marketing and decision analysis.⁴ As such, this approach represents a

4 Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG): Analytic Hierarchy Process (AHP) – Pilotprojekt zur Erhebung von Patientenpräferenzen in der Indikation Depression; IQWiG-Berichte – Nr. 163. Unter Mitwirkung von: Danner, M., Gerber-Grote, A.,Volz, F., Wiegard, B.; Cologne, 2013. Online: https://www.iqwig.de/download/Arbeitspapier_Analytic-Hierarchy-Process_ Pilotprojekt.pdf, [accessed on: July 17, 2014].

17 “For the Benefit of the Patient” ... What Does the Patient Say to That?



217

practical way of taking into account patient preferences, which would be difficult (if not impossible) to take into account in the context of traditional panel or group decisions. Assessment of risks and benefits: As part of the clinical decision considerations, the question arises as to how the interests of patients can be integrated into the risk-benefit assessment. It should be noted that the perspectives and risk attitudes of policy makers, practitioners and patients may be different.⁵ For decisionmaking bodies methods of multi-criteria decision analysis and the measurement of preferences could provide a systematic and consistent approach to quantify the relative benefits assigned by patients to the advantages and risks of medical interventions. These relative weightings can be used to determine the benefit-risk tradeoffs that are acceptable for patients. This type of information could help the regulatory authorities to consider patients’ perspectives during the future approval of treatment options. Aggregation of clinical effects: The last step in the evaluation of health technologies is the ranking of the available interventions. In order to make a complete evaluation of the clinical benefit of an intervention, all patient-relevant endpoints must be considered or aggregated. This presupposes that the endpoints, which are often already displayed in clinical trials, are first identified. In order to aggregate the multiple endpoints to an interpretable outcome, these endpoints must be weighted. An algorithm therefore has to be determined which allows the aggregation of the measured clinical effects, taking into consideration their individual weighting. The methods of multi-criteria decision analysis can deal with this problem. However, it is unclear which algorithm should be used for the derivation of the overall benefit. Based on recent efforts to improve the allocative efficiency of healthcare systems, it is clear that knowledge of the preferences of those involved has quickly become an important aspect for good decision making in healthcare.

5 Gauging Preference: What Methods are there? Discrete Choice Experiments (DCE): The Discrete Choice method is a choice-based variant of conjoint analysis, which was made possible through the work of Lancaster (1966) and McFadden (1974) (Random Utility Theory).⁶ Instead of ranking or rating a variety of different therapeutic characteristics (as in the conjoint analysis), the

5 Mühlbacher/Juhnke: Patient Preferences Versus Physicians’ Judgement: Does it Make a Difference in Healthcare Decision Making? In: Applied health economics and health policy 11(3)2013, pp. 163–180. 6 Lancaster: A new approach to consumer theory. In: The journal of political economy, 74(2)/1966, S. 132–157; McFadden: Conditional logit analysis of qualitative choice behavior. In: Zarembka, 1974, pp. 105–142.

218  Axel Mühlbacher and Anika Kaczynski

discrete choice analysis compares (hypothetical) therapy options and requires a decision to be made by the decision maker regarding the various alternatives. The process of DCEs and their evaluation is multilevel.⁷ In order to explain and predict patient decisions for a medical service, a therapy or a particular product depending on the product or service features, all the therapy characteristics that are of relevance for the decision of the respective target group must be determined.⁸ Depending on the number of determined properties (attributes) and the specific characteristics (level), an exponential increase in the number of possible (therapy) combinations will be given. Therefore, a reduced selection is generally used, which nevertheless still enables a reliable assessment. For decision-making bodies DCEs could provide a systematic and consistent approach to quantifying the relative benefits assigned by patients to the advantages and risks of medical interventions. Discrete Choice models are a proven method for prioritizing and/or weighting of patient-relevant endpoints. The method is also suitable for the operationalization of the multidimensional total benefit or the ascertainment of the willingness to pay as an additional decision criterion. Against the background of microeconomic foundation and given pragmatic advantages, this method is an interesting alternative to existing survey instruments. Best-Worst Scaling (BWS): This new survey approach is a special form of the Discrete Choice experiments that were developed in the late 1980 s. As a multi-nominal extension of the pairwise comparison method, the method of BWS is also based on the random utility theory. BWS assumes that subjects consider any alternative in a choice scenario and then make a double decision by identifying the best as well as the worst feature or option. Thus, the subjects choose the properties, characteristics or alternatives with the highest distance from each other. They thus determine the maximum distance between the offers, which is why BWS is also often called maxdiff modeling. Although this method has been applied in Germany only sporadically for gauging preference in healthcare, early results demonstrate that BWS has great potential for widespread application in practice.⁹ BWS is already well established in management science and marketing. In health economics and health services research, however, this is not yet the case, although an increasingly positive tendency can be seen. Analytic Hierarchy Process (AHP): The AHP was developed by the mathematician Thomas Saaty in the 1970 s and, in the context of multi-criteria decision making, 7 Mühlbacher/Bethge/Tockhorn: Präferenzmessung im Gesundheitswesen: Grundlagen von DiscreteChoice-Experimenten. In: Gesundheitsökonomie & Qualitätsmanagement 18(4)/2013, pp. 159–172. 8 Mühlbacher/Nübling: Analysis of physicians’ perspectives versus patients’ preferences: direct assessment and discrete choice experiments in the therapy of multiple myeloma. In: The European Journal of Health Economics 12(3)/2011, pp. 193–203. 9 Lancsar/Louviere: Estimating individual level discrete choice models and welfare measures using best-worst choice experiments and sequential best-worst MNL. University of Technology, Centre for the Study of Choice (Censoc), 2008, pp. 1–24; Louviere/Flynn: Using Best-Worst Scaling Choice Experiments to Measure Public Perceptions and Preferences for Healthcare Reform in Australia. In: The Patient: Patient-Centered Outcomes Research 3(4)/2010, pp. 275–283.

17 “For the Benefit of the Patient” ... What Does the Patient Say to That?



219

is an instrument for weighting decision criteria and ranking several available alternatives. AHP was then introduced into the healthcare industry by Dolan in 1989.¹⁰ The AHP is a method of prescriptive decision theory, which provides the decision maker with approaches and procedures so as to arrive at a meaningful and comprehensible (rational) decision. The decision maker solves the decision problem by preferring the utility-maximizing alternative based on defined outcomes and individual or groupspecific priorities. The hierarchy is similar to a tree and breaks a decision problem down into main criteria, sub-criteria and alternatives. At each level of hierarchy pairwise comparisons are carried out on a scale of 1 to 9. This method of decision support is, to this day, used mainly in the US and Asia. The method was originally developed to support the decisions of smaller groups of decision makers. Thus, the AHP is widely used for strategic management decisions.

6 Discussion and Future Fields of Application International trends: Those with responsibility for decision making are increasingly interested in the needs and preferences of those affected. Thus, the preference-gauging methods described also become increasingly important in a political context. Around the world large healthcare companies and decision-making bodies such as the Food and Drug Administration (FDA) in the United States and the European Medicines Agency (EMEA) in London are demanding proof of patient preferences as an additional resource for the risk-benefit assessment for the approval and introduction of new drugs. And when it comes to recommendations or decisions in the area of drug reimbursement, most rating agencies in various countries are now also demanding evidence of the additional benefits for the patient or the presentation of the patient perspective and its consideration in the corresponding decisions made. In Germany the collection and consideration of patient benefits is required by law and the patientrelevant endpoints need to be considered when making decisions on new drugs. Pilot studies on DCE and AHP have already been completed.¹¹ Based on the preference data, the benefit of specific characteristics and complete alternatives can be quantified, 10 Dolan/Isselhardt/Cappuccio: The Analytic Hierarchy Process in Medical Decision Making A Tutorial. In: Medical Decision Making 9(1)/1989, pp. 40–50. 11 Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG): Analytic Hierarchy Process (AHP) – Pilotprojekt zur Erhebung von Patientenpräferenzen in der Indikation Depression; IQWiG-Berichte – Nr. 163. With the collaboration of: Danner, M., Gerber-Grote, A.,Volz, F., Wiegard, B.; Cologne, 2013. Online: https://www.iqwig.de/download/Arbeitspapier_Analytic-Hierarchy-Process_Pilotprojekt.pdf, [accessed on: July 17, 2014]; Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG): Wahlbasierte Conjoint-Analyse – Pilotprojekt zur Identifikation, Gewichtung und Priorisierung multipler Attribute in der Indikation Hepatitis C; IQWiG-Berichte – Nr. 227. With the collaboration of: Mühlbacher, A., Bridges, J., Bethge, S., Nübling, M., Gerber-Grote, A., Dintsios, C.M., Scheibler, F., Schwalm, A., Wiegard, B; Cologne, 2014 (in press).

220  Axel Mühlbacher and Anika Kaczynski

the willingness to pay analyzed, and ultimately the acceptance of innovative healthcare goods and services predicted.¹² Benefit and technology assessment for pricing: With the help of multicriteria decision analysis it will be possible to document the patient-relevant decision criteria for intervention and to integrate it into decision-making processes in a transparent manner. In future, these methods will be able to make a significant contribution in the implementation of economic healthcare studies. Known preferences enable conclusions on patient benefits. A promising area of application is therefore the evaluation of innovations. Appropriate studies have to be carried out if a new product or healthcare service is to be included in the catalog of benefits. Weighing up the risks and benefits: Registration studies are meant to show that the healthcare products are useful and safe, which is usually based on clinical endpoints. However, a predetermined clinical parameter (e.g., an improvement of blood values) or a clinical endpoint (e.g., a reduction in blood pressure) can display only certain aspects of a therapy. It is often important to analyze the trade-offs between efficacy and expected side effects and to base any decision on that. Preference data can reflect the relative importance of patient-relevant endpoints and thus represent an important aspect of patient benefit. Acceptance and motivation: Healthcare services are expensive and are not always accepted by patients the way they should be from a therapeutic point of view. Policy makers and medical service providers must therefore be able to determine, with the help of appropriate information, whether a program is useful and meaningful with respect to a patient’s priorities. The so-called compliance (the patient’s willingness to participate in the therapy) and the so-called adherence (which, in addition to the willingness to take part, includes the patient’s own initiative to improve her or his own health) could be improved by taking into account the patient’s preferences. This increased compliance and adherence may then lead to better treatment outcomes.¹³ Prognosis of use: Methods of measuring preferences enable the prediction of the selection behavior and thus the usage behavior. The properties that can make a medically desirable selection behavior or actual behavior of patients more likely can be determined. In summary it can be said that healthcare decisions are always based on a correlation between consumers (patients) and providers (doctors, hospitals, health insurance).¹⁴ These decisions require information that reflects what

12 Severin et al.: Eliciting preferences for priority setting in genetic testing: a pilot study comparing best-worst scaling and discrete-choice experiments. In: European Journal of Human Genetics 21(11)/ 2013, pp. 1202–1208. 13 Behner et al.: Effekte einer gesteigerten Therapietreue: Bessere Gesundheit und höhere Arbeitsproduktivität durch nachhaltige Änderung des Patientenverhaltens. In: Burger, S. (Ed.), Alter und Multimorbidität, 2013, pp. 65–86. 14 Hall: Using stated preference discrete choice modeling to evaluate healthcare programs. In: Journal of Business research 57(9)/2004, pp. 1026–1032.

17 “For the Benefit of the Patient” ... What Does the Patient Say to That?



221

patients really want. Because only in that way will they be able to actively participate in their treatment and receive the best possible care.

Literature Behner, P. et al.: Effekte einer gesteigerten Therapietreue: Bessere Gesundheit und höhere Arbeitsproduktivität durch nachhaltige Änderung des Patientenverhaltens. In: Burger, S. (ed.): Alter und Multimorbidität, Heidelberg 2013, pp. 65–86. Defechereux, T. et al.: Healthcare Priority Setting in Norway a Multicriteria Decision Analysis. In: BMC Health Serv Res 12(1)/2012, p. 39. Dolan, J. G./Isselhardt, B. J./Cappuccio, J. D.: The Analytic Hierarchy Process in Medical Decision Making a Tutorial. In: Medical Decision Making 9(1)/1989, pp. 40–50. Gagnon, M.-P. et al.: Introducing Patients’ and the Public’s Perspectives to Health Technology Assessment: A Systematic Review of International Experiences. In: International Journal of Technology Assessment in Healthcare 27(1)/2011, pp. 31–42. Gauvin, F.-P. et al.: “It All Depends”: Conceptualizing Public Involvement in the Context of Health Technology Assessment Agencies. In: Social Science & Medicine 70(10)/2010, pp. 1518–1526. Hailey, D./Nordwall, M.: Survey on the Involvement of Consumers in Health Technology Assessment Programs. In: International Journal of Technology Assessment in Healthcare 22(4)/2006, pp. 497–499. Hall, J. et al.: Using Stated Preference Discrete Choice Modeling to Evaluate Healthcare Programs. In: Journal of Business Research 57(9)/2004, pp. 1026–1032. Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG): Analytic Hierarchy Process (AHP) – Pilotprojekt zur Erhebung von Patientenpräferenzen in der Indikation Depression; IQWiG-Berichte – Nr. 163. With the collaboration of: Danner, M., Gerber-Grote, A., Volz, F., Wiegard, B.; Cologne, 2013. Online: https://www.iqwig.de/download/Arbeitspapier_AnalyticHierarchy-Process_Pilotprojekt.pdf, [accessed on: July 17, 2014]. Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG): Wahlbasierte ConjointAnalyse – Pilotprojekt zur Identifikation, Gewichtung und Priorisierung multipler Attribute in der Indikation Hepatitis C; IQWiG-Berichte – Nr. 227. With the collaboration of: Mühlbacher, A., Bridges, J., Bethge, S., Nübling, M., Gerber-Grote, A., Dintsios, C.M., Scheibler, F., Schwalm, A., Wiegard, B; Cologne 2014 (in press). Johnson, F. R. et al.: Constructing Experimental Designs for Discrete-Choice Experiments: Report of the ISPOR Conjoint Analysis Experimental Design Good Research Practices Task Force. In: Value Health 16/2013, pp. 3–13. Kreis, J./Schmidt, H.: Public Engagement in Health Technology Assessment and Coverage Decisions: A Study of Experiences in France, Germany, and the United Kingdom. In: Journal of Health Politics, Policy and Law 38(1)/2013, pp. 89–122. Lancaster, K. J.: A New Approach to Consumer Theory. In: The Journal of Political Economy 74(2)/ 1966, pp. 132–157. Lancsar, E./Louviere, J.: Estimating Individual Level Discrete Choice Models and Welfare Measures Using Best-Worst Choice Experiments and Sequential Best-Worst MNL. University of Technology, Centre for the Study of Choice (Censoc), 2008, pp. 1–24. Louviere, J. J./Flynn, T. N.: Using Best-Worst Scaling Choice Experiments to Measure Public Perceptions and Preferences for Healthcare Reform in Australia. In: The Patient: Patient-Centered Outcomes Research 3(4)/2010, pp. 275–283.

222  Axel Mühlbacher and Anika Kaczynski

McFadden, D.: Conditional Logit Analysis of Qualitative Choice Behavior. In: Zarembka 1974/1974, pp. 105–142. Mühlbacher, A. C./Bethge, S./Tockhorn, A.: Präferenzmessung im Gesundheitswesen: Grundlagen von Discrete-Choice-Experimenten. In: Gesundheitsökonomie & Qualitätsmanagement 18(4)/ 2013, pp. 159–172. Mühlbacher, A. C./Juhnke, C.: Patient Preferences versus Physicians’ Judgement: Does it Make a Difference in Healthcare Decision Making? In: Applied Health Economics and Health Policy 11(3)/ 2013, pp. 163–180. Mühlbacher, A./Nübling, M: Analysis of Physicians’ Perspectives versus Patients’ Preferences: Direct Assessment and Discrete Choice Experiments in the Therapy of Multiple Myeloma. In: The European Journal of Health Economics 12(3)/2011, pp. 193–203. Ryan, M./Gerard, K./Amaya-Amaya, M.: Using Discrete Choice Experiments to Value Health and Healthcare. The Economics of Non-Market Goods and Resources. New York 2008. Severin, F. et al.: Eliciting Preferences for Priority Setting in Genetic Testing: A Pilot Study Comparing Best-Worst Scaling and Discrete-Choice Experiments. In: European Journal of Human Genetics 21(11)/2013, pp. 1202–1208.

Peter Langkafel

18 Visualization – What Does Big Data Actually Look Like? Content 1 “… And Then We’ll Visualize the Data…”  223 2 Visualization and the Consequences Based on the Example of PSA  225 Literature  229

1 “… And Then We’ll Visualize the Data…” How quickly and often have we thought, said and done that? Just take an IT tool and the dashboard’s complete. Just go to Excel, click “create graph” – done!? This is where the well-known saying applies: A picture is worth a thousand words. So might it actually have less to do with what is being shown and more with how? What impact does the form of presentation have – could it be bigger than we generally think? Spiegelhalter published an article titled “Visualizing uncertainty about the future” in the prestigious Science journal.¹ Probabilities and risks are difficult to communicate – especially if the target group is very heterogeneous (age, education, etc.) – as is often the case in medicine. One of the most influential visualizations in history is the “rose-like” representation by Florence Nightingale from 1855. This representation had a significant impact on how soldiers were treated during a battle and what value was placed on the sanitary conditions and – especially in combination with the fascinating work of this extraordinary woman – on the world’s health systems. Visualizations in the area of selective interventions, such as screening tests, were and are often used. According to Bayes’ theorem, the probabilities are represented as natural frequencies (rather than relative frequencies).² Not all visualizations are alike – and, depending on perspective, can make a decision seem easier or more complicated. 1 Spiegelhalter/Pearson/Short: Visualizing Uncertainty About the Future. In: Science 333/2011, pp. 1393–1400. 2 Conditional probability defines the probability of Event A occurring under the condition that Event B has already occurred. Bayes’ Theorem (named after Reverend Thomas Bayes (1702–1761) and first mentioned in his essay “An Essay towards solving a Problem in the Doctrine of Chances”, 1763), enables the figuring with conditional probabilities and in particular the inversion of conclusions.

224  Peter Langkafel

Source: Wikipedia Legend: gray: deaths from infectious diseases; light gray: deaths from wounds; black: other Fig. 18.1: Florence Nightingale’s chart on causes of death during the Crimean War

The Cambridge professor Spiegelhalter describes key points and best practices that are relevant for the visualization of risks and probabilities.³,⁴ Among other things, he recommends the following: – Using different formats of visualization, since no single representation will be equally good for all viewers. – Providing graphics with words and numbers. – Creating graphics to enable a comparison of the individual with the whole – particularly also to show absolute and relative risk. – Using vivid examples, which however, do not represent a bias in themselves. – Representing the quality level and “evidence base” of a decision (“how verified is the information?”). – Avoiding certain types of graphics if possible – for example, three-dimensional representations of columns or tables – as they carry a higher potential for manipulation and misunderstanding.

3 Spiegelhalter/Pearson/Short: Visualizing Uncertainty About the Future. In: Science 333/2011, pp. 1393–1400. 4 See also: http://understandinguncertainty.org/view/animations – Risk assessment paired with a little humor.

18 Visualization – What Does Big Data Actually Look Like?



225

His main recommendation is to put the needs of the user in the foreground and to experiment with and subsequently test the final design. (“Most important, assess the needs of the audience, experiment, and test and iterate toward a final design” [in English in the original]).

2 Visualization and the Consequences Based on the Example of PSA That these visualizations are more than mere gimmicks is evident in the example of prostate screening with PSA (prostate specific antigen). In an article in the Süddeutsche Zeitung from March 12, 2010, Richard Ablin, the discoverer of PSA, called the PSA test “hardly more effective than a coin toss” and described it as a profit-driven disaster for the healthcare system. A study by the Harding Center for Risk Literacy shows how little the citizens of Europe generally know about the benefits and harms of screening.⁵ In the study carried out by Gigerenzer, a representative group of 10,228 people from nine European countries was questioned.⁶ In summary: This study showed that most of the citizens of nine European countries are unaware of the use of mammography and the PSA test, including the women and men between 50 and 69 years, for whom these tests are often recommended. However, in order to be able to make informed and rational decisions, adequate knowledge of the benefits is essential. For citizens of all countries involved in this study, neither the frequent consultation of health brochures nor of their primary care physicians were associated with a better understanding of the benefits. On the contrary, the general trend was a slightly positive correlation between the overestimation of the benefits of these procedures and consultation frequency of primary care physicians, as well as reading of health brochures. This drastic experience led to the idea of “fact boxes” developed by Lisa Schwartz and Steven Woloshin. In several studies they showed that this enables the general population to successfully inform itself about the benefits and risks of medical treatments.⁷

The example of prostate screening highlights the importance of visualization particularly well. However, numbers and data can be displayed in very different ways.

5 Harding Center for Risk Literacy: Nutzen und Risiken der Prostatakrebs-Früherkennung. Online: http://wp1146788.server-he.de/index.php/de/was-sie-wissen-sollten/facts-boxes/psa, [accessed on: August 12, 2014]. 6 Gigerenzer et al.: Wie informiert ist die Bevölkerung über den Nutzen der Krebsfrüherkennung? Europaweite Studie erfasst Kenntnisstand. In: Onkologie heute 5/2009. 7 Gaissmaier/Arkes: Psychological research and the prostate-cancer screening controversy. In: Psychological Science Online First 23(6)/2012, pp. 547–553.

226  Peter Langkafel

Source: Harding Center for Risk Literacy. Fig. 18.2: Prostate screening

For all those who wish to try out the effects of these so-called Icon Arrays: An online generator that represents statistics accordingly is available at http://www.iconarray.com/reset. “Numbers Can Be Worth a Thousand Pictures: Individual Differences in Understanding Graphical and Numerical Representations of Health-Related Information” is the title of the 2012 study by Gaissmaier.⁸ The key terms here are “iconicity” (the study of how the visualization of a relationship is associated with the expected understanding) and “numeracy” (the ability to understand numerical information). In addition, “verbatim knowledge” was also examined, i. e., primary language comprehension per se. In summary, this study concluded that different patient groups interpret specific correlations differently, making “one size fits all” particularly difficult. (“The findings of this study have important practical implications, because they clearly

8 Gaissmaier et al.: Numbers Can Be Worth a Thousand Pictures: Individual Differences in Understanding Graphical and Numerical Representations of Health-Related Information. In: Health Psychology 31/(3)/2012.

18 Visualization – What Does Big Data Actually Look Like?



227

demonstrate that not everyone can be successfully informed using the same mode of representation.”) In Germany, the Bertelsmann Foundation provides service boxes that offer decision-making aids for a number of diseases (e.g., depression, knee surgery, cesarean section)⁹ and uses its own visual language to inform patients.¹⁰

Source: Bertelsmann Stiftung ©Faktencheck Gesundheit/ Bertelsmann Stiftung.

Fig. 18.3: Fact box

However, the “Faktencheck Gesundheit – Entfernung der Gaumenmandeln bei Kindern und Jugendlichen” (Facts Health Check – removal of tonsils in children and adolescents) comprises 124 A4 pages,¹¹ and we can permit ourselves to ask whether it’s really much easier to make a decision if issues such as the detailed regional

9 See: https://depression.faktencheck-gesundheit.de/tipps-fuer-betroffene/nutzen-und-risiken-dertherapien/, [accessed on: August 12, 2014]. 10 Other countries partially have national organizations that deal with the issue of risk communication (For example the NICE National Institute of Clinical Excellence or NHS 24 in Scotland). 11 Bertelsmann Stiftung: Faktencheck Gesundheit – Entfernung der Gaumenmandeln bei Kindern und Jugendlichen. Online: https://mandeloperation.faktencheck-gesundheit.de/fileadmin/daten_fcm/ Dokumente/FCM_Report_Web.pdf, [accessed on: August 12, 2014].

228  Peter Langkafel

analysis of operating frequency in Germany are shown graphically. It is clear that medical issues that have a high level of evidence (according to evidence-based medicine these are studies that are supported by meta-analyses or high-quality studies) can be represented in “fact boxes” much more easily.¹² Class

Requirements for studies

Ia

Evidence obtained from a systematic review of randomized, controlled studies (possibly with meta-analysis)

Ib

Evidence obtained from atleast one highly qualified, randomized controlled study

Iia

Evidence obtained from atleast one well-designed controlled study without randomization

Iib

Evidence obtained from atleast one well-designed quasi experimental study

III

Evidence obtained from well-designed non-experimental descriptive studies

IV

Evidence obtained from reports/opinions by expert circles, consensus conferences and/or clinical experiences from recognized authorities

Source: Author's illustration. Fig. 18.4: Evidence levels

However, when it comes to more complex medical decisions, an individual patient in his or her environment cannot be grasped with the concept of evidence level 1: “Evidence obtained from a systematic review of randomized controlled trials (possibly with meta-analyses.).”¹³ Especially when it comes to patients with chronic diseases with multimorbid characteristics, individual considerations are necessary. Evidence-based medicine is always required, but it does in fact include a certain scope that may not always be represented in simple checkboxes. Critics may also object that these simplifications have a tendency to trivialize. However, these fact boxes are specifically designed as a support in the decision-making process, and are thus intended to support a decision shared between doctor and patient (informed consent). This relatively young branch of research – i. e., the visualization of medical data, especially for patients – cannot be represented fully and comprehensively here. However, it is clear that representation and visualization of Big Data in medicine contribute significantly to understanding, and require their own focus.

12 & 13 Cf. the presentation of evidence levels at: http://www.ebm-netzwerk.de/was-ist-ebm/ images/evidenzklassen.jpg/view, [accessed on: August 12, 2014].

18 Visualization – What Does Big Data Actually Look Like?



229

Literature Bertelsmann Stiftung: Faktencheck Gesundheit – Entfernung der Gaumenmandeln bei Kindern und Jugendlichen. Online: https://mandeloperation.faktencheck-gesundheit.de/fileadmin/daten_fcm/ Dokumente/FCM_Report_Web.pdf, [accessed on: August 12, 2014]. Gaissmaier, W./Arkes, H. R.: Psychological Research and the Prostate-Cancer Screening Controversy. In: Psychological Science Online First 23(6)/2012, pp. 547–553. Gaissmaier, W. et al.: Numbers Can Be Worth a Thousand Pictures: Individual Differences in Understanding Graphical and Numerical Representations of Health-Related Information. In: Health Psychology 31/(3)/2012. Gigerenzer, G. et al.: Wie informiert ist die Bevölkerung über den Nutzen der Krebsfrüherkennung? Europaweite Studie erfasst Kenntnisstand. In: Onkologie heute 5/2009. Spiegelhalter, D./Pearson, M./Short, I.: Visualizing Uncertainty About the Future. In: Science 333/ 2011, pp. 1393–1400.

Peter Langkafel

19 The Digital Patient? Big Data in medicine – how does it feel, what does it look like in reality, what is relevant? There is a lot of information out there – often incomplete, not digital, sometimes not easy to interpret and often contradictory. And it usually contains a lot of data… An exemplary range of patient-relevant basic data is presented here – without claiming to be exhaustive! The digital patient – the (still) unknown creature?

88 % of German doctors believe that patients should not have full access to their digital patient records.¹ 70 % of German patients, on the other hand, believe they should have full access to their data. In order to gain such access, 43 % of patients in Germany would change their doctor. 1000 patients in Germany were interviewed. 48 % of hospitals in the EU exchange medical information electronically with external physicians, 70 % with external service providers.² 24 % of physicians surveyed postponed or called off the treatment of patients for cost reasons in 2013.³ 90 % of doctors are in favor of asking for a second opinion, while 3 % of doctors are not. Around 10 % of the doctors surveyed agreed with the proposition that doctors should tell their patients if they have made a mistake in treatment, while roughly 76 % did not. In 2007, 50 % of general practitioners in Europe used electronic health services; in 2013 it was 60 %.⁴ The reasons doctors were not using electronic services were: – No compensation (79 %) – Lack of IT skills (72 %) – Lack of interoperability of the systems (73 %) – Lack of legal framework (71 %) Deloitte Consulting predicts 100 million e-visits for the year 2014.⁵ 75 % of the population expects to use digital services in health care in the future.⁶ Almost 30 % of the

1 Accenture: Patient survey 2013. Online: http://www.accenture.com/de-de/company/newsroomgermany/Pages/patients-survey-germany-2013.aspx, [accessed on: August 26, 2014]. 2 Grätzel von Grätz: Patient 2.0. In: E-Health-Com 4/2014, pp. 14–18 3 Online: http://de.statista.com/; Search term: Gesundheit, [accessed on: August 22, 2014]. 4 SMART Studie: Benchmarking Development of ehealth among general practitioners, 2013. 5 Deloitte: TMT Predictions 2014.Online: http://www.deloitte.com/view/en_CA/ca/industries/tmt/ tmtpredictions-2014/index.htm?lgtog=lgtog, [accessed on: August 26, 2014]. 6 McKinsey & Company: Healthcare’s Digital Future 2014. Online: http://www.mckinsey.de/ sites/mck_files/files/140711_digital_healthcare.pdf, [accessed on: August 26, 2014].

232  Peter Langkafel

insured consult four or more different outpatient physicians per year.⁷ An estimated 40 million people in German-speaking countries use the Internet to research disease issues, to exchange information and collaborate with people with similar interests.⁸ Those who use the Internet for medical purposes in Germany are on average 59 years old; most respondents were between 44 and 74 years of age.⁹ The most searched-for issues are disorders of the musculoskeletal system, with a 30 % share, followed by cardiovascular diseases (23 %) and diabetes, obesity (17 %). As expected, rare diseases as well as mental and emotional disorders are overrepresented: 3166 Internet users completed an online questionnaire for this purpose. The health portals of magazines such as Focus or Apotheken Umschau are consulted approximately as often (43 %) as the websites of health insurance companies (39 %) and hospitals (38 %) as well as health portals not linked to any magazines (38 %). 37 % of searches concern diagnoses, 26 % overall health issues, and 14 % symptoms. Only 9 % of the Internet searches are directly therapy-related. Search engines are the starting point for health-related searches in 58 % of cases. But portals are also directly accessed: At least 27 % of the participants in the survey said that they specifically go to portals they know rather than relying on non-targeted search engines. 79 % say that they understand their disease better, thanks to Internet research; 63 % feel supported by online information in the practical everyday course of their disease. More than half say they are better able to deal with their disease mentally and emotionally thanks to the Internet. On the other side of the spectrum, 19 % say they do not feel supported by the online information. And 14 % were more confused afterwards than they were before.¹⁰ In 2013, the growth rate of investment in IT in healthcare was 3.3 %.¹¹ In the industry report the total turnover of the health IT industry for 2012 is estimated to be EUR 1.6 billion. The total remuneration of all 334,000 independent physicians in 2010 was EUR 31.4 billion. The inflation rate in Germany was 1.5 % in 2013.¹² Between 2005 and 2013, 1446 defect reports were recorded, relating to risk in medical devices caused by instruction and information errors (user guides, labeling).¹³ According to the study there were 86 self-caused “software problems,” such as incorrect assignment of patients/patient data, incorrect treatment requirements (for example, radiotherapy parameters), and other evaluation or documentation errors. In an Internet-based self-management program for asthma

7 Schmidt, J: Die Suche nach den unbekannten Daten. In: Monitor Versorgungsforschung, 2/2014. 8 Schachinger: Der digitale Patient. 2014. 9 Friedrichsen/Schachinger: EPatienten Studie 2014 in Deutschland: Was machen 40 Millionen deutsche GesundheitsSurfer und Patienten im Internet? Key messages can be found at: http://epatient-research.com/; [accessed on: August 27, 2014]. 10 Grätzel von Grätz: Patient 2.0. In: E-Health-Com 4/2014, pp. 14–18 11 BVITG: Branchenbericht. IT‐Lösungen im Gesundheitswesen 2014. Online: http://www.bvitg.de/ marktuntersuchungen.html, [accessed on: 27.8.2014]. 12 Online: http://de.statista.com/; Suchbegriff: Gesundheit, [accessed on: 22.8.2014]. 13 Bundesinstitut für Arzneimittel und Medizinprodukte, 2014.

19 The Digital Patient?



233

patients with a randomized-controlled distribution of 200 adults, van der Meet and Assendelft were able to show a significant improvement for the participants in the online program (better quality of life, fewer symptoms).¹⁴ The estimated number of patient safety events in German hospitals according to type of incident in 2011: – Adverse events: 1,350,000 – Preventable adverse events: 540,000 – Treatment errors: 188,000 – Fatal cases: 18,800 Roughly 2 billion people are transported by airplane every year. When it comes to plane crashes in the field of civil aviation, about 500 people die in severe air accidents around the world per year.¹⁵ Every year 3.2 million people die due to a lack of exercise (“physical inactivity”),¹⁶ and 150 million people per year will face massive economic difficulties because they have to pay their medical treatment themselves.¹⁷ The WHO estimates a minimum cost of USD 44.00 per person per year to ensure “basic, life-saving services.”

Literature Accenture: Patient survey 2013. Online: http://www.accenture.com/de-de/company/newsroomgermany/Pages/patients-survey-germany-2013.aspx, [accessed on: August 26, 2014]. BVITG: Branchenbericht. IT Lösungen im Gesundheitswesen 2014. Online: http://www.bvitg.de/ marktuntersuchungen.html, [accessed on: August 27, 2014]. Deloitte: TMT Predictions 2014. Online: http://www.deloitte.com/view/en_CA/ca/industries/tmt/ tmtpredictions-2014/index.htm?lgtog=lgtog, [accessed on: August 26, 2014]. Friedrichsen/Schachinger: EPatienten Studie 2014 in Deutschland: Was machen 40 Millionen deutsche GesundheitsSurfer und Patienten im Internet? Key messages can be found at: http:// epatient-research.com/; [accessed on: August 27, 2014]. Grätzel von Grätz, P.: Patient 2.0. In: E-Health-Com 4/2014, pp. 14–18. McKinsey & Company: Healthcare’s Digital Future 2014. Online: http://www.mckinsey.de/sites/ mck_files/files/140711_digital_healthcare.pdf, [accessed on: August 26, 2014]. Schachinger: Der digitale Patient. 2014. Schmidt, J: Die Suche nach den unbekannten Daten. In: Monitor Versorgungsforschung, 2/2014. SMART study: Benchmarking Development of ehealth among General Practitioners, 2013.

14 van der Meet/Assendelft et al.: Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma. Online: http://www.plosone.org/article/info%3Adoi%2F10.1371 %2Fjournal.pone.0027108, [accessed on: 27.8.2014]. 15 Wikipedia: Flugzeugabsturz. Online: http://de.wikipedia.org/wiki/Flugzeugabsturz, [accessed on: 26.8.2014]. 16 & 17 World Health Summit: Yearbook 2013. Online: http://www.worldhealthsummit.org/fileadmin/downloads/2013/WHS_2013/Publications/WHS_Yearbook2013_web.pdf, [accessed on: 26.8.2014].

234  Peter Langkafel

Statista. Online: http://de.statista.com/; search term: Gesundheit, [accessed on: August 22, 2014]. van der Meet/Assendelft, et al.: Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma. Online: http://www.plosone.org/article/info%3Adoi%2F10.1371 %2 Fjournal.pone.0027108, [accessed on: August 27, 2014]. Wikipedia: Flugzeugabsturz. Online: http://de.wikipedia.org/wiki/Flugzeugabsturz, [accessed on: August 26, 2014].

Publisher and Index of Authors Published by Dr. med. Peter Langkafel, MBA Born in 1968, studied human medicine and has worked in, among other things, obstetrics and general medicine. In addition to “digital projects” in research, clinical practice and teaching, he also studied medical informatics which he completed with a Master of Business Administration (MBA). He was the founder and CEO of a healthcare IT startup. At SAP AG, he is General Manager of Public Sector and Healthcare for the Middle and Eastern Europe (MEE) region, and in that role advises national and international clients from the healthcare industry in their strategic (IT) orientation. Previously, he was responsible for the strategic development of Charité Berlin. Peter Langkafel is chairman of the Professional Association of Medical Informatics Scientists (Berufsverband Medizinischer Informatiker e. V. [BVMI e. V.]) for the Berlin/ Brandenburg region and lecturer at the School of Economics and Law in Berlin.

With contributions by Dipl.-Inform. Med. Timo Tobias Baumann Born in 1974, MBA, Vice President of the Portfolio Management Clinic at Deutsche Telekom Healthcare & Security Solutions GmbH (DTHS).

Thomas Brunner Born in 1967, diploma in economics, has worked for AOK since the end of the 90’s, on various projects such as the implementation of SAP BW in the framework of SAM/oscare®. Currently working as Product Manager for Business Intelligence at AOK Systems GmbH.

Birger Dittmann Born in 1985, IT business engineer (B.Sc.), Business Intelligence & Data Warehouse Developer at OptiMedis AG, Hamburg.

236  Publisher and Index of Authors

Dr. Werner Eberhardt MBA Born in 1963, global responsibility for business development and strategic clients and partners in healthcare as well as in the pharmaceutical and biotechnology industries at SAP AG. PhD in proteomics at the Max Planck Institute for Biochemistry in Martinsried.

Dipl. Inform. Michael Engelhorn Born in 1949, medical computer scientist, CIO and CTO of ExperMed GmbH Berlin, www.expermed.de. Co-founder of CARS, TELEMED Berlin and KIS-RIS-PACS conventions in Mainz. Member of BVMI.

Alexander Fischer Born in 1984, health economist (M. Sc. with a focus on political science and networked supply structures in healthcare). Health Data Analyst at OptiMedis AG, Hamburg.

Dr. med. Thilo Grüning Anesthetist and intensive care specialist, Master of Science in Health Services Management. Since 2010 Head of Quality Assurance and Intersectoral Care Concepts for the Federal Joint Committee, Berlin. Helmut Hildebrandt Born in 1954, pharmacist and health scientist. Member of the management board of OptiMedis AG, Managing Director of Gesundes Kinzigtal GmbH, Co-Chairman of the Health Commission of the Heinrich Böll Foundation, board member of the German Managed Care Association (BMC).

Renate Höchstetter Physician in Dermatology and Venereology, Master of Public Health, Master of Business Administration. Since 2010 Deputy Head of Quality Assurance and Intersectoral Care Concepts of the Federal Joint Committee, Berlin.

Publisher and Index of Authors



237

Anika Kaczynski, M.Sc. Master of Science in Public Health and Administration in 2013. Since 2013 research associate at the Institute for Health Economics and Medicine Management (IGM) at the University of Applied Sciences, Neubrandenburg.

Harald Kamps Born in 1951, studied medicine in Bonn. Worked in Norway between 1982 and 2002 as a general practitioner, project manager and university lecturer. Working in Berlin since 2002 as a general practitioner and head of a primary care center since 2005 (www.praxis-kamps.de).

Dr. med. Sebastian Krolop, M.Sc. Born in 1971, Director of Accenture Strategy Health for Germany, Austria and Switzerland. Author of the European Hospital Rating Report, various hospital rating reports and the rehabilitation and nursing home rating report, which are published every year in cooperation with the RWI. Qualified doctor (PhD) with master’s degree in healthcare management in the health sector. Member of the WorldWide Board of Directors of the HIMSS and the International Health Economics Association. Lecturer at the Hochschule Fresenius.

Peer Laslo Born in 1963, Business Economist and Management Consultant at SAP Germany AG & Co. KG, Business Development Manager in the process industry with a focus on the production and product tracking of pharmaceutical products.

Dr. rer. oec. Axel Mühlbacher Professor for Health Economics & Medicine Management at the University of Applied Sciences Neubrandenburg, Institute of Health Economics and Medicine Management. Since 2012 Senior Research Fellow at the Center for Health Policy & Inequalities Research of the Duke Global Health Institute, Duke University, USA. In 2010‒2011 Harkness Fellow in Healthcare Policy and Practice at the Duke Clinical Research Institute and the Fuqua School of Business at the Duke University, USA. Between 2009 and 2013 head of the “Conjoint Analysis” pilot study for the Institute for Economy and Quality in Healthcare (IQWiG).

238  Publisher and Index of Authors

Prof. Dr. Albrecht von Müller Head of the Parmenides Center for the Study of Thinking; teaches philosophy at the Ludwig-Maximilians-University of Munich (LMU). His main fields of interest are the concept of time and the phenomenon of thinking. After his doctoral thesis on “Time and Logic” at LMU in 1982, he worked for several years at the Max Planck Society and taught philosophy at the University of Munich. In the 1980 s he worked on international security and arms control. Von Müller developed the EIDOS methodology for visual thinking for a better promotion of complex thought and complex decision-making processes. Von Müller is an external member of two multidisciplinary research institutions of the University of Munich: the Human Science Center and the Munich Center for Neuroscience. He is also a member of the Board of Trustees of the Max Planck Institute of Neurobiology and Biochemistry. Martin Peuker Born in 1973, Deputy CIO of the Charité University Medicine Berlin, consultant in various institutions of the European healthcare industry. The Industrial Engineering graduate began his career at Mummert and Partner, Siemens AG and is an expert in the areas of reporting for hospitals and inMemory databases such as SAP HANA.

Dr. Alexander Pimperl Born in 1983, Health Economist, Head of Controlling & Health Data Analytics at OptiMedis AG, Hamburg.

Karola Pötter-Kirchner Master of Public Health with a degree in social pedagogy. Since 2010 lecturer in the Department of Quality Assurance and Intersectoral Care Concepts of the Federal Joint Committee, Berlin. Dr. Rainer Röhrig Born in 1970, physician and medical informatics scientist. Head of the Department of Medical Computer Science in Anesthesia and Intensive Care, a member of the Ethics Committee of the Faculty of Medicine, Justus Liebig University. Chairman of the technology and methods platform for networked medical research (TMF e. V.).

Publisher and Index of Authors



239

Hartmut Schaper Born in 1962, mathematician and computer scientist (MSc). Over 25 years of experience in the areas of software and IT in various positions (Principal Consultant, Head of SW Development, CTO (Chief Technology Officer) at companies such as Boston Consulting Group, IXOS AG, SAP AG). Today Senior Vice President and Head of Health Services International, Siemens AG, Healthcare Sector, Erlangen.

Dr. Josef Schepers Born in 1959, physician and health economist. Has worked on a variety of projects in healthcare informatics and health system management. Cooperative member of the Institute for Healthcare Systems Management (HCMB) Berlin eG. Employed at TMF e. V. (Technology and methods platform for networked medical research).

Timo Schulte Born in 1983, degree in business management and MBA in health management. Health Data Analyst at OptiMedis AG, Hamburg.

Florian Schumacher Born in 1980, Florian Schumacher is a consultant and founder of Quantified Self Germany and trendscout of Wearable Technologies AG. An engineer and trained design thinker, Florian Schumacher deals with digital sports, health and wellness products, and their economic and social innovation potential. Schumacher advises companies in the design, development and implementation of Quantified Self software and hardware. As the host of Quantified Self events, keynote speaker and author, Schumacher, a selftracking pioneer, promotes discussion on the collection and use of personal data. He deals with the latest trends and developments in the area of wearables and quantified self in his blog igrowdigital.com. Henri Souchon, M.Sc. Born in 1984, a consultant at Accenture GmbH in the areas of healthcare and public administration. Focus on hospital benchmarking and coauthor of the Accenture European Hospital Rating Report. Master of Science (M.Sc.) in management from the University of Edinburgh and Bachelor of Science (B.Sc.) in economics from the University of Münster.

240  Publisher and Index of Authors

Dr. Axel Wehmeier Born in 1966, CEO of Deutsche Telekom Healthcare & Security Solutions GmbH (DTHS).

Dr. Thilo Weichert Born in 1955, lawyer and political scientist/State Commissioner for Privacy Protection Schleswig-Holstein and thus Head of the Independent Center for Privacy Protection, Kiel, www.datenschutzzentrum.de. Prof. Dr. Markus Alexander Weigand Born in 1967, Director of the Clinic for Anesthesiology, Operational Intensive Medicine and Pain Therapy at the Justus Liebig University/University Clinic Giessen und Marburg GmbH, located in Gießen.

Pascal Wendel Born in 1972, qualified IT specialist for application development. Business Intelligence & Data Warehouse Developer, Data Analyst, System and Database Administrator at OptiMedis AG, Hamburg.

Martin Wetzel Born in 1958, registered specialist for general medicine. Chairman of Medizinisches Qualitätsnetz Ärzteinitiative Kinzigtal e. V.

Marcus Zimmermann-Rittereiser Born in 1958, graduate engineer in technical healthcare with Master of Business and Marketing (MBM) in the Department of Economics of the Free University of Berlin. Over 26 years of experience in the areas of International Healthcare and Healthcare IT at Siemens AG. Today Head of Strategy, Health Services International, Siemens AG, Healthcare Sector, Erlangen.

Glossary 23 Things You’ve Always Wanted to Know About Statistics ¹ 1. Natural frequency The simplest method of evaluating the occurrence of events or characteristics is to simply count how often they occur. In contrast to probabilities and relative frequencies, natural frequencies are quasi raw data, that is, they are not normalized with respect to the basic share of the event or characteristic. For example: A doctor examines 100 people, 10 of which have a particular disease, but 90 of which do not. Of the 10 with the disease, 8 show particular symptoms, whereas 4 of the 90 that do not have the disease also show the symptoms. These 100 cases can now be divided into four different groups: disease and symptoms: 8; disease and no symptoms: 2; no disease and symptoms: 4; no disease and no symptoms: 86. These four numbers are the natural frequencies. 2. Average A measure of the central tendency of a number of measurements and observations. The average usually refers to the arithmetic mean, but sometimes also to the median. For example: The annual income of five agents is EUR 80,000, EUR 90,000, EUR 100,000, EUR 130,000 and EUR 600,000. The arithmetic mean of these amounts – i. e., their sum divided by their number – is EUR 200,000. If one arranges the individual values in increasing order (as here), the median is the value that has an equal number of values on either side of it, in this case EUR 100,000. If the distribution is asymmetrical, which is often the case with income, then arithmetic mean and median will differ from each other. It is therefore possible that most people have an income that is below average because a few have very high incomes. Here’s an amusing fact: Did you know that almost all people have more legs than the average? 3. Risk If the uncertainty associated with an event or characteristic can be evaluated on the basis of empirical observations or causal knowledge (design), this is called risk. Frequencies and probabilities can express risks. Unlike its use in everyday language, the term risk does not necessarily have to be associated with adverse effects or consequences, but can equally refer to a positive, neutral or negative event or attribute.

1 This is partially an excerpt from the definitions by the Harding Center for Risk Literacy. Online: https://www.harding-center.mpg.de/de/gesundheitsinformationen/wichtige-begriffe, [accessed on: August 26, 2014] as well as Wikipedia entries.

242  Glossary

4. Conditional probability The probability that an event A occurs when event B has occurred is usually written as p(A|B). An example of this is the probability that a screening mammogram will be positive if breast cancer is present; it is around 0.9 %. In contrast, p(A) is not a conditional probability. Conditional probabilities are often misunderstood in two different ways. The first is that the probability of A under the condition of B is confused with the probability of A and B. The other is that the probability of A under the condition of B is confused with the probability of B under the condition of A. These errors can be avoided by replacing the conditional probabilities with the natural frequencies. 5. Absolute risk reduction A measure of the effectiveness of a treatment (or behavior). It refers to the proportion of people who have been healed or saved by the respective treatment. If, for example, a therapy can reduce the number of deaths caused by the disease in question from 6 to 4 out of 1,000 patients, then the absolute risk reduction is two out of 1,000 or 0.2 %. 6. Relative frequency One of the three major interpretations of probabilities (in addition to the degree of conviction and design). The probability of an event is defined as the relative frequency in a reference quantity. Historically, frequencies found their way into probability theory through the mortality statistics, which in turn formed the basis of life insurance calculations. Relative frequencies are limited to repeated events that are observable in large numbers. 7. Relative risk reduction A measure of the effectiveness of a treatment. It refers to the proportion of patients that were saved by the respective treatment. For example: A therapy lowers the proportion of those dying from a disease from 6 to 4 out of 1,000. This means the relative risk reduction amounts to 2 out of 6 or 33 %. The relative risk reduction is often given because its numerical value is greater than that of the absolute risk reduction (in the same example this would be two out of 1,000 or 0.2 %). When specifying the relative values it remains unclear how big the risk actually is, which often leads to misinterpretations or misunderstandings. If, for example, a therapy reduces the number of deaths from 6 to 4 in 10,000 (instead of 1000), then the relative risk reduction, at 33.3 %, is the same, although the absolute risk reduction is now only 0.02 %. 8. Reliability This is a quality criterion that indicates the certainty with which a repetition of the test would deliver the same results under other conditions – for example in the case of repeated measurements. High reliability is necessary but does not guarantee high validity.

Glossary



243

9. Sensitivity The sensitivity of a test for a disease is the proportion of individuals that tested positive out of all the tested persons who have the disease in question. The sensitivity is thus equal to the conditional probability p(positive I sick) of a positive test result, if the disease is present. Sensitivity and false-negative rate add up to 100 %. The sensitivity is also called the “hit rate.” 10. Specificity (Literally: “peculiarity”, “special feature”). The specificity of a test for a disease is the proportion of individuals who tested negative of all the tested persons who do not have the disease in question. The specificity is thus equal to the conditional probability p(negative I not sick) of a negative test result, if the disease is not present. Specificity and false-positive rate add up to 100 %. 11. Independence Two events or attributes are independent of one another if knowledge of one of the events or attributes says nothing about whether the other will occur or exist. Two events A and B are formally independent of each other, if the probability p(A and B) that both will occur is equal to the product of p(A) and p(B), i. e., equal to the product of the probability of the two events. Independence plays a role, for example, if the match between the DNA profile of a suspect and a trail of blood at a crime scene has to be assessed. Assuming that only one in 1,000,000 people will show such a match, the probability will be one in 1,000,000 that the DNA profile of a randomly selected person will match that of the trail of blood. But if the suspect has an identical twin, then – not taking into account possible evaluation errors – the probability that the twin will show a match will be one instead of 1:1,000,000. And if the suspect has brothers, the probability of a match for them will also be significantly higher than for the general population. This means that the probability of a DNA match is not independent of whether people are related to each other or not. 12. Validity A criterion that indicates how well a test measures what it purports to measure. A high reliability is necessary, but not sufficient for high validity. 13. Number needed to treat (NNT) This is the number of patients that have to be treated or screened in order to save one human life. Therefore, the NNT is a measure of the effectiveness of a therapy. For example: If a two-year mammography screening saves the life of one in 1000 participating women, then the NNT is equal to 1000. Put another way: The remaining 999 women will not benefit in terms of mortality reduction. However, you can also specify NNT if you want to measure the risk of treatment. If, for example, thromboembolism occurs in one in 7000 women who take birth control pills, the NNT for

244  Glossary

birth control pills and thromboembolism is equal to 7000. In other words: 6999 women will not display this side effect. 14. Number needed to harm (NNH) The NNT is put into relation using the NNH, i. e., the number of treatments necessary to reach the desired therapy goal in a patient is compared to the number of treatment procedures necessary to cause the patient damage. 15. False-negative rate The proportion of negative tests in people with the disease (or feature) in question is called the false-negative rate. It is usually expressed as a conditional probability and normally stated in percent. For mammography screening, for example, it is between 5 and 20 %, depending on the age of the tested women. That means that in 5 to 20 % of the examined women with breast cancer, the test result was negative, i. e., the cancer was overlooked. False-negative rate and sensitivity of a test add up to 100 %. 16. False-positive rate The proportion of positive test results in people without the disease (or feature) in question is called the false-positive rate. It is usually expressed as a conditional probability and normally stated in percent. For mammography screening, for example, it is between 5 and 10 %, depending on the age of the tested women. This means that 5 to 10 % of the tested women without breast cancer had a positive result, i. e., a suspected carcinoma was detected although none actually exists. False-positive rate and specificity of a test add up to 100 %. The false-positive rate and false-negative rate of a test depend on each other: If you decrease one, you generally increase the other. 17. Placebo effect A placebo is a “fake drug” that contains no medicinal product and can thus have no pharmacological effect caused by any such substance. In a broader sense, other fake interventions are also called placebo, for example, fake operations. Placebo effects are positive changes in the subjective condition and objectively measurable physical functions that are attributed to the symbolic importance of treatment. They can occur with any kind of treatment, not only in fake treatments. Placebos are used in placebo-controlled clinical trials in order to detect the therapeutic efficacy of different procedures, called verums, as accurately as possible. 18. Nocebo effect The nocebo effect (from the Latin nocere: to harm, I shall harm) is – similarly to the placebo effect (from the Latin placebo: I will please) – an apparent negative effect of a drug. It refers to an effect on the well-being or health of a patient caused by a substance or measure, or rumored substance or measure that changes the environment of

Glossary



245

a patient. In contrast to the positive effect of the placebo effect, the nocebo effect produces a negative reaction. The nocebo effect was discovered when the administration of drugs without any active agents – so-called placebos – caused negative pathogenic effects in patients. 19. Nominal scale Various properties, no predetermined order (e.g., gender, location). 20. Ordinal scale The values can be sorted, but you cannot specify distances between them (e.g., rankings, school grades). 21. Interval scale The distance between two values can be measured; the zero point is set arbitrarily (e.g., dates, temperature in °C). 22. Ratio scale There is a natural zero point, so you can specify both the difference and the ratio of two values (e.g., age, income). Such data provides the most information. 23. Innumeracy The inability to deal with numbers correctly. In the context of statistics, this is the inability to correctly represent and assess uncertainties. It manifests itself in an uncertainty about risks, in confusing conveying of risks and nebulous thinking. Like dyslexia, innumeracy is also curable, as it is by no means a mere “internal” mental weakness, but is at least partially externally produced or provoked by an inadequate presentation of the respective values. Therefore an external remedy is possible.

Testimonials Due to the increasing digitization, storage and use of healthcare data, Big Data will have a dramatic effect on the healthcare industry. PD Dr. Günter Steyer, Honorary Chairman of the German Society for Health Telematics (Deutschen Gesellschaft für Gesundheitstelematik [DGG]) and Honorary Member of the Professional Association of Medical Informatics (Berufsverband Medizinischer Informatiker [BVMI]).

Being able to provide all the necessary information at any time during the treatment process and to interlink it accordingly will be critical for future medical care. PD Dr. med. Lutz Fritsche MBA, Chief Medical Officer at Paul Gerhard Diakonie The growth of data within the information-driven healthcare industry is enormous. The diffusion of modern IT technology is progressing continuously: Hospitals are digitizing more and more paper-based processes, and patients and citizens are increasingly using healthcare apps. The result is that the data volumes continue to rise dramatically, but are only available in separate forms. Big Data provides the opportunity of interlinking this data to generate efficiency potential for both the primary and secondary healthcare market and to create innovations that are sustainably affordable. Dr. Michael Reiher, Professor for HealthCare Management Optimal intra- and inter-organizationally oriented collaborations must be effectively controlled in order to function. Effective controllability requires comprehensive and standardized information in “real time” – “Big Data” is one of the most important, if not the most important, levers! Dr. Pierre-Michael Meier, CEO of März AG and Deputy Spokesperson of IuiG-Initiativ-Rat ENTSCHEIDERFABRIK

Providing well-structured, large amounts of data in high quality for a large variety of application scenarios in medicine is a key issue of the future. Helmut Greger, CIO of the Charité, Humboldt University of Berlin Big Data will open up new opportunities for start-ups in medicine. Dr. Joachim Rautter, Managing Director of Peppermint VenturePartners GmbH The networking and sharing of information provides great potential for the entire healthcare industry, and particularly for patients. Professor Arno Elmer, CEO of gematik (Gesellschaft für Telematikanwendungen der Gesundheitskarte mbH)

248  Testimonials

We are merely at the beginning of the upheavals that Big Data will cause in the healthcare industry. We are currently still occupied with the question of how to best manage the provision of healthcare services. But soon the possibilities regarding the diagnosis and treatment of our patients will play a much more important role. This will show us how we, as a society, are able (or willing) to handle such possibilities. For example, Andrew McAfee, Director of the Center for Digital Business at MIT, predicts that even if computers are not yet the best at making diagnoses, they soon will be. Dr. med. Florian Schlehofer, MBAClustermanager Health Economy/Life Sciences, ZAB ZukunftsAgentur Brandenburg GmbH