210 52 641KB
English Pages 160 Year 2013
Smart Machines
SMART MACHINES IBM’s Watson and the Era of Cognitive Computing JOHN E. KELLY III
STEVE HAMM
Columbia University Press Publishers Since 1893 New York Chichester, West Sussex cup.columbia.edu Copyright © 2013 IBM Corporation All rights reserved Library of Congress Cataloging-in-Publication Data Kelly, John E., III (John Edward), 1954– IBM’s Watson and the era of cognitive computing / John E. Kelly III and Steve Hamm. pages cm Includes bibliographical references. ISBN 978-0-231-16856-4 (cloth : alk. paper) — ISBN 978-0-231-53727-8 (e-book) 1. Expert systems (computer science) 2. Artificial intelligence. 3. Watson (Computer) I. Hamm, Steve. II. Title. QA76.76.E95K434 2014 006.3'3—dc23 2013026154
Columbia University Press books are printed on permanent and durable acid-free paper. This book is printed on paper with recycled content. Printed in the United States of America c 10 9 8 7 6 5 4 3 2 1 Cover illustration: Courtesy of International Business Machine Corporation, © International Business Machines Corporation Cover design: Lisa Hamm References to websites (URLs) were accurate at the time of writing. Neither the author nor Columbia University Press is responsible for URLs that may have expired or changed since the manuscript was prepared.
Contents
Preface by John E. Kelly III vii
1 A New Era of Computing 1 2 Building Learning Systems 23 3 Handling Big Data 43 4 Augmenting Our Senses 69 5 Designing Data-centric Computers 87 6 Inventing a New Physics of Computing 101 7 Imagining the Cognitive City 117
Coda: An Alliance of Human and Machine 137 Notes 141
Preface
John E. Kelly III
W
hen I was a kid, my dad worked at General Electric’s R&D lab in Niskayuna, N.Y. I would visit and watch him work with vacuum tubes, which looked like lightbulbs and directed electrical current in all sorts of devices, from radios and TVs to radar and computers. At the time, I didn’t fully understand how vacuum tubes worked, but those visits inspired me to study science and, ultimately, to get two degrees in physics and my Ph.D. in materials engineering. I later came to realize that I had witnessed one of the important transitions in the history of technology. While my dad was showing me vacuum tubes, other engineers were experimenting with the vacuum tube’s successor, the transistor, which ultimately ushered in modern electronics and personal computing. Now we are at the dawn of a much bigger shift in the evolution of technology—a new era affecting nearly every aspect of the field. The changes that are coming over the next two decades will transform the way we live and work just as the computing revolution has transformed the
preface
human landscape over the past half century. At IBM, we call this the era of cognitive computing. I have had the pleasure of participating in the march of progress in computer science since I joined IBM in 1980 as an engineer. I’ve led organizations that contributed major breakthroughs in science and technology, including the company’s microelectronics division and, presently, IBM Research, one of the most extensive corporate scientificresearch organizations in the world, with 3,000 scientists and engineers at 12 labs in 10 countries. Today, society is faced with a host of opportunities and challenges that require a new generation of technologies and a rewriting of the rules of computing. The availability of huge amounts of data should help people to better understand complex situations. In reality, though, more data often lead to more confusion. We make too many decisions with irrelevant or incorrect information, or with data that represent only part of the picture. We need a new generation of tools—cognitive technologies—that help us to penetrate complexity and comprehend the world around us so that we can make better decisions and live more successfully and sustainably. Yet some of the techniques of computer science and engineering are reaching their limits. The technology industry must change the way it designs and uses computers and software if it is to continue to make progress in how people work and live. The vision of the future of information technology presented in this brief book is the result of a large group effort. Several years ago, I asked a handful of IBM scientists to look decades into the future and sketch out how 3 viii 3
preface
computing will change. Their work sparked discussions and debates among a wide range of IBMers, including the company’s top executives, which, ultimately, evolved into a shared belief about the future of computing. We want to expose our thoughts to others and engage in a conversation, so I asked Steve Hamm, a former journalist who is now a writer at IBM, to help lay out our vision in this book. Steve has been writing and editing stories about the tech industry for nearly twenty-five years and has witnessed the beginnings of several waves of innovation. For instance, in 1994 he visited the founding engineers of Mosaic Communications (later to be renamed Netscape) shortly after they set up shop in a tiny office on Castro Street in Mountain View, Calif. That was the dawn of the Internet era. Since then he has written hundreds of stories about innovation. In addition, he has authored books about the rise of the Indian tech industry, mobile computing, and the future of work. With that kind of deep experience, Steve’s a natural partner for me in this project. He interviewed dozens of IBM’s scientists and engineers plus numerous outside experts to flesh out the vision and shape the narrative. The creation of this new era of computing is a monumental endeavor, and, while IBM possesses a vision of the future and a broad portfolio of expertise, no company can take on this sort of challenge alone. So we look to our clients, university researchers, government policy makers, industry partners, and entrepreneurs—indeed, the entire tech industry—to take this journey with us. Already, other companies, such as Google and Microsoft, are carving out 3 ix 3
preface
roles in the era of cognitive computing. The European Union, with its Human Brain Project, is coordinating cognitive scientists worldwide in an effort to map the inner workings of the brain and mimic them in computing. Among the leading universities in the field are MIT, Carnegie Mellon, and Stanford. Many other organizations are also in the game. With this book, which is aimed at a broad audience rather than just the technical community, we hope to greatly expand the search for answers by stimulating new thinking within industry, government, and academia. And, just as importantly, we hope to inspire university and high school students to pursue careers in science, technology, engineering, and mathematics. Together, we can drive the exploration and invention that will shape society, the economy, and business for the next fifty years.
3 x 3
Smart Machines
1 A New Era of Computing o3O3o
I
BM’s Watson computer created a sensation when it bested two past grand champions on the TV quiz show Jeopardy! Tens of millions of people suddenly understood how “smart” a computer could be. This was no mere parlor trick; the scientists who designed Watson built upon decades of research in the fields of artificial intelligence and natural-language processing and produced a series of breakthroughs. Their ingenuity made it possible for a system to excel at a game that requires both encyclopedic knowledge and lightning-quick recall. In preparation for the match, the machine ingested millions of pages of information. On the TV show, first broadcast in February 2011, the system was able to search that vast storehouse in response to questions, size up its confidence level, and, when sufficiently confident, beat the humans to the buzzer. After more than five years of intense research and development, a core team of about twenty scientists had made a very public breakthrough. They demonstrated that a computing system—using traditional strengths
A New Era of computing
and overcoming assumed limitations—could beat expert humans in a complex question-and-answer competition using natural language. Now IBM scientists and software engineers are busy improving the Watson technology so it can take on much bigger and more useful tasks. The Jeopardy! challenge was relatively limited in scope. It was bound by the rules of the game and the fact that all the information Watson required could be expressed in words on a page. In the future, Watson will take on more open-ended problems. It will ultimately be able to interpret images, numbers, voices, and sensory information. It will participate in dialogue with human beings aimed at navigating vast quantities of information to solve extremely complicated yet common problems. The goal is to transform the way humans get things done, from health care and education to financial services and government. One of the next challenges for Watson is to help doctors diagnose diseases and assess the best treatments for individual patients. IBM is working with physicians at Cleveland Clinic and Memorial Sloan-Kettering Cancer Center in New York to train Watson for this new role. The idea is not to prove that Watson could do the work of a doctor but to make Watson a useful aid to a physician. The Jeopardy! challenge pitted man against machine; with Watson and medicine, man and machine are taking on a challenge together—and going beyond what either could do on its own. It’s impossible for even the most accomplished doctors to keep up with the explosion of new knowledge in their fields. Watson can keep up to date, 3 2 3
A New Era of computing
though, and provide doctors with the information they need. Diseases can be freakishly complicated, and they express themselves differently in each individual. Within the human genome, there are billions of combinations of variables that can figure in the course of a disease. So it’s no wonder that an estimated 15 to 20 percent of medical diagnoses are inaccurate or incomplete.1 Doctors know a lot about diseases and the practice of medicine. What they need help with is using evidence-based medicine to better evaluate and treat individuals. Dr. Larry Norton, a world-renowned oncologist at Memorial Sloan-Kettering Cancer Center who is helping to train Watson, believes the computer will be able to synthesize encyclopedic medical and patient information to help physicians more quickly and easily identify treatment options for complex health conditions. “This is more than a machine,” Larry says. “Computer science is going to evolve rapidly and medicine will evolve with it. This is coevolution. We’ll help each other.”2
The Coming Era of Cognitive Computing Watson’s potential to help with health care is just one of the possibilities opening up for next-generation technologies. Scientists at IBM and elsewhere are pushing the boundaries of science and technology fields ranging from nanotechnology to artificial intelligence with the goal of creating machines that do much more than calculate and organize and find patterns in data—they sense, 3 3 3
A New Era of computing
learn, reason and interact naturally with people in powerful new ways. Watson’s exploits on TV were one of the first steps into a new phase in the evolution of information technology—the era of cognitive computing. During this era, humans and machines will become more interconnected. Thomas Malone, director of the MIT Center for Collective Intelligence, says a big question for researchers as the era of cognitive computing unfolds is: How can people and computers be connected so that collectively they act more intelligently than any person, group, or computer has ever done before?3 This avenue of thought stretches back to the computing pioneer J. C. R. Licklider, who led the U.S. government project that evolved into the Internet. In 1960 he authored a paper, “Man-Computer Symbiosis,” where he predicted that “in not too many years, human brains and computing machines will be coupled together very tightly and the resulting partnership will think as no human brain has ever thought and process data in a way not approached by the information-handling machines we know today.”4 That time is fast approaching. The new era of computing is not just an opportunity for society; it’s also a necessity. Only with the help of smart machines will we be able to deal adequately with the exploding complexity of today’s world and successfully address interlocking problems like disease and poverty and stress on natural systems. Computers today are brilliant idiots. They have tremendous capacities for storing information and performing numerical calculations—far superior to those of any human. Yet when it comes to 3 4 3
A New Era of computing
another class of skills, the capacities for understanding, learning, adapting, and interacting, computers are woefully inferior to humans; there are many situations where computers can’t do a lot to help us. Up until now, that hasn’t mattered much. Over the past sixty-plus years, computers have transformed the world by automating defined tasks and processes that can be codified in software programs in series of procedural “if A, then B” statements—expressing logic or mathematical equations. Faced with more complex tasks or changes in tasks, software programmers add to or modify the steps in the operations they want the machine to perform. This model of computing—in which every step and scenario is determined in advance by a person—can’t keep up with the world’s evolving social and business dynamics or deliver on its potential. The emergence of social networking, sensor networks, and huge storehouses of business, scientific, and government records creates an abundance of information tech-industry insiders call “big data.” Think of it as a parallel universe to the world of people, places, things, and their interrelationships. This digital universe is growing at about 60 percent each year.5 The volume of data creates the potential for people to understand the environment around us with a depth and clarity that was simply not possible before. Governments and businesses struggle to come to grips with complex situations, such as the inner workings of a city or the behavior of global financial markets. In the cognitive era, using the new tools of decision science, we will be able to apply new kinds of computing power to huge amounts of data 3 5 3
A New Era of computing
and achieve deeper insight into how things really work. Armed with those insights, we can develop strategies and design systems for achieving the best outcomes—taking into account the effects of the variable and the unknowable. Think of big data as a natural resource waiting to be mined. And in order to tap this vast resource, we need computers that “think” and interact more like we do. The human brain evolved over millions of years to become a remarkable instrument of cognition. We are capable of sorting through multitudes of sensory impressions in the blink of an eye. For instance, faced with the chaotic scene of a busy intersection, we’re able to instantly identify people, vehicles, buildings, streets, and sidewalks and see how they relate to one another. We can recognize and greet a friend we haven’t seen for ten years even while sensing and prioritizing the need to avoid stepping in front of a moving bus. Today’s computers can’t do that. With the exception of robots, tomorrow’s computers won’t need to navigate in the world the way humans do. But to help us think better they will need the underlying humanlike characteristics—learning, adapting, interacting, and some form of understanding—that make human navigation possible. New cognitive systems will extract insights from data sources that are almost totally opaque today, such as population-wide health-care records, or from new sources of information, such as sensors monitoring pollution in delicate marine environments. Such systems will still sometimes be programmed by people using “if A, then B” logic, but programmers won’t have to anticipate every procedure and every rule. Instead, 3 6 3
A New Era of computing
computers will be equipped with interpretive capabilities that will let them learn from the data and adapt over time as they gain new knowledge or as the demands on them change. The goal isn’t to replicate human brains, though. This isn’t about replacing human thinking with machine thinking. Rather, in the era of cognitive systems, humans and machines will collaborate to produce better results, each bringing their own superior skills to the partnership. The machines will be more rational and analytic—and, of course, possess encyclopedic memories and tremendous computational abilities. People will provide expertise, judgment, intuition, empathy, a moral compass, and human creativity. To understand what’s different about this new era, it helps to compare it to the two previous eras in the evolution of information technology. The tabulating era began in the nineteenth century and continued into the 1940s. Mechanical tabulating machines automated the process of recording numbers and making calculations. They were essentially elaborate mechanical abacuses. People used them to organize data and make calculations that were helpful in everything from conducting a national population census to tracking the performance of a company’s sales force. The programmable computing era—today’s technologies—emerged in the 1940s. Programmable machines are still based on a design laid out by the Hungarian American mathematician John von Neumann. Electronic devices governed by software programs perform calculations, execute logical sequences of steps, and 3 7 3
A New Era of computing
store information using millions of zeros and ones. Scientists built the first such computers for use in decrypting encoded messages in wartime. Successive generations of computing technology have enabled everything from space exploration to global manufacturing-supply chains to the Internet. Tomorrow’s cognitive systems will be fundamentally different from the machines that preceded them. While traditional computers must be programmed by humans to perform specific tasks, cognitive systems will learn from their interactions with data and humans and be able to, in a sense, program themselves to perform new tasks. Traditional computers are designed to calculate rapidly; cognitive systems will be designed to draw inferences from data and pursue the objectives they were given. Traditional computers have only rudimentary sensing capabilities, such as license-plate-reading systems on toll roads. Cognitive systems will augment our hearing, sight, taste, smell, and touch. In the programmable-computing era, people have to adapt to the way computers work. In the cognitive era, computers will adapt to people. They’ll interact with us in ways that are natural to us. Von Neumann’s architecture has persisted for such a long time because it provides a powerful means of performing many computing tasks. His scheme called for the processing of data via calculations and the application of logic in a central processing unit. Today, the CPU is a microprocessor, a stamp-sized sliver of silicon and metal that’s the brains of everything from smartphones and laptops to the largest mainframe computers. Other major 3 8 3
A New Era of computing
components of the von Neumann design are the memory, where data are stored in the computer while waiting to be processed, and the technologies that bring data into the system or push it out. These components are connected to the central processing unit via a “bus”—essentially a highway for data. Most of the software programs written for today’s computers are based on this architecture. But the design has a flaw that makes it inefficient: the von Neumann bottleneck. Each element of the process requires multiple steps where data and instructions are moved back and forth between memory and the CPU. That requires a tremendous amount of data movement and processing. It also means that discrete processing tasks have to be completed linearly, one at a time. While we have introduced some parallelism, it’s not enough. For decades, computer scientists have been able to rapidly increase the capabilities of CPUs by making them smaller and faster. But we’re reaching the limits of our ability to make those gains at a time when we need even more computing power to deal with complexity and big data. And that’s putting unbearable demands on today’s computing technologies—mainly because today’s computers require so much energy to perform their work. What’s needed is a new architecture for computing, one that takes more inspiration from the human brain. Data processing should be distributed throughout the computing system rather than concentrated in a CPU. The processing and the memory should be closely integrated to reduce the shuttling of data and instructions back and forth. And discrete processing tasks should be executed 3 9 3
A New Era of computing
simultaneously rather than serially. A cognitive computer employing these systems will respond to inquiries more quickly than today’s computers; less data movement will be required and less energy will be used. Today’s von Neumann–style computing won’t go away when cognitive systems come online. New chip and computing technologies will extend its life far into the future. In many cases, the cognitive architecture and the von Neumann architecture will be employed side by side in hybrid systems. Traditional computing will become ever more capable while cognitive technologies will do things that were not possible before. Already, cloud, social networking, mobile, and new ways to interact with computing from tablets to glasses are fueling the desire for cognitive systems that will, for example, both harvest insights from social networks and enhance our experiences within them. Should we fear the cognitive machines? MIT professors Erik Brynjolfsson and Andrew McAfee warn in their book, Race Against the Machine, that one of the side effects of this generation of advances in computing is they are coming at the expense of existing jobs. We believe, though, that the most important effect of these technologies will be in assisting people to do what they are unable to do today, vastly expanding the problems we can solve and creating new spheres of innovation for every industry. And like previous eras of computing, this will take a tremendous amount of innovation over decades. “These new capabilities will affect everything. It will be like the discovery of DNA,” predicts Ralph Gomory, a pioneer of 3 10 3
A New Era of computing
applied mathematics who was director of IBM Research in the 1970s and 1980s and later head of the Alfred P. Sloan Foundation.6
How Cognitive Systems Will Help Us Be Smarter As smart as human beings are, there are many things that we can’t do or simply can’t process in time to affect the outcome of a situation. Cognitive systems in many cases help us overcome our limitations. Complexity
We have difficulty rapidly processing large amounts of information. We also have problems understanding the interactions among elements of large systems, such as the interplay of chemical compounds in the human body or the dynamics of financial markets. With cognitive computing, we will be able to harvest insights from huge quantities of data to handle complex situations, make more accurate predictions about the future, and better anticipate the unintended consequences of actions. City mayors, for instance, already can begin to make sense of the interrelationships among urban subsystems— everything from electrical grids to weather to subways to demographic trends to issues reported or expressed by citizens. One example is monitoring social media during a major storm to spot patterns of words and images that 3 11 3
A New Era of computing
indicate critical problems in particular neighborhoods. Much of this information will come from sensors—video cameras, instruments that detect motion, and devices that spot anomalies. Mobile phones will also be used as anonymized sensors that help city planners understand the movements of people and accurately predict the effects and financial impact of various actions. Expertise
With the help of cognitive systems, we will be able to see the big picture and make better decisions. This is especially important when experience in an area is limited or we’re trying to address problems that cut across professional or practical domains. For instance, police are beginning to gather crime statistics and combine them with information about demographics, events, building blueprints, and weather to produce better analysis and safer cities. Armed with abundant data, police chiefs can set strategies and deploy resources more effectively—even predicting where and when crimes are likely to happen. Patrol officers benefit by gaining a wealth of information about unfamiliar locations; situational intelligence will be extremely useful when they knock on someone’s door. The ability to achieve such comprehensive understanding of situations at every level will be an essential tool for managing a city and will become one of the most important factors in the economic growth and competitiveness of cities.
3 12 3
A New Era of computing
Objectivity
We all possess biases based on personal experience, professional background, and intuition about what works and what doesn’t, as well as the influence of group dynamics. Cognitive systems can make it possible for us to be more objective in our decision making. Corporations will become socially networked businesses made up of people and systems in collaboration. Sophisticated analytic engines will understand how an organization works, its competitive environment, capabilities, resources, and ecosystem of partners. Professionals will experience data in new ways and will be prompted to consider a number of hypotheses based on inferences and scenarios constructed by a system using relevant contextual data. The system will make hard-to-spot connections and help guide individuals in achieving business goals. Imagination
Professional background, training, and experience make it difficult to envision dramatically different settings and ranges of choice. Cognitive systems will help us discover and explore new and contrarian ideas. Today, it takes ten to fifteen years and up to $1 billion to bring a single new drug to market. The research-anddevelopment teams of pharmaceutical companies are beginning to use cognitive systems to explore existing biological information in new ways and to discover new
3 13 3
A New Era of computing
treatments utilizing compounds that have already been proven safe in other applications. Furthermore, cognitive systems can spot hidden patterns in the text of thousands of published articles to help identify new opportunities for drug treatments. Using computer modeling and simulation, researchers can run experiments to validate these findings “in silico” much faster than can be done in a life sciences “wet” lab. With the aid of cognitive machines, researchers and engineers will be able to explore millions of combinations in ways that could shave years and hundreds of millions of dollars from the development process. Senses
We can only take in and make sense of so much raw, physical information. With cognitive systems, computer sensors combined with analytics software will vastly extend our ability to gather and process such information. Imagine a world where individuals carry their own personal cognitive system in a handheld device. These personal cognitive assistants would carry on conversations with us and acquire knowledge about us, in part, from observing what we see, say, touch, and type so they can better anticipate our wishes. In addition, the assistant might be able to use sophisticated sensing to monitor threats to a person’s well-being. If there’s carbon monoxide in a room, for example, the device might be able to alert its user. Over time, humans have evolved to be more successful as a species. We continually adapt to overcome our 3 14 3
A New Era of computing
limitations. This partnership with computers is simply the latest step in a long process of adaptation. Just as the personal computer, the Internet, mobile communications, and social networking have given rise to tens of thousands of software applications, Web services, and smartphone apps, the cognitive era will produce a similar explosion of creativity. Think of the coming technologies as cognitive apps. At the enterprise level, corporations might use these apps for handling mergers and acquisitions, crisis management, competitive analysis, and product design. The technologies already exist to assist a team within a company that’s in charge of sizing up acquisition candidates using a cognitive M&A app. The team can augment its knowledge of potential targets, many of which are private companies, by gaining a deeper understanding of data, including transactions and partnerships in the public domain, intellectual property filings, and insights from Web sharing and other social channels. The cognitive software maps the links and the nature of the interactions. The M&A team will also track the performance of previously acquired companies. Those insights, constantly updated and integrated in a learning system, will help the team identify risks and synergies and become smarter with each acquisition. Moreover, such apps delivered from the cloud will help individuals tackle highly complex though common decisions, for example, selecting a college, making investments, choosing among insurance options, and purchasing a car or home. Say you’re starting a family and you and your spouse want to move out of your tiny 3 15 3
A New Era of computing
apartment into a house in a neighborhood or town with good public schools. The cognitive house-hunting app will start by understanding your needs and sizing up places to live, school systems, commuting factors, and costs of living. The app will even tap into social networks to discover hidden truths, such as the fact that the property a house was built on was once part of landfill. You will be able to conduct an ongoing conversation with the app as you learn more about what’s available and the app learns more about you and the things you care about. Then, when you begin shopping for houses and mortgages, the app will guide you through the thicket of information and considerations that will inform your decisions. And, finally, when the two of you are standing in the foyer of your dream house wondering if you really can afford it, and you’re under pressure to make a bid immediately, you can consult your trusted app for advice on one of the most critical decisions you will ever make.
Technology Breakthroughs: Opportunities and Necessities Much of the progress in science and technology comes in small increments. Scientists and engineers build on top of the innovations that came before. Consider the tablet computer. The first such devices appeared on the scene back in the 1980s. They had large, touch-sensitive screens but weighed nearly five pounds and were an inch and a half thick. They were more like bricks than books, and about all 3 16 3
A New Era of computing
you could do with them was scrawl brief memos and fill out forms. After thirty years of gradual improvements, we have slim, light, powerful tablets that combine the features of a telephone, a personal computer, a television, and more. There’s nothing wrong with incremental innovation. It’s absolutely necessary, and, sometimes, its results are both delightful and transformational. A prime example is the iPhone. With its superior navigation and abundance of easy-to-use applications, this breakthrough product spawned a burst of smartphone innovation, which combined with the social-networking phenomenon to produce a revolutionary shift in global human behavior. Yet, technologically, the iPhone was built on top of many smartphone advances that preceded it. New waves of progress, however, require majorly disruptive innovations—things like the transistor, the microchip, and the first programmable computers. These are the advances that fundamentally change our world. Today, many of the core technologies that provide the basic functions for traditional computers are mature; they have been in use for decades. In some cases, each wave of improvements is less profound than the wave that preceded it. We’re reaching the point of diminishing returns. Yet the demands on computing technology are growing exponentially. Soon incremental innovation will no longer be sufficient. People who demand the most from computers are already running into the limits of today’s circuitry. Michel McCoy, director of the Simulation and Computing Program at the U.S. Lawrence Livermore National Laboratory, 3 17 3
A New Era of computing
is among those calling for a nationwide initiative involving national laboratories and businesses to come up with radical new approaches to microprocessor and computersystem design and software programming. “In a sense, everything we’ve done up until this point has been easy,” he says. “Now we have reached a physics-dominated threshold in the design of microprocessors and computing systems which, unless we do something about it, is essentially going to stagnate progress.”7 We need more radical innovations. In the years ahead, a number of fundamental advances in science and technology will be required to make progress. In order to appreciate the scope and breadth of these innovations, we can conceptualize a cognitive system as one of those colorful Russian wooden dolls where each of the smaller dolls nests inside a slightly larger one. The top layer is the way we interact with computers and get them to do what we want. The big innovation at this outer layer is “learning systems,” which we will explore more deeply in chapter 2. The goal is to create machines that do not require as much programming by humans. Instead they’ll be “taught” by people, who will set objectives for them. As they learn, the machines will work out the details of how to meet those goals. The next layer represents how we organize and interpret data, which we’ll discuss in chapter 3. Today’s databases do an excellent job of organizing information in columns and rows. Tomorrow’s are being designed to manage huge volumes of different kinds of data, place information in context, and crunch data in real time. 3 18 3
A New Era of computing
The next layer represents the architecture or design of systems—how we fit together all the physical components that make up a computer. The challenge here, which we address in chapter 5, is creating data-centric computers. The designers of computing systems have long treated logic and memory as separate elements. Now, they will meld the components together, first, on circuit boards and, later, on single microchips. Also, they’ll move the processing to the data, rather than visa versa. Finally, in the innermost layer is nanotechnology employed in building core components of a cognitive system, where we manipulate matter at the molecular and atomic scale. In chapter 6, we’ll explore what it will take to invent a new physics of computing. To overcome the limits of today’s microchip technology, scientists must shift to new nanomaterials and new approaches to switching from one digital state to another. Possibilities include harnessing quantum mechanics or chips driven by “synapses and neurons” for data processing.
A New Culture of Innovation We’re still in the early stages of the emergence of this new era of computing. Progress will require a willingness to make big bets, take a long-term view, and engage in open collaboration. We’ll explore the elements of the culture of innovation in each of the subsequent chapters in what we call the journeys of discovery. An absolutely critical aspect of the culture of innovation will be the ambition and 3 19 3
A New Era of computing
capabilities of the inventors themselves. For rapid progress to be made in the new era of computing, young people must be inspired to become scientists, and they must be educated by teachers using superior tools and techniques. They have to be rewarded and given opportunities to challenge everything we think we know about how the world works. It requires dedication and investment by all of society’s institutions, including families, local communities, governments, universities, and businesses. When we ask scientists at IBM Research what motivates them, the answer is often that they want to change the world—not in minuscule increments but in great leaps forward. Mark Ritter, a senior manager in IBM Research’s Physical Sciences Department, leads an effort, inspired by the human brain, to rethink the entire architecture of computing for the era of cognitive systems. As a child, Mark, whose father was a plumber, had an intense curiosity about how things work on a fundamental level. It was his good fortune that his grandparents, who lived near his family in Grinnell, Iowa, had two neighbors who were physics professors at Grinnell College. One of the physicists, whom Mark pestered with science questions while the neighbor repaired his VW in the driveway, lent Mark a book on particle physics when he was about twelve years old. As a teenager, Mark bicycled over to the campus to attend physics lectures. He built a simple gas laser in the basement of his home. It was the beginning of decades of inquiry into how things work and how they can work better. A few years ago, after more than twenty years in IBM Research, Mark and his colleagues recognized that 3 20 3
A New Era of computing
the computing model designed for mid-twentieth-century demands was running out of gas. “This is the most exciting time in my career,” Mark says. “The old ways of doing things aren’t going to solve efficiently the big, real-world problems we face.” For his part, Dr. Larry Norton of Memorial SloanKettering Cancer Center is driven to transform the way medicine is practiced. Ever since he can remember, he was motivated by the desire to do something with his life that would improve the world. Born in 1947, he grew up at a time when people saw science as a powerful means of solving humanity’s problems. He recalls a thoughtcrystallizing experience when he was an undergraduate at the University of Rochester. He lived in a dorm where students often gathered in the mornings for freewheeling discussions of politics, values, and ethics. He was already contemplating a career in medicine, and the topic that day was, if you were a doctor and had done everything medical science could offer to save a patient but she died anyway, how would you feel? The students were split. “I realized I would feel terrible about it,” he says. “Offering everything available isn’t enough. I should have done better. And since, because of limitations in the world’s knowledge, I couldn’t do better, I should be involved in moving things forward.” During his forty-year career, Larry has been an innovator in cancer treatment. Among his contributions is the central role he played in developing the Norton-Simon hypothesis, a once-revolutionary but now widely used approach to chemotherapy. In Larry’s years as a clinician, 3 21 3
A New Era of computing
he has saved the lives of many patients, but, of course, some of his patients have died. Those deaths haunt him. He believes that he owes it to those people and to their children to improve the treatment of cancer and, ultimately, to help eradicate the disease. He sees his work with Watson as another way to contribute. Through a machine, he can share his knowledge and expertise with other physicians. He can help save cancer victims from a distance—people he has never met. You will meet a host of innovators in this book. There’s Dharmendra Modha, the IBM Research manager who is leading a team of IBM and university scientists in a quest to mimic the human brain. And there’s Murali Ramanathan, a professor of pharmaceutical sciences and neurology at State University of New York, Buffalo, who is studying the role of genetic and environmental factors in multiple sclerosis. These innovators are remarkable people. We depend on them to produce the surprising advances that knock the world off kilter and, ultimately, have the potential to make it a better place. We will need many of them to make the transition to the era of cognitive systems. In the end, this era is not about machines but about the people who design and use them.
3 22 3
2 Building Learning Systems o3O3o
I
n November 2009, when IBM’s Watson was under development for its showdown on Jeopardy!, the machine made one laughable mistake after another in test matches. In one particularly funny instance, it was prompted to identify what the “Al” in the company name Alcoa stands for. It fired back, “What is Al Capone?” Everybody in the room cracked up. The machine had confused the first two letters of aluminum with the name of a famous gangster.1 No harm was done. One of the main purposes of these training sessions was for Watson to make mistakes so that the scientists on the technology development team could tweak the algorithms used by the computer and prevent the same mistakes from happening again. It was a painstaking, manual process. But there was no way that the scientists could anticipate every dilemma that Watson might be confronted with and write a rule to deal with it, so, as they developed Watson, they realized they would need to invent a system that would enable Watson to learn on its own.
building learning systems
Their invention represented a major advance in the science of machine learning, a branch of artificial intelligence that focuses on building systems that learn from data. The field was first defined in 1959 by IBM scientist Arthur Samuels and has found plenty of uses over the years—including common applications such as optical character recognition and e-mail spam filters. Such systems are trained to recognize repeated patterns in words or shapes and to react in a certain way when they encounter them again. Watson takes machine learning to a new level. In creating the technology for Watson, called DeepQA, which includes the learning capability, the developers provided the machine with a large corpus of unstructured information and the algorithms to extract knowledge from it. They trained the machine to identify categories of information, such as famous people, locations, and relationships, and also to parse language. Then they created a set of statistical methods for learning how words are used in different contexts. This combination of techniques allows Watson to learn from the data rather than just follow detailed instructions. In a sense, Watson learns the way people learn, by experiencing a lot of things and drawing inferences and lessons from those encounters. Watson’s learning abilities represent a major step forward in the new era of computing. Today, tech companies, corporations, and governments employ armies of software programmers to create specific applications to accomplish specific goals. Those programmers write millions of lines of detailed instructions telling computers what to do, 3 24 3
building learning systems
following the linear and deterministic approach charted out by von Neumann at the dawn of the programmable computing era. But this approach simply isn’t sufficient anymore. We need to design and build a new class of computers that use Watson-style learning systems in order to help us deal with the complexities of the modern world. These systems will be able to learn from both structured and unstructured data, discover important correlations, create hypotheses for those correlations, and suggest actions that produce better outcomes. In addition, they’ll have feedback loops where they measure the results of their work and modify the way they do things based on that feedback. Once people design and program them and train them, they’ll do the rest—essentially reprogramming themselves as they interact and learn. We’re in the early stages of the evolution of learning systems, but it’s possible to anticipate how things will develop—if not the exact timing and details of the technology. The first stage is static learning. The version of the Watson system that played on Jeopardy! is in this category, according to IBM Research executive Dario Gil, who headed a team that mapped out the progression. At this point, it’s still necessary to assemble teams of technology and domain experts to provide a learning system with the rules and information it needs to perform a certain welldefined task—then train it to use the information. One example is the customer-service call center. Today, when consumers call companies for help with technology problems, representatives read from scripts and search rigid 3 25 3
building learning systems
knowledge-management systems for answers. The process is too often frustrating for the consumer and expensive for the company. Now, companies are training Watson to magnify the capabilities of their representatives. The systems will ingest vast amounts of information that’s useful in solving a specific set of problems, say, glitches involving smart phones. The cognitive system will learn from interactions, steadily improving the advice the representatives give to consumers. In addition, there will be a self-service version of Watson that helps consumers with their problems and concerns directly. In a second stage, dynamic learning, machines will continually mine information from a variety of sources (e.g., textual, numerical, visual, sensory). Also, scientists will create a variety of modes that the machines can use to communicate, including spoken conversations. Dynamic learning systems will expand their hypothesis generation across multiple domains that frequently intersect with one another, such as economics, business, and social sciences. How “smart” will learning systems get? It’s possible that they’ll be able to achieve a deep comprehension of natural language, images, and other sensory input. Already, programs exist that make it possible for machines to engage in scientific discovery, and, at the far end of the arc of time and imagination, they’ll likely be able to take on complex problems that they identify themselves.2 They’ll read a host of scientific, social science, and technical publications; “watch” TV shows and read news reports; and perhaps even monitor environmental shifts like climate change. Years from now, when research scientists stroll 3 26 3
building learning systems
into their offices they will greet systems that have access to vast knowledge resources and can use them to come up with new hypotheses.
Journey of Discovery: The Path to Watson The Watson project got its start in a surprising way. In the fall of 2004, IBM’s head of computing systems software, Charles Lickel, traveled from his home in Tucson to spend the day with a small team he managed at an IBM facility in Poughkeepsie, New York. At the end of the workday, the team gathered at the nearby Sapore Steakhouse for dinner. They were bemused when, at seven p.m. sharp, many of the diners abruptly got up from their tables, rushed into the bar, and clustered excitedly around the TVs. One of Charles’s guys explained that they were watching long-time champion Ken Jennings defend his title on Jeopardy!3 Charles hadn’t followed Jeopardy! for years, but the scene made an impression on him. A few months later, research director Paul Horn asked his lieutenants to think up a high-profile project that the lab could take on that would demonstrate IBM’s scientific and technological prowess. The company calls these its “grand challenges.” The previous grand challenge had been a huge success: IBM’s Deep Blue computer had beaten the world’s top chess grand master in a highly publicized match in the mid-1990s. But a lot of time had passed since that victory. 3 27 3
building learning systems
During one of the brainstorming sessions aimed at picking the company’s next grand challenge, Charles suggested building a computer that could compete on Jeopardy! IBM has long used man-versus-machine games to motivate scientists, focus research, and engage the public. In the early 1960s, IBM researcher Arthur Samuel, the AI pioneer, created one of the first computer programs capable of learning when he wrote a checkers-playing program designed to run on the 701, IBM’s first commercial computer. Samuel challenged one of the top U.S. checkers champions to a match—and won. IBM researcher Gerry Tesauro in the late 1980s developed a program called TDGammon, which used a technique called temporal difference learning to teach itself how to play backgammon. It was competitive in matches with some of the world’s top backgammon players.4 Later, IBM turned the contests into true spectator sports. The 1997 match between Deep Blue and the reigning world chess champion, Garry Kasparov, was streamed live on a website, and millions of chess enthusiasts watched every move. The Deep Blue program didn’t require much learning since chess is such a highly structured game, but, according to Murray Campbell, one of the researchers on the Deep Blue team, the match demonstrated that a combination of clever engineering and sophisticated algorithms could rival the performance of a top human expert in a specific domain of human achievement. “It made everybody understand in a clear way that problems previously considered too hard for a computer could now be tackled,” Murray says.5 3 28 3
building learning systems
Of course, while IBM has played an important role in the evolution of artificial intelligence and learning systems, many other scientists and organizations have made major contributions. John McCarthy, a professor at Stanford University, coined the term “artificial intelligence” in 1955, and Marvin Minsky, a professor at MIT, has produced a long series of advances in the field since the 1950s. He’s now focusing on giving machines the ability to perform humanlike commonsense reasoning.6 Today, Andrew Ng of Stanford is leading a team of scientists in an attempt to create algorithms that can learn based on principles that the brain might also employ. The field of artificial intelligence has advanced in starts and stops. Periods of soaring optimism have been followed by so-called AI winters, when seemingly promising avenues of research failed to produce the anticipated results. Put simply, this is hard stuff. So it’s no surprise that when Charles Lickel proposed Jeopardy! as the next grand challenge, his suggestion was met initially with reactions ranging from skepticism to outright derision. But he quickly won Paul Horn’s support. Paul thought the project could be very exciting—both to computer scientists and the public at large.7 In mid-2006 Charles gave the goahead to researcher David Ferrucci, who was an enthusiastic evangelist for the project, to explore whether building a machine that could win on Jeopardy! was even plausible. Progress was extremely slow at first. For several years, Dave had been leading a group of researchers who had produced good results with question-answering technologies. However, their performance gains had plateaued. So 3 29 3
building learning systems
Dave and a small team cast around for the right technology and strategy with which to approach the Jeopardy! challenge. By mid-2007, though, Dave was convinced the job could be done, and Charles and Paul gave the project their blessing.8 However, there was a danger that the project would never actually be realized as Paul retired from IBM soon after and became senior vice provost for research at New York University. His replacement as head of IBM Research was one of the authors of this book, John Kelly. Dave was terrified that John would kill the project, and his fears were not assuaged when, at their first meeting, John expressed deep reservations. At the time, the system, then called Blue J, was only able to answer 30 percent of previous Jeopardy! questions accurately. John told them: “Guys, we can’t put the IBM brand on TV with 30 percent accuracy.” Some of the team members were discouraged. But Dave told them that he was convinced they could achieve their goals. He said he was so committed to the project that he was willing to risk failure and public humiliation. And he asked each of the twelve scientists who were then on the team to fully commit themselves, as well. “Are you ready for this?” he asked.9 They were. Over the following weeks and months they gradually improved the technology to the point where, even though the gap was still huge, John was convinced they could build a machine that could compete at the highest levels on Jeopardy! He saw that the effort could boost IBM’s reputation as an innovator and that the science would be transformative, opening up massive new 3 30 3
building learning systems
opportunities for computing. He said: “I’ll give you whatever you need in the way of resources, but, if we’re going to put the IBM brand on national TV, we must win. We must win.” That launched the intense effort that led ultimately to Watson’s victory. On January 14, 2011, the Jeopardy! contest was conducted on stage in the IBM Research auditorium. Top IBM executives, research scientists, and guests packed the room. Emotions ran high. John told the audience that he did not know if Watson would win, but he believed that the contest represented an important moment in the history of computing. He said, “It’s not a matter of if a computer will one day win at Jeopardy!, but when.” John was extremely nervous as the games began. He prayed that the system wouldn’t crash and that Watson wouldn’t make any embarrassing mistakes. During the final game, when it became clear to him that Watson would emerge victorious, he glanced across the room and caught Dave Ferrucci’s eye. Both smiled. They knew it was done. They had bet big and won.
The Next Steps for Watson Watson was not designed just for playing Jeopardy! From the start, the goal was to create a technology platform that could be adapted to a wide variety of uses—a practical tool with the potential to transform business and society. Rather than trying to create a monolithic set of rules for analyzing data, the development team used many simpler 3 31 3
building learning systems
algorithms that could be added to or mixed and matched depending on the task before Watson. They created an analytics program for weighing and evaluating evidence and conclusions, no matter what domain. And they made it possible to have experts in particular domains to contribute knowledge to the program. The result: “We created an architecture of discovery,” says Eric Brown, who heads IBM Research’s Watson team.10 Today, IBM is adapting Watson to a number of industries. A team of software developers, with considerable help from experts in various domains, is crafting Watson solutions for health care and other industries. They’re also creating applications for cross-industry uses, such as customer-service management. WellPoint, one of the nation’s largest health-benefits providers, worked with IBM to produce capability aimed at improving the efficiency and effectiveness of health-care payer operations—focused initially on the process of treatment preapproval management. To expand Watson’s capabilities, nurse clinicians trained Watson to understand the U.S. medical treatment code system and WellPoint’s medical policies, clinical guidelines, and process for reviewing members’ requests for treatment authorization. As of early 2013, Watson had absorbed more than 25,000 test cases and 1,500 of WellPoint’s real-life cases and gained the ability to interpret the meaning of queries and analyze them. Then WellPoint and IBM co-developed two integrated applications, one for use by payers, such as WellPoint’s health plan subsidiaries, and the other for use by medical-care providers. In one continuous process, physicians can 3 32 3
building learning systems
recommend treatments for specific patients and submit those recommendations to health-care payers, who can quickly approve the treatments. The system is now being tested by WellPoint’s health plans and several hundred health-care providers.11 A major cancer research center in the southwest is working with IBM to use Watson technologies as a medicalresearch application to help its physicians choose the best treatments for patients who have unusual conditions. There’s a fairly large population of cancer patients for whom reliable treatments have not yet been found. Here, researchers for the cancer center are collaborating with IBM engineers to feed into Watson vast quantities of disease, treatment, and outcome data and analyze it to find hidden patterns and correlations. Do people with a combination of genetic dispositions and health problems react better to a certain treatment? Would a drug that’s used now for one cancer be useful for another? Or would a combination of drugs help out? The hope is that the system will give physicians a huge head start in identifying effective treatments, testing them in clinical trials, publishing the results, and quickly getting solutions into the hands of physicians all over the world. ANZ, a financial-services firm headquartered in Australia and with operations in thirty-three countries, is working exclusively with IBM to explore the possibilities of using the Watson technology in the financial-services industry. The goal is for ANZ to provide its wealth managers with a cognitive assistant to help them better advise the bank’s 2 million wealth-management clients. Here’s 3 33 3
building learning systems
a scenario the bank has worked up: Imagine that you’re talking to your advisor about retirement planning. Amid all of the uncertainty in today’s markets, you’re not sure how to coordinate your investments, retirement savings, and insurance coverage in a way that will result in a secure, well-funded retirement. The advisor, aided by Watson, will be able to quickly develop a custom-tailored plan to fit your needs.12 While this work is going on, the IBM Research team is developing new capabilities that will be useful no matter what the specific application is. The goal is to make Watson better at learning and at interacting with humans. On Jeopardy!, Watson could respond to a clue with the answer it was most confident in. In the next-generation Watson technology, people will be able to engage Watson in ongoing dialogues aimed at finding the most useful insights. After receiving an initial query, Watson will be able to ask for additional information to help it understand more precisely what the human being wants to know. The learning capability includes a series of steps: hypothesis generation from evidence, hypothesis ranking, question generation, and answer acquisition. In addition to getting answers from humans it’s interacting with, Watson will be able to tap a directory of experts for advice or even learn through crowd-sourcing. As it acquires answers, it will build a collection of learned axioms that strengthen its command of given domains. Other improvements to Watson have come. People are now able to view the logic and evidence upon which
3 34 3
building learning systems
Watson presents options. Watson is now able to digest not just textual information but also structured statistical data, such as electronic medical records. A different group at IBM is working on natural-language-processing technology that will allow people to engage in spoken conversations with Watson. At the highest level, many of the changes are aimed at moving Watson from answering specific questions to dealing with complex and incomplete problem scenarios—the way humans experience things. In fact, as people in particular professions and industries experiment with Watson, they find that the basic question-and-answer capabilities, while useful, are not the most valuable aspects of the systems. Rather, they see Watson as a discovery tool that doesn’t just make them better at what they already do; through human-machine interactions, it helps them think differently and work differently. David Nahamoo, who heads up IBM Research’s speech group, says it this way: “Traditional computing systems are systems of record. The computers of the future will be systems of engagement. They’re about dynamic knowledge, not static knowledge.”13 Language is the key that makes it possible for machines to help us better understand our world and to engage with us. But don’t think of language only in the most obvious sense—English, Chinese, Spanish, and the like. Every domain of knowledge and expertise has it’s own specialized language, such as chemical symbols, medical images, or the specific meaning embedded in arcane legal terms.
3 35 3
building learning systems
And that poses particular challenges for computers. As we ask computers to understand and reason more like we do, we must teach them not just the structure and vocabulary of spoken and written languages but the words and concepts that are particular to professional and business domains. When they have that kind of capability, they can begin to offer us not just answers to questions but fresh insights that might not have occurred to us on our own. Think, for example, about a challenge confronting a corporate marketing department. Time and again, as the marketing team tries out new techniques in a world that’s quickly being transformed by digital media, social networking, and mobile communications, it finds that its campaigns are not effective. But what if a cognitive assistant could help? The system could examine past campaigns in detail, looking for common patterns that might help the team spot what it’s doing wrong. At the same time, the system can crawl the world of information for case studies of successful digital-marketing campaigns and spot the patterns that point to success. In order to do this kind of work, the system will need to comprehend the language and incorporate knowledge from a variety of specialized domains, from classic marketing to searchengine optimization and from behavioral psychology to demographics and statistics. Everything that has a meaning within a profession or domain must have the same meaning to the system if it is to collaborate with people at a deep level. One of the most promising domains for this kind of human-machine interaction is the health-care field. That’s 3 36 3
building learning systems
where the IBM partnerships with Memorial Sloan-Kettering Cancer Center and Cleveland Clinic come in. Eric Brown’s team at IBM Research is working with Cleveland Clinic on two Watson projects. First, they’re embedding Watson into the curriculum of the clinic’s Lerner College of Medicine. Physicians, faculty, and medical students there are helping to adapt Watson for use as a collaborative learning tool to assist with medical information exploration and training in clinical reasoning. In addition, Cleveland Clinic is using Watson to help unlock hidden insights in the unstructured data in anonymized electronic medical records, with the goal of helping physicians make more informed decisions about patient care. Leaders at Lerner are among those who believe that medicine must be taught differently—through problem-based learning and simulation laboratories where small groups of students solve sample cases. The medical school’s students use a manual visualization technique to work their way through chains of evidence, inferences, and beliefs about what’s wrong with a patient and what should be done to help. They draw circles and arrows with pencil and paper to represent pieces of evidence and potential conclusions. The IBM Research team adapted the technique to create a user interface called WatsonPaths, an interactive program for teaching Watson how physicians solve problems. Now, working with WatsonPaths, faculty and students will be able to review the evidence the computer presents and the inferences it draws. They will type in additional information and insights. As the faculty members and students interact with the machine, the arrows that represent the 3 37 3
building learning systems
strongest connections will become thicker and the circles that represent important evidence and strong conclusions will become more pronounced. In the background, Watson will learn how to interpret the way physicians talk and write about their craft. The students will train the machine and, at the same time, the machine will help train the students. Just as developing a version of Watson for Jeopardy! helped focus the early research efforts on the technology, the team has set a new goal that’s even more challenging: becoming capable of passing the online section of the U.S. Medical Licensing Exam, the professional exam that physicians with an MD are required to pass before they’re permitted to practice medicine. It includes sections on anatomy, pharmacology, pathology, behavior science, genetics, and other medical disciplines. The goal is for Watson to be capable of passing the exam as a measure of progress, though the system will not actually take it. Professors and students at Lerner are helping Watson achieve this capability. “Problem-based learning techniques for teaching medical students align with the way doctors think. In addition, medical students and physicians need help in harnessing the volumes of evidence now available,” says Dr. James Young, executive dean of the Cleveland Clinic Lerner College of Medicine of Case Western Reserve University. “We are helping to train Watson so that eventually Watson will be a partner in the delivery of healthcare for our patients.” Meanwhile, Memorial Sloan-Kettering Cancer Center, WellPoint, and IBM are developing a Watson-powered product called Interactive Care Insights for Oncology to help oncologists access the latest information on treatments. 3 38 3
building learning systems
MSKCC is helping provision Watson with the body of knowledge it will need to serve as a valuable resource to help physicians make evidence-based decisions. The corpus includes everything from case histories and vast libraries of medical literature to clinical guidelines, best practices from top physicians, and reports from drug trials. Watson has by now ingested 600,000 pages of medical evidence, 2 million pages of text from 42 medical journals and clinical trials, and several thousand case histories. The physicians are using all that data to train Watson to make treatment recommendations that are aligned with best practices. Once oncologists from MSKCC and community-based oncology centers begin testing the product, additional learning will take place. Think of it as on-thejob training for Watson. The more physicians interact with the machine, the smarter it will get. WellPoint has selected several community-based oncology centers to test the Interactive Care Insights for Oncology technology, including WESTMED Medical Group in Purchase, N.Y. By evaluating the system’s performance against their existing patient records, they can see how well it works and provide feedback to make it more usable. Once the application is operational, oncologists at WESTMED and other beta sites will be able to use it to tap into the expertise and knowledge base of MSKCC and other sources, resources they would not have had such ready access to otherwise. A key element of preparing Watson for this new role as a physician’s helper is training it to interact with people the way that’s most natural for the humans. Earlier attempts 3 39 3
building learning systems
at creating support systems for physicians fell short partly because they forced the doctors to adapt to the ways of the computer. Watson’s user interface is being designed to work on different types of computers including desktops, laptops, or even tablets that doctors can easily carry around with them. Its procedures are based on the way physicians like to work. Getting things done should be far simpler than is typical in today’s human-machine interactions.
Scenario: Physician’s Assistant Dr. Mark Kris, the chief of thoracic oncology at Memorial Sloan Kettering Cancer Center, has developed a detailed scenario describing how he thinks Watson could be used by oncologists: Preparation: The oncologist is preparing to treat a Japanese-American woman who has lung cancer. Ahead of their first meeting, he reviews her personal information and medical records, including diagnosis and test results—presented to him by the Watson interface on his tablet computer. He has already spoken to the referring physician. Armed with all of this information, he begins developing a hypothesis. But the oncologist wants a second opinion—from Watson. So he types in a request for Watson to use all of the information at its disposal to evaluate the treatment options for this patient. Then he taps the “submit” button. In a matter of seconds, Watson pulls together the 3 40 3
building learning systems
patient information, medical records, relevant medical literature, and treatment results based on MSKCC’s experience. Watson points out that women from Japan who are nonsmokers are fairly likely to have a mutation in a gene, EGFR. If so, a drug, Erlotinib, shrinks the cancer 95 percent of the time. Watson suggests a test to see if the patient has the gene mutation. Watson suggests other tests as well, and, after reviewing the options, the oncologist can tap a few buttons to submit a request to the patient’s insurance carrier for preauthorization of the tests. First Consultation: The physician meets with the patient in his office. He talks to her about her condition and asks her about her treatment preferences. She tells him that she would prefer to avoid a treatment that would result in loss of hair. It would be too disturbing to her children. After the meeting, the doctor types his consultation notes into Watson and orders the new tests. Review of Test Results: A few days have passed since the first visit, and the oncologist is reviewing the test results in Watson. It turns out that the patient has one of the EGFR mutations, but it’s a mutation for which Erlotinib will not work. Second Consultation: During the second consultation with his patient, the physician explains his deliberations. He also asks the patient if she has noticed any changes in her condition since they last met. She says yes. Two days earlier, she began noticing blood in her phlegm. The physician asks Watson how that new piece of information affects its suggested treatment 3 41 3
building learning systems
options. The program comes back with a very different set of options, ranked on its confidence level. Watson has 90 percent confidence in the top option on the list, a cocktail of two often-used chemotherapy drugs. In addition, Watson points out that the treatment is less likely than others to cause loss of hair. After discussing the treatment with his patient and securing her agreement, the doctor taps a button on the Watson interface to request preauthorization of the treatment from the insurance company. All of the evidence that he and Watson have gathered is forwarded to the insurance company to support the request.
The system works for all the parties involved. The physician gets access to a tremendous amount of up-to-date information that he might not have gathered and digested on his own. The patient gets a carefully chosen treatment that’s likely to be the most beneficial and that may spare her unpleasant side effects. She also doesn’t have to wait long to begin treatment. The system works for the insurance company, too. It gets the evidence it needs to confidently give the go-ahead for a treatment regimen. And society gets better health care that’s more efficiently and affordably delivered. “We’re making evidence-based decisions. Watson proves it,” says Mark. “It’s going to be much better for all of us.”14
3 42 3
3 Handling Big Data o3O3o
T
he global financial crisis of 2008 had many causes, but, boiled down to the essentials, it comes to this: Institutions and individuals didn’t fully appreciate the risks they were taking. Lenders issued high-risk home loans and securitized them without performing enough due diligence. Investment banks packaged and resold them in ways that obscured the underlying risks. Credit-rating agencies gave the securities their seal of approval. Investors around the globe snapped them up. Government regulators failed to spot flaws in the system.1 The crisis was a failure of information. Though institutions and individuals had access to immense quantities of data about both the financial system as a whole and their own parts in it, they were not capable of seeing what was going on well enough to make prudent decisions. Major changes are needed to prevent this kind of thing from happening again, including more effective regulatory monitoring and more responsible behavior by participants in the system. But there’s an additional element that could
Handling Big Data
help prevent such a calamity: the emergence of technology designed to help people understand the workings of the entire system more deeply so they can better assess the risks they face. Think of it as an early-warning system. The financial crisis is a dramatic example of where technologists have the ability to harness data to help solve a complex and critical problem. Remember the metaphor of the nested Russian dolls that we used earlier to illustrate the layers of cognitive computing? In chapter 2, we described the outermost layer—the transformation of computers into powerful learning systems. Chapter 3 explores the doll that nests inside that one—the new approach that these computers will need for managing and analyzing data. Today, we are witnessing the emergence of a new force in society and business: big data. Organizations and individuals are faced with a torrent of data, everything from structured information such as transaction records to a wide variety of unstructured information—still images, video, audio, and sensor data. The biggest new source of data is the so-called Internet of things, data produced by sensors and harvested via the Internet. The sensors involved range from the RFID tags that retailers use to track merchandise to video cameras that capture the flow of traffic. Every day, as a group, human beings generate about 3 exabytes of computer data—a prodigious output that is expected to produce a data universe of 40 zettabytes of digital stuff by 2020.2 A zettabyte is a decidedly big number: a 1 followed by 21 zeros. One zettabyte of storage would hold 250 billion two-hour HD movies. This flood of information should be 3 44 3
Handling Big Data
tremendously valuable. When you get down to it, big data is nothing less than the digital expression of life in the raw, in all of its richness and ambiguity. As we mentioned in chapter 1, it’s useful to think of all of these data as a new natural resource, potentially even more valuable than resources like oil, coal, or natural gas because data are an essential ingredient in everything we do. But a resource isn’t worth much unless you can take full advantage of it. Today, less than 1 percent of the digital data that have been collected is actually analyzed.3 This profusion of data is difficult to capture, make sense of, and move around. And, unfortunately, today’s computing systems aren’t up to the task of handling all of this raw information in an efficient and affordable way. That’s why we need to develop new systems for managing big data. We need a new generation of data storage, management, and analytics tools that will improve our ability to gather, meld, and make sense of huge amounts of data and, in some cases, to perform a complex ballet of tasks in real time. The new generation of tools must be designed to handle the four Vs of big data: volume, variety, velocity, and veracity. n Volume: The sheer amount of data we’re gathering is a challenge in itself. While the cost of transporting and storing a unit of data declines steadily, the amount we collect is increasing at a much faster rate, so we have to find new ways of moving and storing it. n Variety: Until fairly recently, most business and government information was stored in simple spreadsheets or 3 45 3
Handling Big Data
relational databases made up of columns and rows—easily accessed using standard queries. But a host of newer data types, including video, geospatial information, webpages, and speech, require new ways of managing data. n Velocity: In some scenarios, dealing with the velocity of data will be essential. So rather than first gathering and storing data in databases, we must develop effective ways of analyzing data in motion, drawing actionable insights, and keeping only the bits that are essential. n Veracity: Much of the new data we have access to now are not precise or certain—or clearly organized. A lot of it is just so much noise. The challenge is to make sure that we’re getting the right information, that it’s accurate, and that we’re able to draw the right conclusions from it.
How to Solve the Big Data Problem A tremendous amount of progress has been made in the way we handle data since the dawn of the programmable computing era. The first commercial disk drive, introduced by IBM in 1956, was the size of a small car and stored just five megabytes of data. Today, a standard laptop computer comes with a two-and-a-half-inch disk drive capable of storing 1 terabyte of data, one million megabytes. Early database-management systems stored information in lists organized by categories and subcategories. The computer user had to manually navigate through a series of lists to find a particular nugget of information. In contrast, today’s relational databases store information in columns 3 46 3
Handling Big Data
and rows that can be searched with standard query languages. Early computers calculated probabilities of particular outcomes based on statistical analysis of data related to relatively narrow problems. Today’s advanced analytics software programs find patterns in large sets of data and extract meaning from them. In order to handle the four Vs of big data, though, major disruptive advances will be required in the storage, management and analysis of data. Volume
One of the most significant breakthroughs we see coming for storage of huge amounts of data is the ability to use cognitive techniques to more efficiently manage the flow of data within computing systems. Today, data are stored using a variety of technologies, depending on how often access is needed and the cost of the storage medium. Magnetic tape is the cheapest form of storage, but data retrieval is slow. Disk drives are more expensive but faster to access. Data storage on memory chips is lightning quick but even more expensive. Typically, database administrators move data from one technology to another based on rigid rules and schedules. But that approach isn’t satisfactory in the era of big data. So scientists at the IBM Research lab in Zurich, Switzerland, are developing a sophisticated system for optimizing the storage of data for particular uses by considering how much it costs to store the data and how quickly the computer system will need to fetch specific 3 47 3
Handling Big Data
pieces of information. The system will learn by studying the application it’s being asked to manage. Then it will make adjustments as it monitors the application in action—putting bits of data in the right places at the right times.4 For instance, an application for monitoring security on a city street will store most of the video and audio it gathers on tape or disk drives. But when it picks up anomalies that might indicate security threats, it will store data about them on memory chips and revisit that information frequently as additional sensor data are gathered. Only the most relevant data are stored on the most expensive media, and only for the time they are needed. Variety
Over the past decade or so, computer scientists and mathematicians have become quite proficient at handling specific types of data by using specialized tools that do one thing very well. For example, a sales manager can employ data analytics software to reveal past sales trends and forecast future sales, using a specific tool for a specific data type. But that approach doesn’t work for complex operational challenges such as managing cities, global supply chains, or power grids, where many interdependencies exist and many different kinds of data have to be taken into consideration. What’s required is a holistic approach to data management and analysis. You have to be able to mash together different kinds of data and act on them with different 3 48 3
Handling Big Data
kinds of tools. An early example of this approach is a project that IBM researchers in New York and Brazil completed on behalf of the city of Rio de Janeiro. Rio has recurring flooding and landslide problems in many hilly neighborhoods. The researchers used data describing the physics of the atmosphere to create a mathematical model of how storms are likely to unfold in Rio. With it, they can predict up to forty hours ahead of time how much rain will fall in a particular location with 90 percent accuracy. They also used topographical data and historical flood information to create a model of how the water would flow in the streets once it hit the ground. After combining the two models, they were able to predict flooding and landslide conditions. They also consider factors such as traffic patterns on city streets. In this way, they’re making it possible for the city to anticipate flooding and landslides and put resources in place ahead of time to deal with them— including deciding where to deploy emergency crews and where to open shelters. By combining the different data types and techniques in this way, they created a system that gathers a wide variety of information, evaluates it systematically, and reaches conclusions the way humans do. Velocity
Most of today’s computing tasks involve data that have been gathered and stored in databases. The data make a stationary target. But, increasingly, vitally important insights can be gained from analyzing information that’s on the move. This technology can be used for everything 3 49 3
Handling Big Data
from real-time stock trading to drilling for oil a half-mile below the surface of the ocean to monitoring patients’ health in hospitals. Consider the benefits of real-time monitoring of the vital signs of preemies in a neonatal ward. Babies don’t show signs of infection the way adults to. The danger is that they will become very sick before the problem is detected and die before an effective treatment can be administered. But by monitoring their vital signs and detecting subtle patterns that indicate an infection has set in, physicians can spot problems and begin treatments twelve to twenty-four hours earlier—potentially saving lives. This approach is called streams analytics. Rather than placing the data in a database first, the computer analyzes it as it comes in from a variety of sources, continually refining its understanding of the data as conditions change. This is the way humans process information. Streams analytics programs are extremely sophisticated. Each kind of data must be monitored on its own, translated so it is compatible with other data types, integrated with other data, and analyzed again in the context of the overall flow of information. The translation part is particularly challenging. Spoken words, for instance, have to be converted to machine-interpretable form. In the coming years, streams programs will become more cognitive and adaptive, explains IBM researcher Nagui Halim, head of the streams technology project. He foresees a time when managers will provide a basic description of what they want to a system to do and the system will follow those specifications to design an 3 50 3
Handling Big Data
application made up of preprogrammed components and then launch the application to get the job done. As the applications gain real-world experience with the data, the software will continuously sharpen its understanding of what people are looking for and refine the answers it provides.5 Veracity
Organizations face huge challenges as they attempt to get their arms around the complex interactions between natural and human-made systems. The enemy is uncertainty. In the past, since computing systems didn’t handle uncertainty well, the tendency was to pretend that it didn’t exist. Today, it’s clear that that approach won’t work anymore. So rather than trying to eliminate uncertainty, people have to embrace it. One strategy is to build analytics software that makes it possible to identify an optimal way of getting things done based on mathematical probabilities while taking uncertainty into account. The technique is called stochastic optimization. It has been used for some time but is being further developed to handle the problems of big data. The optimization discipline has its roots in the early days of computing. One of the pioneers in the field was Ralph Gomory, who became the head of IBM Research. After he got his Ph.D. in mathematics from Princeton in 1954 and served in the U.S. Navy, the navy hired him as a consultant and asked him to use mathematical techniques to help design the optimal naval taskforce—how many 3 51 3
Handling Big Data
aircraft carriers, battleships, and so on. He invented a new method, called integer programming, to help solve the problem. Initially, he did this work with pencil and paper, then, with a mechanical calculating machine. Finally, he was invited by the Rand Corporation, a think tank in California, to use their large-scale computer, one of only a few that existed at the time.6 Ralph created a program that could compute an optimal design given a specific set of choices. But today we’re faced with immensely complicated problems that contain so many possibilities that it would take a long time, even with a powerful computer, to explore all of them using Ralph’s deterministic techniques. In fact, solving some of the more complex problems would require more computing power than exists on earth. That’s why in the era of big data we need to use stochastic optimization to get our answers. The term stochastic is derived from a Greek word for “aim.” It refers to the fact that when an archer shoots a series of arrows at a target, most will miss the bull’s-eye and will create a cluster around it. Where each arrow falls depends on a combination of factors, including the skill and visual acuity of the archer, his emotions, the quality of the bow and arrow, weather conditions, and the distance from the target. Stochastic optimization takes into account such variables. To perform stochastic optimization, computer scientists use probability theory, the branch of mathematics concerned with the analysis of random phenomena, to analyze complex situations and figure out how to achieve 3 52 3
Handling Big Data
the best outcome while allowing for the effects of variability. The first step is building a mathematical model— translating a real-world situation into a formula. They use probability theory to identify the mostly likely effects of the variables then perform a series of analyses or simulations, based on a subset of the possibilities, to determine the most likely outcomes. Previously, using less sophisticated techniques, it might take days of computation to arrive at a useful result; now it can be done in seconds. In addition, as models are tested against real-world outcomes, they learn and get better over time. To see how stochastic optimization works, consider an electricity grid. Today, many electrical-distribution systems are outfitted with smart meters that make it possible for consumers and operators of the system to know how much energy is being used in real time. Based on that information, consumers can make informed choices about their consumption levels, and operators can predict with a reasonable certainty what demand will be at a specific time. But what happens if the weather changes suddenly or there’s a failure in the system and power is in short supply? Typically, the operator would go to the spot energy market and pay a premium for additional power. In a worst-case scenario, the utility might not be able to meet demand, resulting in service brownouts. Using stochastic optimization, mathematicians at IBM Research have built a computing system that can predict what incentives would be required for consumers to reduce demand in response to a sudden change in supply. Rather than having to decide what to do on a case-by-case 3 53 3
Handling Big Data
basis, consumers can preprogram their smart energymanagement systems to respond instantly and to dial back energy use under certain circumstances. Over time, cognitive computer programs will be designed that learn from the ongoing interactions and continuously optimize the incentive systems.7 The consumer’s devices will be in touch with the operator’s devices, and, as a result of their automated negotiations, the operators will be able to avoid paying a premium for energy on the spot market and, in some cases, to prevent service outages. Stochastic optimization addresses some of the challenges posed by the uncertainty inherent in big data, but not all of them. Machines, on their own, don’t necessarily recognize discrete pieces of information that are related to one another but are scattered among multiple data storehouses. How do they sort through all the evidence and draw conclusions? They must learn to understand context. Today, the most advanced approach to achieving that aim is an emerging class of technology called entity analytics. The software collects diverse data about realworld entities (people, places, and things) and makes it possible to identify who or what they are and to figure out the relationships among them. One of the pioneers of the field of contextual analytics is Jeff Jonas, the chief scientist of IBM Entity Analytics. Jeff took an unusual path to get to IBM. A serial entrepreneur for two decades, he developed the entity analytics technology to help casinos quickly detect the presence of known cheaters and card counters. His big breakthrough came after a casino asked him to find a way to ensure it was not 3 54 3
Handling Big Data
doing business with people who had been barred by the gaming regulators. At the time, this casino had eighteen different kinds of watch lists and information available from other internal sources, ranging from job applications and hotel reservations to enrollment in the loyalty club. Jeff’s technology, known as NORA, for Non-Obvious Relationship Awareness, helped casinos make connections among related pieces of information, which, among other things, helped them spot the notorious MIT cardcounting team that was featured in the best-selling book Bringing Down the House and the movie 21. IBM bought Jeff’s company, Systems Research & Development, and since then he has added new features, including the ability to analyze streams of information in real time. Jeff’s software essentially builds and maintains a giant jigsaw puzzle on the fly, making connections between newly acquired data and older information. With each additional piece of information, more context accumulates, and the software’s insights and understanding improve. The technology might conclude, for example, that the Jonathan W. Smith who is applying for a credit card in New York City is the Jon Smith who was convicted of credit card fraud in Jersey City five years earlier. The system draws a conclusion based on data matches, such as date of birth, previous addresses, phone numbers, and other bits of data. The system is quite nimble. New data can be ingested in real time, and new insights can be drawn from them immediately. It can even change its mind about a conclusion it has reached based on new evidence. And the system 3 55 3
Handling Big Data
can even spot connections that are meaningful and alert people to them before they have to ask. Says Jeff: “Because there are not enough humans to dream up and ask every question every day, the data must find the data and the things that are relevant must find you.”8 We’re still in the early days of being able to gather and understand context in effective ways. The IBM Research scientist Sam Adams foresees the emergence of computing systems that will gather huge amounts of information from a large number of sources, find patterns and correlations and linkages in it, and then apply specialized information-management software that Sam calls context engines. The process is like finding many needles in a field full of haystacks, combining those needles, then searching the smaller but still substantial collection for the needles made of gold—the truly valuable slivers of insight.9 All of this may sound frightfully inefficient, but it’s the way we humans make sense of the world around us. We have experiences, gather sense- and language-based information, see patterns, learn from successes and mistakes, draw conclusions, and act on them. Increasingly, machines will operate in a similar way. Computing systems will need different kinds of data-organizing structures to accomplish this. A technology called the graph database could become essential. This allows people to envision a web of relationships that lies hidden in collections of unstructured data. For instance, consider how the technology could be used in mining insights from social networks. Jim Smith, who lives in New York City, has friendships on social-networking sites with a number of people who, like 3 56 3
Handling Big Data
him, list Bruce Springsteen among their favorite performers. Several of the others live in Buffalo and went to the same university as Jim, the State University of New York at Buffalo. Some of the others have indicated on their social networks that they “like” the Chipotle restaurant chain. Springsteen is performing in Buffalo next month. Armed with this web of information, an Internet travel site might consider offering Jim an airline and entertainment package at a discount combining airfare to Buffalo, a night in a hotel, two tickets to the Springsteen concert, and a coupon for dinner at Chipotle. The travel site’s computer system will use the graph database in much the same way that a human being makes connections among his or her relationships and interests.
The Emergence of Pervasive Analytics We’re at an inflection point in the evolution of data analytics. The combination of massive amounts of information and the tools to deal with it creates a new opportunity that has huge implications for business, society, and individuals. Today, computers are everywhere, thanks, in large part, to the revolution in mobile communications that has brought us all manner of smartphones and digital tablets. Now we’re on the front edge of a second wave of everywhere-ness as data analytics also becomes pervasive. Thanks to mobile communications, we no longer need to be tethered to the monitor and keyboard to get new insights because analytics allows individuals 3 57 3
Handling Big Data
and organizations to have the information they need when they need it. The insights from analytics will soon no longer be reserved for an elite few. In the future, these insights will be available to just about anybody. Pervasive analytics will enrich our private lives, as well. People love the apps on their smartphones that allow them to ask a simple question, such as, “Where’s the nearest pizzeria?” and get the answer via a friendly robotic voice in a matter of seconds. But what if you could ask more difficult questions, such as, “Can I afford to pay for the car?” with the implicit assumption that the app already knows you need to help out with your kid’s college tuition next year, or, “Is it worth fixing my car?” knowing that the app is tuned in to the mileage, maintenance record, and book value of your car. Suddenly, you can get not just handy information but useful insights that help you conduct your life more successfully.
Scenario: The Cognitive Enterprise In the coming era of cognitive systems, organizations will acquire powerful new capabilities for using big data to make better decisions. Dario Gil, an executive at IBM Research who develops technology solutions for the energy and natural resources industries, believes that for organizations to maximize gains from cognitive systems, they will need to transform the way they operate to become more dynamic, transparent, and collaborative.
3 58 3
Handling Big Data
And he believes the spaces in which they operate will influence whether they can achieve the transformation. That’s why he designed the Cognitive Enterprise Lab, a special workspace at the IBM Research in Yorktown Heights, N.Y. As Dario sees it, the way companies do business is expressed in the way they organize their work environments. And most companies organize things in the same ways—with private offices for executives, cubicles for everybody else, conference rooms for group meetings, the boardroom, and, in some cases, special rooms for teleconferencing. Over time, the ways that people behave and interact in these rooms become codified. Traditional power hierarchies often determine who sits where and who says what. The status quo is reinforced. Think of it this way: ergonomics is destiny. The Cognitive Enterprise Lab is a designed to feel different from other work environments, signaling that the old rules of interaction have been suspended. The room is not even rectangular; instead, one of the four walls slants at an angle. The chairs and tables can easily be reconfigured depending on how the space is to be used. But the technology is what truly sets the space apart. One entire wall is covered with a large high-definition visualization screen upon which presentations and graphics can be displayed and manipulated using hand gestures. Other screens designed for aiding group interactions can be positioned around the room by moving them via tracks suspended from the ceiling. Cameras capture speakers for video conferencing no matter where
3 59 3
Handling Big Data
they sit or stand in the room—and also record facial expressions and body language. (The video might be analyzed later by the computer system, and insights drawn from it can be given back to the speakers so they see how people reacted to specific moments of a presentation.) Acoustic arrays help with speech recognition and processing. The idea is to create the richest possible environment for people’s interactions with data and one another in order to enhance the group’s collective intelligence— a term used by MIT’s Thomas Malone. Dario says, “The goal is to have the system partner with the humans inside the cognitive environments to transform the decision-making activities of teams that are dealing with very complex problems.”10 Imagine how a large oil company might use cognitive capabilities in such a space a few years from now. A critical decision-making juncture for the oil-exploration business is whether to bid to secure leases on new oil fields. Today, typically, the analysis that leads to such decisions is performed in relative isolation by a handful of scientists, financial analysts, and executives who rely on information from seismic studies and publications from scholarly and trade journals. The information is then combined with their own experience-driven hunches about where rich and accessible oil deposits might be located. Geologists, geophysicists, petroleum engineers, and financial analysts each have their own sources of data, expressed in the vocabularies of their domains. The decision maker in charge of the bid speaks to each group of specialists separately to gather information and advice. 3 60 3
Handling Big Data
But what if all of the parties to the decision were united in an environment such as the Cognitive Enterprise Lab? There, the group could carry on its conversations in partnership with a learning system that has access to millions of pages of scientific and industry documents, data about the fields being considered, and information about the company’s financial resources and about current and future market scenarios. The system could integrate the data, analytics, and simulations from different sources so the group can perform cross-domain analysis of the options. When considering how much oil will be available in a particular location and how much it will cost to extract, members can call up visualizations of the scientific data on the shared digital wall or personal screens, where the data are expressed in an individually tailored way. Because everyone is talking together and viewing and interacting with the data at the same time, it’s less likely that there will be misunderstandings or that anybody will make recommendations or decisions that are not fully supported by the evidence. This environment could be used in a wide variety of situations. A group of doctors could gather to evaluate a particularly complex case. Public health officials could manage a pandemic. Venture capitalists could size up a new category of investments. Consumer-electronics companies could plan the launch of new gizmos. Government bodies could develop new legislation. City managers could respond to natural disasters. In the Cognitive Enterprise Lab, it is more likely that the best ideas—backed by evidence—will win. 3 61 3
Handling Big Data
Dario’s lab illustrates a key point about the era of cognitive systems. No longer will the deep analysis of complex situations belong only to data specialists. Instead, problem solvers of many different talents will be able to interact with one another and with data in ways that were not possible before.
Journey of Discovery: The Ultimate Big Data Challenge The Square Kilometre Array is one of the most ambitious scientific projects ever undertaken. Its organizers plan on setting up a massive radio telescope made up of more than half a million antennas spread out across vast swaths of Australia and southern Africa. When it’s completed in 2024 or so, astronomers will be able to analyze the data to better comprehend the history of the universe and the nature of matter. The SKA is a scientific sibling to the Large Hadron Collider in Switzerland. There, scientists are studying the tiniest elemental particles for answers to some of the fundamental riddles of existence. In contrast, “Our laboratory is the whole universe,” says Marco de Vos, managing director of ASTRON, the Netherlands Institute for Radio Astronomy, which, along with IBM, is proposing an information-technology system to manage the SKA’s data.11 The SKA is an iconic example of the need in the coming era for what we call radical collaboration. It’s such a complex project, requiring so many skills and so much investment, that many participants are required. Ten nations 3 62 3
Handling Big Data
are backing the project, and thousands of scientists and hundreds of companies are ultimately expected to play roles in designing, building, and operating it. IBM and ASTRON are melding their skills and resources in a true partnership, and they have invited other organizations to participate. Already, Square Kilometre Array South Africa, an agency of the South African government, has joined. In addition, ASTRON and IBM plan on tapping a global network of hundreds of subject experts from industry and academia to help design the system. The SKA is the ultimate big data challenge. The telescope will collect a veritable deluge of radio signals from outer space—amounting to fourteen exabytes of digital data per day. (The data collected by the SKA in a single day would take nearly two million years to playback on an iPod.)12 Because the telescope is to be made from so many individual antennas, the antennas are to be so widely scattered, and such a large volume of data is being gathered, a novel computing system must be developed to manage the process of gathering, storing, and analyzing data from end to end. ASTRON and IBM have combined forces on a five-year initiative called DOME to create such a system. Their work together began years before, when they collaborated on the information-technology system for a radio telescope called LOFAR. LOFAR is radically different from conventional radio telescopes. Unlike the traditional large steelmesh dishes pointed at the heavens, the more than 10,000 LOFAR antennas, which act like a single virtual telescope, are low profile and low cost. The antennas are set up in 3 63 3
Handling Big Data
clusters on vast fields in the Netherlands and elsewhere in northern Europe. The antennas don’t move. The signals gathered by them are combined in a clever way to target objects of interest in the sky. ASTRON hopes LOFAR, which is now up and running, will be used as a model for the SKA telescope. It will be the test for the DOME project and one of the pilot projects testing the SKA antenna technology. During a recent visit to one of the LOFAR fields, ASTRON’s Albert-Jan Boonstra, a codirector of the DOME project, explained how the equipment works. One antenna type is made up of five-foot-tall posts that are held in place by four wires, which are also key elements of the antenna systems. A small disk at the top of the post contains electronics that amplify the radio waves and then transmit the data along underground cables to a nearby metal box the size of a small car, where more processing takes place. The other antennas are spear-shaped metal pieces arrayed in clusters on electronic circuit boards and covered with Styrofoam sheets and plastic tarps. Most of the data is sent by fiber-optic cables to an IBM Blue Gene supercomputer at nearby University of Groningen, where it is filtered and correlated.13 Albert-Jan grew up in a small village in the Netherlands and has been obsessed with space since watching the Apollo moon landing on TV as a ten-year-old. Inspired, he and a friend made their own rocket out of tin cans and wires, with gasoline for fuel—only to see it explode and burn on the launch pad. “I’m fascinated with the idea that
3 64 3
Handling Big Data
we’re tiny humans in an endless, nearly empty universe. I wanted to see how far we can look into space,” he says. But Albert-Jan knows he won’t be able to penetrate deep into space without some significant advances in computer science. In fact, the DOME project will require more than a half-dozen major breakthroughs because of the huge amount of data and the fact that the antennas are to be scattered over such a wide area. The signals from outer space are a combination of valuable data and meaningless noise and have to be processed to sort out the useful stuff. To control costs, designers of the computing system have to figure out how to minimize the amount of energy used for processing data. At the same time, since so much of the energy in computing is required to move data around, they have to discover ways to move the data as little as possible. Faced with these challenges, Albert-Jan and Marco engaged IBM Research–Zurich in a sort of show-and-tell for emergent technologies. In a series of meetings held at the lab, IBMers presented twenty proposals, of which ASTRON ultimately accepted seven, each intended to overcome a major information-technology hurdle posed by the SKA. While these technology projects are focused on the SKA, they could help solve problems across a wide array of computer science fields, from data management and analytics to computer system design and chip design. The SKA “is an extreme situation,” says Ton Engbersen of IBM Research–Zurich, the codirector of the DOME project. “When we solve it, we’ll have a good handle on solving the other big data challenges.”14 3 65 3
Handling Big Data
Several of the DOME subprojects feature cognitive computing technologies. The most audacious of them, called Algorithms & Machines, is focused on creating an ultrasophisticated software program that will help the team design the entire DOME system in a holistic manner—so the technology can handle the extreme data-processing demands of the SKA without breaking the international organization’s budget. It’s one of most demanding system-optimization challenges ever—the great-grandchild of Ralph Gomery’s U.S. Navy fleet-optimization project in the 1950s. Algorithms & Machines is the brainchild of Ronald Luijten, a Dutchman who has worked at IBM Research– Zurich for twenty-eight years. He’s a student of computer history and was impressed when the supercomputing pioneer Seymour Cray recounted in a speech years ago that he designed the architecture for the Cray 1 supercomputer in the early 1970s by essentially locking five engineers in an office with him and a white board for two weeks. Such a feat would be impossible today. Ronald figures that it would require upward of 250 experts in a wide array of computer science domains to design the architecture of the DOME system. Hiring all those people and bringing them together isn’t practical. So instead with Algorithms & Machines, he and his team are gathering all of the pertinent knowledge in a repository, setting the parameters for the entire computing system, and creating optimization algorithms. The system will learn what they want and prepare a recommendation on how to fulfill their needs. Ronald sees this effort as nothing less than an attempt to 3 66 3
Handling Big Data
map out the contours of the new era of computing. “This design-exploration tool will help us figure out how to design everything from the chip of the future to the datacenter of the future,” he says. Big data creates gigantic challenges, but it also offers the potential of tremendous advances in the ways human beings use information. In these huge data-intensive projects—the SKA, Watson—global communities of scientists and domain specialists are gathering large data sets and building sophisticated models for understanding natural and human phenomena. In a sense, with this data and with the tools to understand it, we’re creating a collective intelligence that can be shared and used by many for the betterment of humankind. That’s the ultimate promise of big data.
3 67 3
4 Augmenting Our Senses o3O3o
W
atson’s performance on Jeopardy! was remarkable. The machine searched through millions of pages of information for possible answers and came up with solutions in about three seconds. But Watson was only searching one kind of data—textual information. All Watson could do is read and reason. But what if it possessed machine equivalents of all of the human senses. What if it could see, hear, touch, taste, and smell? These new capabilities are crucial because of the explosion of new kinds of data. The rapid growth of low-cost sensors makes it possible to monitor everything from leaks in water systems, to the performance of electrical grids, to the location and movement of merchandise in stores and warehouses. The proportion of the world’s data that comes from such sensors is expected to increase from 11 percent in 2005 to 42 percent in 2020. At the same time, the amount of video, still images, and audio content on the Web is exploding. It’s estimated that seventy-two hours of video content are uploaded to YouTube every minute.1
Augmenting Our Senses
Yet most images and audio on the Web are searchable only via metadata—words that are manually typed into forms by humans. So it’s vital to develop systems that can recognize images and sounds more like humans do. As the world becomes more complex and as its complexity becomes increasingly decipherable via analytics, we will need machines that can comprehend what’s going on in the physical world and provide an expansion of human senses.
Sensing the Future It’s also going to be mind-blowing. At IBM Research, just for fun, we recently conducted an exercise where we asked specialists in each of the machine-sensing fields to imagine advances that would likely come in the next five years. They envisioned a handful of scenarios. Some of them are practical; others, intriguing: Sight
Computers will help us sort through the 500 billion photos that are taken each year, finding relevant images not only through metadata but also by understanding what’s going on in the images. We’ll train the computers to know what to look for, and they will learn more on their own as they interact with the images and information pertaining to them. Imagine a computer program that combs through all of the family photos you and your relatives have taken 3 70 3
Augmenting Our Senses
and assembles a family album, in chronological order, complete with descriptions of the who, what, and where of each photo. Hearing
Computers will help us understand what we hear. We will record sounds that babies make when they’re hungry, sleepy, in need a pat on the back, or in distress. Then the system will map the sounds to brain, heart, digestion, and lung activity. As a result, we’ll be able to translate their cries and whimpers into messages adults can readily respond to, extracting language from prelinguistic sounds. Smell
Computers will be able to detect diseases via smell. Similar to how a breathalyzer can detect alcohol from a breath sample, sensors in hospital rooms or even smartphones will be able to identify molecules in the air and match them with known biomarkers associated with liver and kidney disease, diabetes, and tuberculosis. People won’t have to wait for their annual physical to learn that something might be amiss. They’ll get an early warning. Touch
You will be able to reach out and touch through your smartphone. When you’re shopping online, haptic technologies, which use electro-mechanical means to simulate the 3 71 3
Augmenting Our Senses
feel of things, will make it possible for you to distinguish among different materials, textures, and weaves. You’ll be able to feel a sweater, jacket, or upholstery sample right through the screen. Taste
Using human beings as sensors, we’re exploring the use of data analytics and learning technologies to create recipes that both taste good and are good for you. The technology analyzes food in terms of how chemical compounds interact with one another coupled with knowledge about which combinations produce pleasant tastes. Then it matches those factors with the health characteristics of the compounds. The results could not only delight people but help them lose weight and live longer and happier lives. The technology represents an advance in the field of computation creativity. It’s the first time that chemistry has been combined with cultural artifacts (recipes) and psychological principles to create something that’s brand new.
3O3O3O3
Today, robots with rudimentary sensing abilities generate a lot of excitement in the popular media. The Mars rover Curiosity operates on the surface of the Red Planet semiautonomously and “sniffs” the air for methane gas searching for signs of life. Experimental self-driving vehicles 3 72 3
Augmenting Our Senses
are capable of using vision and other senses to navigate through cities without colliding with buses, running over pedestrians, or getting speeding tickets. Yet, as of now, robots remain firmly in the von Neumann computing paradigm. They must be programmed in advance by people to deal with nearly every situation they encounter. Already, learning systems are playing important roles in advances in speech recognition and image recognition. Some voice-recognition systems, for instance, get better at understanding an individual’s manner of speech the more they interact. But the scientific community is just at the beginning of making machines that learn like humans do, so we’re just scratching the surface of the potential of computer sensing. “The excitement with machine learning is figuring out how we build a computer to recognize the things in the world and name them,” says John Smith, a senior manager in intelligent information management at IBM Research. “A machine needs to understand something about the world to understand pictures and videos, and one way that it learns about the world is by examining pictures and videos.”2 All sensing and learning technologies have basic elements in common, starting with the sensor, a device or capability that can detect something about the physical world. Then there’s the ability to extract features from the sensory input, such as colors or edges in visual data or phonemes in voice recognition. The next step is modeling. The computing systems use a variety of analytical tools to group together features that constitute an entity, such as a person, a building, a song, or an odor. Then they apply 3 73 3
Augmenting Our Senses
semantics to place the entities in categories, such as locations, events, and behaviors. Machine learning is employed at each step. The system is instructed by the software to recognize features, entities, and semantic categories. Ultimately, because we can’t teach machines to identify everything and every relationship among things, they have to be capable of inductive reasoning, in which they derive propositions from specific examples. That’s the point at which the cognitive systems begin to advance beyond the confines of von Neumann computing. They don’t just do what they are told; they figure out what to do on their own. In order to make this great leap and become true thinking machines, the cognitive systems of the future will integrate information from multiple sensing technologies. Today, as scientists labor to create machine technologies to augment our senses, there’s a strong tendency to view each sensory field in isolation as specialists focus only on a single sensory capability. Experts in each sense don’t read journals devoted to the others senses, and they don’t attend one another’s conferences. Even within IBM, our specialists in different sensing technologies don’t interact much. Yet if machines are to help humans understand the world, they have to make sense of it and communicate about it in a way that’s familiar and comprehensible to humans. This integration of data from various sensing technologies is beginning to happen in multimedia and visual analytics, where vision and sound are correlated. But that’s just the start of what will be required in the next era of computing. 3 74 3
Augmenting Our Senses
Imagine a security system in a major port city a dozen years from now. On the docks, video cameras record the movements of people and equipment and ships. Microphones record sounds. Sensing devices sniff for smoke or noxious gases. It would be prohibitively expensive to have people constantly monitoring all of these streams of information, but computers can integrate the information in real time and detect patterns and correlations that indicate, say, that an arsonist has boarded a ship late at night, broken a window, and set a fire. The authorities might be alerted so quickly that they arrive on the scene not only before the fire has spread but also before the arsonist has fled the scene. All of these new tasks will require a tremendous amount of data processing. Millions of bits of digital information are contained even in a single frame. A computer has to extract various features, classify them, and build a semantic understanding to make sense of that single image— and a full video contains thousands of these. That’s a lot of data to process. Yet we constantly size up the world around us as a matter of routine. The human brain is a marvel. A mere 20 watts of energy are required to power the 22 billion neurons in a brain that’s roughly the size of a grapefruit. To field a conventional computer with comparable cognitive capacity would require gigawatts of electricity and a machine the size of a football field. So, clearly, something has to change fundamentally in computing for sensing machines to help us make use of the millions of hours of video, billions of photographs, and countless sensory signals that surround us. 3 75 3
Augmenting Our Senses
In order to move forward, we need a new kind of chip, now on the drawing boards, that embodies the principles of cognitive computing in the realm of the senses. The goal is not to reproduce a human brain in electronic circuitry; rather, it is to take inspiration from the way the brain works and use it as a model for designing chips and systems. By combining the principles of neuroscience with techniques from nanotechnology, scientists can design chips and systems that not only analyze massive amounts of complex information from multiple sensory modalities at once, like brains do, but also can dynamically reprogram themselves as they interact with their environments. So how would a brain-based technology like this work? In the simplest terms, the brain operates like this: Specialized cells, the neurons, receive, integrate and send messages, which are electrochemical pulses of varying amplitudes. The messages are passed along axons, the paths between the neurons’ cell bodies, to the synapses, the junctions between neurons. Through this messaging, neurons collaborate to form networks that are capable of sophisticated information processing.3 This network of neurons is the organic circuitry of the brain, and the neurosynaptic chips now under development mimic that design. Conventional chips are limited by the fact that a single transistor can communicate with a maximum of four other transistors at the same time. In neurosynaptic computing, each tiny electronic element speaks to thousands of others. The chip circuitry is networked like the brain. Further, logic and memory are intermixed within a 3 76 3
Augmenting Our Senses
single chip, breaking the von Neumann bottleneck that we talked about in chapter 1. That allows computing systems based on the cognitive chips to store enormous amounts of information and act on it efficiently. There’s a debate in the scientific community, including within IBM, about whether it will be better to use a digital or analog approach to designing cognitive chips. Digital chips process information in ones and zeros. They compute using mathematics. But some researchers, including IBM’s Mark Ritter, argue that the best approach for designing neurosynaptic chips will be to make them analog, meaning they will transmit information via electric pulses of varying amplitude, similar to the messages in the brain. In that scenario, the elements representing neurons will “remember” what happened to them by storing information about the strength and routing of the pulses they receive and transmit. At the elemental level, these mechanisms will make it possible for the machine to think. “This is how we’ll make machines that truly learn— that understand and digest data like the brain does,” Mark says.4
Journey of Discovery: A Chip Modeled on the Brain On January 13, 2011, just days before the first broadcast of Watson on Jeopardy!, a box arrived at IBM Research– Almaden in San Jose, Calif. It had been mailed from the IBM chip fabrication plant in East Fishkill, N.Y. Inside the 3 77 3
Augmenting Our Senses
box, cushioned by Styrofoam squiggles, were 200 experimental neurosynaptic computing chips modeled on the human brain. Dharmendra Modha, the manager of cognitive computing at the Almaden lab, and John Arthur, one of the research scientists on his team, were thrilled as they cut open the box and lifted out its precious cargo. To be sure, the chips weren’t very smart compared to a human brain. Each had approximately the intelligence of a worm, in that it had a number of artificial neurons comparable with a worm’s actual neurons. But they demonstrated that chips could be made that to an extent re-create the interaction of neurons and synapses in the brain through the combination of advanced mathematical algorithms and silicon circuitry. It was a significant step toward their ultimate goal: to create a cognitive machine that can sense and draw conclusions from the things that happen in the world—not through the abstraction of numbers but more directly, as humans do via our senses. “This is not a science project. It’s a quest,” says Dharmendra. “It involves many people, many years, and lots of unknowns. But if we can engineer a computer to exhibit the versatility and robustness of the mind, it would have huge ramifications for society.”5 Their cognitive chips were the first harvest of the SyNAPSE (Systems of Neuromorphic Adaptive Plastic Scalable Electronics) project, a collaboration of the Defense Advanced Research Projects Agency (DARPA), IBM, and faculty members from several universities, among them Columbia, Cornell, Stanford, and the University of 3 78 3
Augmenting Our Senses
Wisconsin. It illustrates yet another key element of successful innovation in the cognitive era. Researchers have to keep an open mind. They can’t stubbornly pursue a path because they’re invested in it personally. They have to be open to making drastic changes. Dharmendra grew up in Mumbai, India, and his early education came in a vocational school. But after a tutor took interest in him and gave him his first lessons in trigonometry, he came to believe that he could go far in life if he put his mind to it. He eventually got an undergraduate degree from the prestigious IIT Bombay and a Ph.D. in electrical engineering from UC San Diego, where, among other things, he studied the mathematics of neural networks, one of the foundations of cognitive science. Dharmendra landed a position as a researcher at the Almaden lab, where he specialized in writing algorithms for storage devices. When another company tried to recruit him, Dharmendra instead went to lab director Mark Dean with a radical proposition: he wanted to build a brainlike computer.6 Dharmendra convinced Mark to permit him to organize a conference on cognitive computing in 2006 where leaders in the field—including Nobel laureate Gerald Edelman—presented their latest insights. Dharmendra came out of that event convinced that it was possible to, in a sense, reverse-engineer the brain. At about the same time, Henry Markram and his team at the Swiss Federal Institute of Technology conducted an experiment where they simulated a slice of the brain on a supercomputer—which led a few years later to the European Union’s Human Brain Project. The race was on 3 79 3
Augmenting Our Senses
to create a virtual brain within a computer. Dharmendra’s big break came in 2008, when DARPA launched SyNAPSE. That program would provide funding for Dharmendra’s project beyond what IBM would invest. The goal was to design electronic circuitry that would be based on the workings of the brain. It was a difficult journey. Realizing that the brain couldn’t be easily mapped to today’s chips, which are designed and manufactured using an approach called complementary metal-oxide-semiconductor (CMOS), the team planned on inventing a new technology that would mimic the functions of neurons and synapses. They intended to use the analog approach, rather than digital, because biology is analog. But, after a time, it became clear to Dharmendra that to develop a new technology rivaling CMOS was an effort that would take too long and cost too much to meet the deadlines and budgets of the DARPA project. On December 16, 2009, he arrived home from work feeling like a defeated man. Tears welled up in his eyes as he told his wife, “SyNAPSE is toast. It’s over.” Fortunately, that was not the case. A few weeks later, Dharmendra was struck with a serious case of the flu that left him feverish and in bed. He couldn’t even muster the energy to answer e-mail. But being stuck at home gave him a lot of time to think over the SyNAPSE dilemma. He concluded that while it was futile in the short term to try to invent a new technology for a cognitive machine, that didn’t mean the project should be abandoned. Instead, the team needed to refocus on CMOS chip technology and on digital circuitry rather than analog circuitry. They 3 80 3
Augmenting Our Senses
would create an entirely new non–von Neumann architecture in both silicon and software that would simulate the functions of neurons and synapses. Using that architecture, they would produce chips for sense-making tasks that would be vastly more efficient than today’s standard digital processors.7 Dharmendra called a meeting of all of the participants in the project. On January 21, 2010, about twenty-five people crowded into the library at the Almaden lab. Others were on the phone from IBM labs in New York and from the universities. He knew that it would be difficult to convince some team members that his new approach was the way to go. To do so, he would have to get people to abandon emotional attachments to fiercely held beliefs, primarily the thought that the project could only succeed through a new technology and a new kind of analog chip. So he decided to use a brainstorming exercise called “six thinking hats.” The technique requires participants in a contentious discussion to identify, using colors, the essential nature of the points they make in discussions, with six different colors standing for facts, emotions, and so on.8 After Dharmendra laid out his proposal an intense discussion ensued. Some of the scientists argued that they should continue with the long-term exploration even though it would not be possible to meet DARPA’s deadlines. But Dharmendra made a convincing argument that they needed to adopt a strategy that could succeed within the DARPA timeframe. He argued that, in the future, armed with more knowledge, they’d be able to continue the quest for the novel technology. Also, he pointed out that by using 3 81 3
Augmenting Our Senses
CMOS and digital circuitry, they would be able to test their designs thoroughly by simulating them in software using supercomputers before going to all the work of designing and manufacturing a physical chip. “Learn through simulation. Burn on the chip,” he told them. People gradually came around this way of thinking. Using the “six thinking hats” technique made it easier to separate gut feelings from rational arguments. The exercise “gave us structure and clarity, and made it easier to solve things,” says Bill Risk, an IBM Research physicist who is one of the project managers for SyNAPSE.9 Out of that meeting came agreement on the way forward. Soon afterward, on February 17, 2010, DARPA enthusiastically endorsed the new direction. Within a year, the team had designed the chips that arrived in the box from East Fishkill. Rather than processing data sequentially, as in classic von Neumann computing, the chips mimic the brain’s event-driven, distributed, and parallel processing. Using those chips, team members were able to write applications, including teaching the system to play Pong, one of the earliest computer games, which demonstrated capabilities such as navigation, machine vision, and pattern recognition.10 In mid-2013, IBM produced a second generation of chips, with the design name TrueNorth. They possess approximately the same number of neurons as a bumblebee. If you put one hundred of the chips together in a computing system, you’re got something that occupies less than two liters of volume and consumes less energy than a lightbulb. Dharmendra calls it a “brain box.” The 3 82 3
Augmenting Our Senses
coming generations of brain boxes will be able to combine information from different sensors and draw conclusions from it, much as humans combine what they see and hear to understand what’s going on around them.11 Dharmendra’s team has developed a number of scenarios to help people understand how their technologies might be employed, using anywhere from a single chip to dozens of them. They worked with industrial designers to create physical models of potential devices: n Assistive Vision, eyeglasses designed to help people with impaired vision. The glasses would be outfitted with video cameras that can identify obstacles, dangers, and objects the wearers might be looking for. The glasses would plot a pathway and tell the wearer how to navigate via embedded speakers or ear buds. n Tumbleweed, a robotic ball capable of rolling through a disaster scene to find injured people. The ball, perhaps the size of a volleyball, would have numerous video cameras embedded in it, and would concentrate on the images that are most useful. The cognitive system would evaluate the video streams and other sensor outputs to find injured individuals and send messages to the authorities reporting their location via a radio transmitter. n Conversation Flower, a device that captures both images and sounds during business meetings. The device, with petal-like elements extending from a central core, would sit in the middle of a conference room table and monitor the proceedings. The flower might be designed to give visual cues to the participants in meetings, opening if 3 83 3
Augmenting Our Senses
the conversation is dynamic and useful and closing if it is boring and repetitive.12 The cognitive chips and Watson are complementary technologies. For the sake of simplicity, you can think of them as the right brain and left brain of the era of cognitive systems. Watson, the left brain, focuses on language and analytical thinking. The cognitive chips address senses and pattern recognition. Over the coming years, IBM scientists hope to meld the Watson and TrueNorth capabilities together to create a holistic computing intelligence.
Scenario: Medical Imaging Assistant Over the past century, tremendous progress has been made in technologies designed to help spot evidence of disease inside the human body. X rays, ultrasound, computed tomography, nuclear medicine, and magnetic-resonance imaging make it easier to identify and treat everything from heart disease and breast cancer to complex gastro intestinal maladies. Yet innovation has lagged when it comes to helping physicians interpret the information captured in such images. They must routinely look through hundreds or even thousands of images to find the ones that are relevant and telling. And, in the process, they may miss signs of diseases they’re not explicitly looking for. Imagine a new cognitive technology, a medical-imaging assistant, that can automate the process of examining medical images and also spot evidence of disease that 3 84 3
Augmenting Our Senses
physicians might miss. The technology doesn’t replace radiologists; it relieves stress and eyestrain and helps make them better at what they do. Today, a radiologist typically reports for work at a hospital or clinic and plows through two dozen cases during a shift. Often, all she knows about each patient is a tentative diagnosis and instructions on what to look for in the images. Radiologists are highly skilled people who are required to do too much manual inspecting. The medical-imaging assistant will help transform the way radiologists do their jobs. When a specialist begins considering a cardiology case, for instance, she will be presented with a digital-image storyboard created by the computer that shows the most significant twenty-five images in a series. Write-ups created by the computer will explain why each frame has been selected for close examination. These will alert the radiologist when evidence of disease is found, whether or not it is the particular disease being sought. The medical imaging assistant will have been trained by examining millions of images along with the diagnoses, treatments, and outcomes associated with each of them. It will have a huge memory packed with encyclopedic medical information, medical guidelines, and rules established by the hospital or medical group. Even more importantly, the system will make all of the health-care information about the particular patient available to the radiologist, including his family history and entire personal medical record. The assistant will offer up annotated summaries of the results of other tests. The radiologist will be able to carry on a spoken dialogue with 3 85 3
Augmenting Our Senses
the system to delve deeper into the particulars of each case. In this way, the radiologist gets a full picture of the patient and his problems. The automation inherent in this system could make health care much more efficient. A coronary angiogram is typically made up of 25 short video sequences, contains about 5,000 images, and takes a radiologist about 30 minutes to examine. Using the medical-imaging assistant, the time spent looking at images could be reduced to perhaps five minutes. The impact on patient outcomes could be even more significant. A radiologist supplied with all of the contextual information and given the ability to explore the case through a conversation with the cognitive system is likely to be more effective at providing sage advice to treating physicians.
3 86 3
5 Designing Data-centric Computers o3O3o
A
ll most people know about Watson’s hardware is what they saw on TV: the machine’s cool, purple façade, the avatar that stood between the human contestants on Jeopardy! and Watson’s quirky robotic voice. But there’s more to the machine than its TV persona. If you were to take a tour inside the original Watson computer at the IBM Research lab in Yorktown Heights, N.Y., you’d better understand what made its performance on Jeopardy! possible—and why computers must be redesigned for the new era of cognitive computing. The computer takes up a corner of a datacenter on the lab’s second floor. Two parallel rows of refrigeratorsized metal racks are loaded with ninety-two servers. A door at one end turns the space between the racks into a room within a room. At the time of the Jeopardy! contest, the microprocessors were running twice as fast as the next-fastest general purpose computer on the planet.1 Air-conditioning fans and smaller fans within the servers create a roar that’s nearly deafening. The microprocessors
Designing data-centric computers
give off an immense amount of heat. And that’s not a good thing. All the heat and wind signals just how much energy Watson consumes—85,000 watts of electricity when it’s running at full throttle. That’s enough to keep the lights on in a small town. In comparison, the human brain consumes just 20 watts of energy. Unless we can make computers many orders of magnitude more energy efficient, we’re not going to be able to use them extensively as our intelligent assistants. Computing intelligence will be too costly to be practical. Scientists at IBM Research believe that to make computing sustainable in the era of big data, we will need a different kind of machine—the data-centric computer. Today’s computers are processor-centric. The microprocessor, which is the central processing unit in the von Neumann architecture, is where much of the action happens in computing. Working hand in hand with the operating system, the microprocessor sends out instructions to various components within the computer, requesting data from where it’s stored, including memory chips and disk drives. If the computer is part of a larger network, the processor fetches data from storage systems located out on the network. A new system design is needed to greatly reduce the amount of movement required. The new design puts the data at the center, rather than the microprocessor. This design will accomplish several beneficial things at once, according to Bijan Davari, an IBM fellow who operates at the cutting edge of system design. Machines will perform computations faster, make sense of large amounts of data, and be more energy efficient. 3 88 3
Designing data-centric computers
The race to make ever more capable computers began in earnest in the 1970s, when engineers began packing more and more transistors on a single chip. In the computer industry, the term “scaling” refers to ability to add more resources, for example, more electronic circuitry and storage capacity. The process of packing more transistors on a chip is called “scaling down.” In addition, over the years, as businesses and governments needed ever more data-processing power, computer companies built server computers that were designed specifically to handle large and complex tasks. They pursued two strategies. One was to combine more processing power and other resources within a single machine—called “scaling up.” The IBM mainframe is an example of this approach. Another is “scaling out,” which is stringing many servers together so they behave like a single large system. Examples of this are today’s supercomputers and Google’s vast server farms.2 IBM scientists believe an essential element of datacentric computers will be a new kind of scaling: “scaling in.” Engineers will integrate memory with logic and messaging in dense, three-dimensional packages. Today’s memory chips are single layers of silicon with large amounts of data-storing circuitry packed into them. They’re used to store data temporarily and serve them up to the microprocessor when requested. The chips take up space in a computer and require a lot of data movement. But what if you were to stack memory chips like so many pancakes? They’d take up much less space, and much less data movement would be required. That’s exactly what 3 89 3
Designing data-centric computers
IBM and other computer industry leaders are doing with a concept called the hybrid memory cube. This will be an amazing little device. In a single cube, you will be able to stack many layers of memory chips. The layers are cemented together and connected electronically by tiny holes that extend vertically through the stack and are filled with copper. A layer of logic circuitry on the bottom of the cube exposes the memory resources of the entire cube for use by the microprocessor. The design will require 90 percent less space than today’s memory chips and 70 percent less energy.3 In time, memory cubes will likely have their own processors attached to them. Logic and memory will be one, greatly reducing the von Neumann bottleneck. This technology will have a profound effect on the way computing is done. Giant supercomputers, used for everything from oil exploration to crash tests in the auto industry, will shrink in size and will use much less energy. The cubes will be used in next-generation server farms to greatly reduce the amount of space and electricity required. Further down the road, the technology will make it possible to pack an immense amount of computing resources into smartphones, tablets, and other portable devices.4 A second key element of a data-centric computer will be the way processing is done within the system. Today, a computer is equipped with a microprocessor, or CPU, that performs all or most of the computation. All of the data that the CPU acts upon must be transported from where they are stored to the microprocessor then back again. In some cases, data are transported hundreds or even thousands of miles. A data-centric computer would distribute 3 90 3
Designing data-centric computers
processor chips throughout the system to greatly reduce the movement of data. This concept is being tried out today on specialized computing systems that are used to analyze huge volumes of data. Murali Ramanathan, a neuroscientist at the State University of New York at Buffalo, is in charge of one such project. His research focuses on spelling out the role of genetic and environmental factors in multiple sclerosis. MS is a cruel disease that most often strikes young adults. The body’s own immune system attacks the brain and spinal cord, causing physical disabilities and cognitive problems. The cause isn’t known, and there is no cure. Murali’s goal is to find a cure for the disease—or discover a way to prevent it in the first place. What makes his analysis so challenging is the fact that there are so many potential combinations of genes that could contribute to MS and, when genetic information is combined with environmental factors such as diet and smoking, the number of possible causes increases exponentially. This is a data-intensive problem. Problems like these can’t be addressed effectively with conventional supercomputing, where massive parallelism is the key concept. A machine breaks the problem into small chunks and farms them out to thousands of microprocessors to address in parallel, then combines the results of those calculations to derive an answer. To deal with tasks like Murali’s, it makes sense to meld parallelism with the strategy of moving some of the processing closer to the data.5 Murali and his colleagues acquired a new machine specifically designed for data-intensive analysis. The 3 91 3
Designing data-centric computers
refrigerator-size machine uses specialized processor chips, called field-programmable gate arrays, to filter the data on storage disks before passing along only the relevant pieces of information to the main processors. The chips can be custom programmed after they’re in the machine to handle certain kinds of queries, and, as a result, they reject as much as 90 percent of the data. This implies that the system performs some of the analysis as the data are moving off the disks, rather than handling all of it on the main processor or, in some cases, shipping it off to specialized analytics computers. As a result, the data do not travel as far, and less energy is required to move bits around. The work can be done faster and more efficiently.6 To get an idea of how effective this kind of computing can be, consider the results that Murali and his colleagues are achieving. Shortly after they installed the new computer, they ran a problem that contained 100,000 genetic and environmental variables. That means there were nearly 5 billion two-predictor combinations and 100 trillion three-predictor combinations to be evaluated. Yet it took just eleven minutes to do the analysis—compared to twenty-seven hours on their conventional computer. Murali was sitting at his desk on the SUNY campus when he received an e-mail containing the summary results. “I was very, very excited. This can open the door for us,” he says. “We’re on the cusp of being able to solve problems that were previously unsolvable.” No matter where the data are crunched in a computing system, it will be increasingly essential to move data around more quickly and efficiently. This is the third key 3 92 3
Designing data-centric computers
dimension of designing data-centric computers. The most promising means of moving data faster is by harnessing photonics, the generation, transmission, and processing of light waves. Already, telephone companies and broadband communications providers use pulses of light traveling via fiber-optic cables to speed the transmission of data, voice communications, and multimedia content such as movies and TV shows over long distances. In addition, fiber-optic technologies move data within buildings and between racks of machines in computer datacenters. Next, photonics will be used to connect servers in a single rack and chips on a circuit board. Ultimately, the technology will be used within the chips themselves. In a decade-long initiative, a team of scientists at IBM Research has achieved breakthroughs in nanophotonics that could greatly improve the performance of all kinds of computers. Their newest technology, called CMOS integrated nanophotonics, combines traditional electrical components with optical components on a single silicon chip. In today’s advanced computers, optical elements are placed on separate chips, which impedes the performance of the computers and adds to costs. By combining optical and electrical elements on a single chip, the researchers hope to make it possible for manufacturers to produce large quantities of such devices through production processes that are already in use. Their work has now advanced beyond the pure science stage into product development, with a goal of reaching the market within the next few years. This melding of light and electricity will help computers move data around with a speed 3 93 3
Designing data-centric computers
and energy efficiency that is impossible using today’s technologies.
Scenario: Adding Intelligence to the Electric Grid Today’s “smart grids” aren’t actually very bright. Most electric utility systems that claim the “smart” label use advanced metering systems that enable customers to monitor their electricity use and enable two-way communications between the meter and the utility provider. The systems make it possible for utility companies to set pricing that varies by the season and the time of day, providing discounts to customers who shift energy use to off-peak periods. Useful, yet there’s so much more that can be done to make utility grids smarter. One of the most important elements will be distributing intelligence throughout the grid by moving a lot of analysis to where the data are generated, that is, moving the data processing to the data. A utility grid is a complex system that connects sources of energy generation through a distribution network with users of energy, both commercial and residential. Grids are becoming more complex as more sources of energy are added to the mix, including wind and solar; as more consumers produce their own energy; and as new uses of electricity spread, such as the charging of electric vehicles. Sophisticated information-technology networks need to be built to monitor and manage the grids, better balancing supply and demand, coordinating transactions among 3 94 3
Designing data-centric computers
the diverse parties involved, and optimizing the grid so it’s both efficient and resilient. Potentially, electrical grids could be managed by large, integrated computing systems—cognitive systems—that stretch from the home appliances to wind farms off the coast. Today’s grids are equipped with many sensors that measure not just generation, transmission, and consumption but more detailed information such as problems with particular transformers or power lines. Data collected from equipment scattered throughout the system are transmitted to central computers to be recorded and analyzed. But, in the future, with more data pouring in from sophisticated sensors across the entire network, this approach will require more energy, bandwidth, and storage than is practical. Some of the analysis will have to move to the data. In a decade or so, intelligent devices capable of data analysis will be embedded in every piece of equipment and appliance connected to the grid. These devices will collect detailed information on energy consumption and help us manage our energy use. Intelligent software agents installed in the devices will control how appliances function, based initially on preset parameters. For example, in a family home, an air-conditioning system might be set up to reduce the temperature beginning one hour before the family members arrive home at the end of the day. But the system might learn from motion sensors that one resident is coming back earlier and working in the home office. It may begin to cool earlier—but just in the part of the house that person frequents. The intelligent 3 95 3
Designing data-centric computers
agent could also poll family members, perhaps via their smartphones, to see how satisfied they are with the heating and cooling setup, and, based on their reactions, make modifications. In another scenario, several families in a neighborhood might switch to electric vehicles. Standard practice would be to charge the vehicles’ batteries at night, but if multiple households charge their vehicles at the same time it will overtax the neighborhood transformer. To avoid power outages, intelligent devices in the transformer and homes will detect sudden increases in demand for vehicle charging and will coordinate with one another to schedule charging sessions for the various EVs at different times during the night.7 Putting intelligence in the network has the potential to transform the way energy is generated and used. This will make it easier for new sources of energy to be integrated into the system. It will provide strong incentives for conservation, hasten repairs after power outages, and make it possible for societies to expand electricity services without needing to make massive investments in new generation plants. When all of this comes about, smart grids will begin to truly deserve the name.
Journey of Discovery: Rethinking How Computers Are Designed How much does it cost to move a single bit of data from point of origin to point of computation? Apparently, until 3 96 3
Designing data-centric computers
recently, nobody bothered to try to answer that question— perhaps because it didn’t seem consequential. But now, in the era of big data, computer scientists see that when you move a massive amount of data over long distances—or even over short distances many times—the costs quickly add up. So that’s why David Turek, IBM’s vice president of exascale computing, asked the question of a roomful of IBM computing systems experts in 2011. “Fortunately, they were willing to listen to a stupid question,” he says. “In the data world, it’s becoming a big issue, and we have to come up with a way of quantifying it.”8 That simple question helped focus an initiative at IBM that’s aimed at fundamentally re-architecting computing systems. It’s the most radical rethinking of how computers should be designed since IBM’s introduction of its System 360 family of computers in the early 1960s. The project is an example of the new culture of innovation, demonstrating a willingness to abandon old ways of seeing things and start clean. The initiative is called Data Centric Deep Computing, or DC2. IBM fellow Jim Kahle is responsible for piloting DC2— and for answering Dave Turek’s question. He has played leading roles in some of IBM’s most significant advances in chip design over the past three decades. They include the Cell processor, a chip that was employed in the Sony PlayStation game console, the IBM Roadrunner supercomputer and advanced digital TV sets. Yet the Cell chip did not become as widely used throughout computing as Jim had hoped. With DC2, he dreams of creating a revolutionary design that, ultimately, permeates the entire 3 97 3
Designing data-centric computers
computing world, starting with high-performance computing and filtering out to everything else.9 Jim was born in Venezuela, where his father worked in the oil business, and the family later moved to Indonesia before Jim came back to the United States to study computer science at Rice University. Jim credits his childhood experience with diverse cultures with preparing him to run large and complex engineering projects where close collaboration is essential. (At its peak, the Cell development team was 450 people strong and included engineers not just from IBM but from Sony and Toshiba.) Jim reached out to people with a wide range of expertise to make sure new ideas came into the discussion. He’s doing the same thing now with DC2. In the early months of the DC2 project, Jim convened large gatherings of experts on silicon semiconductors, photonics, memory technologies, computer architecture, software, mathematics, and analytics. They began by describing to one another the new uses for computing that they see coming in the next decade and then explained how they’re preparing for them. Based on that input, Jim is now mapping a new computer systems architecture for the cognitive era and a process for achieving it. “We have to figure out how to reorganize ourselves, which is not easy culturally. We have to steer the boat in another direction,” Jim says. He and his crew haven’t yet finished the discovery phase, but, along the way, they have developed some firm ideas about the principles that will guide them to the end of the journey. The cornerstone of their thinking is the belief that there must be a shift from processor-centric computing 3 98 3
Designing data-centric computers
to data-centric computing. Jim remembers the day in late 2011 when the idea of data-centric computing first came to light. He and about a dozen scientists and engineers had gathered in conference room 36-002 of the IBM Research lab in Yorktown Heights—the DC2 war room. They were stumped. They had a lot of great technology ideas but no unifying principle. Then team member Ravi Nair, arriving late to the meeting, rushed into the room and showed his colleagues a simple drawing he had made. On the left of the page was a graphic depiction of traditional computing showing how data move across all the levels of the datastorage hierarchy, from disk and tape storage to various types of memory and finally to the processor. On the right, he showed how processing could take place at each level of the hierarchy and said energy use would decrease as a result. Jim suddenly saw the light. The team would turn computing inside out. The brains in a computer wouldn’t be in one place anymore. They’d be all over the place. The processing would move to the data. They had their big insight, but how would it play out in the detailed design of the computer? To get to that, Jim had to answer the more basic question Dave Turek had asked him a few weeks earlier: How much does it cost to move a bit of data? Jim went off and did his homework and came up with some useful comparisons. For instance, it costs anywhere from 800 to 6,000 picocents (one picocent is a billionth of a penny) to transfer a single bit through a network, depending on how far the bit travels. In contrast, it costs roughly from 6 to 27 picocents to move data within a single computer.10 So it’s 200 to 1,000 times more costly 3 99 3
Designing data-centric computers
to move data a long distance than it is to perform the processing next to the place the data are stored. The fact that the costs of data transfer are so variable in different situations means that Jim and his team had to carefully evaluate each kind of transfer with a cost-benefit analysis. That led them to some interesting conclusions. They decided that specialized processors attached to components of the computing system would be assigned to perform particular tasks. They’d do a slice of the processing—the part that is optimally done next to the data in memory, on disks, or on tape drives or at the point where data flow into the system. In other cases, depending on the task at hand, most of the data may be moved to the main processors. Now the specialists on Jim’s team are going off to engineer new components, and the software architects have to design the enabling software. All this has to work together. Designing a data-centric computer will take several years. Ultimately, Jim and his team hope to produce computing systems that recognize the tasks that are being assigned to them and automatically parse out the various processing tasks to different chips similar to the way a music composer assigns notes and percussive sounds to different instruments in an orchestra. It will take a tremendous effort to redesign computers to take on big data, but the potential payoff is great. It’s possible that a decade or so from now the power of today’s fastest supercomputer will fit in a box on your desk and tomorrow’s fastest computers will be performing tasks we can’t even imagine today. 3 100 3
6 Inventing a New Physics of Computing o3O3o
M
ore than other single technology, the microchip has driven the tremendous improvements in computing performance over the past sixty-plus years in everything from personal computing to consumer electronics to space exploration. Every two years, scientists develop techniques for putting twice as many transistors on a tiny chip of silicon, doubling the chip’s capacity for storing or processing information. The microchip is arguably the most essential technology of the information age. Unfortunately, this rocket ship’s ascent is slowing, and it’s in danger of falling back toward earth. With each passing year, the expected performance gains become harder to achieve. And now scientists foresee a time, approximately a decade from now, when progress will slow to the point that it will begin to affect not only the computer industry but the whole of society. The microchip combines two of the crucial scientific advances of the mid-twentieth century—the transistor and the integrated circuit. The transistor, invented in 1947
Inventing a new Physics of Computing
by John Bardeen, Walter Brattain, and William Shockley at AT&T’s Bell Labs, provides the switching capability that’s required for computation and data storage. The integrated circuit, invented separately by Jack Kilby at Texas Instruments and Robert Noyce at Fairchild Semiconductor, packs numerous transistors on a substrate of semiconducting material, enabling massive-scale production of a wide variety of tiny electronic devices. Over the years, silicon emerged as the semiconducting material of choice. In the 1960s, the microchip pioneer Gordon Moore set the tempo of an entire industry when he observed that the number of transistors on integrated circuits had been doubling roughly every year and predicted that the pace of progress would continue. Ultimately, he forecast that the density of transistors on ICs would double every two years—a statement that has been codified as Moore’s Law.1 Robert Dennard of IBM Research, who is best known for inventing the DRAM memory chip in 1967, also led the team that pioneered the concept of integrated circuit scaling. His rules provided a recipe for miniaturization that engineers have followed ever since. Today, after much innovation by a host of scientists, billions of transistors can be placed on a single chip. The work that Bob launched enabled Moore’s Law to hold for decades. But there’s a problem: physics. One of the miracles of the miniaturization of silicon-based semiconductors has been that transistors actually perform better the smaller you make them. But because of the laws of physics, we have reached a point where transistor performance will 3 102 3
Inventing a new Physics of Computing
increase only marginally or not at all with each new wave of miniaturization. Transistors have ceased to perform as perfect switches. Electrical charges “leak” from one part of the transistor to another when they’re extremely small and the transistor does not perform well, which leads to a significant increase in the power consumption of current chips and might lead to more severe malfunctions in future chips. The densely packed chips can heat to near the point of failure—not a pleasant experience for the computer user. Chip designers and manufacturers have resorted to a series of refinements and work-arounds to extend the life of Moore’s Law and improve the performance of chips. In the past few years, designers have changed some of the materials they use to build transistors and have limited the clock rate—the speed at which electrical pulses race through chip circuitry. In order to make up for the loss of speed, they have divided elements of the processors into separate cores, where data processing is performed in parallel. Now, to counter the effects of the limits of physics, additional layers of silicon and metal are being added to create three-dimensional chips. But as important as these advances are for enabling progress in microelectronics, they don’t address the core problem: atoms don’t “scale.” No one can make traditional silicon devices whose essential elements approach the size of a few atoms. Also, in devices that are just a dozen or so atoms across, unpredictable things start to happen. They cross the border between classical physics and quantum mechanics. “We’re very near the end of classical scaling. 3 103 3
Inventing a new Physics of Computing
It’s a real discontinuity with huge implications,” says Bob Dennard.2 What’s required is a fundamental rethinking of microelectronics. The goal is to produce computer chips that are much more densely packed than today’s and offer much more functionality, without being so power hungry. Toward that end, scientists are exploring a variety of approaches, ranging from developing new semiconductor materials that would replace the silicon transistor to designing new types of devices using the principles of quantum physics or new circuits and computing strategies that, for instance, take as their models the neural networks of the human brain. At IBM, we have a name for this endeavor: the search for a new physics of computing. This journey will take computing from today’s top microprocessors, with billions of transistors, to the nanosystems of the future, which will consume 100 to a 1,000 times less the power and be more efficient. Observers have predicted that some day we’ll have chips with trillions of transistors, but IBM’s top scientists believe that if we succeed with the most ambitious new approaches, it might actually require fewer switches to perform more work. When scientists achieve these breakthroughs, it will be as important a development as the invention of the transistor itself.3 This search for a new physics of computing is vital to the progress of electronics. We need to do more computing in smaller spaces using less electricity. But the effort is also necessary because of the new aspirations we have for 3 104 3
Inventing a new Physics of Computing
computing. To make the most of the flood of big data, to enable massive interconnectivity between systems and people, and to produce the next advances in learning systems, we need vastly more capable computers. If we are to take on complex problems that involve the interplay of multiple social, technical, natural, and economic systems, we need computers that “think” differently at the nano level. These efforts are all part of the quest to invent the switch that will replace the transistor. These are early days. Nobody knows for sure what the replacement will be. One approach that seems especially promising to researchers at IBM is a new semiconductor device category called piezo tronics. The idea is to manipulate sound waves in a tiny piece of material that will serve as a switching element— the piezoresistive element, perhaps twenty nanometers thick. Applying voltages to those tiny elements changes their thickness. The change in resistance constitutes a change in state—the switch from zero to one that is the basis for computation. These devices, theoretically, could be packed densely on a chip and would require much less energy than today’s transistors—about ten times less power.4 Taking a longer view, there’s great potential in applying the principles of quantum mechanics to create a new kind of computational device. Quantum computing has been a Holy Grail for researchers ever since Nobel Prize–winning physicist Richard Feynman challenged the scientific community in 1981 to build computers based on quantum mechanics. A classical von Neumann–type computer makes use of bits, where each bit represents either a one 3 105 3
Inventing a new Physics of Computing
or a zero. In contrast, a quantum bit, or qubit, can represent a one, a zero, or both at once. Therefore, two qubits can be in the states 00, 01, 10, and 11 at the same time. For each added qubit, the total number of potential states doubles. Hence, the use of qubits in certain kinds of computation could enable us to process exponentially larger quantities of data than is possible with the same number of conventional bits.5 For nearly two decades after Feynman issued his famous challenge, the field of quantum computing remained primarily theoretical. But in the 1990s and early 2000s, breakthroughs in algorithms and experimental techniques gave researchers hopes that what had seemed like one brilliant physicist’s pipe dream could actually become reality. And now new achievements are being reported every few months. “The work we’re doing shows it’s no longer just a brute-force physics experiment. It’s time to start creating computing systems based on this science,” says Matthias Steffen, the manager of an experimental quantum computing project at IBM Research.6 Quantum computing has huge implications for the field of data encryption because quantum computers can, theoretically, factor large numbers like those used for making sensitive data indecipherable to prying eyes. So, the first application for quantum computing could be code breaking—just as it was for the first programmable computers in the 1940s. It would take the fastest contemporary supercomputers billions of years to factor numbers hundreds of digits long. But, theoretically, a quantum computer could perform these tasks in hours or even minutes. 3 106 3
Inventing a new Physics of Computing
While quantum machines seem unlikely to replace general-purpose computers any time soon, these techniques could affect other domains of computing in ways not yet foreseeable. An especially intriguing target is helping people to understand the interactions of complex systems of systems that underlie everything from the human body to cities to the global financial industry. One thing seems certain: only through fundamental breakthroughs in physics will we be able to deal with so much complexity and uncertainty on a planetary scale.
Scenario: Designing Products from the Molecule Up In 1935, scientists at the E. I. DuPont Co. invented a synthetic fabric, later named nylon, that became the first commercially successful synthetic polymer. Nylon was first used as a replacement for silk in women’s stockings and parachutes but later came to be employed much more broadly in everything from clothing to gears in machines. The invention of nylon was a long and painstaking process that stretched for five years as a team of DuPont scientists experimented with one combination of molecules after another until they discovered a combo that had all the characteristics they desired in terms of usefulness, manufacturability, and affordability.7 It was a triumph of ingenuity. But what if scientists and engineers could use computers to help them develop new materials—indeed, whole 3 107 3
Inventing a new Physics of Computing
products—from the molecule up? That way, they could much more quickly identify promising materials, mechanisms, and designs, potentially saving a tremendous amount of time and effort. If the computation, search, and learning capabilities of IBM’s Watson could be applied not to searching for answers that are already known, as Watson does, but to a search for answers that are not yet known, the result would be a discovery machine. It’s possible that discovery machines like this will be commonplace in the new era of computing. Already, pharmaceutical companies use supercomputers to identify promising molecules in the drug-discovery process. And the U.S. Department of Energy’s national laboratories use massively powerful racks of computers to simulate the behavior of fissionable materials in the design of nuclear weapons. But to be able to design products from the molecules up, computers will have to become much more capable and computation will need to become much more affordable than they are today. In fact, these tasks will require several orders-of-magnitude improvements in processing power and energy efficiency. The machines will have to be 1,000 times more powerful than today’s fastest supercomputers. If we were to try to achieve that level of performance using the chip designs and materials we have today, a computer would require about 100 megawatts of electricity to run, which would result in an annual electricity bill topping $100 million.8 So the improvements we need can only be achieved through fundamental changes at computing’s nanotechnology level. 3 108 3
Inventing a new Physics of Computing
But brute-force computation is just part of what will be needed to design products from the molecule up. Discovery machines will also require a combination of sophisticated simulation, machine learning, and analytics software to get the job done. The machines will be capable of ingesting all the scholarly articles, patent documents, and basic scientific knowledge pertaining to a specific area of enquiry. They’ll learn to be expert chemists, materials scientists, and physicists. Based on what has been tried before and other knowledge they acquire, they’ll simulate millions of potential combinations of molecules and make predictions about which combinations will achieve the desired results—then make recommendations to the real chemists, materials scientists, and physicists. Consider a scenario from the energy industry. The photo voltaic cells in solar-energy panels are made from electronic circuitry combined with crystalline silicon. These cells are too expensive to enable the kind of mass deployment that will be necessary if solar energy is to play a major role in moving the world away from fossil fuels. But what if you could develop solar cells out of much cheaper organic materials? A discovery machine could help by coming up with a way to capture energy from light by using an organic dye like chlorophyll, which is the molecule that allows plants to absorb energy from light during photosynthesis. Because every place on earth has different solar characteristics, the machine would have to come up with recipes for thousands of organic dyes that could capture the maximum amount of energy in each location. It’s mass invention at the nano-scale. 3 109 3
Inventing a new Physics of Computing
Jed Pitera, an IBM researcher who specializes in using computer simulation to address fundamental questions about biology and chemistry, has an interesting take on the potential for discovery machines. Because they aren’t humans, he says, they won’t be limited by the prejudices that human engineers and scientists are burdened with when the process of discovery takes them into the realm of nanotechnology. “At the nano-scale, a lot of what people think of as normal or intuitive tends to be wrong or incomplete,” Jed says. “If we can build a cognitive computing system that starts with a blank slate, it could develop a better understanding of how things work in that world.”9 So far, we’ve been talking about chemistry and physics at the molecular level, but there’s another dimension of computer-aided discovery: invention at the macro-scale. Eventually, using a combination of high-performance computation and data analytics, we will be able to simulate the operations of entire physical systems, such as a jet engine or an entire aircraft. Products like these are made up of countless materials and have multiple simultaneous physical processes. But the capability to simulate the behavior of these systems while they’re still on the drawing boards could cut years off the time it takes to develop a new aircraft, nuclear power plant, or submarine.10 When companies can design entire products from the molecules up, it will revolutionize innovation and production, remaking entire industries and roiling economies. But in order to be able to design products this way, scientists will first have to design new computing components from the atoms up. 3 110 3
Inventing a new Physics of Computing
Journey of Discovery: The New Physics of Computing On March 2, 2011, Andreas Heinrich and his team of scientists at IBM Research–Almaden accomplished a major breakthrough in nanotechnology. They produced the smallest device that can be used to reliably store a single bit of magnetic information—albeit at low temperature. (A bit is either a one or a zero.) It comprised just twelve atoms.11 Today’s disk drives use about one million atoms to store a bit. The team’s breakthrough has the potential to make a big difference in the realm of data storage, making it possible, for instance, to store an entire movie and music collection on a charm-sized pendant. In addition, the techniques could be used by scientists who are exploring quantum computing Andreas was thrilled by the discovery. Sitting in his team’s tiny laboratory in a building perched on top of California’s Santa Cruz Mountain range, he pressed the button on his computer mouse to reverse the magnetic states of the twelve atoms and stared at a small computer screen where the atoms showed up graphically as small bumps on a flat surface. When the structure received an electrical pulse from the tiny probe tip of the microscope, some atoms grew visibly taller while others shrank, based on their antiferromagnetic arrangement. Andreas switched the magnetic state of the atoms back and forth for nearly four hours. “It was basically, ‘Wow, this works.’ Do it again. ‘Wow, this works,’ ” he recalls.12 Such are the joys of a scientist working at the edge of the known physical universe. 3 111 3
Inventing a new Physics of Computing
Andreas and his work illustrate another key element of what we call the new culture of innovation: the willingness to invest a tremendous amount of time and energy in a project, to go deep on the science and see what it teaches us. This approach to research produced the transistor and many other advances that came from corporate laboratories in the twentieth century. But corporate labs have withered as one company after another cut funding for long-term, high-risk projects. At IBM, we believe that it’s essential that corporations, working with their partners in government and universities, restore their commitment to fundamental scientific research—to boldness and persistence. IBM has operated at the front edge of nanotechnology since the field’s earliest days. The first breakthrough came at IBM Research–Zurich in 1978 after the lab hired the recent Ph.D. Gerd Binnig and teamed him with the senior researcher Heinrich Rohrer. Within a few months they got the idea for the scanning tunneling microscope, which can detect individual atoms by scanning over a surface with a tiny electronic probe. The instrument takes advantage of a phenomenon called tunneling, which causes electrons to jump from the atoms on the surface to the apex of a sharp tip placed in very close proximity. By measuring the amount of tunneling current that occurred, the scientists could plot a picture of the surface and the atoms positioned on top of it.13 For the first time, scientists could “see” landscapes composed of individual atoms. This was the birth of nanotechnology. For their efforts, Gerd and Heinrich won the Nobel Prize for Physics in 1986. 3 112 3
Inventing a new Physics of Computing
Another of Gerd’s inventions, the atomic force microscope, allowed scientists to observe atoms within insulator (nonconducting) materials and also, in a sense, to see inside molecules. IBM researcher Gerhard Meyer, also at the Zurich lab, is the senior member of a team that uses the AFM to directly observe the chemical structure of a molecule by imaging the chemical bonds and the electrical charge distribution within the molecule. With this capability, scientists are able to study chemistry on a single-molecule level, holding great promise for atomicscale computing and energy storage. Moving atoms came in 1989. A team of scientists at IBM Research–Almaden headed by Don Eigler made improvements to the STM, and Don became the first person to manipulate an individual atom. A few weeks later, in a marathon twenty-two-hour session in the lab, he arranged thirty-five xenon atoms to spell out “IBM.” While the moving of a single atom was remarkable in itself, Don later commented that there was an even more important result: “The biggest change was it changed people’s perspective. We showed that there was a world out there made of small things, which we had never had access to, but we now have direct access to. It changes the way we think about things.”14 That world of small things now looms large on the human landscape. Advances like this don’t usually come quickly or predictably. Andreas had been experimenting with the STM for ten years when the team achieved the big magnetic-storage breakthrough. His most important attribute, he says, is persistence. “Nobody tells us what to 3 113 3
Inventing a new Physics of Computing
pursue. We can explore what we want. The atoms tell us what they want to do.”15 Andreas’s persistence originally won him his job at IBM. While studying for his Ph.D. in physics from Germany’s Göttingen University, he was so determined to land a job on Don Eigler’s team at the Almaden lab that he kept trying even after Don turned him down twice. Each time, he proposed a new experiment using the STM. He finally won Don over with a proposal to study the spin effects of atoms on surfaces. Spin is a quantummechanical property of particles such as electrons, a phenomenon that could prove useful in next-generation computing. Instead of switching the states of electronic components by applying electrical pulses, you might be able to do it by changing the spin, which would take much less energy. In order to do these experiments, the team had to build another STM with new capabilities—a project that took nearly two years. Over time, Andreas’s research focused began to focus on the smallest number of atoms that could retain the stable magnetic characteristics of classical physics. The creation of the smallest device that could store a bit of magnetized data was a byproduct of the team’s study, not the goal. That’s what Andreas means when he says the atoms talk to him. The newest member of the STM team is Susanne Baumann, a twenty-five-year-old Swiss Ph.D. candidate. She has the same intensity and persistence as Andreas. Susanne recalls the day in second grade that she first
3 114 3
Inventing a new Physics of Computing
became obsessed with science. She was reading a book that contained a description of atoms and a photograph of Don Eigler’s tiny “IBM” logo. Susanne remembers running in shock to her mother and demanding to know: “Atoms are in the air and in me and in metal. How can this be?” Years later, when she was studying for her master’s degree, she began pestering Don and, later, Andreas, about giving her a job in the lab. (After Don retired from IBM, Andreas succeeded him as head of the STM team.) “I even offered to find my own money for the stay,” she says. Finally, Andreas relented—based on the quality of Susanne’s proposal: to experiment with ways to make it easier for the STM to differentiate the atoms on different parts of the surface. This research, she hopes, will help the group make progress toward quantum computing. That’s the team’s new quest. Now that they have discovered that as few as twelve atoms can combine and retain the effects of classical physics, they’re moving in the opposite direction and asking how many atoms can be combined while still achieving the behavior of quantum mechanics. And they want to know how to make the quantum effects more pronounced and how to control them. The goal is to control a single quantum bit on a surface within three years. Once they can do that, they believe they’ll be able to assemble devices from qubits that can potentially be used for computing. Andreas and his team are in the early stages of their exploration. It could take them ten years or more to produce a working prototype for quantum computing. But
3 115 3
Inventing a new Physics of Computing
since they know how to build structures from the atom up they have some advantages over scientists pursuing other approaches. If they get this right, they may be able to develop practical techniques that could be used for manufacturing massive quantities of quantum-mechanical devices.16 Those ultra-tiny devices could become the equivalent of today’s microchips for the next era of computing.
3 116 3
7 Imagining the Cognitive City o3O3o
A
fter a fire broke out in a building visible from Yankee Stadium during the 1977 World Series, broadcaster Howard Cosell captured the zeitgeist of that moment in New York City’s history with the on-air comment, “Ladies and gentlemen, the Bronx is burning.”1 New York had fallen into a decline from which many people worried it would never recover. The city was near bankruptcy. Corporations had fled to the suburbs. Crime was rampant. A couple of years earlier, a sanitation workers’ strike had left fetid mounds of garbage bags piled on the streets. It was a city where nothing seemed to work anymore. Indeed, New York had become emblematic of the fate of cities across the globe. In developed economies, people fled to the suburbs to escape crime, crowding, crumbling infrastructure, pollution, and failing schools. Today, in major world economies, young people, professionals, and empty nesters are migrating back into cities in search of economic opportunities, cultural enrichment, and fun. Meanwhile, in the emerging world, cities are
Imagining the Cognitive City
magnets for people from the countryside who dream of brighter futures for themselves and their children. Estimating that one million people move into cities each week, experts predict the global urban population to double by 2050 to 6.4 billion, which would make 70 percent of the total world population.2 Yet, at the same time, many of the problems that sent people fleeing from cities decades ago remain, stubbornly resisting traditional fixes. It’s clear now that the future of cities is the future of the planet, so it’s essential that solutions to those problems be found. Fortunately, cities are tremendous engines of innovation. The urban center is where the dreamers, planners, engineers, builders, social activists, and artists congregate. The more diverse groups of people interact with one another, the more likely it is that new ideas will germinate and take root. This charged mixture, along with bold leadership and new technology, could lead to a global renaissance for cities, which could grow not just bigger but better. According to Geoffrey West of the Santa Fe Institute, cities are the sources of our problems but also can be the sources of our solutions. In order to achieve that goal, however, “we desperately need a serious scientific theory of cities,” Geoffrey said at a Ted Talk in Edinburgh, Scotland, in 2011.3 What would a scientific theory of cities look like? IBMers who study cities describe it as a large set of structures, patterns, and processes that provide a formal, quantitative approach to understanding the complex systems of cities of all sizes. In this way, people who are involved in running, planning, and building cities will be able to comprehend how the human and built 3 118 3
Imagining the Cognitive City
systems interplay with one another. Today, different roleplayers in cities see the urban environment though their own lenses—the planner, the architect, the builder, the mayor, the transit boss, the financier, the retailer, the restaurateur, and the labor leader. They’re like medical specialists who have tremendously deep knowledge of one aspect of human biology but don’t really understand how the biological systems of the body interact. A science of cities will provide that fundamental understanding of the urban organism, a common language for discussing it, and methods for diagnosing the problems of cities. In the new era of computing, we believe that technology will help people not only create a scientific theory of cities but act on it. City leaders and residents alike will be able to harness technologies to better understand how their cities work—and why they don’t work better. And they’ll be able to use that knowledge to make cities more convenient, safer, and more economically vibrant. These days, advanced data-analytics technologies are used in cities for everything from developing crime-fighting strategies to improving the efficiency of transit systems. In fact, technology has played a significant role in the turnaround of New York City. In the early 1990s, the police commissioner, William Bratton, initiated the Comstat program, a detailed tracking of all crimes in the city coupled with weekly police department meetings in local precincts aimed at using the data to come up with improved policing tactics. Since those early days, New York’s policing strategies and crime-tracking technologies have become even more sophisticated, and the crime rate has been dropping 3 119 3
Imagining the Cognitive City
for twenty years. The latest advance in technology for policing is a network of 3,000 cameras and hundreds of radiation detectors deployed in Manhattan that look for packages that might contain bombs. The system uses artificial intelligence to review an immense amount of video imagery and radiation data in real time so a suspicious package can be spotted and checked out quickly by police.4 But smart policing is just one narrow slice of the vital role that cognitive systems could play in the lives of cities. City leaders and sociologists often talk about the importance of a holistic approach to solving a city’s problems. Cognitive systems that learn, mine insights from mountains of data, and penetrate complexity enable this holistic approach toward the major issues cities face, from crime to traffic to pollution to education. For example, the manager of a city hospital who has identified juvenile obesity and diabetes as growing health problems in the urban communities could use cognitive computing to model a solution. This would require gathering data on all the factors that affect local children, from their home activities to the availability of healthy food in local markets restaurants, their transportation options, the proximity of safe places to play, and the culture of their families. The complexity of these data require a cognitive system that can refine and correlate the various inputs and allow the hospital team to run simulations and develop a model. This model points toward programs the hospital can initiate or advocate to improve community health.5 Cognitive systems have the potential to knit together all the systems in a city so they can be understood and 3 120 3
Imagining the Cognitive City
managed better. Over time, waves of technological innovation have brought us the utilities that enable the modern city—water and sewer systems, electric grids, roads, bridges, subways, communications infrastructure. In addition, a multitude of city services emerged to provide for the health, education, safety, and welfare of residents. The computer era enabled us to monitor and control some of these systems as discrete entities. Now it’s clear that none of these can be viewed in isolation. Indeed, cities must be seen as complexes of interrelated systems. We need a new utility: a cognitive fabric overlaying all the other systems that can make sense of them and manage them. In chapter 3, we briefly mentioned a major project undertaken by the city of Rio de Janeiro, Brazil. The city has set up a state-of-the-art operations center, Centro de Operacoes Rio, where the staff monitors dozens of data feeds for information concerning weather, traffic, and medical services and anticipate looming problems, putting defenses in place to diminish their impact. Rio’s operations center embodies the principle that only by considering the human-made and natural systems of a city in a holistic way can municipal leaders hope to manage the complexities of a modern metropolis. The tropical climate that makes Rio so attractive also means that it suffers from severe storms, which endanger lives and property. The city leaders got a shocking wakeup call on April 5, 2010, when torrential rains caused floods and landslides that brought much of the city to a standstill and killed more than seventy residents. Rio’s mayor, Eduardo 3 121 3
Imagining the Cognitive City
Paes, vowed that such a disaster would not happen again. He moved decisively to bolster defenses against weatherrelated disruptions with the new operations center. But the facility does much more than respond to emergencies. By coordinating the activities of more than thirty city and state departments plus private utility and transportation companies, the operations center is the first such facility in the world that is on the path to integrating all of the functions of a city in a single digital command-and-control system. Because data from all of the agencies and companies are integrated, operations managers get a full picture of what’s happening in the city at any time. They can prepare for a large concert or sporting event in the same way they handle emergencies. The agencies don’t just share data; representatives from the various departments actually sit side by side in the operations center and look at live videos of city streets and facilities or graphical representations of data feeds, making group decisions on the fly. The Netherlands is also moving boldly into the realm of intelligent operations. With 20 percent of its land below sea level, the country depends on a vast network of dykes and sluices to hold back and divert sea, river, and rain water. Yet global warming and the threat of rising sea levels and droughts means the Dutch can never be complacent. So the government is launching an innovative collaboration aimed at harnessing big data to improve management of the water system while restraining costs. IBM scientists believe that sophisticated learning systems will be useful in situations like those faced by the
3 122 3
Imagining the Cognitive City
Dutch. Authorities there still monitor the structural integrity of the country’s dykes primarily through visual inspection every five years. But already they are experimenting with continuous monitoring using satellites and sensors embedded in the dykes. Cognitive systems will tap all of those data, learn how dykes react under normal conditions, and automatically spot anomalies that might signal breaches or the need for maintenance. The Digital Delta initiative brings together the Dutch Ministry of Water, the Delfland local water authority, the Deltares science institute, the University of Delft, IBM, and other organizations. IBM’s role is to research how to integrate and make available water-related data from more than one hundred sources to scientists and businesses so they can draw insights from them. The expectation is that when people have easy access to relevant data and tools for analyzing them, they will be able to reduce the time it takes to develop new science and technology solutions and reduce development costs. This project could serve as a model for countries, regions, and cities worldwide that face complex challenges from climate change, natural disasters, rapid urbanization, and other large-scale problems. The Dutch government didn’t try to solve this problem by itself. Instead, it formed a public-private partnership drawing on the expertise and resources of government, local water authorities, universities, and private industry. The project could teach leaders elsewhere around the globe valuable lessons about how to cope with rising waters.
3 123 3
Imagining the Cognitive City
An Operating System for the City You can think of the software undergirding Rio’s operations center and the Digital Delta project as the operating system for the city or country, just as Linux, Windows, iOS, and Android are operating systems for computers and smartphones. Operating systems not only make command-and-control connections among all of the functions of the machines but also create a platform that enables a tremendous amount of innovation. Makers of computer hardware such as printers and publishers of software applications such as word processors essentially plug their products into the computer via the operating system. Also, since so much basic functionality is packed into the operating system, that frees other developers to focus on making useful or fun products that fit on top. The same will go for cities. They need operating systems that make it easier to share information and coordinate functions. They also need operating systems to serve as platforms for innovation, enabling city employees, businesses, and citizens to comprehend more deeply the workings of the city and to think up ideas that make cities better places in which to live. IBM and other companies are designing and building these urban operating systems. In the meantime, another type of urban technology is already flourishing: applications that make cities work better or help people manage their lives in urban settings. Think of them as city apps. Open-data initiatives and contests launched by cities are giving rise to an explosion of creativity—thousands of homemade, easy-to-use apps 3 124 3
Imagining the Cognitive City
that do everything from gathering public opinion on hotbutton issues to helping people plan the best way to use public transit. Some of these city apps are quite sophisticated. Streetline, a company based in San Francisco, offers a parking-space locator service in a number of cities, including San Francisco and Los Angeles. Sensors are embedded in the pavement, and drivers use their smartphones to find open spots, cutting down on the wasted time and fuel that turn parking in cities into a nightmare for many drivers. But most city apps are not connected to one another via operating systems. They will be much more useful when they share data and insights and can be used in concert by city managers to take on previously unsolvable challenges.
Discovering the Invisible Potential of Cities When it comes to cognitive systems in cities, we’re just beginning to scratch the surface of what is possible. As we go forward, data will increasingly help us spot the invisible signals being sent out by our cities. For instance, Arizona State University scientists have created a program called Hestia that makes it possible for city leaders to measure how much greenhouse gas is being emitted from specific areas of an urban landscape—right down to the street and building level. That way they can make better-informed decisions about how to reduce carbon dioxide emissions and do their part in combating climate change.6 3 125 3
Imagining the Cognitive City
In Indianapolis, where the program is being tested, Hestia gathers data from a host of sources, including traffic counts, air pollution measurements, and tax-assessor parcel information. It performs traffic simulations and building-by-building energy-consumption modeling. All of these data then go into a massive computer model of the entire city. Hestia is able to capture the patterns of emissions and display them in three-dimensional maps. It’s immediately clear where the big problems are. The ultimate goal is to map the carbon dioxide emissions in all the major cities in the United States, which accounts for about 25 percent of all global CO2 emissions. Science must become an integral part of the rebirth of cities. The old adage, “You can’t improve what you can’t measure,” is as true today as it ever was. What’s different is that we’re now getting our hands on immense amounts of information we didn’t have access too before, and we are getting new tools with new measuring capabilities we could not have imagined previously. The city is becoming a living laboratory for technological innovation. One of IBM’s top city scientists is Guruduth Banavar, who led much of the work in Rio. As chief technology officer of IBM’s Smarter Cities initiative, Guru spent several years traveling the world, visiting cities and interacting with their governments and business and civic leaders. Now head of industries technologies at IBM Research, Guru sees the problems and potential of cities in stark terms: It’s a supply and demand equation. You’ve got to be able to quantify the available resources—people, capital, natural resources, land, goods, and services. And you have 3 126 3
Imagining the Cognitive City
to understand the current and future demands for those resources. So the way to think about improving the operations of cities is through resource management, matching supply with demand.7 This concept is well known to people in the electric utility business. They create demand-response models with data and analytics that make it possible to manage vast grids with multiple types of supplies and myriad consumers of electricity, as well as variables such as weather and earthquakes. Now, Guru says, we need to create much larger demandresponse models for improving the livability and economic vitality of cities. He thinks of these models as cognitive engines for city management. The first step is to continuously gather vast quantities of data of all types about available resources. This would essentially be an evolving digital copy of the city in all its complexity. Analy sis of the data then spots patterns: changes in human mobility, increased demand for water and electricity, and so on. Simulations help anticipate changes in supply and demand. “The system is learning as it goes,” Guru says. “It’s aware of the past, the current state, and the potential for the future, and it helps you manage for the present and plan for the future.” Guru grew up in Bangalore, India, and spends a lot of time in Indian cities today. He knows firsthand about traffic congestion, crowding, poverty, inadequate infrastructure, and the rapid migration of country people into urban centers. For him, this quest is personal. He’s hopeful that with better systems for managing cities—cognitive 3 127 3
Imagining the Cognitive City
systems—there will come vast improvements in mass transportation, freshwater supplies, waste-water treatment, dependable electricity, education, health care, and public safety. His vision of the cognitive city isn’t about futuristic places where cars are suspended in the air and humanoid robots hand out martinis as we zip through our front doors on conveyor belts. It’s about cities that improve the quality of life for billions of people and that advance the sustainability of the planet. Today, city planners and managers focus on information that we can relatively easily capture and control. Much of this information fits neatly in the tech industry’s conventional databases and is reliable. In the future, our computing systems will be able to face much more demanding tasks. They’ll ingest large chunks of the world in raw form and make sense of it in all of its complexity, sloppiness, and uncertainty. In order to do this, our computers must be capable of learning. They’ll be thinking all the time, gathering and analyzing, going deeper and deeper, ready to respond to humans’ questions and pose questions of their own. Up until now, we have used counting and calculating to find precise answers to our questions. In the future, how we frame the questions will define our successes. The new era of computing will make it possible for us to reshape our cities and our world based not on instinct and tabulated data but on new approaches to problem solving and questioning. With these new cognitive tools, we can grapple with the world as it really is: emergent, not deterministic. And that will enable us to respond much 3 128 3
Imagining the Cognitive City
more intelligently and adaptively to its complexity and unpredictability.8 In the future, cognitive systems will monitor a city’s every twitch and tremor and predict what will happen next. They will guide us to decisions and actions that produce the best outcomes for individual citizens and society. But it won’t be enough to invent a new generation of tools and new ways of managing the world. The people factor must not be ignored. IBM has learned through hundreds of interactions with cities in recent years that it’s essential for leaders to engage with citizens if they hope to get big things done. When they don’t, they’re likely to meet resistance. When they do engage, they stand a better chance of not just getting support but of enlisting the creativity and energies of the people. The many online systems for reporting potholes that cities have set up are a good example of this concept in action. When given the opportunity to report holes in pavement, citizens typically respond enthusiastically. That can save the city public works department a lot of time and expense in inspecting for potholes, and analysis of the reports can help improve preventive maintenance. This kind of citizen engagement can have much more profound effects as well. When citizens are well informed, they understand better how a city operates and what it takes to provide the services and the physical infrastructure that they expect. Big data analytics can help bring them that depth of understanding. Perhaps then they’ll be more supportive of long-term investments in service and infrastructure improvements. 3 129 3
Imagining the Cognitive City
In order to make our cities even more livable and effective, we’ll also have to remake ourselves. To fulfill our cities’ potential, we’ll have to become more collaborative and less narrowly self-interested. We’ll have to be more opinion seeking and less opinion offering, more interested in asking tomorrow’s questions than in giving yesterday’s answers.
Journey of Discovery: Cities in Motion Stand on a busy city street corner in the middle of the day and you will witness a chaotic scene: thousands of people walking, getting on and off buses, descending to subways, riding in cars, and moving in and out of buildings. Where did all these people come from? And where are they going? Today, we can begin to comprehend the complexities of cities in motion and use those insights to help design the cities of the future. Through an IBM Research project called Insights in Motion, scientists have developed analytics software that provides anonymized information about the movement of massive numbers of people. This information can be used by city managers to plan new transit routes, improve the efficiency of current transit systems, and coordinate the various transportation modes in order to make moving around in cities more convenient and comfortable. This kind of rich data about the movement of people comes as a revelation to city planners and researchers alike. “It’s like a blind person for the first time opening 3 130 3
Imagining the Cognitive City
their eyes and seeing,” says Milind Naphade, leader of Insights in Motion.9 The Insights in Motion project is remarkable in two ways. It marks the first time that cities (including Dubuque, Iowa, and Istanbul, Turkey) have used massive amounts of data from cell-phone records to track the movement of citizens without capturing personal information, thus protecting privacy. It also illustrates the importance of developing technology in close collaboration with the people who will eventually use it. One of the key elements of innovation for the era of cognitive systems will be the willingness to use the real world, in all of its messiness and complexity, as a living laboratory for developing new technologies. Milind grew up in what he calls the “concrete jungle” of Pune, India. He loved the fact that even though it was a large city, he could get almost anywhere he wanted to go on foot. Later, he studied for a Ph.D. in electrical engineering at the University of Illinois at Urbana-Champaign. The focus of his dissertation was on pattern recognition in multimedia content, which, it turns out, has some similarities to mapping the movements of people in cities. After working at IBM Research for several years, he got interested in using data and advanced analytics tools to solve transportation problems. His boss suggested that he find a city that would be willing to let him work with real data. Milind chose Dubuque because IBM was opening a service-delivery center there and because the city’s mayor, Roy D. Buol, had laid out of vision of turning Dubuque into one of the most environmentally sustainable cities in the United States. 3 131 3
Imagining the Cognitive City
One of the mayor’s goals there was to increase the use of the public buses. Ridership had dropped off severely over the years, which made it difficult to maintain service levels. How could this be reversed? Milind and his colleagues proposed developing a system for improving public transit in a small city. IBM Research has a program, First Of A Kind, where researchers work closely with a partner to develop technology solutions that have never been tried before. The projects are jointly funded by IBM and the partner. Milind proposed Dubuque as a FOAK. He was asked to also choose another city, a larger one in an emerging market, and, ultimately, he picked Istanbul. So they had two living laboratories that are as different as you can imagine. Dubuque has 23,000 households and only one source of public transit, buses. Istanbul is a sprawling city with 14 million residents, millions of tourists annually, and a wide range of transit options—subway, light rail, express buses, traditional buses, minibuses, and ferries. Until now, city officials have based their knowledge of transportation activities on a wide variety of information, everything from ticket-sales data, to surveys of transit passengers, to actual counts of people on a particular bus or subway car at a particular time in a particular place. The problem is that these types of information only provide fragments of the bigger picture. A survey of people on a particular bus route, for instance, only tells you about the people who are riding the bus—not about people who are moving in the same direction at the same time via other means who might ride the bus under other circumstances. 3 132 3
Imagining the Cognitive City
To change the game for city managers and planners, the Insights in Motion technology draws on transit data, geospatial information, census records, points-of-interest information, and data from cell phones. By tracking the movements of thousands of people from place to place and correlating it with time and the speed of travel, the system can identify the mode of transportation people are using and figure out where they are traveling to and from, whether it’s home, work, school, or shopping. Using cognitive technologies, the system learns what their needs and expectations are. The team started in Dubuque. In addition to gathering mobile-phone location data from telecom carriers, they also recruited 1,000 volunteers who own smartphones. The highend handhelds are equipped with GPS and accelerometers, which make it possible to track the volunteers as they travel around the city with pinpoint accuracy. The team developed an application for the smartphones that would enable the volunteers to use them to navigate the transit system. Their task was challenging. They had to build a system that could, in a sense, “understand” what a city is, what people are, what people do, and how they move from place to place. They needed a digital model of a city that they could experiment with. So they built a framework of knowledge about each individual’s activities, including detecting meaningful locations, segmenting trips, calculating the duration of stays, identifying the purpose of each trip, plotting origins and destinations of each trip, tracking the time of day, and specifying the mode of transportation. They even estimated the carbon footprint 3 133 3
Imagining the Cognitive City
of each trip. Then they aggregated all of the information about individuals to generate citywide patterns. The team used the results to optimize the routes and schedules for the city’s bus system. The objective was to minimize the sum of operator costs, user costs, and unsatisfied demand costs for the entire network. Costs aren’t just expressed in dollars. For users, for instance, they include factors like waiting, walking, and driving a car. Dubuque is now trying the information out with two new bus routes. “Having the data is crucial. You don’t have to operate on tea leaves,” says Chandra Ravada, director of the transportation department for Iowa’s East Central Intergovernmental Association. “You know where people are going. You can change things based on that. You’re designing your system not on somebody’s opinion but based on facts.”10 The goal is to reduce operating expenses by 40 percent, meet 37 percent more demand, reduce average commuter time by 60 percent, and reduce per-traveler combustion emissions by 40 percent. Milind and his team took the lessons from in Dubuque and applied them in Istanbul. The task there was much more complex. There were so many more people and modes of transportation. Also, in Istanbul, they didn’t track volunteers with smartphones. They had to rely on less precise records from regular mobile phones. But the technologies they had used in Dubuque also worked in Istanbul. The city’s transit authority, Istanbul Transportation, is using the Insights in Motion tool to help design feeder bus routes connecting to the city’s new subway lines. 3 134 3
Imagining the Cognitive City
Having these two cities to experiment in was incredibly helpful for Milind and his team. When researchers don’t have a real-world situation to interact with, they can’t easily identify the constraints that products and services created using their technologies will encounter when they’re launched. The volunteers in Dubuque provided valuable feedback on the shortcomings of the smartphone application, which the team used to improve it. In Istanbul, Milind came to appreciate the importance of personal privacy and securing data. Because of privacy concerns among the city officials there, the IBMers worked on the data only when they were in Istanbul. The experience made them very thoughtful about designing security and privacy features into all of the analysis systems they build. At the highest level, Milind believes that Insights in Motion has tremendous potential to alter the relationships of cities and their citizens. “One thing I think about is how we have become slaves of the infrastructure rather than having the infrastructure work for us,” he says. “Cities should help people live their lives, not get in the way.”
3 135 3
Coda: An Alliance of Human and Machine
E
ver since Watson won at Jeopardy!, people have been asking the research scientists who designed the machine if they’d like to try to pass the so-called Turing test. That’s an exercise suggested by computing pioneer Alan Turing in his 1950 paper “Computing Machinery and Intelligence,” where he raised the question: “Can machines think?” He suggested that to test whether a machine can think, a human judge should have a written conversation via computer screen and keyboard with another human and a computer. If the judge couldn’t tell the human from the machine based on their responses, the machine would have passed the test.1 With this test, Turing set a standard for measuring the capabilities of machines that has not yet been met. While the IBM researchers are intrigued by the Turing test, they have no plans to prepare Watson to take it. A Turing test would merely show how good Watson is at imitating human beings and our quirks and social conventions. Instead, they want to concentrate on further developing
Coda: An Alliance of Human and Machine
the machine so it can become an expert, trusted advisor to humans, such as the oncologists at Memorial SloanKettering Cancer Center. They do not foresee a situation where machines would play the same cognitive role in the universe as humans do. Instead, they can be helpers. Machine cognition and human cognition are complementary. We have different strengths and weaknesses. The opportunity here is not to replicate human cognition but to use computers to help us reason over human-created data—our communications, our documents, our images, and our designs. Dr. Larry Norton at Memorial Sloan-Kettering agrees. While he embraces the expanded role of cognitive computing in health care, he believes that even as medical decision making becomes ever more scientific, it will also remain a creative activity. “I envision situations where myself, the patient, the computer, my nurse, and my graduate fellow are all in the examination room interacting with one another,” Larry says. “Arriving at a good medical decision is an act of artistic improvisation—just as it is with a great piece of music.” In his free time, he plays vibraphone in jazz groups. With every major advance in computing that IBM has produced over its more than one-hundred-year history, questions have been raised about the effect on human beings. In a memo to IBM managers on April 25, 1960, CEO Thomas Watson Jr. addressed head-on the issue of “thinking machines.” He wrote: “Computers will never rob man of his initiative or replace the need for his creative thinking. By freeing man from the more menial or repetitive 3 138 3
Coda: An Alliance of Human and Machine
forms of thinking, computers will actually increase the opportunities for the full use of human reason. Only human beings can think imaginatively and creatively in the fullest sense of these words.” Watson wrote those lines just a few years before IBM launched System/360, a revolutionary family of mainframe computers that transformed the computer industry. System/360 included many firsts. The machines had new kinds of processors, memory devices, and storage. It also marked the first time that software was offered that could run on a variety of computers, not just one, which ultimately gave rise to the independent software industry. But IBM’s executives and engineers had no idea at the time that the technology breakthroughs in the System/360 would create a path that would ultimately lead to the personal computer, the Internet, and mobile communications. And so it is today, at the dawn of the era of cognitive systems. We know that big shifts are coming. Yet it’s impossible to fully imagine the impact they will have on computing, business, and society. But this we do know: by working in concert, humans and cognitive systems have the potential to dramatically improve and accelerate outcomes that matter to us and to make life on earth more sustainable. This alliance of human and machine offers the promise of progress on a massive scale. Creating these systems will require breakthroughs and reinvention at every level of technology, from the way we build chips to what we think of as an application. That’s a lot of work to do. So let’s get to it.
3 139 3
Notes
1. A New Era of Computing
1. Dr. Herb Chase, Columbia University School of Medicine, IBM IBV report, “The Future of Connected Healthcare Devices,” March, 2011. 2. Dr. Larry Norton, Memorial Sloan-Kettering Cancer Center, interview, June 12, 2012. 3. Thomas Malone, Massachusetts Institute of Technology, interview, May 3, 2013. 4. J. C. R. Licklider, “Man-Computer Symbiosis,” IRE Transactions on Human Factors in Electronics 1 (March 1960): 4–11. 5. CenturyLink Business Inc., infographic, 2011, http://www .centurylink.com/business/artifacts/pdf/resources/big -data-defining-the-digital-deluge.pdf. 6. Ralph Gomory, former IBM Research director, interview, March 19, 2012. 7. Michel McCoy, Lawrence Livermore National Laboratory, interview, June 14, 2012.
1. A New Era of computing
2. Building Learning Systems
1. Stephen Baker, Final Jeopardy: The Story of Watson, the Computer that will Transform Our World (New York: Mariner, 2012). 2. Dario Gil, program director, IBM Research, interviews, October 10, 2011, and May 31, 2012. 3. Charles Lickel, former IBM Research manager, interview, February 24, 2013. 4. John McCarthy and Edward Feigenbaum, “Arthur Samuel: Pioneer in Machine Learning,” obituary, n.d., http:// infolab.stanford.edu/pub/voy/museum/samuel.html; Gerald Tesauro, “Temporal Difference Learning and TDGammon,” Communications of the ACM 38, no. 3 (March 1995): 58–68. 5. Murray Campbell, IBM Research scientist, interview, May 4, 2012. 6. See Marvin Minsky’s home page, http://web.media.mit .edu/~minsky/. 7. Paul Horn, NYC assistant provost for research, interview, November 28, 2012. 8. David Ferrucci, former IBM Research manager, interview, January 27, 2010. 9. Eric Brown, IBM Research manager, interview, December 10, 2012. 10. Brown, interview. 11. IBM and WellPoint, “IBM Watson Hard at Work,” press release, February 8, 2013. 12. Joyce Phillips, chief executive officer global wealth and private banking, ANZ Banking Group, “How Innovation Can Fuel Smarter, Faster Financial Advice,” A Smarter Planet blog, May 21, 2013, http://asmarterplanet.com/ blog/2013/05/advice.html. 13. David Nahamoo, IBM Research, interview, January 2, 2013.
3 142 3
3. Handling Big Data
14. Dr. Mark Kris, presentation at IBM Investors Day, May 9, 2012. 3. Handling Big Data
1. Permanent Subcommittee on Investigations, Wall Street and the Financial Crisis: Anatomy of a Financial Collapse, April 13, 2011, http://www.hsgac.senate.gov//imo/media/ doc/Financial_Crisis/FinancialCrisisReport.pdf. 2. International Data Corporation, “The Digital Universe in 2020: Big Data, Bigger Digital Shadows, Biggest Growth in the Far East,” sponsored by EMC, December 2012, http://www.emc.com/leadership/digital-universe/index .htm#2012; also see EMC, “New Digital Universe Study Reveals Big Data Gap,” press release, December 11, 2012, http://www.emc.com/about/news/press/2012/20121211-01 .htm. 3. International Data Corporation, “The Digital Universe in 2020”; EMC, “New Digital Universe Study.” 4. Jens Jelitto, IBM Research, interview, December 3, 2012. 5. Nagui Halim, IBM Research, interviews, May 7, 2012, April 15, 2010. 6. Ralph Gomory, former IBM Research director, interview, April 19, 2012. 7. Mark Squillante, IBM Research, interview, September 24, 2012. 8. Jeff Jonas, IBM software group, interview March, 9, 2012. 9. Sam Adams, IBM Research, interview, September 14, 2012. 10. Dario Gil, IBM Research, interview, December 14, 2012. 11. Marco de Vos, managing director, ASTRON, interview, October 16, 2012. 12. Square Kilometre Array, “Amazing Facts,” http://www .skatelescope.org/media-outreach/fun-stuff/facts-figures/.
3 143 3
4. Augmenting Our Senses
13. Albert-Jan Boonstra, manager, ASTRON, interview, October 16, 2012. 14. Ton Engbersen, IBM Research, interview, October 17, 2012. 4. Augmenting Our Senses
1. International Data Corporation, “The Digital Universe in 2020: Big Data, Bigger Digital Shadows, Biggest Growth in the Far East,” sponsored by EMC, December 2012, http://www.emc.com/leadership/digital-universe/index .htm#2012; also see EMC, “New Digital Universe Study Reveals Big Data Gap,” press release, December 11, 2012, http://www.emc.com/about/news/press/2012/20121211-01 .htm; YouTube, “Statistics,” http://www.youtube.com/t/ press_statistics. 2. John Smith, IBM Research, interview, January 7, 2013. 3. Dharmendra Modha et al., “Cognitive Computing,” Communications of the ACM 54, no. 8 (August 2011): 62–71. 4. Mark Ritter, IBM Research, interview, August 15, 2012. 5. Dharmendra Modha, IBM Research, interview, November, 2008. 6. Dharmendra Modha, IBM Research, interview, May 4, 2012. 7. Modha, interview, May 4, 2012. 8. Modha, interview, May 4, 2012. 9. Dharmendra Modha, IBM Research, interview, March 6, 2013; Bill Risk, IBM Research, interview, June 15, 2012. 10. IBM, “IBM Unveils Cognitive Computing Chips,” press release, August 18, 2012, http://www-03.ibm.com/press/ us/en/pressrelease/35251.wss. 11. Modha, interview, May 4, 2012. 12. Bill Risk, IBM Research, interview, June 10, 2013.
3 144 3
5. Designing Data-Centric Computers
5. Designing Data-centric Computers
1. Michael Frissora, IBM Research, interview, August 21, 2012. 2. Bijan Davari, IBM Research, interview, May 21, 2012. 3. Hybrid Memory Cube Consortium, “About Hybrid Memory Cube,” 2013, http://www.hybridmemorycube.org/ technology.html. 4. Douglas Joseph, IBM Research, interview, September 4, 2012. 5. Murali Ramanathan, SUNY Buffalo, interview, August 15, 2012. 6. Adam Ronthal, IBM, interview, May 2, 2012. 7. Ronald Ambrosio, IBM Research, interview, July 5, 2012. 8. David Turek, IBM, interview, June 13, 2012. 9. Jim Kahle, IBM Research, interview, September 12, 2012. 10. Radu Sion, “To Cloud or Not to: An Exploration of the Economics of Clouds and Cyber-security,” http://www.zurich .ibm.com/~cca/csc2011/talks/sion-invited-csc2011.pdf. 6. Inventing a New Physics of Computing
1. Gordon Moore, “Cramming More Components Onto Integrated Circuits,” Electronics 38, no. 8 (April 19, 1965). 2. Robert Dennard, IBM Research, interview, June 21, 2013. 3. Supratik Guha, IBM Research, interview, August 23, 2012. 4. Mark Ritter, IBM Research, interview, August 15, 2012. 5. Matthias Steffen, IBM Research, e-mail interview, September 5, 2012. 6. Matthias Steffen, IBM Research, interview, February 8, 2012. 7. L. Trossarelli, “The History of Nylon,” Club Alpino Italiano, Centro Studi Materiali e Tecniche, 2010, http://www .caimateriali.org/index.php?id=32.
3 145 3
6. Inventing a New Physics of COmputing
8. Michel McCoy, director, Simulation and Computing, Lawrence Livermore National Laboratory, interview, June 14, 2012. 9. Jed Pitera, IBM Research, interview, December 14, 2012. 10. McCoy, interview. 11. Andreas Heinrich, IBM Research, interview, August 2011. 12. Andreas Heinrich, IBM Research, interview, August 3, 2012. 13. Gerd Binnig, former IBM Research scientist, interview, November 17, 2010. 14. Don Eigler, former IBM Research scientist, interview, September 2009. 15. Andreas Heinrich, video interview, 2011. 16. Heinrich, interview, August 3, 2012. 7. Imagining the Cognitive City
1. Jonathan Mahler, Ladies and Gentlemen, the Bronx is Burning: 1977, Baseball, Politics, and the Battle for the Soul of a City (New York: Farrar, Straus and Giroux, 2005). 2. United Nations Department of Urban and Social Affairs, “World Urbanization Prospects: The 2007 Revision,” executive summary, February 2008, http://www.un.org/esa/ population/publications/wup2007/2007WUP_ExecSum_ web.pdf. 3. Geoffrey West, TED Talk, Edinburgh, Scotland, 2011, video, https://www.youtube.com/watch?v=XyCY6mjWOPc. 4. “Fighting Terrorism in New York City,” 60 Minutes, CBS television, September 25, 2011. 5. Paul Maglio, IBM Research, interview, July 6, 2012. 6. Arizona State University, “Study Maps Greenhouse Gas Emissions to Building, Street Level for U.S. Cities,” press release, October 9, 2012, https://asunews.asu .edu/20121009_Hestia.
3 146 3
Coda: An Alliance of Human and Machine
7. Guruduth Banavar, IBM, interview, October 5, 2012. 8. Colin Harrison, IBM, “Smarter Cities—NextGen,” PowerPoint presentation, presented at the Major Cities of Europe Conference, Vienna, Austria, June 5, 2012. 9. Milind Naphade, IBM Research, interview, September 27, 2012. 10. Chandra Ravada, Iowa East Central Intergovernmental Association, interview, January 7, 2013. Coda: An Alliance of Human and Machine
1. Alan Turing, “Computing Machinery and Intelligence,” Mind 59 (October 1950): 433–60; see http://www.loebner .net/Prizef/TuringArticle.html.
3 147 3