Big Data in Small Slices: Data Visualization for Communicators 2020034065, 9781138910911, 9781138910904, 9781315693118

This book offers an engaging and accessible introduction to data visualization for communicators, covering everything fr

218 96 67MB

English Pages [151] Year 2020

Table of contents :
Cover
Half Title
Endorsement
Title Page
Copyright Page
Table of contents
Introduction
Big Data
Book Structure
1 The Canary in the Coal Mine
Surfing Into Science
CO2 and pH
The Lives of Corals
Corals Are Animals
Broadcast Mating
Symbionts
Threats Beyond Losing Reef Habitat
Bleaching
Ocean Acidification
Why It Matters Economically
Economics—TEEB
Two Hundred Days on Heron
The Free Ocean CO2 Enrichment System (FOCE)
Predictions for 2100
FOCE Setup and Challenges
The Reef Gets a Physical
Cleaning, Wrangling, and Munging Datasets
Visual Exploration
Dimension Ranges
Carbon Dioxide (CO2)
pH Range
Aragonite Saturation (Ar)
Date and Time
Graphing for Patterns
CO2 Pattern
pH Pattern
Correlation
CO2 and pH Correlation
Aragonite and pH Correlation
Corals Fight Back
Your Reef on Acid
Data Visualization
Story Thoughts for Journalism Students
Notes
2 Sea of Butterflies
Some Damage is Done
The Ocean Acidification Study
Shell Quality
Data Exploration
Shell Quality Data Exploration
Exploring Visually
Chart Structures
Sinking Speeds
Data Cleaning
Data Visualization
Story Thoughts for Journalism Students
Notes
3 Parasites and Armed Rebels
Mac Otten
Malaria and its Vector Agent
The Mosquito as Vector
Rebel Eruption and Roaming Health-Care Facilities
The Project
Process
Trucks, Roads, and Rebels
The Data
Data Exploration
The Spreadsheets
Data Enhancement Using Code
Data Visualization
Medicines, Mosquito Nets, and Diagnostic Tests
Mosquito Nets
Project End
Story Thoughts for Journalism Students
4 Open City, Open Data
From Dublin to Silicon Valley
Interview
City Hall
Stereotypes Are Just Stereotypes
Public Data Acquisition
Open Data
Data Exploration and Cleaning
Data Visualization
Responsive Design
Trees, Trees, and More Trees
Cleaning Trees
Palo Alto Open Data Portal
Story Thoughts for Journalism Students
Notes
Index

Recommend Papers

Small Summaries for Big Data [1 ed.] 1108477445, 9781108477444

The massive volume of data generated in modern applications can overwhelm our ability to conveniently transmit, store, a

140 97 1MB Read more

Data Visualization Guide: Clear Guide to Data Science and Visualization

Have you ever wondered how you can work with large volumes of data sets? Do you ever think about how you can use these d

251 49 608KB Read more

Data Visualization in Society 9789048543137

Today we are witnessing an increased use of data visualization in society. Across domains such as work, education and th

120 14 6MB Read more

Big Data Management: Data Governance Principles for Big Data Analytics 9783110664065, 9783110662917

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers b

180 66 2MB Read more

Big Data Management: Data Governance Principles for Big Data Analytics 9783110662917, 9783110664065, 9783110664324, 9781547417957, 3110662914

Data analytics is core to business and decision making. The rapid increase in data volume, velocity and variety offers

185 110 2MB Read more

Matplotlib for Storytellers: Python Data Visualization

This book is written for frustrated and reluctant matplotlib users who care about crafting good data visuals. Matplotlib

107 97 10MB Read more

Bio-inspired Algorithms for Data Streaming and Visualization, Big Data Management, and Fog Computing [1st ed.] 9789811566943, 9789811566950

This book aims to provide some insights into recently developed bio-inspired algorithms within recent emerging trends of

365 46 5MB Read more

Data Visualization in Society 9463722904, 9789463722902

Today we are witnessing an increased use of data visualization in society. Across domains such as work, education and th

372 104 5MB Read more

Everyday Data Visualization (MEAP V04)

Radically improve the quality of the data visualizations you do every day by mastering core principles of color, typogra

187 59 17MB Read more

Data Visualization in Society 9789463722902, 9463722904

Today we are witnessing an increased use of data visualization in society. Across domains such as work, education and th

413 21 5MB Read more

Big Data in Small Slices: Data Visualization for Communicators
2020034065, 9781138910911, 9781138910904, 9781315693118

Author / Uploaded
Dianne M. Finch-Claydon

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Big Data in Small Slices: Data Visualization for Communicators

This book offers an engaging and accessible introduction to data visualization for communicators, covering everything from data collection and analysis to the creation of effective data visuals. Straying from the typical “how to visualize data” genre often written for technical audiences, Big Data in Small Slices offers those new to data gathering and visualization the opportunity to better understand data itself. Using the concept of the “data backstory,” each chapter features discussions with experts, from marine scientists to pediatricians and city government officials, who produce datasets in their daily work. The reader is guided through the process of designing effective visualizations based on their data, delving into how datasets are produced and vetted, and how to assess their weaknesses and strengths, ultimately offering readers the knowledge needed to produce their own effective data visuals. This book is an invaluable resource for anyone interested in data visualization and storytelling, from journalism and communications students to public relations professionals. A detailed accompanying website features additional material for readers, including links to all the original datasets used in the text, at www. bigdatainsmallslices.com Dianne M. Finch-Claydon conducts data analysis and visualization workshops around the world and consults to clients on visualization projects. Publications include a chapter on data in The Golden Age of Data edited by Don Grady (Routledge, 2020) as well as stories produced for public radio, Bloomberg, and other news outlets on science and finance.

“Dianne combines deep expertise in computer science with her years as a journalist to write a compelling book about using data to tell stories. Written in a journalistic style (i.e. interesting and fun), she walks readers through the opportunities and challenges of big data sets. The book and accompanying website encourage readers to dig into the data and get their hands dirty, and help them feel safe and confident to experiment.” — Karen Weintraub, Journalist at USA Today “We live in an era when vast amounts of data are at journalists’ fingertips, providing new opportunities for reporting and investigating. But making sense of that flood of data and communicating it to a general audience requires training. Dianne Finch’s book provides a detailed, yet easy-to-use, guide for data visualization, and will be an excellent resource for both journalism teachers and students.” — Sharon Weinberger, author of The Imagineers of War: The Untold Story of DARPA, the Pentagon Agency That Changed the World “This book will be an essential tool in training the future generation of communicators, including both journalists and scientists struggling to build their research programs in an impossibly vast sea of data.” — Tim Ford, Professor and Chair of Biomedical and Nutritional Sciences, UMass Lowell “Life isn’t easily categorized, and Dianne Finch’s work on data visualization makes that abundantly clear. In the chapter on malaria, Dianne compellingly shows how charts and maps can guide a journalist to ask informed questions. She goes beyond parasites, insect vectors, and drugs used to treat the disease to illustrate the issues that seem unrelated to health at first glance but dramatically impact the delivery of help desperately needed. This guide to best practices in data visualization will help you tame the ominous-sounding Big Data — in small slices.” — Jeff Porter, Missouri School of Journalism

Big Data in Small Slices: Data Visualization for Communicators

Dianne M. Finch-Claydon

First published 2021 by Routledge 52 Vanderbilt Avenue, New York, NY 10017 and by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN Routledge is an imprint of the Taylor & Francis Group, an informa business © 2021 Taylor & Francis The right of Dianne M. Finch-Claydon to be identified as author of this work has been asserted by her in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging-in-Publication Data Names: Finch-Claydon, Dianne M., author. Title: Big data in small slices : data visualization for communicators / Dianne M. Finch-Claydon. Description: London ; New York : Routledge, 2021. | Includes bibliographical references and index. Identifiers: LCCN 2020034065 | ISBN 9781138910911 (hardback) | ISBN 9781138910904 (paperback) | ISBN 9781315693118 (ebook) Subjects: LCSH: Visual communication. | Information visualization. Classification: LCC P93.5 F56 2021 | DDC 001.4/226–dc23 LC record available at https://lccn.loc.gov/2020034065 ISBN: 9781138910911 (hbk) ISBN: 9781138910904 (pbk) ISBN: 9781315693118 (ebk) Typeset in Univers by Newgen Publishing UK Access the companion website: www.bigdatainsmallslices.com

Contents

Foreword

ix

Introduction

1

1

Big Data Book Structure

1 2

The Canary in the Coal Mine

5

Surfing Into Science Carbon Dioxide (CO2) CO2 and pH The Lives of Corals Corals Are Animals Broadcast Mating Symbionts Threats Beyond Losing Reef Habitat Bleaching Ocean Acidification Why It Matters Economically Economics—TEEB Two Hundred Days on Heron The Free Ocean CO2 Enrichment System (FOCE) Predictions for 2100 FOCE Setup and Challenges The Reef Gets a Physical Cleaning, Wrangling, and Munging Datasets Visual Exploration Dimension Ranges Carbon Dioxide (CO2) pH Range Aragonite Saturation (Ar) Date and Time Graphing for Patterns CO2 Pattern pH Pattern

8 10 13 14 17 19 21 25 27 29 31 32 32 34 35 36 39 41 43 43 44 44 44 44 44 44 45

v h

j Contents

2

3

4

Correlation CO2 and pH Correlation Aragonite and Correlation Corals Fight Back Your Reef on Acid Data Visualization Story Thoughts for Journalism Students

46 47 47 48 54 56 59

Sea of Butterflies

61

Some Damage Is Done The Ocean Acidification Study Shell Quality Data Exploration Shell Quality Data Exploration Exploring Visually Chart Structures Sinking Speeds Data Cleaning Data Visualization Story Thoughts for Journalism Students

64 65 68 70 71 74 74 79 81 83 85

Parasites and Armed Rebels

87

Mac Otten Malaria and Its Vector Agent The Mosquito as Vector Rebel Eruption and Roaming Health-Care Facilities The Project Process Trucks, Roads, and Rebels The Data Data Exploration The Spreadsheets Data Enhancement Using Code Data Visualization Medicines, Mosquito Nets, and Diagnostic Tests Mosquito Nets Project End Story Thoughts for Journalism Students

87 90 92 93 95 97 99 102 103 104 106 109 110 112 114 115

Open City, Open Data

116

From Dublin to Silicon Valley Interview

116 120

vi h

Contents j

City Hall Stereotypes Are Just Stereotypes Public Data Acquisition Open Data Data Exploration and Cleaning Data Visualization Responsive Design Trees, Trees, and More Trees Cleaning Trees Palo Alto Open Data Portal Story Thoughts for Journalism Students Index

121 122 122 125 125 127 128 130 132 133 134 138

vii h

newgenprepdf

Foreword

For those of us old enough to remember data collection by sensors connected to chart recorders—and the hours spent with a ruler calculating data points—today’s age of big data seems overwhelming. Dianne Finch-Claydon does an outstanding job of “slicing big data.” The first message that comes clearly across is that there’s nothing special about “big data”—it is still data—but there’s just so much more of it with real-time, continuous data collection through sensor arrays capturing multiple parameters. As the author points out, the sheer volume of data has resulted in new sophisticated methodologies and algorithms to help us analyze the information—just as the field of bio-informatics has arisen to analyze molecular data. This information overload can be deadening—to both the scientist and non- scientist alike—but in Chapter 1 Finch-Claydon has artfully woven a fascinating picture of the intricacies of the coral reef environment to help us find meaning in just one slice—data generated on Heron Island’s fragile reef system. Through relatively simple methods of data visualization and interpretation, Finch-Claydon helps us understand how this not only informs the scientists about the chemical and climatological processes that are destroying these ecological wonders—critical marine communities for both ecosystem and human health—but also how to articulate this “slice” of big data to better inform the public and policy makers of the urgency to mitigate the destruction we are causing through human activity. Written in a highly engaging manner, Dianne Finch-Claydon has produced a beautiful educational book that links stunning photographs of endangered reefs with the data that we can use to both understand the processes that are happening and begin to mitigate the shocking consequences of our own development and industrialization. Linked to a number of web-based resources that explore data visualization and interpretation in greater depth, this book will be an essential tool in training the future generation of communicators, including both journalists and scientists struggling to build their research programs in an impossibly vast sea of data! Tim Ford Professor and Chair, Environmental Health Sciences Director, Institute for Global Health UMass Amherst From Sept 1: Professor and Chair Biomedical and Nutritional Sciences UMass Lowell ix h

newgenprepdf

Introduction

Big Data Book Structure

1 2

This book introduces data visualization to communicators—from journalists and writers to professors, teachers, scientists and public relations professionals; or anyone interested in communicating data to an audience in a clear and visually appealing way. Straying from the typical “how to visualize data” genre often written for technical audiences, this book offers those new to data collection and visualization the opportunity to better understand data itself, using a process I call the “data backstory”—in my view the backbone of any strong and effective data visualization. Datasets, after all, originate with goals in mind by the data creators. My hope is to emphasize the need for communicators to understand those goals. They should look for any bias, identify the party who funded the data collection, and look at the process behind that project. What is missing? What is the margin of error in a particular dataset? Are there correlative or causal relationships? Has the data been analyzed by an expert in the respective field? So we begin each chapter with a narrative of a professional and the passions behind his or her data-driven work—from data collection to analysis. We see data through the eyes of the experts who not only collect the data but also design the collection process. This is raw data at its origins. The four projects chosen for this book were of interest to me because of a common thread. That is, each has an impact on the human race or on the planet and its ecosystems—or both. I will admit that my deep concerns for the topics covered led to storytelling beyond the scope you generally find in a data tome. But since this book may be used in classrooms, I attempted to offer students and teachers information that could lead to some interesting projects—or stories for those who study journalism. The accompanying website provides datasets and Tableau samples for those who are hoping to jump in quickly. Still, you’ll need to read the chapter behind each dataset to understand it. BIG DATA As you likely know, the catchphrase “big data” is used widely but lacks a clear definition.

1 h

j Introduction

The U.S. National Institute of Standards and Technology defines it as data that “exceed(s) the capacity or capability of current or conventional methods and systems.” Large technology companies and industry researchers often refer to the “Three Vs” to define big data—for example, Gartner says: “Big data is high- volume, high-velocity, and/or high-variety information assets that demand cost- effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” Some add another “V” for “veracity.” I don’t think it makes sense, because veracity should be the goal of any given dataset, even though it’s not guaranteed. Just about every industry today creates “big data” in its own right— incorporating all three Vs. That includes the industries and fields covered in this book. In fact, talk to any marine scientist or global health expert and you’ll find that they don’t have enough time or resources—human or technological—to analyze all the data created in their own fields—or even in their own labs. No doubt that sensors represent the primary impetus driving the rapid big data growth. A topic for another book, but the wide use of sensors that live in our cities, under our oceans, in the engines of jets, cars, and trains, on top of mountains, on satellites, and on our own wrists and phones are driving the rapid growth of big data. As described in Chapter 1, sensors create some of the “highest resolution” datasets—high resolution meaning that there is more data per unit, such as per second, just as a high-resolution image contains more pixels per unit. Just imagine the data collected on your mobile phone: your movements from here to there, your phone calls, texts, videos, photos, exercise by time and strength, your blood sugar if needed, your heart rate, and so on. All said, in 2011 IBM estimated that 2.5 exabytes (or 2.5 billion gigabytes) of data was produced every day, and it would obviously have grown since then, though no one has exact numbers. Even more compelling is the fact that most of the world’s data was created over the past few decades. But most of that big data never sees the light of day. Less than 1 percent of data collected in the world is analyzed. There is just too much, though billions and billions of slices of the leviathan are certainly under analysis all the time. Thus, the title Big Data in Small Slices. Each chapter relates to one “slice” of the world’s big data, and those slices add to the larger field or industry covered—from marine science to global health and open government. But does it matter? All this data? I’ve included datasets that I believe will show that, yes, it matters—for better or worse. BOOK STRUCTURE In Chapters 1 and 2, we hear from marine scientists studying ocean acidification (OA) from different perspectives and locations. One shines light on the fragile future of coral reefs as oceans continue to absorb more and more CO2—causing

2 h

Introduction j

the seawater to become more and more acidic; the other brings to light a tinier- than-pea-sized sea animal that is already showing signs of damage from ocean acidification but provides essential nutrition to whales and many large fish—fish that we often find on our dinner plates, like salmon. If you wonder why there are two chapters and sets of data on ocean acidification, it’s because one was initially a “supplement” to the other, but the publisher and I felt that the “supplement”—the Bermudian project—deserved its own chapter. With each chapter, one or more datasets are introduced that relate to the topic and narrative. Some datasets, including the malaria chapter data, were too large for the average student’s notebook computer and so have been reduced in size. Though I didn’t modify data, I reduced it in several cases, so it cannot be used publicly to represent any of the projects described. All datasets and some of the Tableau samples are available on the book’s website (www.bigdatainsmallslices.com) for training purposes only. The structure of each chapter varies due to its content. For example, the ocean acidification chapter on the future of coral reefs presents datasets that require more exploration and analysis than the chapter on open government data. The malaria chapter introduces data-driven management and evaluation of a malaria eradication project, so the visualizations are meant for use by public health staff monitoring the successes and failures of an in-progress project, and show how data maps and charts drive next steps. That said, there are three main elements to each chapter: the narrative to introduce the data backstory, exploration of the data itself, a bit of data cleaning and munging (data restructuring) where needed, and finally the data visualization designs. In Chapter 1, I spend more time on the narrative because I find that when I speak to people about coral reefs, many don’t understand that they are living animals that provide sustenance to more than one-quarter of marine animals. Thus, I’ve added some information on those species that rely on reefs. Chapter 4, on open government data, is probably the easiest chapter in terms of jumping in quickly to learn to visualize. It spends less time on data analysis than the first three chapters, moving more quickly into the visualizations. That’s because the city of Palo Alto provided datasets that were clean and prepped for data visualizations, so very little wrangling (munging) was needed. Although this book targets anyone who might want to visualize data to tell a story, most visualizations are appropriate for communications students. However, a data-driven story told via strong narrative and effective data visualizations can improve communications between colleagues or for audiences in any field. If one reads Chapter 4, for example, one could learn how to visualize budgets over 5 years to one’s manager or CEO. In Chapter 3, an intern may visualize data for a manager who is monitoring the spread of a disease—or the eradication of another disease in the field of global health. That said, at the end of each chapter I offer a few story ideas for communications students in particular. In my experience of teaching journalism, I believe that students who studied data visualization had more job opportunities

3 h

j Introduction

after graduation—so learning about data from marine science, global health or government can’t hurt. As this book targets readers who are new to data collection and to data visualization, there is no coding or programming involved, with one small exception in Chapter 3. We use Tableau Public, a free visualization tool, to build the visualizations—a powerful tool used in many industries. While it does not require any coding skills, there are some features that allow the use of coding—such as the creation of new variables, and some decision making using “IF” and “CASE” statements. I’ve taught Tableau to students at Elon and Kent Universities, and have introduced it to classes as a visiting lecturer. It takes time and effort to learn, but with practice my students have produced some compelling visualizations. See the list of other tools for beginners on the website resource list including R, Google Charts, API and others. Throughout the book, this symbol is used to refer the reader to the website for resources on the topic referenced. While tool choice is somewhat important, most on the market are able to produce charts and maps, encode datapoints into color, shapes, angles, and geography. Some require programming knowledge, some do not. The principles outlined in this book are not dependent on any one tool or programming language. For practical purposes I think that everyone should learn as much as possible about HTML and CSS—mainly to understand how your visualization works inside a web page, but also to make any necessary changes. In Chapter 4 there is some discussion on how to “resize” visualizations to suit various devices—from smartphones to desktops. HTML plays a significant role in the process. I’ve introduced several types of charts and maps. In addition, dashboards— made up of groups of charts, text, photos or other components—are also presented in some chapters. Most of these visuals are commonly used in news and public relations because they are clean and simple—meant to communicate to general audiences. In general, bar charts are by far the most commonly used. They are clear and simple, and the human brain quickly assesses the data based on the bar height. Maps present numerical data by location, encoding datapoints into bubbles often sized for the numeric values, such as populations by country or crimes by city. In those cases, the map and bubbles serve as high-level or macro overviews, and one can “drill down” into each bubble to get to other charts or tables showing more granular or micro views, and other charts can be added to work interactively with those bubbles to provide more granular information. The book website provides downloads of the datasets and some of the visualizations shown in each chapter. Please use them to build the sample visualizations from scratch. There are some instructions, but you are expected to know Excel and to use the resource links for deeper learning of Tableau and data fundamentals. Remember, there is much to learn about data and visualization, and this book aims to be the first step by offering you a chance to get started.

4 h

newgenprepdf

1 The Canary in the Coal Mine

Surfing Into Science 8 Carbon Dioxide (CO2) 10 CO2 and pH 13 The Lives of Corals 14 Corals Are Animals 17 Broadcast Mating 19 Symbionts 21 Threats Beyond Losing Reef Habitat 25 Bleaching 27 Ocean Acidification 29 Why It Matters Economically 31 Economics—TEEB 32 Two Hundred Days on Heron 32 The Free Ocean CO2 Enrichment System (FOCE) 34 Predictions for 2100 35 FOCE Setup and Challenges 36 The Reef Gets a Physical 39 Cleaning, Wrangling, and Munging Datasets 41 Visual Exploration 43 Dimension Ranges 43 Carbon Dioxide (CO2) 44 pH Range 44 Aragonite Saturation (Ar) 44 Date and Time 44 Graphing for Patterns 44 CO2 Pattern 44 pH Pattern 45 Correlation 46 CO2 and pH Correlation 47 Aragonite and pH Correlation 47 Corals Fight Back 48 Your Reef on Acid 54 Data Visualization 56 Story Thoughts for Journalism Students 59

5 h

j The Canary in the Coal Mine

1.1 David Kline’s lab. Heron Island, Great Barrier Reef, Australia. Courtesy of: David Kline.

“I really don’t know why it is that all of us are so committed to the sea, except I think it is because in addition to the fact that the sea changes and the light changes, and ships change, it is because we all came from the sea. And it is an interesting biological fact that all of us have, in our veins the exact same percentage of salt in our blood that exists in the ocean, and, therefore, we have salt in our blood, in our sweat, in our tears. We are tied to the ocean. And when we go back to the sea, whether it is to sail or to watch it we are going back from whence we came.”

The above quote from a 1962 speech made by President John F. Kennedy during the America’s Cup in Newport, Rhode Island, speaks to me because not only do I feel at home when I’m by the sea, I know that we, as humans, share its chemistry, and therefore its origins. Unless you’re an avid diver, an adventurous nature lover, or a marine scientist, it’s not likely that you’d choose Heron Island for your next vacation. But David (Davey) Kline, a staff scientist at the Smithsonian Tropical Research Institute, became enamored with the island and the coral reefs off its coast when he was a young student. Two decades later he’d find himself back there—this time with a team of researchers. Over 200 days and three seasons the team collected data, seawater and coral samples, algae and other specimens to figure out how reef-building corals might react to future predicted acidification of the ocean (Figure 1.1). “In my research I’m focusing on warming and ocean acidification and other local and global stressors and what that means for the future of coral reefs. I’m

6 h

The Canary in the Coal Mine j

trying to figure out how corals and reefs are affected in 50–100 years and find ways to minimize impacts on reefs,” said Kline. Today, it’s expected that coral reefs will decline by 60 percent from coral bleaching and between 10 and 50 percent from ocean acidification (OA) by 2100. Add that to losses already incurred. “Coral reefs are really the canary in the coal mine for climate change and we have already lost 40 to 60 percent of reefs globally due to overfishing and climate change impacts,” said Kline. Heron Island, a 72 acre coral cay, sits off the east coast of Australia about 300 miles north of Brisbane, joining roughly 900 other islands dotted throughout the Great Barrier Reef. It is small, but packed with animal life on-and offshore. “You can walk around it in about 20 minutes. At any one time there are between 20 and 80 researchers on the station, and the only other thing on the island is the Heron Island Resort where there’s probably 20 to 50 staff. Those are the only humans on the island but there are hundreds of thousands of birds, so we’re greatly outnumbered by the wildlife,” Kline said. Kline and his team spent many sleepless nights during those months on the island—eyes wide open in their bunk beds. “So trying to sleep on this island, there’s this crazy symphony of what sounds like children crying,” Kline said. “They have this mating ritual where they moan all night long and they kind of sound like crying children.” Enter “muttons,”1 or wedge-tailed shearwaters (Figure 1.2). The seabirds earned their nickname when early visitors to Australian islands discovered that they tasted like mutton (meat from older sheep), for better or worse. If curious, search online for a video of the wedge-tailed shearwater to hear their calls. It’s worth your time. “The birds are around all year. Some parts of the year they are louder than others but you almost always hear the birds at night. Sometimes people who visit for the first time think they’re on the set of a horror movie. It’s really scary.”

1.2 Mutton bird: juvenile wedge-tailed shearwater. Photo by Benjamin Keene. Courtesy of: Wikipedia.org, CC 3.0.

7 h

j The Canary in the Coal Mine

And the raucous birds are accustomed to having the island to themselves. “They can fly thousands of miles, but they are terrible at landing and often slam into people and buildings,” Kline said. With up to 200,000 birds on Heron, including muttons, black noddy terns, herons, and others, it’s no wonder that the island once hosted guano miners. Heron is also home to greenback and leatherback turtles. In 1925, a turtle soup factory was built there—but fortunately for turtles, it didn’t last long after devastating the turtle populations. The humans left, and the turtle population rebounded. “The adults are there year-round, but at some times in the year the females come and lay their eggs on the beach. Then in the morning it looks like a volcano erupting and there’s all these little baby turtles that are like wind-up toys making their way down the beach,” Kline said. “You try to protect them from the birds trying to eat them, and there are sharks in the water also trying to eat them. And they magnetically cue into the sand where they were born, and that’s the only place they’ll come back to to lay their eggs when they are adults,” he added.

“It’s kind of a magical place in terms of the wildlife at Heron. You are in the world of the wildlife rather than the other way around.”

In 1943, Heron Island became a national park. Its research station, built in the early 1950s, is operated jointly by the University of Queensland and the Australian government.2 Later in the chapter, we’ll explore and visualize some of the high-resolution datasets from the 200 day “in situ” project designed and led by Davey Kline. Several new discoveries resulted from the in situ experiment, shedding light on how corals react to more acidic seawater in their natural setting. While it’s already known that more acidic oceans can slow the growth of coral skeletons—or dissolve them completely in extreme conditions—this field of research is relatively new. Coral reefs are on the road to extinction from bleaching and other stressors, including acidification, and scientists around the globe are scrambling to find ways to save at least some species as humans continue to burn fossil fuels, causing the oceans to become increasingly more acidic. That data—culled from that natural habitat of the coral reef itself—offers a glimmer of hope for at least one coral species on one reef, as discussed later in this chapter. SURFING INTO SCIENCE Raised in Los Angeles, Davey Kline frequented the California beaches, not realizing at the time that some of his beloved sea animals below those big blue waves were at risk.

8 h

The Canary in the Coal Mine j

1.3 Location of the Heron Island reef flat research site (black rectangle represents the research station; star represents the reef; arrow represents a channel cut). Courtesy of: David Kline.

Nor did he realize that he’d shape his career around saving them. “I spent every chance I could get in the ocean either surfing or snorkeling or sometimes fishing,” Kline said. “It was a short drive over a canyon to get to Malibu so most of my summers as a young child were spent in Malibu or Santa Barbara in the ocean.” Life was good—as long as he had the sea by his side. But at 18, he found himself landlocked. “I made the decision to go to a small liberal arts school in Minnesota called Carleton College. And one of the first things I realized, besides being shocked by how cold the winters were, was how much I missed the ocean,” Kline said. In 1996, he solved that problem by signing up for a joint art and marine science program at Carleton that took him back to the seas. Kline joined other marine biology majors and art students on a research adventure to New Zealand and Australia where they learned about Maori and Aboriginal cultures, spent time writing and drawing, and of course went snorkeling and diving. “We also got to go to Heron Island as an undergraduate and spend a couple of weeks there,” he said (see Figure 1.3). Kline’s memories of those college days on Heron remain vivid—the dives in particular. “We dropped down and it looked like a massive underwater city, all these corals with swarms of fish all around. Several turtles were coming nearby and checking us out. And there were big sharks. It was just one of the most spectacular things I had ever seen. These were massive corals that were probably 10 to 15 feet high, and 20 feet in circumference … and they would form arches, and there were big fish hiding underneath. These were massive colonies. Really spectacular reef formations,” he said. No surprise that he’s been researching corals and reef ecosystems ever since.

9 h

j The Canary in the Coal Mine

“Within a week of spending every day on a reef snorkeling and diving I realized what an incredible ecosystem coral reefs were and I was just fascinated by them,” he said. “And after doing some research I realized how many unanswered questions there were, and I decided then that that was the kind of area I wanted to pursue for my career.” Today, that list of “unanswered questions” he devised as a young student has greatly expanded. Seeking answers has led to new research involving cellular biology, data science, underwater autonomous imaging robots, sensors and satellites, and more. And one major question looms over all of Kline’s work, driving him to keep diving, digging, and analyzing data:

How will corals and reef ecosystems react to a higher CO2 future as oceans become increasingly warmer and more acidic? It turns out that this question is more complicated than it seems, and the Heron project added new methods to the field of ocean acidification research and how future predicted levels of CO2 will impact the world’s coral reefs. To address the question, one needs context. First, one needs a general understanding of the chemical balance required by corals to live and thrive. And second, it’s important to know how corals live, breed, eat, and provide nutrients and shelter to many thousands of sea species. Carbon Dioxide (CO2) Recent data for 2015 show that humans were adding about 41 billion tons of carbon dioxide to the atmosphere every year (Global Carbon Project3). That number had stabilized for a few years, but started increasing again in 2018. Concentrations of the greenhouse gas have reached 414 parts per million (ppm), up from about 280 ppm before the industrial revolution. In other words, if we count 1 million gas molecules in the atmosphere, 414 of them would be CO2. Those CO2 molecules absorb heat and release it over time, warming the earth. That has always been the case, but the more CO2, the warmer the planet. Of course, the average annual temperature over centuries must not be confused with the variations in daily temperatures. Keep in mind that we humans take in oxygen and breathe out CO2. Imagine if your lungs became compromised, and your body started accumulating excess CO2. What would happen? You may start breathing more rapidly attempting to inhale more oxygen and expel more CO2—or in severe cases where your CO2 level is very high, you could go into a seizure or even die. The graph in Figure 1.4 shows the changes in atmospheric CO2 in parts per million (ppm)4 since 1958. Oh, and funny thing about those CO2 molecules—they remain in glaciers for many centuries, frozen in time for scientists to cull and measure. It’s not within the scope of this chapter to look at how scientists use ancient corals, ice cores, tree rings, plant stomata, and other organic time clocks to 10 h

The Canary in the Coal Mine j

1.4 Keeling curve: yearly carbon dioxide concentration. Produced by Dianne M. Finch- Claydon. Data source: University of Southern California San Diego, Scripps.

1.5 CO2 in millions of tons by country. Produced by Dianne M. Finch-Claydon. Data source: Global Carbon Project. www. globalcarbonatlas. org/en/content/ welcome-carbon- atlas

estimate CO2 levels in the atmosphere and water but there are numerous studies showing CO2 levels from centuries ago5—and more are being generated as we speak, expanding that already big “big data” of climate change. China and the U.S.A. rank highest in global CO2 emissions, followed by India as shown in the map in Figure 1.5. In 2018, China emitted over 10 billion tons of CO2, while the U.S.A. generated over 5 billion. See how to create the Keeling Curve and the global CO2 map on the website, and how the Keeling Curve can be misleading depending on how the axes are defined. Scientists say that 30 to 40 percent of that human-generated carbon dioxide is absorbed by the oceans, the largest carbon sink on earth. The atmosphere holds on to about half, while trees and plants take up the remainder. 11 h

j The Canary in the Coal Mine

The ocean has always taken in CO2 from the atmosphere, a natural part of the “carbon cycle” we all learned in elementary science classes.6 But the ocean is taking on more than ever—since industrialization, that is.

Problem is, when CO2 meets with seawater, the pH drops, meaning that the ocean becomes more acidic. According to National Oceanic and Atmospheric Administration (NOAA) scientists, since industrialization the pH has dropped by about 30 percent7 because of the absorption of excess CO2 generated by humans.

And a more acidic ocean spells trouble for corals, mussels, some types of phytoplankton, clams, oysters, and other shell-or skeleton-builders—many essential to the food chain that many fish species rely on for nutrition, as do humans. “The CO2 gas forms a weak acid called carbonic acid, which lowers the pH of the ocean and makes it more difficult chemically for organisms to build calcium carbonate, which is the basis for skeletons and hard shells,” said Kline. Evidence has accumulated from ocean research worldwide that some sea animals are already affected negatively by ocean acidification. Mussels off the coast of Seattle, and pteropods in the waters in Antarctica and elsewhere, have been found with their shells partially dissolved by the acid. And many studies on corals and other animals have been conducted in labs around the world using aquariums injected with various levels of CO2, showing that shells either grow more slowly or dissolve in higher CO2 scenarios. Kline has studied the effects of acidification on corals for several years, but the project described later in this chapter was the first to produce “high- resolution” data in a natural reef environment—rather than in lab aquariums. Most studies to date were conducted using lab aquariums at universities and research organizations. Or, when studies were conducted in natural ocean environments, they were shorter term, thus producing less data. “I’d say moving these experiments into the field [ocean] is a critical step to provide additional data to verify which of the hypotheses we developed in the lab hold up when you test it in the field,” said Kline. “And also sometimes they [in situ field experiments] provide new insights, an important aspect of this. You need a variety of approaches. Each approach can give you good data on different aspects of the physiology or ecology, but only after integrating a variety of different approaches can you get the full picture you need to get a better understanding of ocean acidification or warming impacts on coral reefs.” Obviously, “big data” is getting bigger and more varied as marine scientists continue to study ocean environments. We are getting smarter and smarter, but the question is, what are we going to do with all of the data? Before we jump into some of those datasets produced by the project, we need what I call the “data backstory.” In other words, as experienced journalists know, we need to understand the context before we begin to understand the data.

12 h

The Canary in the Coal Mine j

For example, how do corals live and interact with other species and why are scientists so concerned about coral extinction? What other “stressors” are corals facing? Ocean acidification, for instance, is a global stressor affecting ocean life worldwide, but so is bleaching of corals—caused by increasingly warmer oceans. We need to think at both micro (one reef off Heron Island) and macro levels (world reefs including many species) to understand the answers to such questions. Keep in mind that the demise of coral reefs will impact nearly one-q uarter of the ocean’s fish species, as described below. That fact on its own provides some startling context—context that remains indelibly on the minds of scientists. CO2 and pH Just as a doctor tests your blood for CO2 and pH levels regularly, scientists are able to regularly measure both in the atmosphere and in the largest fluid on earth: the ocean. Like the ocean, our bodies don’t react well to excess CO2. Too much CO2 in humans leads to an illness called “respiratory acidosis,” which can be fatal. Remember, we continuously expel CO2 when we breathe out—that is, as long as we have working lungs. Plants take it in and use it in photosynthesis. Coral skeletons, akin to our bones, can dissolve when too much CO2 is absorbed by seawater because when CO2 is added the pH drops to more acidic levels, and those calcium minerals that must be absorbed become sparse. We are not the only animals on earth that need calcium, a CO2–pH balance, or other nutrients provided by ocean, soil, atmosphere, and plants. We do forget that humans have not been on the earth as long as many other animals, and that as we continue to add species to our growing endangered lists, we might one day be on that list ourselves if we continue to use the planet’s resourses as though they were infinite. Apologies for the digression! “So what we found with ocean acidification is that corals have a harder time building their skeleton, their growth rates decrease, and there are potentially big ecosystem level impacts as rates of calcification and shell growth decline,” said Kline. Calcification is the process corals use to absorb calcium-based minerals from the surrounding seawater to build their skeletons. Corals and other animals that rely on those minerals are referred to as “calcifiers.”

“The skeletons,” Kline said, “are often less dense when exposed to lower pH [more acidic] environments.” “They are not as structured as they are under a higher pH and so it’s almost like corals are getting osteoporosis. The skeletons are becoming weaker.” Today coral reefs face many other “stressors,” including bleaching events, coastal development, overfishing, and sedimentation. Together with ocean

13 h

j The Canary in the Coal Mine

acidification, these stressors create the perfect storm for coral decline or even extinction, as Kline described below:

“When there are a lot of stressors on the ecosystem, just like when there’s a lot of stressors on us, when we’ve been staying up and not getting enough sleep, drinking too much coffee, not eating well, eating unhealthy foods, then germs come around and you can get really sick. And then when you get sick your immune system goes down—and you can be affected by another disease or another stressor in your environment. Something that might have just given you the sniffles for a day can lay you out for a week.” And it’s the same on a coral reef. “So when there’s overfishing, and there’s a bunch of pollution, and then there’s sedimentation [coastal construction causing corals to be smothered in sediment], and then there’s this little bit of extra [ocean] warming, all of a sudden the reef can’t deal with it as well. You can have a massive bleaching event and some of the corals die,” he said. Kline went on to say:

“So if we minimize the stressors that are impacting the reefs, they’re more able to deal with the global stressors that are around all the time, just like when we minimize the stressors in our life—we’re more able to deal with any new stressors that come up.” As for global stressors, we’ll focus on ocean acidification, while coral bleaching caused by warmer seas is more widely known and discussed. But we’ll touch upon it in this chapter for context. As noted, local stressors include overfishing, coastal sedimentation, pollution, and tourism. Ocean acidification is relatively new in marine research and not as well studied as bleaching of corals. In this chapter, we’ll focus on Davey Kline’s 200 day acidification project and two of the many datasets it generated. But first we will take a look at hard corals; how they live, mate, build colonies, and provide sustenance to many sea species who rely on them. THE LIVES OF CORALS Coral reef—the term elicits images of tropical vacations, colorful stony sculptures in clear shallow water (Figure 1.6), and white sandy beaches giving way to deep azure seascapes dotted with small green islands. For divers, coral reefs represent paradise—as Kline expressed in his description of his early dives in the Great Barrier Reef.

14 h

The Canary in the Coal Mine j

1.6 Orange-lined trigger fish. Courtesy of: Wikimedia. https://commons. wikimedia. org/w/index. php?curid=698177

As a recreational diver, I understand those sentiments. Every time I dive I find myself wondering why I didn’t choose a career in ocean research. Dropping backwards off a boat into a watery abyss, for me, begins with a slight panic. But once I feel oxygen coming from the regulator into my lungs, the panic gives way to anticipation. It’s a life-affirming experience to watch my exhaled CO2 entering the ocean in the form of silver bubbles. I watch them line up as they slowly make their way to the ocean surface—growing in size as they rise, just as my lungs will when I return to the surface. That experience calms me, and prepares me for the deeper dive. With fins pushing against currents, I glide toward that brilliantly colored and comical place where sea creatures of many sizes and shapes hover and hide and chase and feed and dart back and forth and up and down—an underwater menagerie. They all gather around the coral reef, and that’s why you are there. Sometimes they appear to circle around the reef over and over until they spot something of interest—just as divers do. And there is always something of interest, something new and different.

“Some people like to call them [coral reefs] the rain forests of the sea, but in fact you’d want to call rain forests the coral reefs of the land because coral reefs are even more biodiverse than rain forests in many respects,” said Kline.

15 h

j The Canary in the Coal Mine

1.7 Porcupine fish. Courtesy of Springcold at da.wikipedia, CC 2.5. https:// creativecommons. org/licenses/by-sa/ 2.5

1.8 Parrot fish. Photo by Adona9. Courtesy of: Wikimedia, CC 3.0. https:// commons. wikimedia. org/w/index. php?curid=5992414

When a marine scientist mentions “biodiversity” to someone who has never experienced it in a natural environment, I wonder if the term resonates. Coral reefs attract a vibrant cast of characters from microbial to gigantean species. I believe that if everyone had the chance to hover and glide around a coral reef, then maybe they would join the fight for cleaner energy. I often giggle when I dive. It’s impossible not to when you see tiny spotted gobies with big wide frowns popping out of holes on the ocean floor, then back down again like a game of Whack-a-Mole, then back up to find out if your giant face is still watching. Or the porcupine fish (Figure 1.7) that puffs up into a large ball several times its natural size so that only very large-mouthed predators can eat them. Then there’s the long sliver needle fish that moves elegantly past you in a straight line, like an arrow, aloof and uninterested. I revel in the striped and speckled, the frowning, pouting, smiling and sharp- toothed, the transparent and luminescent, the bullies and the wall flowers hiding in plain sight, shyly keeping an eye on you. The parrot fish (Figure 1.8) exemplifies the riot of color that awaits you. Orwell could have written another novel based on coral reef life, like Animal Farm but under the sea. I suppose the sharks would be at the top of the social hierarchy, and slow-swimming seahorses would be pushed around by big fish.

16 h

The Canary in the Coal Mine j

More seriously, those reefs provide shelter and sustenance to many thousands of species along the food chain—from the tiny plankton to massive whale sharks. In fact, reefs offer nutrients, shelter, mating and birthing zones, protection from predators and hiding places to about one-quarter of all marine life.8 One-quarter of all marine life—a staggering responsibility! Scientists have documented more than 4,000 fish species—from large sharks to thumb-sized gobies—that rely on coral reefs. And they’re still counting. Coral colonies are known for attracting fish and invertebrates,9 but like humans, corals live symbiotically with microbes—an essential part of the reef community. “When you think about a coral colony, there are hundreds to thousands of organisms living within the colonies,” Kline said. “Symbiotic algae that allow coral to photosynthesize—you have a diverse community of bacteria, archaea and fungi, etc.—all of the different domains of life are found within one coral polyp.” It takes a biodiverse village to sustain a reef community. “So there’s a whole micro-ecosystem going on in each coral polyp—but then when you think about the coral colony, building a reef and making structure within that structure, there’s thousands of species of invertebrates and thousands of species of fish that use corals as a nursery,” he said. “And when there’s baby fish they hide within corals so that predators can’t eat them.” A hard coral, also known as “stony coral,” is made up of individual living polyps, as Kline described, that grow on hard surfaces on the ocean floor—such as rock or limestone. Those stony corals build reefs and form communities. They require clear and relatively shallow warm water to survive. Soft corals, such as sea whips or blue sponges, often join the stony coral reef communities, but many soft corals can withstand deeper, darker, and colder waters. This chapter focuses on the stony variety—the reef builders. Corals Are Animals Stony corals may look like lifeless sculptures (Figure 1.9) to those who have never really learned about them. Not long ago I was on a beach in the Caribbean and heard a woman complaining about how the sharp “rocks”, which were corals, were in her way, and that she was tired of stepping on them. Next time you snorkel by a coral reef, watch closely. You might see a community of polyps waving their tentacles—hoping to catch something edible floating by. Visit a year later, and those corals may have grown an inch or two. Corals are indeed animals. Their relatives include jelly fish and sea anemones, and belong to “Cnidaria” phylum and “Anthozoa” class. Individual coral polyps hunt and capture food—zooplankton being a favorite dish. They use those tentacles to detect predators and edibles. Many species also use stingers, or “nematocysts” to disable or kill prey.

17 h

j The Canary in the Coal Mine

1.9 Flynn Reef outcrop. Great Barrier Reef near Cairns, Queensland, Australia. Photograph by Toby Hudson. Courtesy of: Wikipedia. CC 3.0.

Those tentacles, once they capture prey, push it down through their only orifice, the “mouth” that sits at the top of the polyp encircled by tentacles, as shown in Figure 1.10. Once inside, they digest their dinner in a gastrovascular cavity, or “stomach.” Leftovers are excreted back up through the mouth—the only way out. Then, when dinner is over, those tentacles clean around the mouth opening. No napkins necessary! Corals mate, give birth, establish colonies, and form symbiotic relationships with many thousands of sea creatures. One reef can establish many thousands of coral polyps, and can grow to weigh several tons.

1.10 Star coral. Note the corals’ tentacles are extended as they hope to catch a snack floating by. Courtesy of: Wikimedia. Public domain. https://commons. wikimedia.org/wiki/ File:Montastrea_ cavernosa. jpg#/media/ File:Montastrea_ cavernosa.jpg

18 h

The Canary in the Coal Mine j

1.11 Star coral colony: Montastraea cavernosa GS or great star coral. Wide view shows colony of polyps. Courtesy of: Wikipedia. Public domain.

In fact, coral reefs inhabit about 2 percent of the entire ocean floor. To put that in perspective, the ocean itself accounts for about 70 percent of the earth’s surface. “They are the largest living structures on earth,” said Kline. “Skeletons of corals form massive reef structures so big that they are visible from space.” To date, scientists have identified about 800 different species of corals and say that many millions of species are yet to be identified.10 After all, about 95 percent of the ocean is still unexplored.11 And they’ve been on this earth for millions of years—and some of the coral reefs live much longer than humans. “There are hundreds of thousands of individuals in a coral colony,” said Kline. “And there are massive coral colonies that were around when Christopher Columbus came to the United States—there are really old corals out there!” Figure 1.11 shows a colony of star coral. Broadcast Mating One of the most interesting things about corals is how they make babies. Most hard coral communities appear to synchronize their reproductive activities—usually once a year in spring, depending on the location and species. All at once, on a specific day and time, males “broadcast” sperm into the surrounding seawater. Females release eggs at about the same time. Hermaphrodites, which are both male and female, release fully fertilized packets. See Figures 1.12, 1.13, and 1.14. Search the web for “coral broadcast spawning” and you’ll find several videos of these events.12 Corals generally time these events following a full moon and a sunset, and scientists say that pheromones are likely emitted to alert nearby corals, those that aren’t paying attention to the lunar cycles maybe. Just like some humans, they need a nudge to escape inertia.

19 h

j The Canary in the Coal Mine

1.12 Coral snowstorm: Dan Basta observes coral spawn in the water column. Photo by G.P. Schmahl. Courtesy of: Flower Garden Banks National Marine Sanctuary and G.P. Schmahl.

1.13 Gamete release by symmetrical brain coral (Pseudodiploria strigosa) during spawning. Courtesy of: Flower Garden Banks National Marine Sanctuary.

1.14 Star coral spawn: polyps of boulder star coral (Orbicella franksi) releasing egg and sperm bundles during spawning. Photo by G.P. Schmahl. Courtesy of: Flower Garden Banks National Marine Sanctuary and G.P. Schmahl.

20 h

The Canary in the Coal Mine j

And of course scientists have been collecting data on the timing of the broadcasts shown in Figures 1.12 and 1.13, big data on coral spawning! Coral babies, called “planula,” generally live for a few days near the ocean surface to avoid predators before dropping back down to the ocean floor—where they attach to hard surfaces to join a colony or start a new one. Some planulae move with ocean currents seeking new horizons. If not eaten along the way they may travel up to a hundred miles before finding a new home on some rock or limestone surface. Hard coral polyps, when they die, become part of the limestone surfaces that young planula use to take root and grow—graveyards of ancestral skeletons providing structure for new life. Symbionts Once planted, if they are to survive, an important tenant must move into the polyp quickly—a type of algae called “zooxanthella.” Zooxanthellae are the most important symbiont hosted by hard coral polyps (though there are many fascinating types of symbionts). Figure 1.15 shows why. Without zooxanthellae, corals bleach. “Coral reef systems attract thousands of species for protection, feeding, breeding, and other purposes. Symbionts include seahorses, shrimp, wrasse, and many other types of fish that interact with coral in one way or another. At least one symbiont benefits from the relationship,” said Kline. These relationships can be complicated. Zooxanthellae and corals both benefit as “mutualistic” symbionts, whereas seahorses enjoy a “commensal” relationship in that they benefit from coral, as described later, but don’t appear to give back. Sadly, invasive species are eating important symbionts known to maintain the zooxanthellae on corals—sort of like landscapers removing weeds. For example, parrot fish graze on coral polyps. They remove excess algae and help maintain reef equilibrium, and divers love to see the colorful, toothy parrots that seem to smile as they float by. See Figure 1.8.

1.15 Healthy coral hosting zooxanthellae (left); bleached coral after losing its zooxanthellae (right). Courtesy of: National Ocean and Atmospheric Administration. Public domain.

21 h

j The Canary in the Coal Mine

1.16 Lionfish, an invasive species from Asia found in the Caribbean. Photo by Dianne M. Finch-Claydon.

But the colorful parrots have become a favorite food for lionfish—shown in Figure 1.16—an invasive species from the Indo-Pacific that has discovered the diverse and plentiful buffets of coral reefs in the Caribbean, Florida, Mexico and other areas. They are beautiful, but looks can be deceiving! A parasite to reefs, lionfish eat just about any species that floats by, giving nothing back—sort of like humans. And like humans, they have no predators in these new locations. Last time I was diving in the Caribbean, our lead divers were killing every lionfish we encountered. We all reported sightings—and there were many. Sadly, some Bahamian reefs have lost 65 percent of the fish species that traditionally lived in harmony with the reefs there13 after becoming prey to the beautiful but pernicious lionfish. As the photo shown earlier in Figure 1.15 illustrates, corals lose their color when zooxanthellae are expelled. They bleach and often die.

1.17 A before and after image of coral bleaching at Lizard Island on the Great Barrier Reef in March/May 2016. © XL Catlin Seaview Survey. Courtesy of: David Kline.

22 h

The Canary in the Coal Mine j

1.18 Another view of the coral bleaching at Lizard Island. © XL Catlin Seaview Survey. Courtesy of: David Kline.

The most brilliantly colored corals get their color from microbes, including zooxanthellae algae. The algae use carbon dioxide from the coral’s stomach lining to perform photosynthesis, which produces oxygen, sugar, and fats that corals use to survive and grow. The zooxanthellae also act as a sunscreen to corals, preventing them from bleaching. When sea water becomes too warm, those algae are expelled from the polyp, causing the corals to bleach (see Figures 1.17 and 1.18). Unless the seawater cools down quickly, the corals die. One of the most fascinating coral symbionts, in my view, is the seahorse (see Figures 1.19, 1.20, 1.21, and 1.22). The first time I saw a seahorse in its natural environment—during a night dive in St. Croix—I was stunned to see how slowly it moved, the sloth of the 1.19 A seahorse camouflaged by coral. Photo by Haley Hendrickson during a night dive in St. Croix with the author.

23 h

j The Canary in the Coal Mine

1.20 Seahorses hide by mimicking the coral’s color. Photo by Haley Hendrickson during a night dive in St. Croix with the author.

1.21 A seahorse wraps its tail around coral and its color begins to mimic the coral. Photo by Haley Hendrickson during a night dive in St. Croix with the author.

sea, yet elegant. It appeared to be so vulnerable as it floated slowly by me in the midst of rapidly moving fish of all sizes. Seahorses give birth in reefs or seagrass where they are able to hide themselves and their offspring from predators. They mooch tiny shrimp off the backs of corals (see Figure 1.23), yet provide nothing in return. This is referred to as a “commensal” symbiotic relationship. Corals also protect seahorses from predators—but only because seahorses practice “mimicry” by changing color to match the corals, as shown in Figures 1.20 and 1.21. These fairytale creatures are in fact the slowest swimmers of all fish species, so hiding is essential to survival.

24 h

The Canary in the Coal Mine j

1.22 Corals attached to a pier footing provide cover to seahorses. Photo by Haley Hendrickson during a night dive in St. Croix with the author.

Fortunately, for additional protection from predators, the long-snouted fish can see in two directions simultaneously—using one eye to search for floating food and the other to monitor for danger. Taking advantage of coral branches, they wrap their long tails around them to anchor themselves, blending in and poised to vacuum up unsuspecting plankton with their long snouts. Seahorses generally mate for life, or at least remain monogamous during mating and birthing. But the most astonishing thing about these colorful little fish is that the males, not the females, go through pregnancy and give birth. The male releases sperm into the seawater, and the female deposits her eggs into the male’s pouch—a womb-like sac that sits on top of his stomach. After the female delivers the eggs to the male, the sperm, still swimming in the sea water, make a beeline for the pouch to fertilize the eggs before the gate closes. Who knows how the sperm are alerted that the pouch is about to seal? Within a month, or sometimes within a few days depending on the species, the male seahorse gives birth to up to 2,000 babies. Once born, parenting stops. Those babies are on their own. If you are not inspired yet by these little creatures, try an online search for “seahorse mating dance” and watch one of the videos of a ritual “dance” that seahorse couples engage in every morning—sometimes referred to as a “morning greeting.” The ritual is so astonishingly eloquent that one can imagine it choreographed to a romantic concerto like Frederic Chopin’s Romanza Larghetto. Threats Beyond Losing Reef Habitat At least eight species of seahorses are threatened today from various stressors— from the loss of coral reefs to coastal pollution and trawl nets used in fishing.14

25 h

j The Canary in the Coal Mine

1.23 Tiny shrimp hides in corals to avoid predators, including seahorses. Photo by Haley Hendrickson during a night dive in St. Croix with the author.

They are also caught for use in traditional Chinese medicine—the most colorful believed to be the most effective, and the most profitable. While many fish species are moving north to cooler areas as a result of climate change, seahorses won’t likely survive such a strategy. The slow swimmers generally don’t stray far from their customary habitats. A study out of the University of Lisbon, “Seahorses under a changing ocean: the impact of warming and acidification on the behavior and physiology of a poor-swimming bony-armoured fish15” found that when seahorses are exposed

26 h

The Canary in the Coal Mine j

to environments that are both warmer and more acidic, they become lazier, eat less, and spend more time resting. Besides, they need corals or seagrass to survive, and such environments aren’t as common in the north. So, scientists concluded that the combination of ocean acidification and warmer seawater may add additional stress to the already threatened species. BLEACHING As mentioned earlier, bleaching of corals has been widely discussed for decades, while ocean acidification is a more recent issue. Both are global stressors to corals in that they occur in oceans worldwide—and local reef managers can’t manage these events, while pollution, sedimentation, and overfishing can be addressed locally. Before digging into ocean acidification and its impact on corals, we’ll review bleaching (seen starkly in F igures 1.24 and 1.25), since when they occur together, the likelihood of corals surviving is quite low. Both negatively impact coral growth, but in different ways. When the mercury rises to atypical levels, corals bleach. As discussed above, zooxanthellae live symbiotically with corals, providing nutrients, and also protection from the sun. But when the temperature increases too much, the algae leave. “And when the temperatures get too high the relationship between the corals and the algae that live within the coral tissue that allow them to build their skeletons, that relationship breaks down,” said Kline. “And so these corals that are normally tan brown, and even brightly colored, they lose the color that comes from the pigments that algae use to

1.24 American Samoa: healthy reef (left) and dying reef (right). © XL Catlin Seaview Survey. Courtesy of: David Kline.

27 h

j The Canary in the Coal Mine

1.25 The same reef in American Samoa before, during, and after coral bleaching. © XL Catlin Seaview Survey. Courtesy of: David Kline.

photosynthesize and make sugar, basically giving corals the energy to build their skeletons. They lose these algae, they lose this color, and they bleach.” Certainly this is not what most people think when they see beautifully colored corals—who knew that algae are responsible for the bright colors, though many healthy corals are brown? The colors depend on the type of algae. Notice that Kline mentioned that corals require sugar from those algae to build skeletons. This, combined with a drop in pH caused by acidification, makes it more difficult to predict the pace of skeletal breakdown. Two years after Kline graduated from Carleton, in 1998, coral reefs went through the worst bleaching event on record at the time—affecting corals in more countries than previous events in 1987 and 1990. Before 1980, coral bleaching was usually limited to smaller local areas, was less severe, and corals would generally recover according to David Kline and historical evidence.16 Bleaching events today are global in scale and more devastating to reefs because of increasingly warmer seawater temperatures caused by human- generated CO2 as a result of burning fossil fuels, manufacturing cement, deforestation, and other activities. The Smithsonian Ocean Portal website says that while many reefs have died off completely, others are “a pale shadow of what they once were.17” In 2016, another global bleaching event damaged more than 90 percent of the Great Barrier Reef–representing the most devastating worldwide bleaching event18 to date. It affected corals worldwide, not only in the Great Barrier Reef. “NOAA came out with data showing that on Jarvis Island—one of the most isolated reefs and a U.S. protectorate—the warm water sat over it for so long that pretty much all the corals bleached. They just went back and verified there was 95 percent coral mortality,” Kline said. Jarvis, a coral island in the South Pacific between Hawaii and the Cook Islands, is an uninhabited territory of the United States designated as a wildlife refuge. According to Kline, the widespread mortality was a direct result of warming—as there were no other stressors present.

28 h

The Canary in the Coal Mine j

“There—in a place that has no human population, so no fishing, no nutrients, or [other] runoff—it was such a bad warming that basically the whole reef died,” he added. Kline said warm “blobs” of water can stay in one area too long, which sadly happened in the northern part of the Great Barrier Reef in 2016. “In Jarvis it [the ‘blob’ of warm water] sat there for about 34 weeks. It just cooked the corals basically. And the same thing happened on the Great Barrier Reef. And the interesting thing is there is the northern barrier reef, which is more isolated and incredible, and it had 35 to 50 percent [coral] mortality, and the southern reef had less than 5 percent mortality. But that’s because a cyclone or the remnants of the cyclone passed over and moved that warm water out, and they were able to recover,” said Kline. “That often happens. It often takes a tropical storm or a cyclone to move that warm water out,” he added in a voice lacking his characteristic enthusiasm. Scientists say that the warming trends over the past five decades have actually slowed the natural process of “ocean mixing,” where water from the deep ocean mixes with surface water, replenishing it with cooler water and nutrients required by plankton and other tiny critters at the bottom of the food chain. At the time of this writing, scientists were still studying the long-term effects of the 2016 bleaching event, which started in Hawaii. According to NOAA, bleaching events between 2014 and 2016 covered more of the world’s reefs than ever before, and lasted longer than any previous events. In 2016, bleaching hit 51 percent of reefs globally. The event was the first to bleach the Great Barrier Reef. At its northern-most points—killing off about 30 percent of those that normally thrive in shallow water. The event also killed off half of the corals in the Seychelles, and hit the western reefs in the Indian Ocean. The third global bleaching event, from 2014 to 2017, brought mass bleaching- level heat stress to more than 75 percent of global reefs; nearly 30 percent also suffered mortality level stress. This bleaching event was the longest, most widespread, and most destructive on record.19 While it’s relatively easy to see bleaching impacts on coral reefs—they turn white after losing their algae—ocean acidification is a more complicated stressor, and studying the impact involves sophisticated chemistry, biology, and long-term observations. OCEAN ACIDIFICATION Today, approximately 26 million tons of carbon dioxide is absorbed by the ocean on a daily basis.20 “As CO2 increases, more is absorbed in the ocean. The pH goes down. We call this ocean acidification.” Kline said. The image in Figure 1.26 describes the chemistry behind ocean acidification and shows what happens to shells as the CO2 rises. The bottom line is that when CO2 enters the seawater, a chemical reaction takes place that creates more acidic conditions (a lower pH), which reduces the amount of mineral, aragonite, that corals need to grow. Aragonite, a type of calcium, is particularly vulnerable to dissolving in acid. Without it, many species’ shells or skeletons dissolve. 29 h

j The Canary in the Coal Mine

1.26 Ocean acidification carbon chemistry. Notice the dissolution of the sea snail shell in more acidic conditions. Courtesy of: National Ocean and Atmospheric Administration, Pacific Marine Environment Laboratory (PMEL).

But this is not the first time that our ocean has become more acidic. Let’s look at ocean history for a moment. About 250 million years ago, most marine life was extinguished, including some forms of coral reef. Trilobites21 and other sea animals disappeared completely during that extinction. This period is referred to as the “Permian Crisis,” and widespread massive volcanic activity is believed to have caused the ocean to acidify, which caused the extinctions. But human-caused acidification started in the late 1800s with industrialization and use of fossil fuels. In less than 200 years the ocean has become 30 percent more acidic.22 “We are seeing a CO2 rise in about 200 years that is equivalent to a CO2 rise that occurred over hundreds of thousands of years,” said Kline. “There will be greater impacts. There is no time to adapt as it’s happening so quickly. With long-living organisms like corals, you don’t get enough generations for evolution to adapt and deal with high CO2 stress.” “Though some of the shorter lived organisms might adapt, it’s the rate of change that is the worrying part of the CO2 rise,” Kline added.

We are in fact rapidly changing the chemistry of the ocean—a chemical balance that marine animals from the bottom to the top of the food chain have relied on for millions of years. And we rely on that food chain. Remember the pH scale in high school chemistry? Notice in Figure 1.27 below that the pH scale runs from a high of 14 (most alkaline) to a low of 0 (most acidic). Pure distilled water is neutral at 7. The ocean is currently at 8.1 on average but, as you’ll see in Kline’s data, it can be lower or higher in coastal areas, where reefs are located.

30 h

The Canary in the Coal Mine j

1.27 pH scale: as shown, 14 represents maximum alkalinity, while 0 is most acidic and 7 is neutral (like distilled water). Chart produced by Dianne M. Finch-Claydon.

On a logarithmic scale, the 0.1 change from 8.2 to 8.1 represents a 30 percent change—so the ocean is 30 percent more acidic than it was in the 1800s. From a non-scientific point of view, I find it interesting that the pH of human blood (average of 7.4) and seawater (8.1) are relatively close on the scale, given our ancestry! As John F. Kennedy said:

“We are tied to the ocean. And when we go back to the sea—whether it is to sail or to watch it—we are going back from whence we came.” Amen, I say. The ocean, for tens of millions of years, has maintained a pH of 8.2, which is slightly “basic.” Just as humans rely on specific pH levels for healthy blood and for various organs to function, ocean life has evolved to rely on a pH of 8.2 over. Yet, since industrialization, human activities have changed the ocean’s pH by nearly 30 percent—from 8.2 to 8.1 on the pH scale. That is solid proof that the ocean is becoming more acidic. If you want to test this fundamental, pour a glass of water. Test the pH. Then, using a straw, blow bubbles into that water. Test the pH again. The CO2 that you exhale will drop the pH in that water to a more acidic level. This is fundamental in chemistry. So unless we recover from our addiction to fossil fuel, the planet will continue to react and adjust, and chemistry doesn’t care whether extinctions occur. Chemistry just happens. Why It Matters Economically Beyond the magic of seahorses, corals, and that menagerie of fish species mentioned earlier, coral reefs are obviously essential to the economies of coastal communities. Reefs protect coastlines, including mangrove forests, by limiting wave impact on shorelines. Mangroves are often cut down for coastal development, causing sedimentation that covers reefs—and smothers them. Those algae that provide nutrients to corals die when covered in sediment. Humans are obviously at the top of the food chain, and theoretically could be viewed as coral reef symbionts—of the parasitic kind.

31 h

j The Canary in the Coal Mine

We benefit by catching and eating fish supported by coral reef ecosystems. But it’s certainly not a mutually beneficial symbiotic relationship. “They [coral reefs] are the home and the nursery for many of the commercial species of fish and fisheries that we depend on. So if we lose coral reefs we’ll likely lose a very important source of food,” Kline said. “And besides having all this diversity, they provide billions of dollars from tourism and fisheries. There are millions of coastal communities that are dependent on coral reefs for their livelihoods,” Kline said. In fact, as many as 900 million people live within 100 kilometers of coastal reef—and at least a quarter of them rely on the reef for their livelihoods. In addition, reef communities are goldmines to scientists seeking treatments and cures for humans. Already, coral reef species have led to life- saving medicines including AZT for treating HIV/AIDS and Ara-A for a specific type of leukemia.23 Some corals transmit toxic chemicals to scare off predators—and those chemicals are under study for potential treatments for cancer and other diseases. Human bones are formed with calcium carbonate, like corals, which has led to the use of corals for grafting human bones! Who knew? I didn’t. Economics—TEEB Economist Pavan Sukhdev directs a project called “The Economics of Ecosystems and Biodiversity,” or TEEB. Started in 2008, the mission is to identify economic benefits that humans get from nature—including from coral reefs, tropical forests, mangroves, and other ecosystems. More than 80 different studies analyzed the financial benefits of coral reefs. A summary is shown in Figure 1.28, “The Economics of Ecosystems,24” and a detailed discussion can be found in the full report, which was sponsored by the United Nations Environmental Program (UNEP). Value per hectare of coral reef • Food, raw materials, ornamental resources: average $1,100 (up to $6,000); • Climate regulation, moderation of extreme events, waste treatment/water purification, biological control: average $26,000 (up to $35,000); • Cultural services (e.g. recreation/tourism): average $88,700 (up to $1.1 million); • Maintenance of genetic diversity: average $13,500 (up to $57,000).

Sukhdev concluded that the total global value of coral reef ecosystem services reaches as high as $172 billion per year. Those numbers of course will be refined as the project continues to collect and analyze more data, but it certainly illustrates how many areas of our lives relate to coral reef ecosystems. Two Hundred Days on Heron Most studies covering OA’s impact on corals have been conducted in glass aquariums. Some have been conducted on reefs, but for short periods producing lower-resolution datasets compared with the Heron project.

32 h

1.28 “The Economics of Ecosystems and Biodiversity,” project sponsored by the United Nations Environment Programme (UNEP) and directed by economist Pavan Sukhdev.

The Canary in the Coal Mine j

While it’s clear that corals require the mineral aragonite to grow skeleton, and it’s clear that oceans are warming and becoming more acidic, it is not yet clear how long various coral species will go on before becoming extinct. To help address that question, Kline decided that instead of bringing corals to the lab—he’d bring the lab to the corals.

“I take these studies out of the aquarium and onto the reef itself. We actually increase the CO2 levels on the reef, where the corals are part of an interactive ecosystem and can interact with all members of that ecosystem with natural seawater, with totally natural environmental conditions, and see how much the artifacts associated with having corals in a glass box, how that affects their responses.” Indoor aquariums can introduce artifacts such as unnatural light, microbes, and other contaminants that don’t belong in a coral ecosystem. They also miss important elements, such as natural ocean currents and tides and the millions of microbes and other symbionts that inhabit coral reefs. And in the lab, feeding is controlled by scientists instead of the natural ecosystem. In the natural “in situ” environment, bio-eroders such as sea urchins, parrot fish, worms, and sponges pick at the corals—eroding the coral skeletons that form the reef structure. “Bio-erosion is greatly accelerated under high CO2 and we missed that in the aquarium mostly because the bio-eroding organisms weren’t there,” said Kline. “When you have the corals in an aquarium, many organisms that either erode the corals internally from inside their skeleton or externally, they were isolated from those organisms, and you wouldn’t realize you were missing it [in an aquarium].” Data culled from the Heron project, described below, continues to produce new insights into how coral reefs react to ocean acidification. A large study like this one produces so much data that scientists can spend many years analyzing it from various angles. Kline pointed out that when the data collection was completed, he didn’t have enough people who could analyze it all—and he said that in 2016, 5 years after the project was completed. This is just one project. Worldwide, less than 1 percent of data produced is actually under analysis. The field of marine science alone generates big data in its own right, each slice too large for one small team to analyze quickly. The 200 day experiment also established newer and more robust methods for analyzing OA and its impact on corals. When datasets are introduced below, we’ll start by looking at the baseline chemistry of the Heron reef itself. Then we’ll peruse through some biological

33 h

j The Canary in the Coal Mine

data culled from individual coral polyps in response to acidic conditions and take a look at the impact on the larger reef community. The data, as you’ll see, aren’t reporting all bad news. They actually help inform reef managers around the world with processes that may slow the rate of reef dissolution—buying some time as humans continue to burn fossil fuels and add more CO2 to the atmosphere and oceans, not to mention lakes and rivers. Marine scientists are realists, in my view—both hopeful and realistic. While they continue to warn us about coral reef extinctions and other climate crises, they worry that world leaders aren’t taking necessary actions to avoid oncoming extinctions. Thus, many are turning to mitigation methods—or Band-Aids—that may help at least some coral reef species survive longer than expected, from planting more resilient corals in existing reefs, forming new reefs on hard structures— including sunken ships—and working on designing and then planting genetically modified “Super Corals” that may survive OA, bleaching, or other stressors. All of the above require deep knowledge of individual reef characteristics by location and the biological responses of individual corals of various species within those reefs. Enter the Heron Island project. The Free Ocean CO2 Enrichment System (FOCE) To get started, Kline and team designed, built, and floated a “Free Ocean CO2 Enrichment System,” or FOCE, shown in Figure 1.29, placed in the Heron reef.

Akin to a time machine, the FOCE system allowed the team to look into the future—to see how hard corals respond to the more acidic environment predicted for the end of this century in the natural environment.

1.29 David Kline and his “Free Ocean CO2 Enrichment System” (FOCE) on Heron Island. Courtesy of: Davey Kline.

34 h

The Canary in the Coal Mine j

1.30 Project technician Thomas Miard adding doors to divide compartments into three sections for obtaining physiology specimens. Courtesy of: Davey Kline.

Ocean-based bottomless “flumes,” shown below in Figure 1.30, housed living and dead corals from the surrounding reef —two treated with CO2, and two untreated for comparison. Predictions for 2100 Predictions for 2100 show that the ocean’s pH will drop to more acidic levels by between 0.25 and 0.5 pH units. Those predictions are derived from models and CO2 scenarios from the United Nations International Panel on Climate Change (IPCC)25. For the project, Kline chose the less severe drop in pH of 0.25. The more severe 0.5 unit drop in pH would cause the ocean to become 150 percent more acidic, while the 0.25 pH drop would only double the acidity of the ocean. Let’s hope that the 0.25 drop is a more accurate prediction—though doubling the acidity isn’t exactly good news. As discussed earlier, the pH has already dropped by 0.1 units since industrialization, leaving the ocean 30 percent more acidic. And that change happened in less than 200 years. Damage from acidity has already been found in mussels, pteropods, and other “calcifiers” (sea animals that build shells or skeleton) in the ocean from Antarctica to the Gulf of Maine. But even at the lower and “less risky” end of the scale—the 0.25 drop in pH units—sea animals and humans will experience an ocean more acidic than seen in at least 20 million years.26 We really don’t want to go there. Consider this. If your blood pH drops by 0.25 or 0.3 pH units, you would likely experience seizures, go into a coma, or die from “acidosis.”

All life on earth—from humans and other animals to plants, sea creatures, and microbes—relies on a fragile chemical equilibrium, and the slightest changes can throw an organism off course and into dangerous territory.

35 h

j The Canary in the Coal Mine

Think about how you feel when your child’s body temperature rises by, say, 5 percent, to 103.5. I’m guessing that you would call a doctor and get treatment. We are not so different from other life forms, nor from our planet’s delicately balanced ecosystems. FOCE Setup and Challenges The Heron-based FOCE system was designed to gradually adjust the seawater chemistry by injecting CO2 into “treatment flumes” over 6 months. But it wasn’t easy to get all that to happen. “It took over a year of development working with a team of engineers, biologists, and chemists to develop, build, test, and make a functioning FOCE system. It took several iterations to get this complex multisensor array system to work properly. And the system deployed for the 200 day experiment was actually the third version of the CP-FOCE built and tested by our team,” Kline said. FOCE methods have been around for over a decade but were initially designed for experiments on land—some placed in rainforests to gather data on sustainability. So how does it all work? Let’s start with flumes—those bottomless open-ended acrylic boxes housing the corals on the sea floor. The flumes, as shown in Figure 1.30, house the corals but also store working sensors that test for pH, temperature, and other elements. “We ran a study with four experimental FOCE flumes where we gradually lowered the pH 0.25 units and studied the impacts on the calcification of living branching Porites,” Kline said, referring to the coral species Porites cylindrica. The Porites cylindrica species branch upward and also grow horizontally. Shown in Figures 1.31 and 1.32, they are known to build small islands, or atolls.

1.31 Porites cylindrica. Photo by Philippe Bourjon. Courtesy of: Wikimedia. CC 3.0. http:// creativecommons. org/licenses/by-sa/3.0

36 h

The Canary in the Coal Mine j

1.32 Porites cylindrica in the Great Barrier Reef, Australia: an intertidal “micro- atoll.” Photo by Isobel Bennett. Courtesy of: Australian Institute of Marine Science.

Those CO2-treated “future scenario” flumes automatically reached a pH of 0.25 below the pH measured in the “natural environment” flumes in the final phase of the experiment (September through early December). But of course, before they could begin data collection from flumes, they had to hook up the entire system. Outfitted for diving, the team floated an array of FOCE apparatus on rafts supported by large yellow buoys to the reef flat, carefully maneuvering the rafts while stepping cautiously around living corals. Gear included the transparent flume housing, submersible pH, temperature, flow, light, salinity, and depth sensors, hoses and pumps and other gadgets for the CO2 injection system, a wind generator, solar panels, computers and waterproof cables, connectors—and all the wires, cinder blocks, spikes, pipes, and hooks needed to build, stabilize, and run the FOCE system underwater. They set up the wind generators and solar panels to provide energy to the system. They hooked up the controller and computers, and tested the system to ensure that data collected by sensors would upload via radio antennae to the onshore lab computers—where scientists could watch pH and other variables fluctuate in real time. Once up and running, the lab-based databases started to receive data from the underwater sensors, including pH—every second—temperature, and other measures. They also consistently collected water samples for lab-based work on land. Flumes were divided into three compartments. The center contained reef sediment, and the two outer segments contained the coral samples at one end and algal fragments at the other.

37 h

j The Canary in the Coal Mine

Flume-based sensors reported the seawater pH level every second to an underwater computer pod, which drove the CO2 injection system. If, for example, the pH sensor in the natural environment—used as a benchmark—reported a pH of 8.2, then CO2 injections into the future scenario flume would be increased until that flume’s seawater pH dropped to 7.98 units— 0.252 units lower than the natural environment pH—reflecting a conservative prediction for the future. The IPCC predictions are based on a future CO2 scenario for the open ocean. But seawater chemistry, including pH, varies widely in highly dynamic coastal areas like the Heron reef flat. Coastal seawater pH can reach lower and more acidic levels than the average open ocean pH of 8.1, as you’ll see in the data later. Some reefs are more dynamic than others, so it stands to reason that some species of corals may be more resilient to pH variations than others. And this study revealed some interesting data on resilience, as one of the included datasets will show. Marine scientists also need to record the length of time and the intensity of reef exposure to anomalous pH levels, temperature, and other stressors to effectively understand OA’s impact on corals, and each study must be viewed as a step toward such understanding. The high-resolution pH, collected every second, provides just that. Scientists must first document the “normal” conditions of a reef—measure its CO2, pH, temperature, minerals used for corals to grow, lighting, water flow rate, and other elements that play into reef development and decline. While some team members worked on chemistry measurements in the lab, others spent time in the water maintaining instruments. “We had to clean all the instruments on the reef. All sorts of algae will grow on sensors, so if you don’t stay on top of it at least every day or every other day then the sensors are measuring the effect of algae rather than the surrounding seawater,” Kline said. Since they had to study seawater and coral samples throughout the day and night, they were out at all hours—together with those beloved mutton birds. “So there’s periods of time, usually for about a week, where we go out every three hours. So this is including like three in the morning or five in the morning where we are out walking, freaking out a little because you’re walking out in the dark and everything you hear you think is a shark, to collect samples. But it’s kind of a fun experience”. Kline’s team also collected coral skeleton specimens by hand for analysis— diving with chisels and hammers for culling coral branches. Using a fine drill, researchers bored into coral skeletons for analysis at the end of the experiment and at different time points throughout the experiment— including in June, August, September, and November 2010. The project included a cellular physiologist to analyze internal “calcifying fluid” from the coral samples, and the results were published in 2015.27 “We’re trying to figure out how the changing conditions affect the corals on a cellular level. He [the cellular physiologist] stains for different enzymes and

38 h

The Canary in the Coal Mine j

we can look at internal pH and see how that’s affected by changes in the environment, changes in temperature, pH, light, nutrients,” said Kline. Kline wanted to measure the Porites’ internal “calcifying fluid” pH—and compare it against the seawater pH. That fluid contains ingredients required to grow skeletons. Aragonite, a calcium carbonate mineral found in seawater that tends to easily dissolve in acidic conditions, is the most essential growth ingredient for corals. Other similar minerals are required for other shell or skeleton builders.

1.33 Heron Island carbon chemistry spreadsheet. Courtesy of: Davey Kline.

The Reef Gets a Physical This chapter includes a few of the analyzed datasets from the project, but it’s a small slice of the bigger data generated on the Heron reef. We’ll see how individual polyps attempt to fight back against lower pH seawater, and how the larger reef—the community—reacts under various conditions in the natural environment. This project broke new ground in methods used to assess OA’s impact on coral reefs, driven by the data. First, we look at the baseline data—the data that describes the natural reef. Think of this data as the results of an annual physical—though this is the reef’s first full physical. All reefs under study will need such a starting point for analysis. Just as doctors test humans for pH, CO2, calcium and bone density, sodium, weight, and such, corals are tested by marine scientists for similar health properties. One can look at just about any living organism and find biological commonalities with humans. Last time I checked, my CO2 measured at 29 mmol per liter—quite good! What amazes me is that we trust medical scientists (doctors) to test us for so many different health issues using basic chemistry—the same chemistry principles that climate scientists use to measure the atmosphere, ocean, and other earth ecosystems. Yet, we still hear the word “hoax” when it comes to the climate crisis. In fact, corals, like trees, create time clocks. Trees produce rings every year, revealing their approximate age, and hard corals build layer upon layer of calcium carbonate, producing ring-like layers that, together with uranium–thorium dating,28 provide information on the reef and its surroundings over seasons and years. Anyway, Figure 1.33 contains the baseline data for the Heron Reef, with each datapoint recorded at a specific date and time. This spreadsheet wouldn’t likely be shown to the general public in its raw state, but we will visualize some

39 h

j The Canary in the Coal Mine

of this data in the exploration phase of a visualization project—to get to know more of the data backstory before we even consider designing an audience visualization from this cryptic dataset! This spreadsheet is all about carbon chemistry. The methods used to produce this data are outlined in Kline’s study.29 For this chapter, we’ll look at the dimensions (categories) that are most useful to a story. The rows of data are described by the header row—a series of labels in the first row. These labels are known as “dimensions” in statistics and in some software programs, including Tableau. And, by the way, most visualization software expects to see those dimensions in the first row, just as Figure 1.33 shows. You can in fact use two rows, but it’s not advised and the software must be told that you’re not using the “one row” header standard. An ugly and daunting spreadsheet at first glance, but we decipher it using what we’ve learned from interviews and studies—that data backstory. “Ocean acidification is a chemistry problem,” Kline said. “So, you have to understand the chemistry to study it [OA]. To do carbon chemistry, you need at least two aspects of carbon chemistry system. You have to measure things in the lab, you have to do titration where you add very small amounts of acid to figure out the buffer capacity of the alkalinity of the seawater. You also look at dissolved inorganic carbon. You’re basically looking at all the CO2 molecules that are dissolved in there.” So complicated to the non-chemist. But we, as communicators, need to focus on the dimensions that will tell a story for an audience. Aragonite saturation, shown in column G, is salient to any story on corals and OA because the mineral must be available in the seawater for coral absorption— an essential ingredient for growth. We’ll also focus on pH, which tells us how acidic the ocean is at a specific date and time, and of course CO2, which drives the pH level. “That time series we generated on the reef flat is one of the most high- resolution long-term datasets with discrete water samples for anywhere on a reef,” said Kline. Some measures, including pH, were recorded at the highest resolution— every second, 24 hours a day, over 200 days. Aragonite was calculated from water samples. “From just the sensors you can get fine scale resolution, but you can’t do the full chemical analysis without the water samples and doing these chemical analyses in the lab,” Kline said. But every single datapoint—from sensors or samples—is timestamped. This spreadsheet rolls the pH data up from seconds to specific minutes in a day for use with the aragonite analyses using lab seawater samples from specific points in time. As newer robust research methods like this become standardized, datasets from reefs around the world will be shared with IPCC modelers, leading to more refined predictions. Sensors of all sorts produce high resolution datasets. Wearable sensors keep track of our heartbeat, steps, mileage, running speed, bike speed, weight, sleeping habits, etc. Think about each of those items. 40 h

The Canary in the Coal Mine j

But sensors are also placed in cities, on ocean buoys, on boats and ships and satellites, in surveillance cameras, in jet engines, cars, trains. They are ubiquitous. It’s not surprising that sensors are the main driver of big data—as data is collected rapidly, in colossal volumes in a variety of topics, from health to pollution to finance and spending trends. In marine science alone, so much data has been collected that scientists sometimes spend years analyzing datasets from one project. Kline has published several papers based on data from the 2010 Heron project—the most recent in September 2019. So this baseline spreadsheet reflects the periodic aragonite measurements aligned with the pH values recorded at the time of the collection of the aragonite samples. “These are factors that allow us to understand the chemical environments that corals live in,” Kline said. “And this is the chemical environment that’s going to change with increasing CO2 and potentially make it harder for corals to build their skeletons.” CLEANING, WRANGLING, AND MUNGING DATASETS Warning to instructors: this section on wrangling and cleaning data may result in class boredom and frustration. So, take it in steps and mix it up with lectures and presentations involving design, colors, shapes, and “fun data” related to the music industry, nutrition, fashion, or sports! And make sure that the class can use Excel or another spreadsheet program that allows filtering by rows and columns and the basic functions listed in the resources list on the website. Before we start, data wrangling and munging are for the most part interchangeable terms. Cleaning a dataset can be simple, such as removal of blank rows, spelling fixes, and other fundamental modifications. Wrangling and munging imply more sophisticated steps—and could involve code, but not necessarily. A simple example is provided in the data section below, where we reformat a date dimension. Like many things, people use these terms loosely. Let’s get started with the dimensions in the header row in Figure 1.33. For training purposes, we’ll munge one dimension—the date dimension in column A. While it may sound dull to deal with a date format, keep in mind that all of the data measures, or datapoints, were recorded by date and time—the bedrock behind the carbon chemistry time series dataset. Without the date and time scientists couldn’t connect the chemical reactions among variables such as pH, temperature, CO2, and AR (aragonite) over time—or roll up data such as pH into weekly, monthly, or even yearly averages. Nor could they identify an accurate baseline for the reef at various temporal scales—such as day, night, week, month, season, years, and centuries. The cryptic-looking date in column A contains the decimal number 149.64583. It may be called “Julian” in the header, but some quick research on Julian dates tells us that there are many types of Julian formats. We can’t expect Excel 41 h

j The Canary in the Coal Mine

to convert it until we are able to tell Excel what is contained in that number. Year? Month? Day? Time in hours, minutes, seconds? In my software career in the 1980s (showing my age), I had created Julian dates frequently in computer programs, but they were simple five-digit integers containing the year, month, and day. We handled conversions back then in our code because spreadsheets didn’t yet exist! In fact, if all code from those days had used the five-digit Julian format, then the Y2K “crisis” may not have occurred! But conversions to recognizable dates cannot be done if you don’t understand the format hidden in the raw data. You have to figure that out. So, I went to my source, David Kline, and asked about that format. “149 is the day of the year,” said Kline, referring to the number to the left of the decimal. That number can range from 1 to 365, or 366 for leap years. “And the hour of the day [number to right of decimal point] is a decimal representation for the time,” Kline added, referring to the 64583. It turns out that the format used does not contain the year, unlike the Julian formats I used years ago—which are still in use today. But we know the year of the project, which was 2010. I learned that this particular Julian format is used often in science, particularly when sensors are involved. Next we decide how the date and time should look in our visualization for a public audience. A Gregorian calendar date would make sense—at least for North America and Western Europe. No reason to force your audience to figure out the Julian calendar format, which dates back to the Middle Ages. Or would your audience prefer the Alexandrian calendar? Berber?30 Since we now understand the contents, Excel can be used to convert 149.64583 to 5/29/2010 at 15:30, or 3.30 p.m. For training purposes, I created new columns in the spreadsheet in Figure 1.34—B through F—to show you that the conversion required a few “wrangling” steps using simple Excel functions. All of those steps could have been done in one long “nested formula” in Excel, but since this book targets beginners who may not be skilled in Excel formulas, I decided to show each step—one step per column—exposing the functions. The final date and time column is shown in column F in Figure 1.34. See website to download the spreadsheets from F igures 1.33 and 1.34 with the formulas for date conversions. See website for date conversion steps and how to build a “nested” set of steps in one cell. Keep in mind that once you create the functions and steps needed to convert the first row of data for any type of wrangling, you can usually cut and paste the formulas to all rows of data. Excel is just one of many tools used for wrangling data. Some inexpensive tools include OpenRefine, Trifacta, Tableau Prep, and others. There is no need to write code for wrangling data—at least not for the datasets introduced in this book. So, make sure you have just one “header row”—the row that describes your dataset. In addition, don’t include any empty rows in your data because it can 42 h

The Canary in the Coal Mine j

1.34 Heron Island carbon chemistry spreadsheet, showing results of conversion in “date–time” column. Courtesy of: Davey Kline.

cause visualization software to come to a halt when it hits that row, assuming it’s found the end of your dataset. Make sure that you look for misspellings. For example, if you have data organized by country, make sure that country is spelled the same in all instances. Example: BVI versus British Virgin Islands will be viewed as two separate countries. USA and U.S.A. will also be seen as two different countries. But even spelling issues sometimes pop out once you visualize the data in the exploration phase, as happens in the chapter on global health. VISUAL EXPLORATION So we’ve moved past a bit of wrangling. For large or complicated datasets I like to visualize the datapoints after the initial cleaning and wrangling steps. You’ll be surprised at how patterns reveal characteristics that you missed while vetting and cleaning a spreadsheet—particularly if you have more than a few hundred rows of data. While the carbon chemistry spreadsheet contained several dimensions (column descriptors), I believe that CO2, pH, and aragonite are most important in my story since I want to show my audience what happens to coral growth under future predicted CO2 and pH levels. Will there be enough of the calcium-based mineral aragonite to allow shell or skeleton-building when pH drops to more acidic levels? Dimension Ranges To start the exploration, it’s important to identify characteristics of each dimension of interest—from the highs and lows to perhaps averages and outliers, depending on your dataset. This not only helps you to choose graph types, but it’s an essential part of your data backstory. 43 h

j The Canary in the Coal Mine

Let’s start with the CO2—the primary cause of ocean acidification. Carbon Dioxide (CO2) Using Excel’s filtering function, I learned that CO2 ranged from 100.1 to 881.8 parts per million over 200 days. You might assume that the higher numbers may have caused pH to drop to more acidic levels, but we’ll use visual patterns to verify that assumption. pH Range The pH ranged over 200 days from 7.74 to 8.48—which surprised me, knowing that a 1.0 change is ten times more acidic or alkaline, depending on the direction. Remember that the global mean average for ocean pH is currently 8.1. Dynamic coastal reefs can obviously vacillate wildly around that average. Aragonite Saturation (Ar) Aragonite saturation state reached a healthy maximum of 5.9, well above the 4.0 threshold known for healthy coral growth, but at the other end it hit a very unhealthy low of 1.7 where shells would likely dissolve if exposed for too long. Date and Time Dates ranged from 5/29/2010 to 12/11/2010, based on those Julian days in column A. Keep in mind that CO2 treatments didn’t begin until July—reaching the most severe levels in September (–0.25 pH compared with natural environment pH). Now that we have the top and bottom ranges of the four key dimensions, we’ll need to think about how to plot them together to look at how one dimension may relate to another. And we’re also hoping to spot any issues in the dataset. Some stand out clearly during visual explorations. Graphing for Patterns Let’s look at the CO2 patterns over the project span. CO2 Pattern Since we know that all datapoints are timestamped, we have a time series. Line charts and area charts are often good choices for viewing data over time. Think of the stock market—you often see stocks in line charts for specific periods of time. Figure 1.35 illustrates the variations in CO2. Since the lowest value in our dataset was 100.1, I thought it best to start the Y axis at 100—which strays from the “norm” of using zero at the axis. But it should be mentioned in the narrative or below the chart why the axis starts at 100. Notice the gaps in July and November where I’ve placed the red arrows. It would have been difficult to spot those gaps by eyeballing a spreadsheet. On first glance I wondered whether someone (maybe me) had inadvertently deleted some data—knowing the project was designed to run every day for 200 days.

44 h

The Canary in the Coal Mine j

1.35 The chart shows gaps in CO2 measurements—a red flag.

Well, once you verify from your original spreadsheet (the one you saved in a backup file when you received it from the source), then go straight to your source. Kline explained:

“A powerful tropical storm in November 2010 disabled many of the field instruments and limited access to the site, resulting in a data gap in November.”

He added that in July they didn’t have enough staff on the island to gather samples, but on the days they had sufficient staff, they collected strong high-resolution  data. So let’s look what happens to pH when CO2 rises and falls. We’ll consider CO2 as our “independent” variable, since we know that rising CO2 is causing oceans to become more acidic. (Review basic statistics if you’ve forgotten about dependent and independent variables. Usually, the independent drives the dependent.) As in the CO2 chart above, we’ll plot the dates on the X axis for a time series. Again, we’ll use a line chart. But for pH on a new Y axis, should we start with a zero origin as we did with CO2 above? Take a look at how pH graphs below in Figure 1.36. pH Pattern As you know, the pH values ranged between 7 and 9. The bottom chart uses 7 as the origin, and the top chart starts at 0. Both use 9 at the top. Since pH vacillation is so important to OA discussions, we need to see the data clearly. So I would plot the data as shown in the bottom graph in Figure 1.36. That said, keep in mind that it’s very important to plot data based on its full range, which we did in both charts above because there were no datapoints below 7 or above 9.

45 h

j The Canary in the Coal Mine

1.36 These two graphs show the variation in pH over 200 days, which is what we want to see.

But playing with non-zero axes can be used to mislead. Notice that the top graph, at a glance, might leave you thinking that pH doesn’t change much at all. It’s flattened compared to the bottom graph. Flattening or heightening of curves by changing an axis can not only mislead, it is a known trick used by those who are trying to sell a point. Keeping in mind that a one point change in pH, say from 7 to 8, is a tenfold change. From 7 to 9 represents a 100-fold change. So it’s even more important to reveal those pH vacillations! We’ve all seen biased images where data visualizations are purposefully distorting trends one way or another to push biased ideas. The main point here is to be honest with your data. Make sure that all relevant data is included, whether you choose to start higher than zero for the axis origin or not. And always document the origin of your data below the chart. Hopefully that goes without saying! Correlation Now let’s turn to combining the pH and CO2 graphs to look at patterns and correlation. Notice the dual Y axes in Figure 1.37. CO2 is plotted vertically on the left, and pH vertically on the right—each plotted on its own scale, while both share the dateline on the X axis. It’s fine to use two different scales, but sometimes clearer if you generate two separate graphs instead for an audience. Right now, we are exploring, not visualizing for an audience.

46 h

The Canary in the Coal Mine j

1.37 CO2 and pH levels by day. The pH axis on the right is based on a logarithmic pH scale where a 1 point change represents a tenfold increase or decrease in pH. The CO2 axis on the left is plotted on a standard linear scale. We are only looking at patterns at this point to get a feel for how the two dimensions correlate.

CO2 and pH Correlation It’s quite clear that when CO2, in gray, rises then pH (in yellow) drops to more acidic levels throughout the experiment. Viewing this graph on the web allows you to zoom in and out to more or less detail. This graph, in essence, shows ocean acidification. When CO2 meets with seawater, pH drops. That is ocean acidification. This inverse relationship is well known in chemistry—s o scientists expected this to happen on the Heron reef, but they did not know how often CO2 and pH would vacillate, nor did they know the ranges of each dimension on the reef flat. The frequency of such changes is also important because Kline was hoping to learn how often and how long corals were exposed to lower pH scenarios. But keep in mind that without the knowledge we acquired about how seawater reacts to CO2, we could not have assumed, based only on this chart, that CO2 caused pH to drop. In other words, to determine whether the chart shows causation or just correlation depends on the data backstory—in this case the fundamentals of chemistry and an understanding of the project itself. We are certain at this point that CO2 is driving pH on the reef flat, but the datapoints also show us the frequency of change. Who knew how often pH would rise and fall until we saw this chart? If we wanted to see how long corals were exposed to lower pH scenarios, we’d dig into the pH datasets at the highest resolutions—every second for this project. See resources on the website for information on correlation versus causation. Aragonite and pH Correlation Next, we want to explore what happens to that essential growth mineral, aragonite, when pH drops.

47 h

j The Canary in the Coal Mine

1.38 The chart shows the reduction in the quantity of the mineral aragonite as pH drops to more acidic levels.

Remember that aragonite saturation in the seawater should be above a level of 4 for healthy coral skeletal growth, according to previous studies. When pH is low, will aragonite saturation diminish or increase? We’ll use another dual Y axis with pH on the right using the 7 to 9 scale, and aragonite saturation on the left, ranging from 0 to 7. As you can see in Figure 1.38, when pH is high, aragonite is high. But when pH drops to more acidic levels, the shell growth mineral diminishes, sadly. Again, a very clear pattern formed directly from our reef dataset. I’ve added a dotted line at the aragonite 4.0 level. The mineral datapoints below that dotted line are at risk. Of course, it depends on how long the coral reefs are exposed to the acidic environments. Scientists say that corals become stressed when aragonite drops from 4 to 3, and by the time it reaches 1, they begin to dissolve.31 Depending on the study, these numbers differ slightly, but most are in the same ballpark. Kline’s latest study presents new findings on this topic by looking at the reef overall, in addition to individual coral polyps. Now that we have some understanding about the three chemistry measures, let’s look at another dataset from the project. CORALS FIGHT BACK Turns out that the results in the in situ environment proved to be quite different from those in aquariums, at least for the reef off of Heron Island. And the results gave scientists a bit of hope—and boy do they need hope! First, it turns out that the coral species under study, Porites cylindrica, are biologically more sophisticated than previously thought. When faced with more acidic seawater, they appear to maintain or boost their own internal pH to higher more alkaline levels. Internally, they build their shells using a “calcifying fluid” in their gastric system—and by maintaining an alkaline pH level, they are likely to grow even under OA scenarios, at least for as long as they have access to aragonite.

48 h

The Canary in the Coal Mine j

1.39 Data show corals’ internal pH measurements against seawater pH.

The dataset in Figure 1.39 reflects this discovery. Again, it’s ugly, but this dataset includes data from underwater flumes treated with CO2 as well as those that were untreated (control, or natural environment) flumes. As you can see, seawater pH is shown in the third column from the right, and after some due diligence (reading the study or interviewing Kline), you would discover that the last column on the right reflects the corals’ internal pH. The measures were rolled up into a monthly resolution. It’s difficult to see any patterns in the dataset. But Figure 1.40, which was copied from Kline’s study,32 encoded the data into a box and whisker chart. I like these charts for datasets that have groups—and each group has its own range from top to bottom that can be presented in quartiles. Here, the data are divided into monthly groups, though keep in mind that the storms in November led to insufficient data, so best not to make any assumptions for that month. The diamonds represent corals’ internal “calcifying fluid” pH, and the circles the seawater pH. Notice that the coral diamonds consistently stay a few points above the seawater pH? They “up-regulate” metabolically—attempting to grow bigger—almost like those gym guys that take steroids. Those shapes colored red represent measures from the lower pH treated flumes, while blue shapes represent the natural environment flume values. The vertical lines (whiskers) reflect the top and bottom range for each flume. The horizontal lines provide the averages. Remember that the CO2 injections didn’t start until July, so the June values are about the same for treatment (red) and control (blue) in that month. But even in June, the corals’ internal fluid pH (diamonds) was boosted higher than the surrounding seawater pH (circles). See the website for help on building this box and whisker chart with interactivity and information on box and whisker charts. The Porites’ clever ability to boost pH offers hope that some coral species may survive longer than previously thought. And even better, it turned out that many of the coral samples—even those in lower pH treatment flumes—did in fact calcify and grow.

49 h

j The Canary in the Coal Mine

The table in Figure 1.41 shows the various measures used to assess growth in individual corals. Anyway, we will look at growth rate, the last column on the right, which is based on calculations from coral weight, shell density, and the change in length along a primary growth area. It’s not easy, as corals grow in different directions. We’ll build a box and whisker chart using those growth rates after we do a bit of data cleaning on this table. If you are a student, it’s likely you are scowling after hearing “cleaning” again, but that’s reality. Can you guess what we need to do to this table, which was copied from Kline’s study PDF, before we can import it into visualization software? Is there anything missing? Anything that might confuse the visualization software program? Notice how it’s quite easy for us to understand which rows of data belong to treatment versus control flumes. Our brains are accustomed to seeing

FOCE

Dry weight (g)

Wet weight (g)

Coral density (±0.02)

Ext. rate (cm/y) (±0.24)

Growth rate (g/cm2/y)

Control

7.48

0.14

1.02

1.45

1.47 (±0.31)

2.44

0.20

1.08

1.69

1.83 (±0.28)

4.28

0.43

1.11

1.45

1.60 (±0.28)

3.63

0.55

1.18

1.08

1.28 (±0.29)

1.05

0.15

1.16

1.45

1.68 (±0.28)

4.93

0.23

1.05

1.08

1.13 (±0.31)

3.49

0.14

1.04

1.32

1.38 (±0.32)

1.21

0.18

1.17

1.57

1.84 (±0.30)

2.38

0.14

1.06

1.57

1.66 (±0.27)

5.73

0.40

1.07

1.20

1.29 (±0.29)

Treatment

50 h

1.40 Box and whisker chart shows range (colored vertical lines) and the mean (diamond) and (±1) SE from the mean (black bars) of pHcf of Porites cylindrica nubbins taken at four intervals of growth (June, August, September, and November) grown under treatment (red) and control (blue) conditions. Also shown are the mean (circle) and SE (colored horizontal bars) of the pH of the seawater within both treatment (red) and control (blue) FOCE flumes. Chart and description copied from Kline’s study.

1.41 Spreadsheet data measures to assess growth in corals.

The Canary in the Coal Mine j

1.42 Spreadsheet data measures to assess growth in corals with additional datapoints added to column A for visualization purposes.

FOCE

Dry Wet weight (g) weight (g)

Coral density (±0.02)

Ext. rate (cm/y) (±0.24)

Growth rate (g/cm2/y)

Control

7.48

0.14

1.02

1.45

1.47 (±0.31)

Control

2.44

0.20

1.08

1.69

1.83 (±0.28)

Control

4.28

0.43

1.11

1.45

1.60 (±0.28)

Control

3.63

0.55

1.18

1.08

1.28 (±0.29)

Control

1.05

0.15

1.16

1.45

1.68 (±0.28)

Treatment

4.93

0.23

1.05

1.08

1.13 (±0.31)

Treatment

3.49

0.14

1.04

1.32

1.38 (±0.32)

Treatment

1.21

0.18

1.17

1.57

1.84 (±0.30)

Treatment

2.38

0.14

1.06

1.57

1.66 (±0.27)

Treatment

5.73

0.40

1.07

1.20

1.29 (±0.29)

spreadsheets with titles and subtitles designed to lay out groups, many with subtotals, averages, grand totals. Obviously the ten rows of measures are ordered by the type of FOCE flume—control and treatment. But the visualization software will simply see blanks—so eight of those rows will not be identified with any flumes. And what if you sorted the table in Excel? You would lose the ability to figure out which rows of data are linked to the treatment or control flumes, which would fundamentally ruin your dataset and render it unusable. So we need to add the FOCE flume type to all cells in the first column—as shown in Figure 1.42. Because we know the data backstory after reading the studies and interviewing Kline, we can be confident about filling in the blanks in this small dataset of only ten rows. But often, when you receive blanks in any dataset, you need to contact the data producer, as we did when we found gaps in the CO2 data earlier. If you were to import the table now, you would discover yet another cleaning issue that would hinder the visualization software from working properly. During imports of datasets, Tableau and other visualization software generally attempts to identify each dimension’s data type—number, date, text (known as “string”), geographic data, Boolean, or other. Sometimes, when it can’t figure it out, you end up with nulls instead. Take a look at the column at the far right. How is it different from the other numbers in other columns? If you are thinking about the parentheses, you’ve got it. Software, even code, would not expect a number to include any characters. If characters are identified, then the dimension will be viewed as containing strings, not numbers. The growth rate needs to be a pure number in order for it to be used in any calculations, or for encoding into bars, circles, bubbles, maps (latitude and longitude), or any type of marks that need to be sized or positioned by value on a chart.

51 h

j The Canary in the Coal Mine

FOCE flume type

Dry weight

Wet weight

Density

Extension rate

Growth rate

Growth

Control

7.48

0.14

1.02

1.45

1.47 (±0.31)

1.47

(±0.31)

Control

2.44

0.2

1.08

1.69

1.83 (±0.28)

1.83

(±0.28)

Control

4.28

0.43

1.11

1.45

1.60 (±0.28)

1.60

(±0.28)

Control

3.63

0.55

1.18

1.08

1.28 (±0.29)

1.28

(±0.29)

Control

1.05

0.15

1.16

1.45

1.68 (±0.28)

1.68

(±0.28)

Treatment

4.93

0.23

1.05

1.08

1.13 (±0.31)

1.13

(±0.31)

Treatment

3.49

0.14

1.04

1.32

1.38 (±0.32)

1.38

(±0.32)

Treatment

1.21

0.18

1.17

1.57

1.84 (±0.30)

1.84

(±0.30)

Treatment

2.38

0.14

1.06

1.57

1.66 (±0.27)

1.66

(±0.27)

Treatment

5.73

0.4

1.07

1.2

1.29 (±0.29)

1.29

(±0.29)

And PDF tables are often filled with characters within the same cells as numbers. Ugh. So, I’ve separated the margin of error from the growth rate—adding two new columns, G and H, for the two values that were combined in column F (Figure 1.43). Now we have clean data. This is one of the most common issues—separating one column into two distinct columns, or, in reverse, joining columns into one. For example, think of an address. Some visualization software expects an address to contain the street and number, the city, state, and zip code in one column. Other software may want the city, state, and zip separated. So you need to become adept at concatenation and separation of columns. The growth rates can be explored using a box and whisker chart or a bar chart. I wouldn’t suggest using a line chart because there are no dates or other ordinal values for the lines to follow. For exploration purposes, I created a quick box-whisker to look at growth rates in Figure 1.44. Natural control flume data is represented in blue dots, and treatment flume data in orange. On the website, if you hover over the orange or blue dots, you’ll see the growth rates in a tooltip along with other information. Notice there are ten dots in total, just as there were ten rows of data. As you can see, the growth doesn’t look drastically different between the two, so the corals appear to have continued to grow in the lower pH environment! However, the treatment flume growth numbers are slightly lower. You see this by looking at the horizontal lines that represent the top and bottom of each range. But neither is even close to zero growth as you can see at just a glance. If I were to visualize this dataset for an audience, I would start my origin at about 0.8 since there is so much white space under the boxes. A word of caution: It is not yet known how long those corals would survive such acidic scenarios since the pH was gradually dropped to the most acidic levels

52 h

Margin of error

1.43 Spreadsheet data measures to assess growth in corals with changes in the last two columns (G and H) for visualization purposes.

The Canary in the Coal Mine j

1.44 Exploratory box and whisker chart to quickly assess growth rates in natural and treatment flumes.

over 6 months—reaching the most acidic level in September—but November was interrupted by storms. And when a coral polyp directs its energy to boosting its own pH, there are energy tradeoffs. Other needs, such as reproduction and feeding, could be slowed. Scientists refer to this as a “metabolic cost.” We animals only have so much energy! The other uncertainty is that Heron Island’s reef is highly dynamic, and the Porites have likely adapted to the constant change. So the findings don’t tell us how other species in other locations will react. Many live in more stable reef environments where carbon chemistry (remember that baseline data) doesn’t vary as widely or as often. Stressors, such as bleaching, weaken corals. Thus, boosting pH internally would likely be more challenging when the animals are weakened by a bleaching event, or warmer waters, invasive species, and other stressors. The Heron Island reef flat wasn’t experiencing a bleaching event at the time of the project. No other stressors were present. Science is difficult. Each study adds more evidence to an ongoing story. But each study also raises more questions. If find it hopeful that at least one coral species in one reef location may resist ocean acidification, even if only temporarily while humans get their act together. Scientists are looking for species that may be resistant to OA and even warming, and they are already planting corals grown in labs to help stave off extinction. But while the Porites did show growth individually, we also need to look at the bigger picture—the reef itself—the coral community.

53 h

j The Canary in the Coal Mine

YOUR REEF ON ACID Remember that corals attach to hard surfaces, including limestone structures that were once living corals? This knowledge is essential at this point. Living corals make use of their ancestors—those skeletons that morph into limestone platforms to host new and old communities. Most reef communities, depending on the stressors they’ve been exposed to already, are hosting living and dead corals—both part of a larger 3D structure. And without those ancestral layers, the reef would no longer stand. If you’ve ever hovered in and around large reefs, you know that they can be gargantuan—particularly those that have really cool arches large enough for sharks and bigger fish to pass through. And many have caves and small holes in which small and large fish or eels can hide. Thing is, if the ancestral limestone dissolves, all will come tumbling down. So Kline’s most recent study involved looking at the entire reef—and how it would react to future more acidic scenarios—using many measures, including the change in weight from start to finish and whether those changes differed in natural versus treated flumes. “We found that reduced pH led to a drastic decline in net calcification of living corals to no net growth, and accelerated disintegration of dead corals. Net calcification declined more severely than in previous studies due to exposure to the natural community of bio-eroding organisms in this in situ study and to a longer experimental duration,” said Kline. The bio-eroders appear to feast more on weakened reefs, such as those that have large portions of dead corals. They found damage from these parasites on corals internally and externally. Of course, the team also looked at that shell-building mineral—and they found some disturbing new thresholds. This data modeling showed that reefs dissolve when the aragonite saturation level reaches 2.3—and that is only when reefs are made up of 100 percent living corals. “The threshold of ΩAR = 2.3 [aragonite] is a value that reefs routinely reach during the night and could become the daily average on most reefs by 2100,” said Kline in the study. But if 70 percent of the reef is already dead from bleaching or other stressors, the reef can start dissolving at an aragonite level as high as 3.5—very close the 4.0 that we referred to as a “healthy growth” level for individual corals. In other words, the ratio of living to dead coral cover predicts to some extent a reef’s life expectancy. “In a high CO2 future, reefs are going to start dissolving; dissolve rather than grow and build massive reef structures. It’s really worrying,” said Kline. Sadly, many reefs worldwide have already lost large portions of living corals due to bleaching and other stressors, leaving them less likely to grow when aragonite levels are too low. And as you saw from our first spreadsheet, the baseline data, aragonite saturation was often below 3.5 on the Heron reef, as is the case for many reef environments globally.

54 h

The Canary in the Coal Mine j

1.45 Net growth or dissolution of reef. Small box tooltip provides more information, including aragonite level for each datapoint, when viewer hovers. Notice that most datapoints fall below zero growth.

The exploratory box and whiskers in Figure 1.45 show the weights of the ten corals in each flume. Notice how many of the datapoints (circles) are below zero—in the dissolve zone. See website for interactivity for this chart, which breaks down into smaller units by flume and provides tooltips for more information. Also, see “resources” for more information on box and whisker charts. The box-whisker chart in Figure 1.45 lays out dead and live corals from each of four flumes. Blue circles represent dead corals and live are colored orange. In the two future or CO2-treated flumes (Flumes 1 and 3), the lower pH stunted the growth of the resident corals because of increased bio-erosion caused by the lower pH. But Kline finds a silver lining in these results. He points out that as more marine reserves are created, existing reefs can be covered with living corals to stave off total erosion. Reefs around the globe that have been heavily damaged from warm water bleaching, meaning that the living:dead ratio should and can be addressed. “Conservation efforts to protect living coral cover and areas of lowest bleaching-related mortality would therefore delay the gradual OA-induced loss of three-dimensional reef structure, which is critical for reef biodiversity, fisheries habitat, fisheries production, coastal protection, and food security for coastal communities in many tropical island nations,” said Kline in the study. “The results suggest that having a higher fraction of a reef ecosystem occupied with living rather than dead coral structures would maintain reef integrity and resistance to OA for longer,” Kline wrote in the study.

55 h

j The Canary in the Coal Mine

So reef managers, in essence, become coral farmers “using reef restoration methods such as coral farming and possibly even genetically modified climate resistant corals or ‘Super Corals,’ ” he added. Scientists are already working to restore reefs by planting test tube babies— corals grown in labs—in many locations around the world. Some are working on designing those “Super Corals” as well after identifying various species, including Porites, and then digging into the biogenetics behind that resilience. Some grow faster than others, some may be more resilient to warmer seawater and others may adapt to stressors, such as OA, more quickly than others. Personally though, I find it difficult to be overly enthusiastic about planting corals all over the world when the cause of ocean acidification is so clear and could be reversed if world leaders would agree to take it on as a global crisis. We are still adding greenhouse gases to the atmosphere and acidifying our precious oceans, and corals are not the only animals affected as you’ll see in data collected from another calcifier—known as “sea butterflies,” described in Chapter 2. Though the magical little critter sits near the bottom of the food chain, it’s important food for big fish and whales. Meanwhile, I’d be happy to volunteer to plant corals around the world. DATA VISUALIZATION If you were to create a story from some of the information in this chapter, what would you focus on? My feeling is that this topic is filled with information that most readers or viewers would not find interesting. But I wanted to illustrate that data visualization projects are not about the tool of choice or the prettiest design, they are about the data backstory—getting to know the data. But ugly data and geochemistry aside, this topic is ripe with opportunities for attractive images and visualizations. I’ve uploaded a few samples of data visualizations that I think would work in a story for an intelligent yet non-scientific audience. You’ll see that I’ve used some of the work from our exploratory visuals, but worked on removing excess lines, changing colors, refining shapes and sizes, and building dashboards or storyboards for context. The data visualizations remain simple and clean, but interactive—though not all data visualizations must be interactive. It depends on your platform and how much information you want to provide after seeing the top level. You may want to drill down to more granular details or use pop up tooltips to add details for specific datapoints. But the surrounding story requires a strong narrative that also includes breathtaking images, or artwork. In fact, in addition to beautiful coral reefs and sea creatures, or photos of people involved in the project at the project site, you might think about an image of a real human sitting down to a fish dinner to reflect the top of the food chain— or a fisherman from an island community who relies on the reefs to attract the fish that sustain his livelihood. (You can do a separate story on overfishing!) 56 h

The Canary in the Coal Mine j

1.46 Example of using story feature in Tableau. Each tab can be used to add to a story using images such as charts, tables, photos, or diagrams. In this example, we start with David Kline’s underwater lab before introducing data visualizations.

Below are some static screenshots of some of the visuals I might use in a story about ocean acidification’s impact on coral reefs—driven by the datasets presented. The audience visualizations based on our datasets are mainly on the website, but I’ve included some static screenshots below to give you a few ideas on how to visualize such cryptic data in a story. Figure 1.46 shows one “slice” of a storyboard using Tableau. The story may describe the Heron Project in steps—with slideshow-like views of the data visualizations. Each tab at the top includes a short blurb about the image, assuming the viewer has read the context. Each page can include videos, images, data visualizations, narrative, or links. Figure 1.47 could be used in that same storyboard as one “page” or “slide,” or it could be used on its own embedded in the narrative. Where aragonite falls as CO2 rises, the scatter chart in fi gure 1.47 illustrates visually how datapoints for the shell- building mineral mainly fall below the healthy 4.0 threshold over the span of the project. The white box is a tooltip that pops up when the viewer hovers over a dot. For some audiences, the ocean background may be more appealing than a gray background, and how climate change will lead to more acidic oceans and less of the mineral aragonite.. Of course the narrative surrounding this image would focus on the corals’ needs for aragonite, and how climate change will lead to more acidic oceans and less of the mineral aragonite. See resources on the website for more information on scatter charts and use of interactive tooltips. We used a box and whisker chart to explore this growth data earlier, but the version shown in figure 1.48 is formatted for an audience. 57 h

j The Canary in the Coal Mine

1.47 Scatter chart illustrates visually how the shell- building mineral— aragonite—falls below the healthy 4.0 threshold over the span of the project as CO2 rises on the X axis.

1.48 Box and whisker chart shows how corals fight back using internal pH fluid. Formatted version for audience.

The narrative would point out that the corals, using their abilities to raise their own internal pH levels, were in fact able to grow even in the lower pH flumes. That is, as long as there was enough aragonite in the seawater. The web version is interactive, allowing the user to view more information.

58 h

The Canary in the Coal Mine j

The green boxes change from light green to dark green at median growth. Notice that the median is much lower in the treatment flumes, and that overall growth appears to be greater in the natural flumes, but not by that much! See website for more information on building these and other charts, dashboards, and storyboards using Tableau Desktop or Public. STORY THOUGHTS FOR JOURNALISM STUDENTS While this book targets anyone interested in working with data visualization for the first time, my hope is that journalism students will find it helpful in classes when they produce stories. Thus, I’ve included a few thoughts. Using the datasets discussed in this chapter, stories could be written from several angles. . Some Coral Species May Resist Ocean Acidification Better Than Others. 1 Our data in this chapter informed us that one coral species appears to adapt to some extent to ocean acidification by boosting its internal pH to more alkaline levels. This was big news that came out of the Heron Island high-resolution datasets.33 Think about writing this for a non-science audience using images and data visualization. I like the idea of including an ocean background behind the line or bar charts as shown earlier. In addition, you would of course include images of corals to attract readers. 2. Scientists Are Designing “Super Coral” Farms in Attempts to Stave Off Reef Extinction. This story would require reporting beyond what we’ve done for this chapter. I mentioned that scientists are working to design corals that can withstand ocean acidification and bleaching. Interview some marine scientists involved in genetics and find out where the experimental coastal farms are and create a map showing the locations. An online search would be a good start. There are videos and web stories about these efforts. See the website for samples and resources related to these ideas. NOTES 1 2 3 4 5 6 7 8 9 10

www.youtube.com/watch?v=5SkGmGPxP90 www.uq.edu.au/heron-island-research-station/content/front-page www.globalcarbonproject.org/carbonbudget/ Data behind image from https://scrippsco2.ucsd.edu/ www.ncdc.noaa.gov/data-access/paleoclimatology-data/datasets https://oceanservice.noaa.gov/facts/carbon-cycle.html www.noaa.gov http://ocean.si.edu/corals-and-coral-reefs www.montereybayaquarium.org/animal-guide/invertebrates Bryant, D., Burke, L., McManus, J., and Spalding, M. 1998. Reefs at Risk: A Map- Based Indicator of Threats to the World’s Coral Reefs. World Resources Institute, International Center for Living Aquatic Resources Management, World Conservation Monitoring Centre, and United Nations Environment Programme.

59 h

j The Canary in the Coal Mine

11 12 13 14 15

16 17 18 19 20 21 22 23 24 25

26 27 28 29 30 31 32 33

http://oceanservice.noaa.gov/facts/exploration.html http://video.nationalgeographic.com/video/coralreef_spawning https://sanctuaries.noaa.gov/earthisblue/wk46-lionfish.html www.iucnredlist.org “Seahorses under a changing ocean: The impact of warming and acidification on the behaviour and physiology of a poor-swimming, bony-armoured fish”, a study by Filipa Faleiro, Miguel Baptista, Catarina Santos, Maria L. Aurelio, Marta Pimentel, Maria Rita Pegado, José Ricardo Paula, Ricardo Calado, Tiago Repolho and Rui Rosa. Conservation Physiology, Vol. 3, Issue 1, 2015. https://doi.org/10.1093/conphys/cov009 www.frontiersin.org/articles/10.3389/fmars.2018.00283/full https://ocean.si.edu/corals-and-coral-reefs www.noaanews.noaa.gov/stories2015/100815-noaa-declares-third-ever-global-coral- bleaching-event.html www.ametsoc.net/sotc2017/StateoftheClimate2017_lowres.pdf www.globalcarbonproject.org/carbonbudget/ and www.pmel.noaa.gov/pubs/PDF/ feel2899/feel2899.pdf http://inyo.coffeecup.com/site/latham/latham.html www.pmel.noaa.gov/co2/story/A+primer+on+pH Chivian, E., and Bernstein, A. 2008. Sustaining Life: How Human Life Depends on Diversity. Oxford University Press. www.teebweb.org/publication/climate-issues-update/ Solomon, S., Intergovernmental Panel on Climate Change, Working Group I. Climate Change 2007: The physical science basis—contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. https://ocean.si.edu/ocean-life/invertebrates/ocean-acidification www.pnas.org/cgi/doi/10.1073/pnas.1505586112 www.scientificamerican.com/ a rticle/ c ores- f rom- c oral- r eefs- h old- s ecrets- o f- t he- oceans-past-and-future/ www.pnas.org/cgi/doi/10.1073/pnas.1505586112 https://en.wikipedia.org/wiki/Julian_calendar https://sos.noaa.gov/datasets/ocean-acidification-saturation-state/ www.pnas.org/cgi/doi/10.1073/pnas.1505586112 https:// n ews.mongabay.com/ 2 015/ 10/ g ood- n ews- s ome- c orals- s how- s urprising- resilience-to-ocean-acidification/

60 h

newgenprepdf

2 Sea of Butterflies

Some Damage Is Done The Ocean Acidification Study Shell Quality Data Exploration Shell Quality Data Exploration Exploring Visually Chart Structures Sinking Speeds Data Cleaning Data Visualization Story Thoughts for Journalism Students

64 65 68 70 71 74 74 79 81 83 85

“Nothing exists for itself alone, but only in relation to other forms of life.” Charles Darwin Most people on this planet have neither heard of nor seen a pteropod. 2.1 Pteropod species Limacina retroversa or “sea butterfly.” They use pods (wings) to stop from sinking. Healthy spiral-shaped transparent shell. Photo taken in a lab. Courtesy of Amy Maas.

61 h

j Sea of Butterflies

2.2 Clione limacina or “sea angel.” Cousin to the sea butterfly. Wikipedia Creative Commons. Matt Wilson/Jay Clark, NOAA NMFS AFSC. Public domain. https://commons. wikimedia.org/wiki/ File:Sea_angel_ vertical.jpg

But these tiny winged zooplankton populate open oceans globally, generally hidden from boaters, swimmers, and divers because of their nearly invisible bodies and deep sea habitats. That said, divers do spot them from time to time when the pteropods are near the surface feeding, and they do sometimes sadly wash up on beaches, looking like miniature glass ornaments. But tiny doesn’t mean unimportant. Starring in the ocean acidification study covered below is the smaller-than- pea-sized “sea butterfly” pteropod, scientifically known as Limacina retroversa, shown in Figure 2.1. The adorable little butterfly is one of the most populous of the “thecosomes,” or pteropods that build shells, as opposed to “gymnosome” pteropods, which are shell-less such as the Gummy Bear-like “sea angel” shown in F igure 2.2. Sadly, the sea butterfly shells are quite fragile, which is why they were one of the first sea animals to display shell corrosion as a result of ocean acidification. The gelatinous sea angel is not directly impacted by ocean acidification— there are no calcium carbonate minerals to corrode because it does not have a shell or skeleton. But it is still at risk because, as far as scientists know, the angel eats only one thing—its cousin, the sea butterfly. So if the butterfly dies out, the angel may soon follow. And, by the way, that angelic image in Figure 2.2 is a bit misleading. When it spots an appealing sea butterfly, a rather grotesque process unfolds. From the top, between the ear-like appendages, a monster-like head pops out and devours its prey. Search for videos of these critters. It’s worth your time! At least 40 species of pteropods provide essential nutrients, including protein and calcium, to whales, herring, pink salmon, mackerel, sharks, seabirds, and many other marine animals.1 Whales are known to passively swallow them up en masse by opening their cavernous mouths as they move through large swarms of the free swimming critters. 62 h

Sea of Butterflies j

No wonder they are known as the “potato chips of the sea,” tasty little bites available in large quantities. While pteropods are easily captured when seen and are certainly near the bottom of the sea food chain, they have a few things going for them. First and foremost is their invisibility feature. “So these guys [sea butterflies] have these clear shells that help them hide in this habitat. They want to be seen through because they’re in the open ocean and if something can see them it can eat them,” said Amy Maas, a biological oceanographer working at the Bermuda Institute of Ocean Sciences (known to her colleagues as “The Pteropod Lady”).

“If predators can’t see them, they can’t eat them. So being transparent and invisible is a perk.”

Maas studies ocean acidification and its impact on pteropods, while simultaneously working the genomes to figure out how many species exist globally, how they differ from each other and what types of proteins are expressed as shells corrode, and more. As Maas notes:

“This is totally one of the problems in working with open ocean organisms—we don’t know exactly how many species there are and we don’t know how connected they are. So if you’ve got one in the Atlantic Ocean and one in the Pacific Ocean and one in the Indian Ocean, are they all the same or are they different?”

“And it’s beginning to look like there’s more diversity than was previously identified. At the same time sometimes it’s not as diverse as we previously thought. So we’re talking less than 100 [species] but more than 40,” she continued. But it appears that most or all species share at least one behavior. “These animals are ‘diel vertical migratory,’ which means they spend their days moving from a lit, well-oxygenated place near the sea surface to a cold, deep, dark, less-oxygenated place to avoid predators. Animals in the open ocean have no place to hide. In the daytime, the light is on; there are no trees to hide behind and no burrows to crawl into. To hide from predators they have to go someplace, and in the open ocean the only place they can go is down,” said Maas. A daily vertical ritual. After learning about these gorgeous critters, I’ve added “night diving with pteropods” to my bucket list. Maas and others think that ocean acidification not only damages the animals’ shells, but may slow them down. 63 h

j Sea of Butterflies

That daily migration from top to bottom and back plays an important role in the ocean’s carbon cycle as many millions of these diminutive critters deliver carbon to ocean depths where it can stay for many hundreds of years. But delivery of that CO2 must happen quickly, which is perhaps one reason why they are so heavy for their size. Dropping quickly is as important as staying invisible. The slower they move, the more likely they will dissolve before delivering the CO2 to the ocean floor. “In some ocean basins these shells will never reach the ocean floor because they dissolve before they get there. This is normal,” said Maas, adding:

“However, with acidification if they sink slower they’re less likely to get deep enough to actually get out of that mixed layer. And so they never leave.”

And that mixed layer is where the ocean exchanges carbon with the atmosphere. To act as a carbon sink, that carbon needs to go deeper. “Down below [the mixed layer], where there’s lower exchange with what we consider our terrestrial world, once carbon or calcium carbonate sinks it takes hundreds of years for it to come back up and engage with the atmosphere.” And that’s what we want to happen, we want to sink that carbon to where it can become part of the ocean floor. So Maas’s sea butterfly study and the datasets it created, as you’ll see, shed light on shell quality as well as sinking speed under higher CO2 conditions. SOME DAMAGE IS DONE Pteropods have been found with damaged shells in ocean basins from the north to the south pole. There are pockets that seem to be more acidulous than others—including areas within the Gulf of Maine, where Maas and colleagues collected the sea butterflies for the ocean acidification study described in the next section. Back in 2014, the Proceedings of the Royal Society B published a study2 with images of damaged pteropods from waters off the West Coast of the United States, shown in Figures 2.3 through 2.6 below. The first two illustrate the difference between a healthy shell (Figure 2.3) and a shell showing signs of dissolution (Figure 2.4). Notice the clear shell in the healthy pteropod in the first figure. The wings are clearly defined and smooth. In the second, there are pockmarks and etchings in the shell, and what appears to be a bit of fraying in the pods (wings). These are signs of corrosion from a more acidic ocean. Figure 2.5 shows two shells, side by side—the healthier shell on the left and the shell with signs of OA damage on the right. And in Figure 2.6, shell corrosion of a pteropod’s surface can be seen using a powerful microscope.3

64 h

Sea of Butterflies j

2.3 Healthy pteropod collected during the U.S. West Coast survey cruise. Notice the pods (wings) are shaped properly for propelling, and the shell is mostly transparent, as it should be. Credit: National Oceanic and Atmospheric Administration. Public domain. www.noaa.gov 2.4 Dissolving pteropod. Evidence of marine snails from the natural environment along the U.S. West Coast with signs of dissolving shells and damage to pods (wings). Credit: National Oceanic and Atmospheric Administration. Public domain. www.noaa. gov

2.5 Shells showing signs of OA. Notice the change in transparency in the unhealthy shell on the right. Credit: National Oceanic and Atmospheric Administration. Public domain. www.noaa.gov

Many other studies show similar results, but scientists are trying to figure out just how far and wide the problem is today, and whether higher CO2 scenarios will impact the populations and their daily behaviors. THE OCEAN ACIDIFICATION STUDY Maas and her colleagues spent many days and nights on research vessels in the Gulf of Maine in 2014 and 2015 collecting pteropods using large

65 h

j Sea of Butterflies

2.6 Surface Dissolution. An image from a scanning electron microscope of dissolution on a pteropod shell. Credit: National Oceanic and Atmospheric Administration. Public domain. www.noaa.gov

2.7 Amy Maas separating out pteropods. Photo by Peter Wiebe. Courtesy of Amy Maas.

netted drums. On each voyage they had to separate out the sea butterflies from other types of plankton, placing them in jars to bring back to the lab (Figure 2.7). The work, funded by the National Science Foundation, shed light on ocean acidification’s impacts on shell quality and sinking speed.4 They also gained insight into pteropods’ genetic and physiological responses and published results in 2018 and 2019. Maas said that pteropod studies for the most part must be conducted in lab aquariums (Figure 2.8). An in situ project like Kline’s just wouldn’t work well for these critters.

66 h

Sea of Butterflies j

2.8 Amy Maas working with her pteropods in the lab at the Bermuda Institute of Ocean Sciences. Photo by Amalia Aruda Almada. Courtesy of Amy Maas.

“The problem is that we don’t do as well if we don’t have a place to stand up,” said Maas, adding that some pteropod research has been conducted in the open ocean using “mesocosms.” “They’re like big plastic bags meters and meters tall. They basically encapsulate a chunk of seawater and all the animals that live in it. And then they bubble it with CO2 or bubble it with air and then look at how the community responds over time,” said Maas. Sounds good, but there are issues. Pteropods are heavy for their size— another feature meant to help them sink quickly. “These guys actually do very poorly in mesocosms. In part we think because they are so heavy they tend to get snagged in the fabric, so unfortunately this is a very difficult group to work with. One, they’re really sensitive—which is why we’re trying to work with them in the first place. And two, they’ve got a lot of peculiarities of their biology that just makes it hard,” said Maas. So the indoor lab aquarium it is. But it’s still a challenge. “Remember this is an open ocean organism and they’re negatively buoyant which means if they don’t do something they sink. They just sink,” said Maas, pointing out:

“And so often when you’re trying to keep the animal in captivity it’s hard because they’re used to having much more space, and no sides, no bottom. And you put some of them in a jar and they just sink to the bottom, and they just sit on the bottom and bacteria collects and they’re more likely to get sick there.”

And it’s also challenging to feed them in captivity.

67 h

j Sea of Butterflies

“They have these little strands that they use to pull water toward their mouth and pull particles or phytoplankton toward their mouth but they also produce a web out of mucus. It’s really quite interesting. So they produce this bubble and stuff sticks to it and then they can pull that in and eat everything that’s attached to it and we think that [bubble] actually helps them with buoyancy, with floating a little bit in the open ocean, but if they get stressed out they drop it and freak out.” Fortunately, Maas knows what they like to eat. Lab aquarium menu options include appetizing little phytoplankton such as Rhodomonas lens (red-colored microalgae with flagella for swimming) and Heterocapsa triquentra (gold-c olored dinoflagellates with horns). Yummy.

SHELL QUALITY While marine scientists are already aware of OA’s impact on shells, they still have much to learn about the animal’s physiological and behavioral response to the more acidic environments. Maas had a couple of key questions that influenced the direction of her research:

“We can see the shell changing but do the animals really care? Does it matter to the animals that their shell is changing? So we chose three metrics that we thought were ecologically relevant. One, how transparent they are because that relates to how their predators can see them, second how fast they sink, and third whether there’s a change in swimming behavior.”

In 2018 and 2019, Maas continued to analyze datasets from the 2014– 2015 research trips in the Gulf of Maine, where CO2 changes dynamically on a regular basis. Like Kline, she is also digging into gene expression related to ocean acidification, and other physical and biological factors, which we won’t cover in this chapter. Maas said that most shell quality studies in the past relied on subjective visual assessments of pteropod shells. “Up to this point what we’ve been primarily seeing in the pteropod shell literature has been a rating scale like 5 to 1,” said Maas. “So a person will look at a shell under the microscope and say “this is really damaged so it’s a 5 or it’s got something that looks like holes. And then a level 4 is a little bit better, and then pristine would be a 1.” But her team designed an objective mathematical model for assessing shells. “It took a standard photograph with a standard light source and then took it into a computer program that just measured black and white, and it was able

68 h

Sea of Butterflies j

then to turn that into a number. Completely quantitative rather than qualitative data,” said Maas. The program measured proportions of “opacity,” or how white the shell becomes in higher CO2, and the proportion of light that travels through the shell, labeled “transmittance.” Shells become whiter or more opaque under higher CO2—losing their “invisibility” perk, and they develop areas on their shells that block light completely. Both dimensions reveal changes that lead to loss of transparency or invisibility. “So basically you take an image and it converts to the gray scale. Then you’re looking at how white the shell is or how dark the shell is. The whiter the shell the further away from the natural state it is,” said Maas. Alex Bergan, who was a PhD candidate working with Maas at the Woods Hole Oceanographic Institute at the time, conducted the experiments. He placed lights at angles to image shells in two ways: opacity proportion reflects how white the shell becomes in higher CO2, and transmittance measures the amount of light that actually passes through. A few of Bergan’s images are shown in Figures 2.9–2.12, taken in two CO2 scenarios after 3 days of exposure. As summarized in Figure 2.13, scenarios include the ambient or natural CO2 levels at 400 ppm and the high CO2 scenario, predicted for the end of the century, at 1,200 ppm. Both shells had spent 3 days in their respective CO2 scenarios.

2.9 Shell is somewhat transparent or see-through in ambient CO2.

2.10 Shell turns almost completely opaque or white in high CO2 after only 3 days.

2.11 Shell’s see-through quality still intact in ambient CO2.

69 h

j Sea of Butterflies

2.12 Shell allows very little light through. It is no longer transparent after only 3 days in high CO2.

CO2 at 400 ppm for 3 days

CO2 at 1,200 ppm for 3 days

Opacity

Figure 2.9

Figure 2.10

Transmittance

Figure 2.11

Figure 2.12

2.13 Table showing opacity and transmittance changes. Courtesy of Amy Maas and Alex Bergen.

The software program, MATLAB, calculated opacity and transmittance from the images on a scale of zero to one, as you can see below in the spreadsheet in Figure 2.14. Data Exploration Fortunately, this spreadsheet is quite clean and almost ready for visualization software—just one header row and no empty rows. There are some empty cells that should be vetted—some with notes that hint at the problem. As always we need to go to the source to confirm any assumptions. The source can be the study’s author or the study itself, since most studies explain issues encountered during experiments. Turns out that some of the shells were too damaged to photograph properly. Think about it—a crack in a shell would allow light through, which would skew results. 70 h

2.14 Data from this spreadsheet includes: CO2 levels in column C (Treatment), the number of days of exposure in column B (Day), transmittance in column J, and opacity in column K. Courtesy of: Amy Maas.

Sea of Butterflies j

For an audience, I’m most interested in the data that presents evidence on the shell quality—and would use images as well as data in the story about OA’s impact on pteropod shells. As always, we’ve checked margins of error for the datasets. See web resources for useful links to statistics fundamentals. Thus, from spreadsheet in Figure 2.14, the following dimensions will be used in visualizations: CO2 levels (Treatment), the number of days of exposure in column B (Day), transmittance, and opacity. While length, width, and weights are important to the study overall, it’s not within the scope of this chapter to cover them. That said, other datasets from the same study show start and end weights of shells exposed in ambient, medium, and high CO2 aquariums over 2 weeks—clearly showing that the shells were lighter after a week of exposure to medium and high-CO2 environments. Good to know as part of the data backstory. For our visualization, I will be deleting any rows that don’t have opacity and transmittance, since there were problems with those shells. SHELL QUALITY DATA EXPLORATION Let’s look at the ranges of our dimensions of interest. Use filters in spreadsheets or explore visually as shown in Chapter 1 to identify ranges for each dimension. We already know that opacity and transmittance values fall within the 0 to 1 range, and we think of them as proportions of 1, or 100 percent. The three CO2 treatment levels are 400 (ambient), 800 (medium), and 1,200 (high)—all measured in parts per million (ppm). Actual shell opacity numbers in this small slice of data fall between 0.12 and 0.31. Transmittance ranges from 0.06 to 0.54. Shells were exposed to the three CO2 treatment levels for 14 days, and measured on days 1, 3, 7, and 14. When I explored the data visually for exploration, as expected, the shell opacity increased (shells turned whiter) as CO2 and days of exposure rose. But when I viewed transmittance, I was surprised to see the numbers also rise with higher CO2 and days of exposure. Not what I expected. Maas’s study described a downward trend in transmittance—a s shown in her study chart in Figure 2.15 (based on a different dataset, but the same dimensions and measures). In her chart, as the level of CO 2 rises (represented by colors) and days of exposure increase (X axis), transmittance falls. Transmittance, from what I read, was calculated based on the portion of light that passed through the shell. So if the shell becomes more opaque as CO2 rises, then wouldn’t the shell block more of the light from passing through? I turned to Bergan, who had prepared the spreadsheet before the final study was published.

71 h

j Sea of Butterflies

(a)

0.9 0.85

Transmittance

0.8 0.75 0.7 0.65 0.6 0.55 0.5 0.45 (b)

0

2

4

6

8

10

12

16

14

0.3 0.28 0.26

Opacity

0.24 0.22

Ambient Medium High

0.2 0.18 0.16 0.14 0.12

0

2

4

10 6 8 Days of exposure

12

14

16

Turns out that there are two ways to look at transmittance—and two ways to calculate it. First, the software (MATLAB) can be instructed to measure the “darkness” of the shell—or the portion of the shell that blocks light from passing through. Or, it can measure the “see-through” area where light passes right through. In the spreadsheet shown above, the former was used. But our narrative, and the study’s narrative, used the latter. Either method reveals the same thing—that invisibility is diminishing. Think about the glass-half-empty or half-full analogy. If a glass was filled with a black liquid would you calculate the dark portion of the glass or the clear transparent portion? This issue exemplifies why it’s essential to understand the data backstory, and that only the data creator really has the answer. The solution, however, was a piece of cake. Bergan suggested that I use the “darkness” data in my spreadsheet, but to subtract each transmittance value from 1 (1 – transmittance) to get the number we wanted. In other words, we subtract the darkness to see the light! So I added a new column (I), shown in red in Figure 2.14, with the new values for use in our visualizations.

72 h

2.15 This chart shows that opacity increases as CO2 levels and days of exposure rise— reflecting a direct relationship. For transmittance, the lines move downward as CO2 rises, meaning that less light is passing through the shell— reflecting an inverse relationship.

Sea of Butterflies j

2.16 Pteropod at ambient CO2 levels (400 ppm) for 14 days. Courtesy of: Amy Maas.

2.17 Pteropod at high CO2 (1,200 ppm) levels for 14 days. Courtesy of: Amy Maas.

See website for how transmittance was converted in the Excel spreadsheet. We’ve seen images of shells impacted by ocean acidification after 3 days of exposure. In Figures 2.16 and 2.17, the impact is stark against a black background that is not unlike a dark ocean. These pteropod shells had been exposed to ambient and high CO2 for 14 days. Commenting on the pictures, Maas said:

“The first photo shows what a natural looking pteropod is, pretty see-through. The second shows similar organism that had been held in captivity for the same number of days but had experienced a high CO2 level and you can see it is much whiter.” So why do these shells turn white under more acidic conditions? “I think what is happening is that the normal crystalline structure of aragonite allows light to pass through very very easily. And that’s why they [pteropods] do that on purpose because they want to be see-through because they’re in the open ocean,” said Maas. Maas has also been working with the shells at the molecular level to assess ocean acidification damage and it has given her an idea about what’s happening.

73 h

j Sea of Butterflies

“As acidification occurs we think what happens is that the crystal structure gets wonky. They [crystals] are not where they’re supposed to be. We think there may be some changes with the proteins that hold things in shape and so the light then starts to bounce off all of these different crystals that are no longer aligned and you start to see white. So it [the surface] is a little bit rougher.”

Wonky crystals. She added that they are still working on shells at the molecular level to confirm the wonky crystal hypothesis. EXPLORING VISUALLY So how would we visualize this issue for an audience? First, I like to emphasize that the data visualization is just one component of the story—perhaps the component that adds evidence or gravitas. But the audience must be drawn to the story. Like art, data visualizations can help to break up the text, but unlike art, they are meant to be clear and simple. As Edward Tufte—the “father of data visualization”—counsels, the ratio of data to ink should be quite high. Less ink, more data. A story can be “data-driven,” but data alone wouldn’t likely draw an audience. In this case, images of pteropods and scientists working with the critters might be a better approach to drawing eyeballs to the story. At this point, we hopefully know the issue and the data backstory behind pteropod shell quality—and why it’s important. In journalism classes, when students reach such a point, I bring out colored pencils and drawing paper and ask them to design two or three charts, including interactivity ideas such as what to filter or group or tooltip information. If this task is too difficult, it’s quite likely that they don’t yet understand the data backstory. And without that understanding, they likely don’t have a strong story angle. Our story, for training purposes, is to tell our audience about ocean acidification’s impact on pteropods. I’d like to start with a rough draft of a box and whisker chart. My advice is to show the draft to someone, a friend or colleague, after explaining the context, and ask them how long it takes to understand what the chart is telling them. If they are confused, go to a simpler chart. Chart Structures See resources on the website for more information on chart types.

74 h

Sea of Butterflies j

Starting with a box and whisker chart, we’ll plot the shell opacity on the Y axis and the days of exposure on the X axis. As you know, the shell quality data is grouped by CO2 scenario (400, 800, and 1,200 ppm) and by days of exposure (1, 3, 7, and 14). Box and whisker charts are known to work well with groups, but they aren’t as familiar to general audiences as they may be to some of us who work with business, finance, and science data—imagine a stock price chart for the day. The top and bottom lines outside the box are represent the high and low for the day. The top and bottom ends of the box show the open and close, and inside the box are the stock prices between the high and low. That box can be divided into sections such as quartiles, or halved at the median, depending on the designer’s preferences. But since we have groups and highs and lows and averages for opacity and transmittance, this is a good chance to build a box and whisker, though we’ll choose the best chart for the audience later—and we’ll format the chosen chart. We will color the major grouping, CO2 level, in yellow (ambient, natural), green (medium), and red (high). The second grouping, days of exposure, will be encoded to “position.” See Figure 2.18 where each box is positioned over the number of days of exposure on the X axis. The shell quality measures are encoded as circles aligned with opacity values on the Y axis and circumscribed by the box. One could use squares or other shapes, but circles work well with boxes. The whiskers represent the minimum and maximum shell quality values—in this case, shell opacity. The viewer could switch between opacity and transmittance in an interactive chart. Not every group has the same number of measures, as you can see, because some shells had to be eliminated from the process—particularly those that spent 14 days in high (1,200 ppm) CO2. That box is smaller with fewer shells for that reason. The box draws itself over the area where most of the values fall, while the top and bottom lines, or whiskers, represent the minimum and maximum values.

2.18 Rough draft of box and whisker chart. This chart shows the opacity for shells exposed to the lowest CO2 levels (400ppm.) The user chooses the CO2 level at the top of the chart. Produced by Dianne M. Finch-Claydon using Tableau.

75 h

j Sea of Butterflies

2.19 Rough draft of box and whisker chart. Opacity: all three CO2 scenarios. Produced by Dianne M. Finch-Claydon using Tableau.

The viewer can choose one of three CO2 scenarios, or rotate through them. This works well since the boxes move up and down during the interactive rotation. Or, one could choose “ALL” CO2 scenarios to get a side-by-side view. See Figure 2.19. Hovering over a shell measure triggers an information box or tool tip. And hovering over the box itself provides information such as the median opacity level for the full set of shells associated with each box. The longer the box, the larger the range of opacity values. Notice how the minimum opacity measures move up after three days of exposure in Figure 2.19. The pattern is clear, but maybe not as clear as a line chart? In both cases above, one can easily see that opacity rises with CO2 and days of exposure—though the ambient scenario in yellow (400 ppm) is relatively stable, as there were no CO2 treatments to create a more acidic environment. See website resources for more information on box and whisker charts. The line chart is often a good choice, though it can be a bit funky when a dataset has natural gaps. For instance, we are using days 1, 3, 7, and 14 along the X axis. The shells were not imaged on the days in between, so we have gaps. That is why the chart in Figure 2.20 comprises straight vertical lines, where the opacity measure ranges are plotted, and slanted lines, which are there as connecters to the next set of opacity measures. The slanted lines simply fill the gaps. If one removes them, then the chart will show four straight lines. Not ideal. And when we plot the three CO2 scenarios together in Figure 2.21, it’s difficult to read. One reason for the odd line shape is that the datapoints, or shell opacity numbers, were measured on only four specific days—and that is where you see the straight vertical lines. The slanted lines are simply there to connect the data from one day to the next. The box and whisker looks good, but may be too complex for some audiences. The line chart doesn’t work because we can’t get a continuous line without continuous data. Let’s look at one more idea. 76 h

Sea of Butterflies j

2.20 Rough draft of line chart. Opacity: high CO2 (1,200 ppm). Produced by Dianne M. Finch-Claydon using Tableau.

2.21 Rough draft of line chart. Opacity: all three CO2 scenarios (400, 800, and 1,200 ppm). Produced by Dianne M. Finch- Claydon using Tableau.

The charts below in Figures 2.22 and 2.23 might be a bit more friendly than the bar and whisker, yet still show the same patterns. I’ve added an average line to each group to emphasize the trend direction over 1, 3, 7, and 14 day CO2 exposures. These circle groupings by days of exposure could just as easily be bars. The viewer would choose between the opacity and transmittance views, and the CO2 scenario colors (the level of CO2 used) would be the same for both charts for consistency. The Y axis would use the ranges we discovered earlier for the two shell quality values. Remember that a CO2 level of 400 parts per million is our current “norm.” But 800 and 1,200 parts per million are predicted for the end of this century or sooner—and would represent severe ocean acidity. These studies place animals in the severe CO2 treatments for a number of days or weeks, so it’s important to keep in mind that once the ocean becomes more acidic over the next 50 years, sea animals will spend all their time in that environment. 77 h

j Sea of Butterflies

2.22 Rough draft of circle chart. Opacity rises with CO2 and days of exposure. Shells are losing their translucency in higher CO2 scenarios (red circles), turning whiter and rougher. Notice the distribution of the CO2 colors. Most of the yellow (natural seawater) shells are in the lower opacity range, while the green (medium CO2) and red (high CO2) begin to rise on day 3 of exposure as the shells turn whiter. Produced by Dianne M. Finch- Claydon using Tableau.

2.23 Rough draft of line chart. Transmittance falls as CO2 rises and days of exposure increase. Shells are losing transparency (less light getting through them). Notice the distribution of the CO2 colors. Most of the yellow (natural seawater) shells are in the higher transmittance range, while the green (medium CO2) and red (high CO2) begin to fall on day 3, as less light penetrates through the shells. Produced by Dianne M. Finch- Claydon using Tableau.

Notice that most of the ambient shells (yellow) in the first chart show a lower opacity than those in the green (medium) and red (high) CO2 scenarios. The circle chart could just as easily be a bar chart with some quick adjustments. As we’ll create a bar chart for sinking speed below, we’ll use the circle chart as our final visualization with formatting. See the interactive version on the website, and the formatted version shown in Figures 2.24 and 2.25.

78 h

Sea of Butterflies j

2.24 Notice the gaps widen between the yellow and red circles. The yellow circles represent shells that are still nearly invisible (or lower opacity), while the red ones are whiter (higher opacity). That gap begins to widen on day 3, and continues through day 14. As the chart shows, the whitest and most opaque shells spent longer periods in the highest CO2 environments. Produced by Dianne M. Finch-Claydon using Tableau.

2.25 This is the same as the previous chart, but in this view the user chose to focus on high CO2 (red circles), and so the average for the view was recalculated, leaving out the numbers in the natural and medium CO2 groups. The tool tip or information box pops up and provides more information when the viewer selects a circle. Produced by Dianne M. Finch- Claydon using Tableau.

SINKING SPEEDS As discussed, pteropods sink vertically and quickly to get to deep dark areas of the ocean in the wee hours or the morning—before sunrise. There they can hide from predators until darkness falls, and then rise again to feed at or near the surface for the night.

79 h

j Sea of Butterflies

So it makes logical sense even to a non-scientist that when their shells erode and become whiter, brighter, and lighter—not to mention rougher—they just might lose that hydrodynamic efficiency they are known for. That “hydrodynamic” ability is not only known to marine scientists. While visiting Amy Maas at the Bermuda Institute of Ocean Sciences (BIOS), I talked with some visiting mechanical engineers who were analyzing pteropods with a view to designing micro underwater robots—wing propulsion in particular. The design process where nature is behind ideas is known as “bio-inspiration,” where the design is inspired by nature—but not copied. You’ve probably heard that sharks, bumble bees, and other insects and animals have been studied for bio-inspiration. Imagine pencil eraser-sized underwater robots dropping like bombs to dark ocean depths, then rising rapidly using pteropod-like wing propulsion mechanics. Maybe they collect data payloads and then deliver those payloads from the ocean depths—but to whom? Scientists? Governments? Military? Amazing stuff. So, bottom line is that we don’t want pteropod shells to erode and we don’t want them losing mass or weight. I doubt that whales would eat the tiny robots instead of their tasty little calcium-rich snacks. Then again, plastics of all shapes and sizes are found in the stomachs of whales and other fish. So maybe it’s better to state that the whales will not replace nutritious pteropods with tiny swimming robots. So how did the team measure pteropod speed? To summarize the steps, Bergan sucked up pteropods from their respective CO2 scenario aquariums using a soft pipette. Then, he dropped the little critters into a wider glass stationary pipette placed in a 13 liter cylindrical tank or “carboy.” Think of a very large glass and a wide glass straw, the bottom of the straw high enough in the glass to leave ample room for the animals to drop vertically into the seawater. The animals slowed upon arrival in that straw, then settled down and began to do their natural thing—dropping to the bottom. A high-speed camera photographed them sinking through a designated area measuring about 6 square centimeters. Drop speed was calculated from the video frames in centimeters per second. As you can see in the data, each pteropod was tested at least three times. In my visualizations below, I’ve used the average speed of those three drops. See a video on website showing sea butterfly dropping and rising in one of the carboys. Mirrors were placed in the tank so that the pteropods could be videotaped from different sides—as though in 3D. The data slice used here represents pteropod “races” conducted in the third week of CO2 exposure—a week longer than the data we used in the shell quality experiments. When pteropods opened their wings, the measurements were not included. Other experiments recorded speed with wings open, but they turned out to be difficult. With wings in against their bodies, they drop vertically more quickly—more like rocks than butterflies!

80 h

Sea of Butterflies j

2.26 This spreadsheet was designed for an experiment to monitor the impact of CO2 on pteropods’ vertical dropping speed. Needs substantial cleaning before data can be visualized for exploration or audience.

Data Cleaning Take a look at the spreadsheet on sinking speed in Figure 2.26. It is laid out with labels meant for the researcher to record and assess—and not designed with data visualization in mind. That said, this is a typical style of spreadsheet used in corporations, non-profits, and other organizations. They are organized for viewing—not for transforming into a visualization nor for importing directly into a database. In fact this dataset continues to the right and left with chunks of data for different weeks and years. When you hear that “big data” is made up of a variety of formats (one of the three Vs mentioned in the introduction), this spreadsheet exemplifies a format that is difficult to decipher by software programs. It obviously needs to be cleaned for data visualization. Format varieties refer to the internal features—such as how individuals lay out their data and labeling. In addition, there are also external formats—such as whether data is created in Excel, a relational database, XML, HTML, CSV, and many others. That’s where SAP’s Hana, Hadoop, and many other big data platforms come in—though they are not exactly affordable to journalists and other communicators. But we tend to work with slices of datasets anyway. So, there are many approaches to cleaning this data. And it depends on how you plan to use it. We are interested only in the speed data—and ensuring that each speed is aligned with its respective pteropod, as reformatting can be dangerous if you don’t focus on the relationships. I added two new dimensions (columns): “CO2” and “Feature.” Feature would categorize the numeric data into speeds 1–3, average speeds, maximum

81 h

j Sea of Butterflies

length, or area. That way I can grab any of those dimensions when needed in the visualization software. I removed the term “individual” and renamed the columns to pter1, pter2, … pter23 for clarity. It’s best to label appropriately in your data instead of renaming in the software program, though you can if needed. I then removed all empty rows and added the CO2 scenario data to the new column. In the original, it was labeled above the groups of data as “high (1200ppm)”, “medium (800ppm)”, and “ambient (400ppm)”, but we need it attached to each row of speeds so that the CO2 scenario can be selected in the visualization, and all pteropods and their speeds will be displayed in the view. See the results in Figure 2.27. If the data have hundreds or even millions of rows, I’d likely change the entire format from wide to narrow for efficiency. For example, we could have added one column called “Pteropod Number” and then each row, versus column, would contain the pteropod ID. Other columns would include CO2 Scenario, and all of the other dimensions, including speed1, speed2, speed3, and so on. This would be most effective in visualization software. Keep in mind that column labels, or dimensions, should be the filters. I’d like to filter on CO2 Scenario, as we do below. One needs to become savvy about switching rows to columns and columns to rows when reformatting datasheets. This frankly comes with experience. But start with your design on paper to figure out what dimensions you need to filter or categorize in your visualization. Always keep in mind that you can always wrangle a dataset from wide to narrow or narrow to wide, depending on your needs. You’ll know when you need to do so, because your visualization won’t cooperate.

82 h

2.27 Cleaned spreadsheet. Columns added and data descriptions filled in where needed. All empty rows and excess labels removed.

Sea of Butterflies j

In Tableau, you can sometimes use a feature called “Parameters” to accomplish the same thing, depending on the spreadsheet and your needs. See the website for tips on wide-to-narrow wrangling. Notice in the cleaned spreadsheet that I include Month and Year. I don’t need those since there is only one slice of data from April 2015. For the full database created in the study, I would also add a column for week, and we would then need all three columns since the study produced data for several weeks over 2 years. Don’t be confused by the pteropod numbers. Each CO2 scenario used its own group of 23 pteropods—so while they use the same IDs, each is attached to a CO2 scenario grouping. As for gaps, you may have noticed that the “high” CO2 scenario appeared to include only 16 pteropods, while the medium and ambient counterparts tested as many as 23 animals—sort of like the hurricane gap in Chapter 1. So I contacted Bergan. He replied by email:

“I found that they didn’t have their wings fully withdrawn, because otherwise I wouldn’t have saved the video. All pteropods were alive during sinking and swimming trials. In a few instances, the pteropods were compromised/destroyed (they are very tiny after all) while moving them around with a pipette and therefore I couldn’t get as many trials as I would have liked.” And as you can see in this slice of data, those that had already spent over 2 weeks in the highest CO2 scenario represent the largest loss. Since shell quality was proven to atrophy under higher CO2 conditions in this study and others, and sinking speed was also negatively impacted, Maas and her team concluded that the shell quality issue, caused by ocean acidification scenarios, most likely drove the slowdown in sinking speed, but they will conduct more studies to confirm this hypothesis. DATA VISUALIZATION Since I’d like to see 23 pteropods for each CO2 scenario, I thought I’d try a bar chart. Sometimes one has too many dimensions to fit the view—and the bars become too skinny or an annoying scroll bar is added to the bottom. The view in the unformatted chart in Figure 2.28 may be a bit wide, but one can turn the chart on its side for a longer and narrower view for smaller devices. Of course, you need to learn “responsive” web design to switch from large to small device views in a web page. That certainly can’t be addressed here, but keep it in mind. There are templates out there that can be used, but you do need to know how your own images and charts will shrink or expand or even disappear depending on the size of the device in use. Responsive design is discussed in some detail in the last chapter.

83 h

j Sea of Butterflies

2.28 Rough draft of bar chart reflecting sinking speeds. CO2 scenarios by color.

Let’s try a quick draft of a bar chart. In Figure 2.28, each bar represents one pteropod. CO2 scenarios were encoded to color: blue for ambient, orange for medium, and red for high CO2. Each group of pteropods lived for 3 weeks in one of the three CO2 scenarios. The bar chart works well for sinking speed. It’s quite easy to understand what is happening by viewing the bar height trend—speed drops as CO2 rises from left to right. The dotted average lines added to the final chart (Figure 2.29) emphasize that same trend. The format you choose for any chart depends on so many factors. Will it be used on the internet or published in a print format? As each chapter attempts to reveal, chart choice is all about the data and the story angle. And again, any chart I’ve mentioned in this book requires a narrative around it, and captions under it with information on who produced it, where the data originated, and dates of data collection, as well as other pertinent information, such as margins of error, p values, or other significant statistical information. See website resources for reviewing statistics.

84 h

Sea of Butterflies j

2.29 Final bar chart. Each bar represents one pteropod. The sinking speed for each is an average of three sink tests, and measured in centimeters per second. The sinking experiments involved 23 pteropods from each CO2 scenario. The highest CO2, in red, had only 16 pteropods to work with due to losing some that didn’t survive the most acidic environment (1,200 ppm) for 3 weeks. This is just 1 week of tests from a database that is much larger (over 2 years), but the trends are the same. Viewers can hover over bars for more information on sinking speeds, weight, length, and other information on each pteropod.

STORY THOUGHTS FOR JOURNALISM STUDENTS While this book targets anyone interested in working with data visualization for the first time, my hope is that journalism students will find it helpful in classes when they produce stories. Thus, I’ve included a few thoughts. The stories you could consider based on the above data include: . How Acidification of Oceans Impacts the Adorable “Sea Butterfly.” 1 2. The Ocean’s Shell-based Animals at Risk—From Mussels and Oysters to Sea Butterflies. Pteropods are not the only sea animal facing shell or skeleton dissolution. You’ve seen how sea butterflies’ shells and sinking speeds are impacted by acidification. What about shells of other sea animals such as mussels, clams, scallops, oysters, abalone, or others? Contact marine scientists anywhere in the world who work with “calcifiers,” animals that use calcium carbonate to grow shells or skeletons. Interview them about their latest projects and what they are finding, and obtain the datasets they’ve built. Depending on what you find in your data, you may likely use similar visuals to those presented in this chapter. This “calcification” concept of growing strong shells and skeletons can be explained to non-science types by comparing it to osteoporosis—a human condition where bones weaken when calcium levels are too low. Again, for visualizations, use adorable images of sea butterflies or of people who farm or collect mussels as their livelihood. You can take a business angle—the mussel farmer, for instance—or the ocean food chain angle, highlighting the impact on fish and whales as pteropods, “the potato

85 h

j Sea of Butterflies

chips of the sea,” dwindle in numbers. Remember, they need to be strong and healthy to make it to ocean depths, and can dissolve before reaching the near bottom of the sea. Images should be easy to find. As for datasets, each scientist has many. It’s up to you to dig in through interviews and research, but the fundamentals of acidification in the first two chapters should give you a jumping off point. Find new angles. Follow a scientist at work. Dig into the data with the scientist and talk about how to present it for a public audience. Remember, maps are great when location data is available, and bar charts are the clearest form of data visualization and can be placed over an X axis timeline—or over categories/dimensions. NOTES 1

Armstrong, J.L., et al. 2005. Distribution, size, and interannual, seasonal and diel food habits of northern Gulf of Alaska juvenile pink salmon, Oncorhynchus gorbuscha. Deep Sea Research, Part II, 52, 247–265. doi:10.1016/j.dsr2.2004.09.019 2 “Limacina helicina shell dissolution as an indicator of declining habitat suitability owing to ocean acidification in the California Current Ecosystem” by N. Bednaršek R. A. Feely, J. C. P. Reum, B. Peterson, J. Menkel, S. R. Alin and B. Hales. Proceedings of the Royal Society B, Vol. 281, Issue 1785, 2014. https://royalsocietypublishing.org/ doi/10.1098/rspb.2014.0123 3 www.noaa.gov/noaa-led-researchers-discover-ocean-acidity-dissolving-shells-tiny- snails-us-west-coast 4 Bergan, A.J., Lawson, G.L., Maas, A.E., and Wang, Z.A. 2017. The effect of elevated carbon dioxide on the sinking and swimming of the shelled pteropod Limacina retroversa. ICES Journal of Marine Science, 74, 1893–1905. doi: 10.1093/icesjms/ fsx008

86 h

newgenprepdf

3 Parasites and Armed Rebels

Mac Otten Malaria and Its Vector Agent The Mosquito as Vector Rebel Eruption and Roaming Health-Care Facilities The Project Process Trucks, Roads, and Rebels The Data Data Exploration The Spreadsheets Data Enhancement Using Code Data Visualization Medicines, Mosquito Nets, and Diagnostic Tests Mosquito Nets Project End Story Thoughts for Journalism Students

87 90 92 93 95 97 99 102 103 104 106 109 110 112 114 115

“If you can’t fly then run, if you can’t run then walk, if you can’t walk then crawl, but whatever you do you have to keep moving forward.” Martin Luther King Jr.

MAC OTTEN When Mac Otten receives a new monthly dataset from the Central African Republic (CAR) in Washington, D.C., he sees much more than rows and columns of labels and numbers. He sees rural villages where moms look older than they should, holding babies dying of a preventable and treatable disease. He sees deaths going up, or down, and whether medical supplies are reaching villages, towns, and cities around the country, or not. Otten was tapped to manage the malaria eradication data reporting project in late 2013 by the International Federation of Red Cross and Red Crescent Societies (IFRC). 87 h

j Parasites and Armed Rebels

3.1 Children in the Central African Republic inside insecticide-treated mosquito net. Courtesy of: Benoit Matsha-Carpentier/ IFRC.

As a pediatrician and epidemiologist—not to mention a data scientist in his own right—Otten spends all of his energy working to save babies and children from preventable death. This time his focus is on malaria. In the CAR, 70 percent of child deaths are down to malaria. The disease remains the number-one killer of adults and children in the country despite efforts in the past to turn it around. Among the many challenges in beating this preventable disease, Otten and his team must distribute myriad supplies to rural villages and cities— from malaria medications and testing equipment to the essential insecticide- treated mosquito nets that help families sleep through the night without becoming infected by a mosquito carrying the malaria parasite. Those nets are shown in Figures 3.1 and 3.2. The net is set up and ready for use in the first figure, and in the second, a mom holds her baby in one arm and a net in the other. For more than two decades, Otten’s worked in underdeveloped countries where his efforts are most needed. This project was sponsored by the Global Fund to Fight AIDS, Tuberculosis and Malaria. Otten’s role as the Monitoring and Evaluation (M&E) Director was to design and implement a data-driven strategy to eradicate malaria from the country. But even more important than the database technology was the training program to ensure that patient data was accurately reported from the field to the central health ministry and to Otten. Together with the IFRC and the Central African Republic health ministry, they designed a system and worked together to train staff in health centers

88 h

Parasites and Armed Rebels j

3.2 Mother and baby in the Central African Republic, mosquito bed net in hand. Courtesy of Benoit Matsha-Carpentier/ IFRC.

nationwide. Initially Otten spent two weeks there every few months to drive those efforts. Datapoints collected monthly from health-care facilities include new malaria cases, deaths, hospitalizations, stocks of medical supplies used and remaining, and other information. Malaria patients are tracked by age and associated with one health-care facility in their respective locale. Many datapoints raise red flags as Otten assesses the monthly data from across the country. For example, if 100 mosquito nets are dispatched by truck from the central warehouse in the capital, Bangui, to a health facility in the rural north, but data show that the facility received only 50 nets, Otten and his team investigate. Otten remembered a case where the number of malaria treatments used by one health-care facility was much higher than the reported malaria cases. He explained what had transpired:

89 h

j Parasites and Armed Rebels

“So one [health-care facility] was an HIV clinic that was asking for malaria drugs to give to every patient who was coming in for HIV. They [the HIV patients] are seen to make sure they’re OK every month and then they’re given their HIV drugs. Well they were giving them a dose of malaria treatment prophylactically [to prevent illness], and that’s not according to anybody’s guidelines—not the World Health Organization and not the [CAR] Ministry of Health.” In that case, the dataset was accurate—but the treatment process was wrong. In other instances, the data itself is inaccurate, even when the process is right. “A second example is that we saw a couple health facilities where—these were big health facilities—where there were the number of [malaria] cases reported was, say 200, and the number of malaria treatments was 1,000 ... So when we investigated that we found that what happened was that it was a big institution that was only reporting data of malaria cases from one of their wards,” said Otten. So, the hospital staff needed training. “You know malaria’s more common in children and you get them into the hospital more often because children with malaria have a higher rate of developing severe anemia,” Otten said. Only those cases were reported. But other wards, such as labor and delivery, adult ward, outpatient department, and others were not. Otten described the response:

“So you know that was a big process. We had to go there and we had to talk to the head of the health facility to get them to meet with all wards together so that all wards would report the number of malaria cases so that we’d have an accurate total from the entire facility, not just from a single ward.” If you’ve ever worked on a large software implementation project, you’ll know that it takes time to get it right—whether your process is monitoring malaria eradication or manufacturing a jet engine. Sometimes it takes more than a year to get all staff on board and well trained. Once installed, there are always bugs and additions and human errors. MALARIA AND ITS VECTOR AGENT While a malaria vaccine is under trial at the time of this writing, this project relied on distributing insecticide-treated mosquito bed nets and a drug known

90 h

Parasites and Armed Rebels j

3.3 This map shows the absolute number of deaths by country. Red dots are sized for number of deaths. Data from WHO. Deaths estimated, but not exact. Produced by Dianne M. Finch- Claydon using Tableau.

to clear the parasite from blood cells called artemether-lumefantrine, referred to as “ACT” among global health types. The drug is recommended by the World Health Organization (WHO) as the first line of treatment for uncomplicated malaria. There are more severe forms of malaria that require other treatments. The rapid diagnostic tests (RDTs) to identify malaria are easily administered even in remote areas that don’t have laboratories readily available. The combination of nets and therapies and rapid tests changed the paradigm for eradication of malaria in developing countries—but only when those supplies are available, distributed successfully, and monitored using data. In fact, the disease was all but wiped out in the United States and most other developed countries in the mid-20th century, as the map in Figure 3.3 illustrates. Each red dot is sized for the number of malaria deaths in 2017. It’s quite easy to see where the disease is now concentrated in the world: sub-Saharan Africa. On the website, the interactive map rotates through the years from 2000– 2017, and hovering on a dot produces a tooltip with more information on the country and number of deaths. Malaria deaths have dropped by more than 60 percent worldwide since 2000 via use of the nets and treatments, according to the United Nations, but the Central African Republic (CAR) and other underdeveloped sub-Saharan countries remained behind the curve. But even in the United States and other advanced countries, where malaria was eradicated long ago, monitoring and evaluation (M&E) is still essential in a world where microbes evolve and become resistant to drugs.

91 h

j Parasites and Armed Rebels

Outbreaks, even if relatively small, are often a sign of drug resistance or a future epidemic—or even pandemic. Those outbreaks raise big red flags in the field of public health, and M&E managers and public health experts watch for those flags. In fact, according to the Centers for Disease Control and Prevention (CDC), 63 outbreaks of malaria were reported in the United States between 1957 and 2015. Studies show that as the climate continues to warm, mosquitoes, ticks and other insects that normally live in warmer southern climates are finding their way north, some toting parasites with them. So it’s comforting to know that the WHO, CDC, and other organization are watching for outbreaks, vaccine and drug resistance, and new microbes that take a liking to humans. Let’s hope they are well-funded and staffed. THE MOSQUITO AS VECTOR Malaria is caused by a parasite that is delivered into human cells when a female “anopheles mosquito” bites. That human becomes infected as the pathogen replicates in the liver. Then, when another mosquito bites the newly infected human, the insect picks up the parasite, and carries it to more people—the so- called “vector” infection path. Mosquitoes may also pick up the parasite from a cow or other animal. Sadly for the citizens of the CAR, the most common mosquito there prefers human blood to cows or other animals—the Plasmodium Falciparum (P. Falciparum) mosquito. The parasite it carries causes nearly all of the malaria cases in CAR. Other plasmodium species, including P. vivax, are more geographically widespread and can live in cooler climates. Malaria can appear as a flu-like disease. In some cases, symptoms don’t appear for days, weeks, or even months. At the severe end, the disease can cause convulsions, coma, and death.

3.4 Anopheles stephensi mosquito consuming blood from a human. Credit: CDC, Wikipedia: https:// en.wikipedia. org/wiki/ Anopheles#/media/ File:Anopheles_ stephensi.jpeg

92 h

Parasites and Armed Rebels j

Some mosquitoes, fleas, and ticks act as vectors as they distribute various pathogens to humans. Other vector-borne diseases include Lyme disease, West Nile virus, and dengue fever. There are many more. Beyond the mosquito bites, one of the primary forces behind the high number of malaria deaths in the Central African Republic is poverty. Without the Global Fund, IFRC, and other humanitarian agencies, medications and nets would not be available widely. But sadly, malaria was not the only scourge requiring focused attention by the Central African Republic government and its fledgling health-c are facilities. Rebel Eruption and Roaming Health-Care Facilities Certainly, eradication of any disease from any country at any time represents a monstrous challenge to global health strategists, practitioners, and health staff on the ground. But soon after Otten got started, the country erupted into widespread violence in the wake of an armed rebel coup d’état. The president was ousted, and sectarian fighting between Christians and Muslims broke out across the country. The populations most vulnerable to malaria infections and deaths became also the most vulnerable to rebel violence, looted homes and villages, and health-care facility closures. At the time of this writing (5 years after the coup), the pillage continued, though there were some more peaceful periods following negotiated agreements. The United Nations reported that nearly 700,000 citizens were displaced inside the country because of the violence, and over half a million fled to refugee camps in other countries. The population of the Central African Republic was over 4 million before the coup, and many think it has been halved since, though census numbers are questionable at this time. By 2017 the country fell into a wide-scale humanitarian crisis. People were suffering from lack of food, clean water, transportation, and basic infrastructure. The education system came to a halt after teachers fled from villages, where rebels discovered that schools serve well as base camps. CAR is a mineral-rich country, and rebels took over some mines in the north and started selling rough diamonds. They also ramped up “road taxes,” by stealing supplies and cash from delivery trucks—including those with medical supplies and mosquito nets on the way to rural health facilities. And, of course, the exodus to safer countries left a paucity of essential medical services. As Otten observed:

“If you wiped out 50 percent of doctors [in the U.S.A.] tomorrow, health care would decline dramatically. So that’s what happened when the war started.”

93 h

j Parasites and Armed Rebels

Doctors could leave because they had resources, including access to transportation, unlike most villagers. Many teachers also left, and the government started training parents to teach children at home. Many health-care facilities closed or moved to safer territory. There are accounts of villagers walking for days to find health care, according to news reports. Hundreds of thousands fled on foot and crossed borders to Cameroon, Chad, South Sudan, and the Democratic Republic of Congo, while some refugees stayed in the country in makeshift camps in the bush. This led to a rise in snake bites and a shortage of anti-venom. Some remote villages lack any roads leading to them. In fact, many villagers have never seen cars or trucks. Red Cross workers walk to deliver goods in those situations. Cell phones are rare in the rural areas—though in peaceful times some villages may have a few cell phones that work if they are close enough to the scarce cell towers. Obviously, this was not the best time to start such a monumental project. But Otten, the IFRC, and the Central African Republic’s national health ministry continued working on the project, knowing that the mosquitoes would continue to bite—rebels or no rebels. Otten and his team had worked on global health projects in unstable conditions before, but not long after the project started there was hope for an election and a peaceful solution. Otten was well-known and respected in medical and epidemiological global health circles before he was tapped by IFRC for the M&E position. He had earned a reputation for heading M&E teams, including the one that wiped out polio in China in the 1990s, using a data-driven map strategy. His team analyzed 3,000 counties in China to identify polio cases and areas that were not reporting any cases to the central health ministry. Otten recalled:

“It wasn’t that there wasn’t any disease [polio] there. It’s that their surveillance wasn’t good. So we then used that to put effort into training and other resources, into contacting the folks at the health facilities in that district to get them to increase their surveillance.” Polio vaccination programs were implemented across China and data reporting became mainstream to prevent recurrence or to spot resistance to the vaccine. “I think there should be much more attention on that [data-driven M&E] for helping countries to build a culture of using their own data and managing the stockouts [shortages in medical supplies and drugs] and drawing conclusions about what’s happening to the cases, and whether the severe cases are going down or not—and if they’re not going down, why they aren’t going down,” said Otten.

94 h

Parasites and Armed Rebels j

That’s key to the success of the malaria project—accurate numbers that represent the disease numbers and trajectories each month. In his youth, Otten studied pediatrics and then spent a few years in practice treating children. He then became interested in public service, so enrolled in a public health program at the University of Minnesota, followed by a move south. “I spent two years in Mississippi kind of doing public health for government and worked for the National Health Service Corps,” said Otten. The Service Corps hires health professionals like Otten to work in communities that lack adequate health services in the United States. Otten then found his way into global health where he was most needed—in developing countries. First he interned at the U.S. CDC’s epidemic intelligence service, and remained at CDC for 20 years. Looking back to this period, Otten said:

“But I was actually loaned out at least half that time to the World Health Organization in China and Africa and then I retired from CDC and joined the WHO for 2 or 3 years, and then after 4 years I started to do the consultancy for the Red Cross in the data space.” And somewhere in that mix he also worked for Save the Children and had consulted to several non-governmental organizations (NGOs). Then came the CAR project. THE PROJECT To get started on the project, Otten first had to identify each health-care facility, called a “FOSA” in the CAR, that was still up and running—adding the name, location, and status of each to a growing database. That list of health-care facilities was crucial to the project since all malaria- related data is tied to a health facility serving a community. The geographic and administrative structure in the Central African Republic includes towns and villages, sub-prefectures, prefectures, and regions. The datasets are categorized by geographic units. The health-care facilities sit at the bottom of the geographic hierarchy. The country’s health ministry already had a list of health-care facilities, but the chaos rendered the list outdated. Some were still open. Some had closed. Some had moved. And some were mobile. “I wouldn’t say it’s easy [finding the mobile FOSAs], but we kind of find out about them because we communicate with the international NGOs,” he said, referring to non-governmental organizations such as Save the Children and Médecins Sans Frontières (MSF), among others.

95 h

j Parasites and Armed Rebels

Many NGOs became partners with the IFRC, which Otten said was helpful:

“They ask for supplies from us, you know, and we ask them to list the name of the health facility [in their purview], and if they know something new that we haven’t seen before then we get some more information and we put that on the list, and then we tag them as a special type of health facility.” For example, the partner NGO may tell them whether a new mobile FOSA was currently in a specific village. “Yes, some of them are fixed, they have been there since, you know, 10, 20 years. But then there are also some mobile ones. In an emergency setting you always have some mobile health facilities. Sometimes they’re associated with [refugee] camps, displaced persons camps, and they’re set up depending on where people are,” said Otten. A moving target—not the best situation for collecting data. As Otten said:

“When the warring parties and the population at risk starts to move, these kind of makeshift health facilities pop up. Some of them are closed down when the population goes home or goes someplace else, and then others pop up and we don’t have them [health facilities] in our dropdown lists.”

Those “dropdown lists” allowed Otten to select a health-care facility, to see all malaria data for that facility, and update it. If the data wasn’t received at the end of a month, he’d look into it. Did the facility close or move, or is the data late? The team planned to support 166 facilities for the project, but ended up working with at least 600 FOSAs by 2017, according to IFRC’s Geneva-based director, Jason Peat. Since children and babies were dying unnecessarily, time was of the essence, so the team continuously tried to work with FOSAs and prefectural health departments to streamline the monthly process. It’s not as though data collection was a new idea for the CAR government; it’s just that the system was not designed to eradicate a disease. “People at the ministry were used to looking at data [from FOSAs] by quarter—national level indicators,” said Otten. To make a dent in malaria cases, Otten and his team designed a system to monitor more datapoints more often.

“So we’re shifting the entire paradigm from quarterly summary data to looking at it monthly,” said Otten. “The more

96 h

Parasites and Armed Rebels j

datapoints you have, the more effective your management is ... big data [in global health] has shown that to be true in every country in the world.” The goals were noble, but the challenge to transform the system was formidable—an understatement of some magnitude. PROCESS I asked Otten how patient data was first documented before the project. “That’s the easy part. Patients come into them [FOSAs]. Every health-care facility has a register of sick patients. It’s a line item, no folders. If you have diarrhea, they write the [patient] name, village, diagnosis, and treatment,” Otten said, adding:

“And they count the patients, the numbers tested, and the numbers that tested positive for malaria. But it’s not typed into a computer at the health-care facility at all.” It was a paper trail—but data was collected. That’s a start. The Central African Republic’s health ministry, working with the Red Cross and Otten, provided local information and contacts at most health-care facilities in the country—but the lists were not up to date. The first phase of the project started in and around the largest city and capital of the CAR, Bangui. We’ll look at data from the first phase of the project in which bed nets, malaria medicines, and diagnostic tests were delivered directly to health facilities, called “FOSAs” in the datasets, or to their respective sub-prefectures for distribution to those facilities. But despite all of the support around the team, there was a shortage of trucks, which slowed delivery of essential supplies to rural villages and towns. And even when trucks were available and running well, there were other issues.

“Rebels and bandits attack [the trucks] in rural areas. Sometimes they set up roadblocks and take ‘taxes,’ ” said Otten. “They take bed nets and sell them, and then go on to the next city, to do the same.” Otten said that at times they can lose 30 to 50 percent of supplies for the FOSAs on account of the bandit “tax.”

97 h

j Parasites and Armed Rebels

And drivers were at risk. Even with adequate transportation, there are few main roads in the country—limiting ways to get around rebel attacks. The country had been making great progress installing cell towers and internet infrastructure nationwide before the coup and subsequent civil war. To speed up the data collection process, Otten and team provided 81 phones to 81 sous-prefecture (sub-prefecture) supervisors in the country, allowing them to enter data into surveys using the telecommunications network. Otten explained:

“So we gave we gave them [supervisors] phones, a smartphone, and a small monthly data contract for each of these supervisors. We asked them to collect the data each month from the health facilities in their district.” Those supervisors would visit the smaller healthcare facilities (FOSAs) in towns and villages, or conduct surveys with them by phone—if the village had phones available. Either way, the data ends up on the mobile phone system and was then uploaded to cloud survey software called “Magpi,” which was developed by another pediatrician, Joel Selanikio, who also had roamed the world for the CDC and WHO to focus on children in need. Selanikio’s phone-based survey process has taken off in developing countries, and was used within the “Rollout of Rapid Mobile Phone-based (RAMP)” program—a combined technological and methodological process to collect and use data relatively quickly compared to 5 years earlier, when it was all done using paper-based surveys. “So I receive the data that comes in by phone to the Magpi web database so that I can go in and look at that data,” said Otten. But before he does that, the data goes through some vetting in Bangui. “First, our own staff and the ministry of health’s data manager in Bangui goes in and looks at the data and does data cleaning. After the data cleaning step, he shares it with me. Then I find some issues, and we go back and forth a little bit to get the data cleaner,” said Otten. The RAMP survey process, using Magpi as its survey platform, has been used in several countries for health and disease monitoring. Otten summed up progress:

“Just in the last 3 years it’s starting to work so there are a couple countries now—Zambia and Burundi—where their system is working. But in the rest, the really poor African countries, it’s not working at all. So they’re managing in health management with no data.”

98 h

Parasites and Armed Rebels j

Since my interviews with Otten and IFRC staffers, RAMP has been expanded beyond the malaria project; it is also used to manage and evaluate data for tuberculosis and HIV in the Central African Republic. “Now, fortunately, with the mobile technology, we have a chance actually to put the desktop computer aside and go for a new paradigm because we’ve had huge problems with the desktop paradigm where you need a data entry team in every district, and that just hasn’t worked very well,” said Otten. There were too many problems with internet lines and lack of training. And in Otten’s view:

“That’s why in a lot of these countries, the child mortality and the death rates still remain stubbornly high. And why there are stockouts [empty shelves]. There are not enough bed nets and not enough malaria treatments, plus child pneumonia is not getting treated effectively.”

TRUCKS, ROADS, AND REBELS There’s nothing more important than getting the bed nets and medical supplies to the health-care facilities, but trucks are hard to come by in the CAR—the problem exacerbated by the war. “It’s not safe enough there. People are not willing to put their trucks on the road. They are keeping them, hiding them from looters,” said Freddy Munyaburanga, a private logistics entrepreneur from Rwanda hired to help resolve transportation issues for the malaria project in the Central African Republic (Figure 3.5). Munyaburanga had directed transportation logistics for humanitarian projects in Rwanda and elsewhere—some sponsored by the Clinton Fund and the Global Fund. “Before distribution I would send people to the field to run a status village by village. They know the number of people who live there and everything, so they would give me the raw data and I read the data and came up with distribution plans and other stuff,” said Munyaburanga (Figure 3.6). But the lack of cell phones in rural areas worried him—from a security perspective. He said:

“Truck drivers sometimes have to leave their trucks, filled with supplies, to go to find help when there is no cell phone service. And that is common,” Munyaburanga said, adding “And there is no security. You basically have to pay for the people to guard the truck. It’s very costly.”

99 h

j Parasites and Armed Rebels

3.5 Freddy Munyaburanga with Red Cross colleagues. Courtesy of: Freddy Munyaburanga.

3.6 IFRC staff and volunteers loading one of the large trucks for delivery of nets and medications. Courtesy of: Freddy Munyaburanga.

To make matters worse, the roads to rural areas are often too muddy to use, or in some cases have collapsed into rivers. “And you have bridges which are broken. In some cases you have to cross the bridge, but there is no way. We just had to make wooden bridges to allow a truck to cross to the other side. Or sometimes you have to turn around and go 300 miles to get to the other side,” said Munyaburanga.

100 h

Parasites and Armed Rebels j

3.7 Borrowed truck with bales of nets for delivery. When large delivery trucks break down, volunteers sometimes help out with deliveries using their own vehicles, including small trucks, cars, and even motorcycles. Courtesy of: Freddy Munyaburanga.

Some drivers have even resorted to building makeshift rafts to float supplies to the other side of a river when bridges were out. “We would try to arrange for another truck to come to the other side of the bridge and do the offloading and then loading on the other side.” But trucks weren’t always available on the other side. So they down graded to two wheelers or recruited volunteers to use their own vehicles (Figure 3.7). “Other times we would hire motorcycles. But the maximum they can take is four bales of nets. Imagine the logistics around offloading 300 bales to motorcycles where one can take just four bales!” said Munyaburanga (see Figure 3.8). There are more motorcycles than cars and trucks in the countryside, so that made sense. See Freddy Munyaburanga’s amateur video showing what happens when a road needs attention along the route. Fortunately, Munyaburanga was not attacked during his tenure on the project, but he recalled an incident in which one of his drivers encountered bandits.

“You know everyone is armed—it’s really a mix of the population and the militias. The militias live within the population in rural areas.”

Munyaburanga described a driver who got stuck. “It was during heavy rain and the truck was approaching the bridge, and of course they couldn’t cross the bridge because it was not functional. So they tried

101 h

j Parasites and Armed Rebels

3.8 Bale of 50 nets on the right and one net on the left. Courtesy of: Freddy Munyaburanga.

to turn around, but there was so much mud, so much mud. Then the truck was stuck in the mud,” he said. Munyaburanga said that the driver’s conductor, lacking a cell phone, walked away to find help, leaving the driver with the truck. Soon enough bandits with guns arrived threatening to kill him. “So he ran away and hid [in the bush] and the truck was left by itself. I think they forced the padlock and opened the container and they just took everything from it,” he said, adding that they took about 500 bales of mosquito nets. When the colleague returned with law enforcement, the bandits were gone. Munyaburanga went on to say that rebels often stop trucks on their way to villages. “They ask for cash or goods—a ‘roadway tax.’ … They are not always violent, fortunately, particularly when they get what they want.” At times, to deliver salaries to health-facility staff, he piloted a small plane. People had to be paid in cash in rural areas because there aren’t any banks. Transportation logistics are not for the faint of heart. THE DATA While Otten’s work in D.C. uses all the latest technologies, he’s learned to embrace older systems in developing countries if necessary:

“You take what you know about data science, then you look at what’s going on in the field, and then you use the things that work, and try not to push things at a higher level that you’re familiar with in the north on the people on the ground.”

One of my first questions was whether he used data visualizations. I was thinking about web-based visualizations, which was quite naive. The answer was “no.”

102 h

Parasites and Armed Rebels j

“So you’re gonna use some of the technologies from 10 years ago or 15 years ago and that’s fine because they’re five times better than what they’re using, so that’s a key point that I have noticed and it works fine,” he added. Of course he visualizes the data for the team, but the charts are printed on monthly bulletins. Otten explained:

“You know having the bulletin on paper and distributing it in PDF on email versus having it on the web—that’s not a big deal. We have a lot to do just to get everybody caught up using the data on a PDF bulletin.”

Using the data to drive strategies, that’s important—and getting “buy in” from managers across the board is difficult and takes time. As with many data projects, if managers don’t use the data in decision making, it is a waste of time. Data Exploration When you explore data, including data from governments, universities, and other entities, you’ll often find a “schema” that describes the database “field names,” otherwise known as “dimensions” or “variables” or even “column labels” if using a spreadsheet. Below I’ve listed a few of the dimensions we’ll be using in our visuals—so that you can see how they are defined in the schema for the larger database. Field labels, in computer programs and database schema, are named by the programmer and defined by the type of data. For example, the data field for malaria medication is ACT, and the field type is “numeric,” because the field will contain counts of those medications. Our dataset was created mainly from a list of questions, a survey, filled out by individual healthcare facilities every month. So if the question asked for the name of the FOSA’s supervisor, then the field label would be of type “text” or “string.” At times, a field label might be defined as “Boolean” if the answer has only two possible choices, like a 0 or a 1. For example, one could use a Boolean to store a 0 if a health-care facility is closed and a 1 if it is open. For this database, the schema included every survey question that health- care facilities are expected to answer at the end of every month. Remember, they enter their answers into a mobile phone—or they ask their sub-prefectural health administrator to enter it for them before uploading to the central health ministry. Before, the surveys were mainly on paper and delivered from the field up to the central health ministry. I’ve listed only a few of the field labels and associated survey questions from the database schema below.

103 h

j Parasites and Armed Rebels

When Otten exports sets of data from the larger database to Excel spreadsheets, those field labels become column labels that contain the raw data from the surveys—every row attached to a region, prefecture, sub-prefecture, and health-care facility by month.

5. Qui (nom) est le superviseur de la sous-préfecture pour cette FOSA? (Who is the sub-prefectural supervisor?) (Asks for sub-prefectural supervisor’s name) Data Type= Text Data Field Name: superviseur_nom

60. Nombre de TDR utilisés ce mois (y compris les TDR perdus, détruits, utilisés par erreur, tests répétés, etc.) (Asks how many tests were used for the month at the FOSA) Data Type= Numeric Data Field Name: tdr_utilises

61. Nombre de TDR restants à la fin du mois (How many TDRs were left at the end of the month?) (Asks how many diagnostic tests are still in stock) Data Type= Numeric Data Field Name: tdr_restants (this stores the number of diagnostic tests remaining in a health-care facility for a specific month).

The spreadsheet has over 80 columns and had been more than 4,000 rows, but I had to cut back on data so that it could be small enough to use for training purposes. So let’s explore our spreadsheet and the dimensions we have chosen for our visuals including the medications (ACTs), diagnostic tests (TDRs), nets (Milda), malaria cases, and the associated health-care facilities (FOSAs). The Spreadsheets Figures 3.9 and 3.10 show portions of our spreadsheet, and I’ve added a row (in red) with English translations of each dimension or column label. Notice that the medications (ACTs) are separated by the weight of the patient in columns R through Y. Babies of course would not receive the same dose as a larger child or adult, so the numbers are divided into four categories. Columns Z and AA store the diagnostic tests—including how many were used for the month, and how many are remaining in stock. The number of mosquito nets delivered and remaining in stock are stored in columns AD and AE, and total malaria cases are stored in AH. All data in this specific spreadsheet is stored by region, prefecture, sub- prefecture, and FOSA (health-care facility) by the month of data collection. You’ll see in the visuals that not all FOSAs reported data for all months. This project was in progress when this data was created, so all of our visuals represent a snapshot in time—when Otten and managers were using the data to plan next steps.

104 h

Parasites and Armed Rebels j

3.9 This screenshot shows the medication dimensions (ACTs). I’ve added English in red. On the far right there are also two dimensions for diagnostic tests (TDRs).

Otten said the project started in the south where it was safest, and gradually moved to all areas of the country. Before moving to the visualizations, we need to understand the geographical and administrative units in the CAR. This is an essential part of our data backstory—since the data itself is circumscribed within this hierarchy of administrative units. There are seven regions in CAR. Bangui is the largest city and houses the national government, including the health ministry. The seven regions include Plateau, Equateur, Yade, Kagas, Fertit, Haut- Oubangui, and Bangui. Within those regions are 14 prefectures and 71 sub-prefectures. We focus mostly on sub-prefectures and health-care facilities within them since our datasets are reported from the ground up. The visualizations, unlike in other chapters, target project managers—or those who monitor malaria cases in every corner of the country on a monthly basis. Take a look at Figure 3.11—the front page of the monthly bulletin that Otten creates from his analysis and sends by email or snail mail to all managers and stakeholders, including the project funder. All charts shown on the PDF reflect aggregated numbers at the national level—obviously of importance to the central health ministry. But the database of course contains detailed data for every region, prefecture, sub-prefecture, and FOSA. The box at the top center reads “stock actuel d’intrants essentiels,” which roughly translates to “essential supplies currently in stock.” It’s front and center for a reason. Nets or “milda” are essential to preventing new malaria cases and ACTs (medications) are needed to treat those who have the disease. And without

105 h

j Parasites and Armed Rebels

TDRs (diagnostic tests), doctors wouldn’t know who needed the drugs. So all of these, at minimum, must be well stocked and properly used. While all of the data outlined in the report is important to the project, Otten assessed one of the datapoints realistically. He referred to the TDR chart, table 5 on the front page of the report, which shows the rate of positive malaria cases for the month. He of course wants to see it going down every month. In French it reads “Taux de positivité des TDR.” From December to March the rate of positivity was between 62 and 68 percent. By April it was 52. Otten noted:

“Now if, in the next 3 to 4 months, if it is still in the 50s, then that will be a positive sign. But before April there was no evidence that we had a widespread nationwide impact on Malaria.”

He added that, at that time, the mosquito nets had not yet been distributed to every prefecture. “You see the 52, but I’m a bit suspicious of one number for one month,” Otten said. The sample below is from an earlier month, so shows 68, not 52. Knowing when to suspect a datapoint is an essential skill in data science. “Some data is not perfect,” Otten added. Data Enhancement Using Code I didn’t see the need to clean the data from Otten, though I did remove several rows of data simply to reduce the size of the spreadsheet for training purposes. 106 h

3.10 This screenshot moves farther to the right in the dataset and shows dimensions for nets (Milda) and malaria cases. English translations have been added in red.

Parasites and Armed Rebels j

3.11 Sample of a monthly report produced by Otten and colleagues in Bangui for managers in Bangui, Global Fund, and the IFRC in Geneva. Courtesy of: Otten/IFRC.

I also added some English column labels for training purposes, but the French labels are uploaded to Tableau. One can rename any field label in Tableau for the visuals. But there was one tiny datapoint missing that I felt would enhance our visualizations—and provide a teaching moment! Notice the legend in Figure 3.12 at the top right. It is a black box with a list of numbers and associated colors. The dots on the map are colored by CAR’s seven regions—the highest administrative levels in the country. The dots are sized by the number of malaria cases. 107 h

j Parasites and Armed Rebels

The health ministry in the Central African Republic of course knows the region numbers, but other stakeholders who want to see the status of the project may not. I found it unclear myself, so decided to add the region names, though they were not included in the dataset. So this small issue offered the chance to introduce some simple coding— enough for you to understand how logical decisions work in code. By creating a temporary variable (or “field label” or “dimension”), I can use code to assign the region name to every row of data used in the visualization. But that region name will not be saved to the database—it will simply be created and stored in memory when the visualization is live. Notice the white box hovering over the map in Figure 3.12. There, you’ll see the details of a “CASE statement.” If you search online for CASE statements, you’ll find that they exist in most programming languages, and are similar to the “IF-THEN-ELSE” statements you’ve likely heard about. The code, if you look closely, is almost self-explanatory. Basically it is telling the visualization software to look at the region number for each row of data, and if it equals “1” then set the new dimension, “RegionName,” to “Plateau.” If “2”, then “Equateur.” You can fill in the rest. “Null” is used for any row that doesn’t include a region number. Then every time you want to see the region in a visual, you can use the name in the view, but also to filter rows of data by region. It acts like any other dimension or field label. In geek-speak, we’d say that this new variable is “interpreted” from the raw data and doesn’t exist until we run the visuals—it hangs out in memory while the visualization is up and running. See the website for how to create a new variable with a CASE statement, and then use it in the visual. When one needs to add new data to the dataset, such as the latest month’s malaria cases or nets distributed, it must always be done through the 108 h

3.12 The name of the region has been added using a CASE statement in Tableau. Produced by Dianne M. Finch- Claydon using Tableau.

Parasites and Armed Rebels j

database—not from visualization software packages like Tableau. Otherwise, project management becomes a nightmare. Who entered what from where? Was it entered more than once? The visualization software should be used only for visualization and exploration, but you can create as many temporary variables as needed. I introduced this idea because as you get more into visualizations, you’ll often find that you need dimensions that don’t exist in the database. For example, maybe you want to visualize the crime rates by county on a map, but your data doesn’t include those rates. It does contain dimensions for the number of crimes and the population of the county. So you create your own dimension. Let’s call it “crimeRate.” Your new dimension, “crimeRate,” would be created by dividing the number of crimes by the county’s population: crimeRate = #of crimes/population. If you want to use it as a percentage, then multiply the result by 100. For strong database explorations and for visualizations, learning some basic coding is important, if not essential. But a review of basic math and statistics is certainly needed. See web resources for references to basic statistics and math used often in data reporting and visualization. DATA VISUALIZATION The visualizations presented below are not designed for a general audience. Instead, they are meant for monitoring a project from a few angles. All data used is a subset of the entire project. Let’s start with a map of malaria cases for context for the entire country of the Central African Republic. Keep in mind that the data you see on the map represents only what was reported by the health-care facilities at a specific point in time. We will see the reporting months in the detailed data views, but the year isn’t necessary for training purposes. In Figure 3.13 circle size represents the number of malaria cases for a sub- prefecture that reported cases. The sub-prefectures are encoded to colors, as the legend shows on the bottom right. The figure is a dashboard—a combination of charts created individually. In the dashboard, they interact with each other. The line chart below the map provides details for each health-care facility within a selected sub-prefecture, and that selection is done by clicking on a map circle. When a circle is selected, the line chart below it reacts by displaying the cases for each month for each health-care facility. You can see the highs and lows for each reported month. Because the more detailed data includes the month of the malaria diagnosis, a line chart works well—using the month as a timeline on the X axis (Figure 3.14). The line colors represent the sub-prefectures to match the map color profile. Remember that for this snapshot in time, the data collection training was ongoing, so one must question the numbers for accuracy, which is a significant aspect of Otten’s job. 109 h

j Parasites and Armed Rebels

Otten’s database includes the names and contact information for every supervisor at the health-care facility and its sub-prefecture, so he can pick up the phone to question any oddities, and determine whether training or other assistance is needed for more accurate data. MEDICINES, MOSQUITO NETS, AND DIAGNOSTIC TESTS Moving on, we’ll take a look at the status of medications (ACTs), nets, and diagnostic tests. The dashboard in Figure 3.15 uses three charts reflecting medication gaps at three levels. Gaps are determined by subtracting the ACTs used for patients from the number of malaria cases for a specified month. When the number is negative, there were more patient cases than treatments administered—a red flag worth a follow up. Was the data entered incorrectly? Was the facility out of ACTs? Were the patient cases lower than the data show? The two bottom graphs break down the ACT medication gaps by prefecture (on the left) and sub-prefecture and health-care facility on the right. So managers

110 h

3.13 Malaria cases dashboard by sub- prefecture. Circle size represents the number of malaria cases for a sub-prefecture. Sub-prefectures are encoded to color, and the lines at bottom show malaria cases for individual health- care facilities. Produced by Dianne M. Finch-Claydon using Tableau.

Parasites and Armed Rebels j

3.14 Selection view of 3.13 showing data for one sub- prefecture and its health-care facilities. Produced by Dianne M. Finch- Claydon using Tableau.

3.15 A dashboard containing a regional map as the top view. Click on a region and the two charts at bottom show medication gaps by prefecture, sub-prefecture, and health-care facility. Produced by Dianne M. Finch-Claydon using Tableau.

can monitor the quality of the data or the treatment process at any level—from regions to facilities on the ground. See the website for using the interactive filtering from top level to detail levels for regions, prefectures, sub-prefectures and health-care facilities.

111 h

j Parasites and Armed Rebels

3.16 Dashboard with two graphs and a photo. Data is filtered by sub-prefecture. Blue bars show delivered nets and orange bars reflect stocks available for distribution.

Mosquito Nets Logistics managers keep a watchful eye on net deliveries from the project start to the end. The dashboard in Figure 3.16 clearly shows the number of nets delivered (blue bars) and the number of nets in stock (orange). In the map, sub-prefectures are color coded, and bubble sizes represent the number of nets delivered in the community. The bar below can be filtered down to a sub-prefecture and the net stock levels as well as nets used (delivered). Orange bars show stock levels, and stacked bars reveal numbers for one health-care facility (FOSA) at a time. Blue bars show nets in use (delivered). That data in the map obviously shows that one sub-prefecture in particular, Begoua, has used a substantial number of nets—over 70,000. The project was implemented in steps, so the outlier may make sense, but a manager monitoring the project would ask why if the answer wasn’t obvious. A sample of a selected sub-prefecture is shown in Figure 3.17. More information is provided when one hovers over a bar, including month of delivery and the name of health-care facilities. As you’ll see when you visit the website, dashboards comprise charts and other types of media—from photos to videos and live web pages. But if a topic has multiple data angles, a “story” can be pulled together. If one is using Tableau, a story feature allows one to incorporate multiple dashboards into one slide-show-like story—adding titles or text above each dashboard, as shown in Figure 3.18. 112 h

Parasites and Armed Rebels j

3.17 View after user selects sub- prefecture Mbaiki from the map. The blue bars, stacked, show the number of nets in use by health-care facilities within the sub- prefecture—each box within one bar representing one health-care facility. The orange bars show existing stocks of nets for each facility.

The same could be done within a web page, where dashboards are surrounded by narrative. There are many ways to tell a story online. The malaria eradication project involves multivariate analysis—where project managers must constantly monitor the status of the project from local to national levels, by location and healthcare facility, and most complicated—by monitoring every delivery of medication, nets and other critical supplies based on need. The storyboard shown in Figure 3.18 uses Tableau’s story feature. The interactive visual includes several dashboards inside one “storyboard,” allowing the user to move from one dashboard topic to the next. Notice the clickable tabs at the top of the image—currently showing the net deliveries and stocks by sub-prefecture and healthcare facility. That dashboard includes an image with text below it, a map at the top showing net deliveries for each sub-prefecture encoded by color, total net deliveries encoded by circle size, and a detailed breakdown of net deliveries to healthcare facilities within a selected sub-prefecture. The map drives the chart below it. Project managers would move from one tab to the next to understand the status of each major topic—digging into details where needed. Keep in mind that links to documents can also be included in these dashboards. And when new data is added to the database behind the visualizations, those visuals can be automatically updated in Tableau. For the malaria project, in reality, it did not make sense to rely on web-based reports since so many stakeholders were without consistent internet access. In the Central African Republic, internet access was growing nationwide before the violence ensued.

113 h

j Parasites and Armed Rebels

In news or public relations, it might be appropriate to use a storyboard like this for a political election—each dashboard showing poll numbers by various dimensions, such as gender, age, state, city, etc. One could move through each dashboard—where text and images or even videos are used to add context to the numbers. Project End According to the IFRC’s Jason Peat, who directed the CAR project, the team had delivered about 2.8 million insecticide-treated mosquito nets by the end of 2017, providing coverage for over 5 million children as they sleep at night when mosquitos are most active. Overall, about 4.1 million treatments (ACTs) were dispensed to malaria patients nationwide. Unfortunately, the central supply warehouse in Bangui was robbed in 2017, incurring a loss of over $100,000-worth of medical supplies, including ACTs and nets. The IFRC took over the security of the facility after the incident, but around the same time the violence was increasing. Several NGOs decided to leave CAR after staff members, including volunteers, were killed by rebels. According to Jason Peat, victims included staff from the Red Cross. United Nations Peacekeepers were also killed. Despite the violence and chaos, the interviews with Mac Otten and Jason Peat left me inspired. Many of those working for the IFRC and other NGOs were

3.18 Story feature starting with nets. Other tabs could illustrate medication gaps by location, deaths, births, treatments or other elements of the data-driven project.

114 h

Parasites and Armed Rebels j

volunteers risking their own lives to save others, and I’ve no doubt that they’ll continue to do so. I asked Otten in the midst of the chaos if he ever felt like giving up. He replied:

“Well, it’s just frustrations, you know. I don’t think I’ve ever had a day where I thought of saying ‘let’s give up.’”

STORY THOUGHTS FOR JOURNALISM STUDENTS While this book targets anyone interested in working with data visualization for the first time, my hope is that journalism students will find it helpful in classes when they produce stories. Thus, I’ve included a few thoughts. This chapter used the datasets from a managerial—not a journalism—point of view. I envision that the public relations officers might use the datasets while the project is in progress to report on successful milestones. For journalism students, it’s best to acquire the final datasets at project completion. That said, when the project is live, events can drive news, such as when rebels attack and steal supplies, or when volunteers are injured or killed, when innocent people are driven from their villages, when health-care facilities are raided or shut down, and so much more. As for a data-driven story, students can look at the grants and the costs of the project, found on the funder’s website once the project is completed. The grants are published before the project starts. That is an area to dig into. Was the project as expensive or more expensive than previous malaria eradication projects in the same country? Was it more or less successful, and why? For this you’d have to include references to the war since running a project in such chaos will inevitably face more obstacles. Important to note is that reports are done by grantees, and sometimes the grantor grades the project based on the initial goals and outcomes. Take a look at the Global Fund website, and others, for story ideas in that realm. You will find reports on this very project, and the problems related to thefts. A project like this one screams for maps. Maps of deliveries, thefts, and populations before and after the war (the war is still going on at the time of this writing). Of course bar charts and timelines make sense for projects such as this. Show a monthly timeline of malaria cases where those cases are encoded on bars or lines along the X axis timeline. Maps, remember, can drive the details of other charts, including tables, bars, lines by location. One other idea: look for other projects worldwide that use data-driven monitoring and evaluation—which has become an aspiration for project managers around the world.

115 h

newgenprepdf

4 Open City, Open Data

From Dublin to Silicon Valley Interview City Hall Stereotypes Are Just Stereotypes Public Data Acquisition Open Data Data Exploration and Cleaning Data Visualization Responsive Design Trees, Trees, and More Trees Cleaning Trees Palo Alto Open Data Portal Story Thoughts for Journalism Students

116 120 121 122 122 125 125 127 128 130 132 133 134

“The liberties of a people never were, nor ever will be, secure, when the transactions of their rulers may be concealed from them.” Patrick Henry

FROM DUBLIN TO SILICON VALLEY In 1980s Dublin, Ireland, a young lad watched his older brother type commands like PEEK, POKE, GOSUB, and ‘FOR x’ into a keyboard attached to a black screen with green characters. “So cool,” thought Jon Reichental, who was not yet 10 and in awe of his brother. At the time, the legacy “green-on-black” screen, a sample of which is shown in Figure 4.2, was beginning to be phased out, making way for multicolor monitors. And his brother had state-of-the-art equipment. “I used to look over his shoulder and watch what he was doing. I remember what he brought home first—it was a Vic 20 and then a Commodore 64 [see Figure 4.3],” said Reichental, who is currently the chief information officer (CIO) of Palo Alto, California’s City Hall. At the risk of revealing my age, I used those screens to write programs in Basic and FORTRAN in the early 1980s. 116 h

Open City, Open Data j

4.1 Palo Alto’s Open Government Data Portal. Each box leads to datasets and interactive visualizations on one of more aspects of the primary areas printed on the boxes above.

4.2 Green-on-black screen like the one Reichental remembers in the early 1980s. Photo by user: Gortu. Public domain: https:// commons. wikimedia. org/w/index. php?curid=342925

Reichental was fascinated not only by the mysterious computer commands, but by the fact that his brother told him he could create games on it. He said:

“I remember writing programs in the early 1980s using green-on-black monitors—and the painstakingly slow process of placing text on the television-like monitor. We’d ‘POKE’ values to specific locations in memory, and ‘PEEK’ to retrieve them. ‘PRINT’ would display them on the low- resolution screen at specific rows and columns.” 117 h

j Open City, Open Data

4.3 Commodore 64 (C-64). Photo by Federigo Federighi. Wikimedia, CC BY-SA. https:// creativecommons. org/licenses/by-sa/ 4.0

Back then, everything looked blocky. I remember writing similar programs using rows and columns to direct text to those green screens. “As for resolution, we were excited when displays allowed for more than 200 pixels across the screen,” said Reichental. “By the age of 10, I was able to spin up a couple of lines of code and I could make a program work, and one of my brother’s friends—he was quite young, in his late teens—he had a little company and paid me to write a piece of software. So, at the ripe old age of 10 I wrote my first piece of software!” “And I was hooked,” he added. And the same young entrepreneur then asked him to develop an educational game. “There was a very popular toy at the time called Speak and Spell,” he said. “It was considered state of the art, but when I look back at it now it was very basic. It would say the word ‘boat’ and you would type boat into the keyboard and if you did it right it would say ‘you are correct.’ And it would keep score. So it was a very basic sort of electronic game.” Reichental produced a similar game and was paid $200—a fortune for a 10- year-old in those days. With a wide smile, he said:

“If I had been in Silicon Valley today and done that, I probably could have gotten $10 million dollars!” Reichental, not surprisingly, studied computer technology at the Dublin Institute of Technology, and joined a technology start-up in Dublin after graduation. With each step in his career as a young lad, he reveled in working with the latest technologies, but also in how those technologies impacted humans, culture, and society. And that combination of passions would surface in his career at various points, particularly when he made his transition to city government decades later.

118 h

Open City, Open Data j

While working at the start-up in Dublin, he found that he was adept at picking up new technologies, and wanted to be where innovation was rampant. He then heard about a program that would do just that. “There was an opportunity to come to America, a visa program that Irish people could apply for that would give you a green card and then you could apply for citizenship,” he said. “There was no cost to this, so why not just throw my name into the hat? And that’s all you really had to do, give your name and address, and that was it—and you were randomly chosen.” And 3 years after applying, he was randomly chosen.

“I felt privileged. I know lots of people in the world who have gone through hell to get to America. I just had to send a letter.” So off to the American embassy he went, and soon after that he found himself in Florida. Reichental spent about 15 years at PricewaterhouseCoopers (PWC) where he led teams on emerging technologies—from predictive marketing to governance as well as IT innovation and data-driven demographics projects. His interest in demographics relates directly to his desire to use technology to improve the lives of people.

“So we are in a unique time right now. We have shifted from being a rural planet to being an urban planet. Over half the population live in cities and people are very quickly moving into cities. Of the seven plus billion people, well over three and a half billion live in cities, and every day hundreds of thousands move into cities. Within a few years another billion and probably another billion after that in a few extra years,” said Reichental,” adding, “And we are ill-prepared for it.” After his long career at PWC, the company was shifting its IT to India. Reichental saw what was coming and so moved across the country to work for O’Reilly Media as it was starting to look at e-book technology in Northern California. “It wasn’t an arbitrary decision. I like to go to industries that are either starting or in the middle of significant change because that’s really a fun place to play. So the book business looked interesting—this was just when e-books were like going crazy and everyone was asking is there any future in physical books? How are we going to monetize this whole thing? And what happens to authors and is the web going to take over? Just a lot of questions, and O’Reilly was at the front of that,” he said, adding that the company was also at the forefront of new conferencing tools. But he was a bit out of his element.

119 h

j Open City, Open Data

“It was gorgeous. It was in the apple orchards of Northern California. But for city-slicker like me that was a little difficult,” he said. During a meeting at O’Reilly, he picked up a phone call from a recruiter asking him whether he’d consider a position with Palo Alto City Hall as chief information officer. He recalled:

“It’s Palo Alto, the heart of Silicon Valley, and there was a very innovative and forward-thinking city manager there. I was curious about local government and the potential for me to make a difference.” Like most of us, he knew the stereotype of local government, but he was also aware that larger cities in the United States and other countries were working toward more information transparency and open innovative services designed to engage citizens, to share relevant information with them, and to gain trust through technology and cultural innovations. The “smart city” movement was on his reading list. He said: “So, you know, I could’ve done one of two things. I could’ve said ‘no thank you.’ You know, I sort of visualize government—I could’ve tapped into all the clichés and sort of bureaucratic, non-innovative slowness, just everything bad. But I chose the opposite—which was ‘tell me more, I want to hear what you have to say. What is it that you want to talk about today?’ ” After all, they looked for him—a guy known for reinvention and technological innovation. So there must’ve been a reason to reach out to someone who had no experience in government. So he was poached by city government and started his job at the end of 2011. INTERVIEW I chose Reichental for this book chapter because I was looking for someone who was deeply involved with the open data–smart city movement. I listened to some of his talks from digital and smart city conferences, and read about his work at Palo Alto, a city of only 67,000. He was not only opening data in Palo Alto to citizens, but working with colleagues around the world to promote smarter cities that use data to serve communities. He says that the first step to becoming a “smart city” is to become a “digital city.” In other words, data collection and use of technologies to make life easier for citizens comes first. Then, once established, integrate the physical city into a data-driven strategy. That would include traffic monitoring, sensors testing CO2, temperature, pollution, and other physical issues in the city—even potholes and broken street lamps. There are entire books written about smart cities. Take a look!

120 h

Open City, Open Data j

For my interview with Reichental, I expected to see a city IT department furnished with those typical grey metal desks and matching file cabinets stuffed with folders—some spilling out onto piles on desks topped with staplers and paperclip holders, coffee cups, framed photos, and non-descript plants—all blending into a backdrop where people are typing or filing or staring into space, looking up only when an annoying citizen shows up, interrupting that soundless lull that seems to slow time as staff wait for 5.00 p.m. to roll around. I left the elevator and headed for the IT department where a receptionist was expecting me and knew my name. She greeted me and immediately printed out a guest badge from a small device that sat on her desk. While waiting with a cup of coffee for Reichental to complete a phone call, I watched people work. Many were young, possibly in their 20s and early 30s. Some were popping up and down like meerkats—chatting over cubicle walls about their latest technology projects, hack-a-thons, and such. What I felt in that office space was energy—energy one finds in a technology start-up, energy that might attract young people to city hall jobs. Reichental led me to a small meeting room that felt larger than it was— brightly lit with one window facing the city and the opposite wall—glass from floor to ceiling—facing staff. Transparency, I thought. The furniture was colorful with a mid-century motif. A third wall was dominated by a large white board. I chose the bright orange chair, and turned on my recorder. “The place didn’t look like this back then [before he joined in December 2011]. I mean this is all my design, and it was pretty nasty here,” said Reichental. He added:

“There’s energy here now. When people walk in and do a double-take, they say is this the city? They think it’s a start-up.” I mentioned that I liked the bright atmosphere, something I had never seen in city government. “It’s the greatest compliment in the world to me that we created a perception that the government can be different, and can act differently, do different things,” he said.

CITY HALL So, taking on city government wasn’t such a bad idea. Streamlining communications between citizens and their government servants is one of many steps Reichental had on his mind when he decided to leave the tech industry for city hall. Today, Reichental lectures globally about the future of cities—and shares ideas with counterparts from around the world on how to prepare cities for the future using technologies and social innovations.

121 h

j Open City, Open Data

He appears to thrive on change, which is why a city hall seemed like an odd fit. However, to his surprise he found kindred spirits there—city hall managers who were hoping to transform the status quo and better engage citizens. Stereotypes Are Just Stereotypes City governments worldwide are thought to be slow-moving and bureaucratic. But if you’ve ever worked in a large corporation, versus a small start-up, you may find similar traits. A Harvard Business Review survey1 found that the larger the size of a corporation, the more bureaucratic it becomes—and “bureaucratic drag” sets in. HBR scored such organizations using seven categories: bloat, friction, insularity, disempowerment, risk aversion, inertia, and politicking. The study authors concluded that bureaucracy “frustrates innovation” and “breeds inertia.” Similarly, when we visit government offices to ask questions about home assessments, property taxes, building permits or licenses, parking tickets, and other mundane issues, we generally don’t find efficiency. Yes, this is a stereotype, but many of us have experienced it. Reichental wanted to change this pattern as the new chief information officer at Palo Alto City Hall, at least in his IT department. Public Data Acquisition As you likely know, city databases are not traditionally easy to get one’s hands on despite the fact that most data produced is funded by citizens and meant to be for the benefit of those citizens. Journalists, in particular, can tell you that they often have to either wait for long periods after requesting databases or documents containing public data—or they are asked to pay ridiculous prices for them. Sometimes they are told that they can’t have it at all because the “computer won’t let them search for that.” I heard that one myself. But Reichental believes that government data should not only be freely available to the public, but easily accessed in formats that can be directly uploaded into databases or spreadsheets for running calculations, conducting statistical analysis, or visualizing in graphs. “I saw a world where people didn’t have to come to city hall. They take out their smartphones and interact with government—like an app,” said Reichental. And if they don’t have to come to city hall, they don’t have to drive.

“Community expectations are also changing,” he said. “People want to cut back on driving. Fundamentally, we need to be thinking differently about the type of organization that hasn’t changed in a long time—city governments. Driving to get driver’s licenses, birth certificates, and other

122 h

Open City, Open Data j

things involves sitting in traffic, then finding parking. All of this needs to be rethought and reinvented.” Reichental is not alone in his vision. Worldwide, city managers are working on “smart cities” ideas and projects. In a nutshell, these ideas and projects involve digital connectivity between citizens and government through engagement and open data on everything from city budgets to police and fire, and environmental issues—such as pollution, traffic, recycling, parks, and livability overall. “The good news is that this is happening more and more in Europe and in the U.S.A., Asia, and Australia,” he said. “Digitally connected communities is a global phenomenon. San Francisco and New York City are doing great things, as are Vienna, Melbourne, and Singapore, where they are super focused on a smart-city nation. London is picking up, along with France, Ecuador, Colombia, and Kuala Lumpur.” And data-driven decision making is the keystone to smart cities—big data from small towns to large cities around the world. Imagine how these databases, as they continue to grow, could be used in climate mitigation alone. That is, if the data is shared. The Smart City Council2 defines smart cities as follows: “A smart city uses information and communications technology (ICT) to enhance its livability, workability, and sustainability. First, a smart city collects information about itself through sensors, other devices, and existing systems. Next, it communicates that data using wired or wireless networks. Third, it analyzes that data to understand what’s happening now and what’s likely to happen next.” And remember, Reichental hopes that citizens can cut back or even eliminate visits to city hall and instead communicate via smartphones and the internet. But, of course, big ideas take time, and opening data up to the public is a first step—and a big step—toward a digitally connected smart city. Reichental placed open data at the top of his list when he joined the city hall. He envisioned citizens becoming more educated and engaged with government if they could jump on the internet and easily see how the city was spending their tax dollars, read meeting notes in almost real time, comment on meeting agendas, and follow the progress of citizen-reported issues such as fallen trees, potholes, graffiti, and the like. Early on, Reichental spoke to his new colleagues as part of a panel discussion on priorities, and he was quickly brought down to earth.

“They, of course, were asking me ‘what does innovation of the government mean to me?’ and ‘what do I see as a future?’ And I started to describe a world in which people don’t have to come to city hall. They can just take out their smartphone and interact with government—like an app—and

123 h

j Open City, Open Data

I was talking more and more about this. And then it came down to, you know, this panel interview. We had about ten people, and one of them said, ‘John your vision is awesome, but our phones don’t work.’ ” “We just need some new phones.” Oops. My reaction to this vignette was “This is Palo Alto! And they don’t all have phones?” But Reichental took the practical approach. “I really remembered that, and took it to heart. I realized that if I can get the phones to work, I can buy credibility—and then I could actually do a lot of stuff.” So his first priority shifted toward telephones—making sure that everyone could communicate with citizens and each other from their own desks. In fact one of the city managers had been using a phone that lacked an 8 on the keypad, and had to ask a colleague to make calls for him for any number that required an 8. Otherwise, he was good. Reichental said that he found out later that a communications project had been funded about a decade earlier and that was to include new telephones, but it was never completed. “That was government” as he knew it, Reichental added. By the time I met Reichental in 2016, everyone had a working phone and his team had already built Palo Alto’s “Open Data Portal,“3 its front-end web page shown at the beginning of the chapter in Figure 4.1. From that portal, citizens could download data of all kinds—from police tickets to citizen complaints to budgets, expenses, and revenues by department. They could look at maps to find all parks and walking trails, or look at every tree in the city and its species. If one wants to buy a home or look at a new home, all building permits—applications and final approvals—were available by address and date. Meeting minutes were uploaded the morning after the various committees met, and citizens could comment on topics. Upcoming meetings were also listed. Citizens could view the salary of any government employee—including Reichental’s. And citizens could watch government responses to issues reported from communities. For instance, a citizen reports a pothole on Main Street through the website for all to see. From that point forward, one can monitor the response; how long did it take the city to repair that pothole? Using the same data, Reichental kept his eyes on citizen issues, monitoring them to determine whether the city was improving response rates, or not. This is, of course, data-driven monitoring and evaluation—similar to the work of Mac Otten in the malaria eradication chapter.

124 h

Open City, Open Data j

To the delight of journalists, there was no one “guarding” the databases. A reporter could just jump on the web page and select the data by various filters. In addition to opening up datasets, his team built friendly data visualizations of government spending, revenues, budgets, and much more—including those trees. New data continues to be added to the site. Suffice to say, the open data project was in place, and the data showed that the rate of citizen engagement was rising. Open Data As for open data, Reichental equates it with open government. He said:

“I do think that data made available to anyone who wants it empowers democracy. We’re seeing good evidence of that beginning in Africa, for example, where corruption is very common. When people have knowledge and information they get power back from those in leadership positions.”

This is right up my alley. Under the Obama Administration, data.gov was created as part of the “open data movement” that was taking shape to empower citizens with knowledge about their country, its demographics, government spending, health-care quality, pollution, and many other data topics of importance to communities. But at the time of writing, the Trump Administration has been dismantling some of the open data sites—including data related to environmental protection. The Sunlight Foundation and other open government advocacy groups4 started copying and storing public records to prevent them from disappearing. DATA EXPLORATION AND CLEANING There is so much data on the Palo Alto website portal to dig into for this chapter, but I thought I’d use city budget numbers for our first visualization—from 2012 through 2018. This type of data is hierarchical, and so provides the opportunity to have a 100,000 foot visual of the budget that will drive a more detailed chart. One of the great advantages of online data visualization is that you can start at the top level and interactively drill down to the details. But first, let’s look at the budget dataset in Figure 4.4. Notice there are two dimensions, Division and Cost Center, in addition to six separate year columns/dimensions. Divisions represent the top level within the overall budget. They include police, fire, utilities, public works, sustainability, and others. Within those divisions are cost centers. So, for the police division, cost centers include investigations, parking services, field services, admin, traffic, and others.

125 h

j Open City, Open Data

So if a citizen wants to see where tax dollars are going, this is a good place to start, though one could also look at expense and revenue databases. The dataset was already quite clean and ready for visualization, but I did a bit of wrangling mainly because I wanted to filter on year as well as by division and cost center. For filtering and grouping, I find that it’s more efficient, when possible, to have a single column for a filter. In Figure 4.5, I added two new columns—year and amount (columns C and D). Notice how there are now only four dimensions, compared with the original dataset that had eight (columns A through H). I made the decision to go from wide to narrow. So we have more rows of data, but fewer columns. When the user selects a year to filter a view, only one column will be needed for the search. If I wanted to show the years side-by-side, I may have left the spreadsheet alone—using all eight columns. If you look closely at the original spreadsheet in Figure 4.4, notice that there are some rows containing totals for divisions, but most of the cells in those rows are empty. I removed those rows because the software produces totals in its own way, and I don’t want to confuse things. But first, I checked to make sure that the totals added up as expected in the dataset. So just a small amount of cleaning and wrangling. This dataset is most easily analyzed using visualization. As soon as you encode a division to a color and the division’s total budget to size, you’ll clearly see where most of the city’s money goes—at the high level. See Figure 4.6. Obviously the largest bubble belongs to utilities. On the website, hover over the tiniest bubble and you’ll find sustainability. Hope that one grows! 126 h

4.4 The original data from the city before moving all the year data into one column. Public data.

Open City, Open Data j

4.5 Budget data spreadsheet after reshaping from wide to narrow. Produced by Dianne M. Finch-Claydon.

4.6 High-level bubble view of budget—used to interactively choose a budget division for exploration. Produced by Dianne M. Finch-Claydon.

DATA VISUALIZATION Budgets provide an opportunity to use bubbles. I say that because most of my students loved bubble charts, but they aren’t appropriate for many types of visualizations because they make it difficult to estimate actual size. Bars are great for that—as the height equals the size along the axis. But when you are designing a top-down approach and simply want to show perspective—with a bit of detail via use of a hover and a tooltip—bubbles work well, in my view. In the dashboard shown in Figure 4.7, a hierarchy exists. The entry point is the year—and the selector is at the top in the middle, colored orange so that it is noticed against the gray background. The years are 127 h

j Open City, Open Data

4.7 View of user selecting budget year. Produced by Dianne M. Finch-Claydon.

selected from a drop-down list. Next to that selector is the total budget amount for that year. The year drives the bubble chart on the left. That’s your overview, and clicking on a bubble, or division, drives the list chart on the right to display all the cost centers and budget amounts for that division. One can also hover on a bubble to see the total budget for the division. The total budgets for the chosen year are shown at the top by the year selector. Figure 4.8 shows the view when a user selects the police division—the blue bubble is highlighted as you can see. In the detailed chart on the right, it’s quite clear that field services represent the largest portion of police division spending. On hover, the toolbar pops up as shown. RESPONSIVE DESIGN Now, you’ve likely heard about “responsive design” for websites. Built for viewing on various device sizes, web page designs must include components that can be resized in real time. For example, if you are viewing photos on a website using your desktop computer, and then decide to show a friend the site using your smartphone, those photos shrink to fit the smaller view, right? The same holds true for videos, visualizations, text, art, and other components found in web pages. As data visualizers, I believe we should at least know the concepts and processes behind responsive design. There are many approaches to adjusting web page components to the suit the device size. Someone may do this for you depending on where you work, but your visualization needs to be prepared for running inside a responsive web page.

128 h

Open City, Open Data j

4.8 View of user selecting police division bubble to see all cost centers in the police budget. Obviously, field services, by far, makes up the largest portion of the budget for the police division. User is hovering over the field services bubble to see the details. Produced by Dianne M. Finch-Claydon.

In web development classes, I taught undergraduates how to write about 5–10 lines of HTML code that handled re-sizing based on browser width. All well and good, but if you have a visualization that doesn’t resize, then you have to create two or three versions of it, and the HTML code would then call in the best-fitting version depending on the device size. The paragraphs below include some discussion on resizing views for phones, tablets, and desktops using Tableau. See the website for example of the HTML/CSS code that selects a Tableau visualization based on the web page’s width, as well as suggestions on how to learn HTML, CSS, and more about responsive design. If you are using Tableau, it now has the capability to create a phone-sized dashboard based on your desktop-sized or tablet-sized design. Or, you can design your own. Either way it’s actually quite easy. Once you create the device-based dashboards and then publish your final project to Tableau, an “embed code” will become available that includes the ability to resize the visualizations. If you’ve ever embedded videos from YouTube or Vimeo, then you likely know how to add the code to your website. You do need to make sure that your web pages are prepared for adjusting sizes, however. Figure 4.7 was designed for a large screen, about 1,000 pixels wide. For the phone-sized version shown in Figure 4.9, I copied the larger version to a new dashboard, moved the detailed chart below the bubble chart, and set the width of the dashboard to 400 pixels. Then I resized the fonts. Tableau had suggested a phone dashboard that was quite similar to the samples below, but needed some adjustments such as reducing the font sizes and lining things up vertically. Generally speaking, one should design for the smallest device first and expand out from there, since most viewers will be using smartphones. But either approach works.

129 h

j Open City, Open Data

TREES, TREES, AND MORE TREES The city of Palo Alto has been keeping an inventory of trees for several years in a database that includes exact tree location, species, health status ratings, height, width of trunk and canopy, notes on any potential obstructions such as electrical cables, date of last inspections, branch trimmings, and other datapoints. Since I love trees as much as I love the ocean, I couldn’t resist using this database. Even better, this green data provides the opportunity to use custom symbols instead of the usual circles, pins, and bars, as you’ll see below. Keeping track of trees in a city is actually part of the global “smart cities” movement. It’s not as though cities didn’t keep track of trees in the past—many did. But today, world databases are expanding and being used in assessing tree stocks as CO2 sinks—big data on trees. Trees are vital to air quality, particularly in cities. They provide oxygen and absorb CO2. So those involved in climate and sustainability are also interested in knowing about the tree inventory—not to mention citizens! First, let’s take a look at the maps that the city placed on its data portal (Figure 4.10). It looks like a satellite view of the city. Each circle contains a number, representing the tree count associated with its respective circle. Since the background is primarily dark green, the visualizers chose bright colored circles and pins to represent the tree stocks. When designing visualizations, once we know our database, we often find that there are too many values to display on one view. While I’ve greatly reduced the tree database for training purposes, there are still too many trees to show on a map. This is one of the reasons we use groups and filters. All visualization software provides the ability to divide data into groups and filters. So the city staff created groups by area or city block, encoded to circles. Citizens or city staff would then drill down through that circle to the individual

130 h

4.9 Budget sized for iPhone or other small device. This view was created to be 400 pixels in width. For a web page, one would use HTML5 code to check the width of the web page and display the design that suits it. Notice that one can create multiple views for multiple devices. Produced by Dianne M. Finch-Claydon.

Open City, Open Data j

4.10 View of all public trees in Palo Alto. At the top, trees are grouped into circles by area, and the numbers on those circles represent the number of trees in that area. A click on one circle opens the map at the bottom, revealing all trees for a given area, each encoded as a red pin. Credit: Palo Alto Data Portal.

trees. That makes sense, as most would be looking at trees in specific areas of the city. Keep in mind that citizens report tree issues to the city, such as branches leaning on cables or damage from storms. So it makes sense for the city to organize at the neighborhood level. For our visualization, we’ll target an audience interested in tree diversity in the city. So we’ll group trees by species and location. So that citizens can learn about various species, we’ll add photos of each species and also add a new column/dimension to the dataset for the common tree name, since not everyone knows the Latin species name.

131 h

j Open City, Open Data

And because we are filtering the data by species, there is enough room in a map to identify each tree by location, adding a tooltip for learning more about the chosen tree on hover. We’ll use tree images to symbolize trees on the map—colored green. Because we want the trees to pop, we’ll use a lighter background map than the satellite image in the city’s maps above. See the website to learn how to create custom shapes for use in Tableau. Cleaning Trees Like the budget database, the trees spreadsheet is quite clean for a visualization; one header row and no empty rows, as shown in Figure 4.11. I’ve added the dimension “Common Name” in column P for a general public audience, as most would be unfamiliar with the Latin species label. Of course I had to add those common names to the database for each tree species. One additional bit of cleaning involved removing all the “total” rows, since Tableau adds totals to views based on the rows chosen for the view, which may differ from those in the spreadsheet. Fortunately, the database included latitude and longitude for every tree. Thus it was quite simple to plot them on a map. Sometimes we need to “geo-code” our dataset—adding geographic information to each row before we can use maps. Figure 4.12 displays all of the blackwood trees in their exact locations based on latitude and longitude. Viewers can hover on any tree to find out more information about it. But to see even more information, the map is added to a dashboard, shown in Figures 4.13 and 4.14. By adding a detailed list view, the tooltip didn’t have to be overloaded with information. The tree selector, shown just above the species photo, drives the three components on the dashboard—the map, photo, and detail list.

132 h

4.11 The original tree dataset from the City of Palo Alto. The view is not complete as there are many more dimensions to the right. Column P, the common tree name, was added to this dataset by the author.

Open City, Open Data j

4.12 User hovers on tree symbol to see tooltip with information about the tree. This view shows all of the blackwood trees in the city, at least those in the reduced dataset we are using for this chapter. Produced by Dianne M. Finch-Claydon.

The photo changes when a new species is selected. Viewers of course can click on one tree, and the list view will filter down to that one tree for exploration. Figures 4.14 and 4.15 reflect views of trees at the neighbourhood or street level, a feature that citizens may use to find out about the health of a tree near their home—or to make a complaint about overgrowth or obstructions. As you’ll see on the website, there are three Tableau “sheets” or views that make up this dashboard—all driven by tree species. See the website for learning to use Tableau filters to rotate through photos. There is so much more that we could do with the data from Palo Alto. What I am hoping is that you will ask your own town or city government to make its databases public, and that you can offer to visualize those datasets! While larger cities, such as Boston, San Francisco, and others have been moving more quickly toward open data, Palo Alto jumped in as one of the smallest cities. Of course, financial resources come into play—not all towns and small cities have the staff to work on these projects. Take a look at Palo Alto’s open data portal, as well as others listed below. Palo Alto Open Data Portal https://data.cityofpaloalto.org/home San Francisco https://datasf.org/opendata/ Boston https://data.boston.gov/ Seattle https://data.seattle.gov/

133 h

j Open City, Open Data

STORY THOUGHTS FOR JOURNALISM STUDENTS While this book targets anyone interested in working with data visualization for the first time, my hope is that journalism students will find it helpful in classes when they produce stories. Thus, I’ve included a few thoughts. There are many stories to dig into from any financial or budget database. As reporters, you need to get that data backstory once you analyze the data. For example, what is behind the large bubble called “Non-departmental?” I don’t see a breakdown in that category, while other large budget bubbles include breakdowns. Notice the cost center breakdowns for “Police” and “Utilities,” for example. In this case, citizens might want to see how that non-departmental money is spent each year. Depending on what you find out, there may be a story of interest to local citizens. This requires reporting, of course! For the visualization, you could use the bubble chart of the divisions, as we did above, and then hone in on the details you uncovered in your reporting— allowing the bubble to drive the new breakdown. Hopefully you’ll include some good quotes from city officials about that breakdown. Local citizens like to know how their tax dollars are spent, obviously! Another thought: download your town’s salary data for the past 5 years and look at the raises for specific people or positions. Explore the data visually. Look for patterns.

134 h

4.13 Full map view of all blackwood trees. The detailed list at bottom requires scrolling to the right to see more dimensions, and down to see all trees for the species. Produced by Dianne M. Finch-Claydon.

Open City, Open Data j

4.14 Zoomed-in view of three blackwood trees. Produced by Dianne M. Finch-Claydon.

For example, using the year on the X axis and the salary on the Y axis, draw one line for male employees and one for female. If it appears that female employees are generally making less than their male counterparts, do some reporting. Make sure you look at all angles. For example, how long was each person employed? What is their background? Are the gaps due to position responsibility, education level, other? Reporting is always critical to this process—particularly when you see patterns that appear intriguing. Don’t assume anything! Journalists do sometimes report public servant salaries without digging into their backgrounds because it’s one of those stories that everyone reads—even when it simply states each salary by city role. But you can do more than that with your visualizations and narrative. Another budget area I’d look into would be “sustainability.” As you can see on the bubble chart, it’s the tiniest bubble. Why is that? How is the money spent? Are there any projects that had been proposed and then rejected? Is the annual sustainability budget rising year over year? One would expect it to as we attempt to address environmental issues in cities worldwide. There is a story in there somewhere. It may even be a “good news” story. You can dig into town meeting minutes that include the sustainability committee. Are they adding another green area? Are they buying more public land for conservation? Or are they selling public lands for development? And, of course, call those at the town who oversee sustainability and conservation—as well as people who are opposing those trying to conserve land. Developers!

135 h

j Open City, Open Data

You can bet that the sustainability committee has asked for funding for various projects. What has been turned down? Approved? What is new on the agenda? What would those who oversee sustainability do if they had a larger budget? Is there a long-term plan? As is often the case, the data drives the questions, but the questions must be answered through reporting and analysis. There are some fascinating stories in small towns that involve developers and environmentalists. You saw the small sustainability budget number, but until you dig via interviews and research, you don’t have a story. Simply reporting that the budget grew by X percent over 5 years isn’t exactly a story! Another idea along these lines: go to the Securities and Exchange Commission’s website and dig into corporate salaries and benefits packages. When I taught business journalism at Elon University, all of my students went on SEC data hunts and learned about various forms that public corporations are required to submit to the SEC. Go and explore, build some datasets! One idea is to look at income statements. Did the corporation’s profits (net income) increase over the tenure of the current CEO? And did the CEO’s salary rise with profits? Or with share prices? There are many areas to explore here, and I’ve included references to books and other resources related to SEC filings and public company information for those who’d like to dig in. See the website for information on public company SEC filings and other financial and economic datasets for use in stories.

136 h

4.15 Zoomed-in view of the silver wattle tree. Only three exist in the small database used for this chapter. Produced by Dianne M. Finch-Claydon.

Open City, Open Data j

NOTES 1 2 3 4

https://hbr.org/2017/08/what-we-learned-about-bureaucracy-from-7000-hbr-readers https:// r g.smartcitiescouncil.com/ r eadiness- g uide/ a rticle/ d efinition- d efinition- smart-city https://data.cityofpaloalto.org/home https:// s unlightfoundation.com/ 2 017/ 12/ 2 1/ o pen- d ata- p olicy- p articipation- a nd- progress-sunlight-open-cities-2017-year-in-review/

137 h

Index

Note: Page numbers in italic refer to Figures. 200 day experiment see Heron project (2010) ACT medication see artemether-lumefantrine (ACT) algae 28, 31; zooxanthellae algae 21, 23, 27–28 Anopheles stephensi mosquito 92 aragonite (shell-building mineral) 39, 40, 43, 54, 57, 58; Heron project 44, 47–48 artemether-lumefantrine (ACT) 91, 105; data visualizations 104, 110–111; malaria eradication project 110–111, 113 Bahamian reefs 22 bar charts 4, 83, 127; mosquito nets 112, 113; pteropods drop speed 84, 85; pteropods shells 78, 79 Bergan, A. 69, 71, 80, 83 Bermudian project 3 big data 1–2, 11, 12, 33, 41, 81, 123 biodiversity 15 bio-eroders  33, 54 bleaching see coral bleaching book website 4 box and whisker charts 49, 75, 76; coral growth 50, 52, 58; Porites cylindrica 49, 50, 58; pteropods shells 74, 75–76, 75, 76 bubble charts 4, 127, 134; Open Government Data Portal 126, 127–128, 135 budgets 134–136; Open Government Data Portal 125–126, 127–128, 130 calcification 13, 85 CAR see Central African Republic (CAR) carbon dioxide (CO2) 10–13, 29, 30, 43, 64; Heron project 44, 45, 46, 47, 49; oceans 12, 13, 27, 28, 29; pteropods drop speed 83; pteropods shells 68–69, 70, 71, 73, 75, 76, 77, 83; see also ocean acidification (OA) Centers for Disease Control and Prevention (CDC) 92 Central African Republic (CAR) 87, 88–90, 93–94, 97–99, 105; malaria 88, 89–90, 91, 92, 93, 107–108, 109, 110, 111; malaria eradication project 90–91, 95–98, 99, 100–102; mosquito nets 88, 89, 90, 102, 112–113, 114

charts 3, 4; see also bar charts; box and whisker charts; bubble charts; circle charts; line charts China: carbon dioxide 10; polio 94 circle charts 76–78 city databases 122, 124–125, 134–136; see also Open Government Data Portal, Palo Alto city governments 120, 121, 122–123, 125, 134–136; see also Palo Alto cleaning see data cleaning climate change 11, 26 coastal communities 31, 32 coral bleaching 6, 8, 13, 14, 21, 22, 22, 23, 27–29, 53, 54, 55 coral broadcast spawning 19, 20, 21 coral dissolution 8, 13, 48 coral growth 50–52, 58 coral polyps 18, 20, 21, 22, 39 coral reef ecosystems 8–9, 10, 13–14, 16–17, 31–33 coral reef extinctions 8, 29, 31, 34 coral reefs 3, 6, 8, 9–10, 12, 13–17, 18–19, 21–22, 28–29, 31, 38, 39, 55–57; see also Heron project (2010); ocean acidification (OA) coral reef species 32–34 corals 12, 13, 16–19, 21, 22, 39, 40, 54, 58 coral skeletons 8, 13, 21, 28, 32, 33, 38, 54 coral species 8, 18–19, 32, 33, 38, 53; see also Porites cylindrica (coral species) dashboards 4, 112–113; artemether-lumefantrine 111; budgets 127; malaria 110; mosquito nets 112; responsive design 129 data 1, 2 data backstory 1, 3, 12, 40, 51, 56, 72, 74, 134 data cleaning 3, 41–43, 81–83 data collection 1, 2, 4 data.gov 125 data munging 3, 41–43 datasets 1, 3, 33, 40, 43, 85, 89–90, 114–115 data visualizations 1, 3–4, 56, 57–59, 74, 83, 103–104; coral growth 50–52, 58; Heron project 43–48, 49–51, 55; malaria eradication project 102–103, 104, 105–109, 110–111; Open Government Data Portal 125–126, 127–128, 130–133, 135, 136; pteropods drop speed 81,

138 h

Index j

84, 85; pteropods shells 71–72, 74–78, 79; trees 130–133, 134, 135, 136 data wrangling 41–43, 82–83 diel vertical migration 63, 64, 79–80 digital cities 120, 123 financial benefits 32 fish species 15–16, 21–22, 31 Free Ocean CO2 Enrichment System (FOCE) 34–35, 36–39 Gartner 2 global health 2 grafts 32 Great Barrier Reef 6; coral bleaching 22, 23, 28, 29, 37 greenhouse gases 10 hard corals 21, 39 Heron Island, Australia 6–8, 9, 39–40, 53 Heron project (2010) 6, 9, 10, 14, 33, 34–39, 41, 43, 48–49, 54–55, 57; carbon dioxide 44, 45, 46, 47, 49; coral growth 50–52, 58; data visualization 43–48, 49–51, 55; Free Ocean CO2 Enrichment System 34–35, 36–39; pH 44, 45–46, 47, 49 high resolution datasets 2, 12 IBM 2 India 10 indoor lab aquariums 32, 33; pteropods 66, 67, 68 International Federation of Red Cross (IFRC) 87, 88, 114 International Panel on Climate Change (IPCC) 35, 38, 40 Jarvis Island, South Pacific 28 journalism 3–4 journalism students 59, 84–85, 114–115, 134–137 Keeling curve 11 Kennedy, J. F. 6, 31 Kline, D. 7–10, 12, 21, 27–29, 33, 56; coral reefs 6, 13–14, 15, 18, 31–32, 55; corals 17, 19; Heron project 34, 35, 36, 38–39, 41, 45, 54; ocean acidification 30, 40, 55 line charts 76; pteropods shells 76, 77 lionfish 22, 22 Maas, A. 63, 64, 65–69, 71, 73–74, 83 malaria 3, 91–92; Central African Republic 88, 89–90, 91, 92, 93, 107–108, 109, 110, 111 malaria eradication project 3, 87, 114–115; artemether-lumefantrine 110–111, 113; Central African Republic 90–91, 95–98, 99, 100–102; data visualizations 102–103, 104, 105–109, 110–111; mosquito nets 88, 89, 90, 102, 112–113, 114 mangrove forests 31 maps 3, 4, 115

marine science 2–3, 33, 34, 38, 41 medicines, lifesaving 32 mesocosms 67 mosquito nets 88, 89, 90, 102, 112–113, 114 munging (data restructuring) see data munging Munyaburanga, F. 99, 100, 100, 101–102 muttons (wedge-tailed shearwaters) 7, 38 National Institute of Standards and Technology, U.S. 2 National Oceanic and Atmospheric Administration (NOAA) 11, 28, 29 non-government organizations (NGOs) 95, 96, 114, 115 Obama Administration 125 ocean acidification (OA) 2–3, 6, 7, 8, 10, 12–14, 26, 27, 28, 29–34, 40, 47, 53, 56; coral reefs 32, 33, 34, 38, 55; pteropods 63, 66, 68, 73 oceans 12, 13, 26, 28, 29, 30, 31; pH 12, 13, 35, 38, 39, 40 open data 125, 133, 134 open government 2, 3, 125 Open Government Data Portal, Palo Alto: budgets 125–126, 127–128, 130; trees 130–133, 134, 135, 136 Otten, M. 87–90, 93, 94, 95, 104, 114; malaria eradication project 95–98, 99, 102–103, 105, 106, 110 Palo Alto 3, 120, 122, 133; Open Government Data Portal 117, 124–126, 127–128, 130–133, 134, 135, 136 parrot fish 16, 21 Peat, J. 96, 114 Permian Crisis 30 pH 30, 31, 43, 58; Heron project 44, 45–46, 47, 49; oceans 12, 13, 35, 38, 39, 40; Porites cylindrica 48, 49–50, 58 planulae (coral babies) 19, 21 Plasmodium Falciparum (P. Falciparum) mosquito 92 polio, China 94 porcupine fish 16 Porites cylindrica (coral species) 36, 37, 39, 52–53; coral growth 50, 52; pH 48, 49–50, 58 pteropods 61, 62–66, 67–69, 79–80, 85; drop speed 79, 80, 81–82, 83, 84, 85; sea angels 61–62; sea butterflies 56, 61–62, 64, 66, 80; shells 68–69, 70–78, 79, 83 reef communities 32, 54 reef dissolution 33, 54 reef ecosystems see coral reef ecosystems reef managers 33, 56 Reichental, J. 116–120, 121–124, 125 responsive design 128–129, 130 Rollout of Rapid Mobile Phone-based (RAMP) survey 98–99 sea angels 61–62 sea animals 3, 7–8, 12, 14–16, 21–24, 25–26, 31

139 h

j Index

sea butterflies (Limacina retroversa) 56, 61–62, 64, 66, 80 seahorses 21, 23–24, 25–26 seawater 26, 28–29; see also ocean acidification (OA) Securities and Exchange Commission (SEC) 136–137 Selanikio, J. 98 sensors 2, 40–41 shearwaters, wedge-tailed  7, 38 smart cities 120, 123, 133 Smart City Council 123 Smithsonian Ocean Portal 28 stony corals 17 Sukhdev, P. 32 Super Corals 56 symbionts 21, 23, 31; seahorses 21, 23–24, 25–26; zooxanthellae algae 21, 23, 27–28

Tableau Public 4, 57, 82, 129 The Economics of Ecosystems and Biodiversity (TEEB) 32 trees 130–133, 134, 135, 136 Trump Administration 125 Tufte, E. 74 turtles 7–8 United States (U.S.A.) 95; carbon dioxide 10; malaria 91, 92 University of Lisbon 26 vector-borne diseases  92–93 visualizations 1, 4, 40, 43, 85, 128–129, 130 wedge-tailed shearwaters  7, 38 wrangling see data wrangling zooxanthellae algae 21, 23, 27–28

140 h