Life, the Universe and the Scientific Method 0615267459

This book by the noted polydisciplinary scientist Steven Benner describes what scientists do to arrive at the 'trut

384 86 11MB

English Pages 209 Year 2008

Table of contents :
TitlePages......Page 1
Introduction......Page 5
Chapter-1......Page 11
Chapter2......Page 33
Chapter3......Page 39
Chapter4......Page 49
Chapter5......Page 81
Chapter6......Page 120
Chapter7......Page 146
Chapter8_......Page 186
Chapter9......Page 207

Recommend Papers

Hypothesis and Perception : The Roots of Scientific Method 9781317851608, 9780415296151

This is Volume X of seventeen in a collection of works on the Philosophy of Mind and Psychology in the Library of Philos

148 78 7MB Read more

The Micro and the Macro, the Scientific Mathematical and Philosophical Principles of the Natural Universe manuscript

459 114 7MB Read more

Everything in Nothing: The Secrets of Life and the Universe

Entering the 21st century, human civilization is faced with three serious challenges. The first is the conflict between

115 83 893KB Read more

The Montessori Method - Scientific Pedagogy as Applied to Child Education

512 23 2MB Read more

The Scientific Method: Reflections from a Practitioner 0198825625, 9780198825623

This book looks at how science investigates the natural world around us. It is an examination of the scientific method,

195 9 9MB Read more

Archeological Explanation. the Scientific Method in Archeology 9780231878500

Discusses the history and scientific method of archeology in the United States. Examines archeological practices, method

131 78 21MB Read more

Investigating Life in the Universe : Astrobiology and the Search for Extraterrestrial Life 9781138628717, 9781032472522, 9781315210643

119 36 63MB Read more

The Kalām Cosmological Argument: Scientific Evidence for the Beginning of the Universe 9781501335877, 9781501335907, 9781501335891

The ancient kalam cosmological argument maintains that the series of past events is finite and that therefore the univer

164 10 4MB Read more

The Light — The Micro and the Macro_ The Scientific, Mathematical, and Philosophical Principles of the Natural Universe

A book describing our world in it's closest form.

0 0 2MB Read more

The Micro and the Macro; The Scientific, Mathematical, and Philosophical Principles of the Natural Universe — a deeper knowledge of God

The Scientific, Mathematical, and Philosophical Principles of the Natural Universe — a deeper knowledge of God

0 0 2MB Read more

Life, the Universe and the Scientific Method
0615267459

Author / Uploaded
Steven Alfred Benner

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Life, the Universe and the Scientific Method

Life, the Universe and the Scientific Method Steven A. Benner Foundation for Applied Molecular Evolution The Westheimer Institute for Science and Technology Gainesville FL 32601

STEVEN BENNER is a Distinguished Fellow at the Foundation for Applied Molecular Evolution and The Westheimer Institute for Science and Technology. His research seeks to combine two broad traditions in science, the first from natural history, the second from the physical sciences. Towards this goal, his group works in fields as diverse as organic chemistry, biophysics, molecular evolution, geobiology, and planetary science. He has contributed to the founding of several new fields, including synthetic biology, paleogenetics, and computational bioinformatics. He co-chaired with John Baross the National Research Committee's 2007 panel on the "Limits to Organic Life in the Solar System", advised the design of missions to Mars, and invented technology that improves the medical care of some 400,000 patients each year suffering from infectious diseases and cancers. THE FOUNDATION FOR APPLIED MOLECULAR EVOLUTION (www.ffame.org) and THE WESTHEIMER INSTITUTE FOR SCIENCE AND TECHNOLOGY (www.westheimerinstitute.org) are nonprofit research organizations that use private donations and peer-reviewed grants from government and private sources to pursue research the crosses boundaries to address "big questions" in science and technologies. Current activities seek to apply human genomics to manage diseases such as cancer, hypertension, and alcoholism, as well as to ask: How did life originate? How did we come to be? Are we alone in the universe? The FfAME is currently a member of the NASA Astrobiology Institute, as well as the team sponsored by the National Human Genome Research Institute to lower the cost of acquiring human genomic information for the purpose of personalizing patient care.

The FfAME Press

The FfAME Press 720 S.W. Second Avenue, Suite 208 Gainesville Florida, 32601 USA Published in the United States of America by the FfAME Press www.ffame.org © Steven A. Benner, 2008 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of the copyright holder. First published 2008 Printed in the United States ISBN. (paperback) This book is for review purposes only, and is not for commercial sale.

Contents Preface

ii

Life and the Scientific Method

1

A Definition-Theory of Life

15

Four Approaches to Understanding Life

20

Working Backwards in Time from Life on Earth Today

30

Forward in Time: From Chemicals to the Origin of Life

61

Exploration to Expand Our View of Life

105

Synthetic Biology: If We Make It, Then We Understand It

130

Weird Life. Life as We do not Know It

174

Further Reading

192

Image Credits

193

Index

194

ii

Preface The opportunity to write this book came when Mark Courtney, the Program Director of the "Population and Evolutionary Processes" cluster at the National Science Foundation (NSF), called to say I had received an OPUS award. OPUS is an NSF acronym for "Opportunities for Promoting Understanding through Synthesis". The OPUS program reflects the history of books that integrate many fields to influence the development of biology. The Origin of Species, the 1859 book by Charles Darwin, is one example, but many others where similar "synthesis" has created new research directions in biology in the past. Thus, the OPUS program exploits inductive logic to expect that future books in this vein will create future research directions. Accordingly, the OPUS program gives scientists in "mid career" (a polite term) the opportunity to "synthesize" work from their own laboratories in books such as this one. Now for full disclosure. I was trained as a chemist who focused on a corner of that field that proudly carries the title "physical organic chemistry". My corner of chemistry provides tools to explain how atoms and molecules work. Since almost everything on our planet is made from atoms and molecules, this corner is well-placed to launch research in many directions. Especially in the direction of biology. The last century has provided reason to believe that life is no more (and no less) than a self-sustaining chemical system capable of Darwinian evolution. For example, the recently sequenced human genome is nothing more (and nothing less) than a statement about how carbon, hydrogen, oxygen, nitrogen, and phosphorus atoms are arranged in the natural products that allow us to have offspring. Evolution is nothing more (but again, nothing less) than the change in inherited chemicals that, through complex (and largely obscure) interactions with their environments, produced the richness of living forms and behaviors that have fascinated humankind ever since self-awareness became one of our characteristic features. Chemistry, however, is itself viewed as "difficult", by the public (of course) but also to the university science student who must take a chemistry course to gain entrance into medical school, veterinary school, or nursing school. Many biologists are victims of a poorly taught college course in organic chemistry. Many do not like chemistry at all. My career as an educator has focused on explaining to students why chemistry is not difficult, and how it can be used to unify our view of life in the cosmos. One of chemistry's entertaining applications is to ask the question: Are we alone in the universe? Do aliens exist whom we might encounter, or who might come here? Are aliens already here among us? Here, I use such questions (and the associated sciences of exobiology and astrobiology) as the focus for this book. This focus allows for a balance between these lighter topics and heavier science, offering a light-hearted reading experience that includes some serious stuff. This book was also timely because of a confluence of other events in my life, personal and professional. My child was just coming through middle school, where his science teachers had bravely attempted to teach something called "The Scientific Method". Being a faithful follower of my offspring's educational progress, I had read what he was being taught. As it turns out, "The Scientific Method" as taught in middle school scarcely resembles what professional scientists actually do. Therefore, I attempted to make this book readable by advanced high school students to convey a more realistic picture of how science actually works. No effort to discuss science as it is actually done, even the science of alien biology, can avoid the sociological issues that surround real science. "Publish or perish" and "No bucks, no Buck Rogers" are just two aphorisms that describe the struggle by scientists to survive in the academic ecosystem; these are the academic equivalents of "survival of the fittest" in natural ecosystems.

iii

While practicing scientists may not easily admit that their work is influenced by the need for publication or money, it is, and we weave this fact into the narrative as well. Further, no discussion of scientific method can avoid the discrepancy between how scientists view their methods from the inside, and how philosophers and historians who study science view their methods from the outside. Many scientists think that process in science involves "proving" propositions. Many scientists think that their hypotheses are "scientific" if (and only if) they are "falsifiable". Logicians and anthropologists of science know that these thoughts are incorrect as logical standards (as opposed to useful rules of thumb). Thus, even as we explain scientific methods to the educated layperson, we also hope to help our professional colleagues understand more of the logic behind science, especially in disciplines that are not there own. The need to understand logic, process, and method in many fields is especially important in exobiology and astrobiology. Questions about life in the universe involve many disciplines, from astronomy, planetary science, and geology through biology (in all of its hyphenated forms, including evolutionary biology, cell biology, and molecular biology) to chemistry and physics. Those interested in aliens must worry about all of these. The challenge of multidisciplinarity research arises only in part because different disciplines use different languages to describe different observations made using different instruments. Were it only so easy. The real problem presented by multidisciplinarity arises because different fields have different ways of judging evidence, different standards of proof, and different ways of professionally evaluating other scientists. Conferences degenerate into fights (with people throwing chairs) from disputes over these differences, not misunderstandings arising over what instruments were used to collect data. In this regard, this OPUS book was particularly timely because I had just completed my service on a National Research Council panel that addressed the question: How would extraterrestrial life appear if it did not share common ancestry with the life that we know on Earth? John Baross, a noted microbiologist from Seattle, co-chaired the panel with me, and helped produce a book covering this topic. Even though that volume was targeted at a more professional than is this book, it gathered widespread interest, in the public as well as scientific communities. Carl Zimmer, a noted science writer, commented that exobiology could be a platform to teach almost any subject in science. A more popular book connecting life and the universe to the scientific method would, I thought, be useful if it did nothing more than explain what the National Research Council's report meant. This book is also made timely by the new observations coming from the latest missions by NASA and the European Space Agency to Titan, comets, and Mars. These coincided with the selection of the non-profit foundation that I manage in Florida (the Foundation for Applied Molecular Evolution, www.ffame.org) to be a part of the NASA Astrobiology Institute. NASA pursues some of the most interesting scientific questions imaginable. This makes it easy to discuss scientific methods within NASA's "big questions", as well as in the context of the biggest question in biology: What is life as a universal concept? Perhaps the most startling coincidence, however, was the decision by the John Templeton Foundation to consider adding chemistry to its collection of "big questions". Previously, the Templeton Foundation had previously emphasized "big questions" in physics (for example, do physical laws reflect something called an "anthropic principle"; is their precise nature required to permit humans to exist at all?) and human biology (such as the nature of consciousness). Dipping its toe into the community of chemists, the Templeton Foundation arranged a meeting in 2005 in Italy asking whether "water" was special, about a year before this OPUS award was made. The

iv

Templeton Foundation also funded a small project at the FfAME under the direction of Matthew Carrigan to explore the possibility that water might be uniquely suited to life. Charles Harper, who directs the Foundation's efforts, found chemists to be irascible. But it is clear that if the universe has a "Why?" it must have something to do with life, where chemistry cannot be avoided. Most importantly, the OPUS opportunity came at a time where my diligent coworkers were producing a flood of results that was beyond remarkable in four fields that my laboratory had either helped to start, or had been active developing. Eric Gaucher and Michael Thomson had brought ancient proteins back to life that were last seen on the planet several billion years. This was far beyond my own optimistic expectations when we had set the new field of paleogenetics in motion two decades ago. Alonso Ricardo, Matthew Carrigan, Hyo-Joong Kim, and Heshan Illangkoon were resurrecting my faith in the potential of prebiotic chemistry to shed insight onto how life began. Daniel Hutter, Shuichi Hoshika, Nicole Leal, Fei Chen, Zunyi Yang, and Ryan Shaw, were expanding the synthetic biology of nucleic acids, effectively producing a new chemical system that could support Darwinian evolution in the laboratory. Stephen Chamberlin and Ross Davis were breaking new grounds in evolutionary bioinformatics, using new tools in genomics to understand still more deeply our history on planet Earth. While this productivity gave me more than enough experimental data to synthesize, it also enhanced one possible downside of the OPUS format: The OPUS format encourages authors to emphasize their own work in their "grand syntheses". If this book has just one defect, it is an overemphasis of results from my own laboratory. I tried to mitigate this defect by integrating our work with the work of many others. This integration has been informal, as expected for a popular book on science. There are no lists of citations, although from time to time, I recommended additional reading. Nevertheless, very worthy work of many of my colleagues has been excluded, and I may have lost some friends as a consequence. To these, let me apologize at the outset. Last, this book would not have achieved its balance between entertainment and education were it not for Jake Fuller. Jake was the political cartoonist at the Gainesville Sun before being downsized earlier this year. As you will see from cartoons throughout this book, Jake has talent. He could turn a cartoon idea that I suggested, often in just a sentence or two, into a work of art to illustrate the point that needed to be made. Enjoy. I am also indebted to Faith Portier for copy editing, Stephen Chamberlin and Daniel Benner for reading and commenting on some of the chapters, and (as always) Romaine Hughes, who in addition to providing a steady hand on the administrative helm at the Foundation for Applied Molecular Evolution, helped track down copyrights for various images from the movies and popular culture that also add a dimension of entertainment to this effort. Steven Benner Foundation for Applied Molecular Evolution Gainesville FL

v

vi

Life and the Scientific Method

1

Chapter 1 Life and the Scientific Method The movie The Puppet Masters introduced America to a field of science known as exobiology. Adapted from a novel of the same name by Robert Heinlein (spoiler to follow), the movie featured an especially nasty species of aliens who glued themselves to the backs of their human victims, sent tentacles into their brains, and controlled them like, well, puppets. Julie Warner played Mary Sefton, a NASA exobiologist called to the spacecraft’s landing site. Sam Nivens, a government operative played by Eric Thal, asked Sefton about exobiology. The exchange went like this: Nivens: So tell me Mary. What exactly do The Puppet Masters (1994, Hollywood you do for NASA? Pictures) describes a contest between the best minds on Earth (such as the planetary Sefton: My specialty is exobiology. protection expert Andrew Nivens, played by Nivens: Exobiology? Donald Sutherland (above) and a vicious Sefton: Uh-huh. It’s a study of what alien species of aliens (napping below) in a battle life forms might be like. that required the science of exobiology to win. Nivens: You actually make a living at that? Seems like it would be mostly guesswork. Sefton: Well, we had a little joke in school. Ours is the only science that didn’t have a subject matter. To the American middle school student trained in the “scientific method”, this would be the end of the story about extraterrestrials, at least the part belonging in a science class. There, “the scientific method” is a prescription that begins with neutral observations of the world. Objective hypotheses then follow from those observations. Scientists test these hypotheses by deftly constructing experiments, preferably experiments that distinguish between alternative hypotheses. 1

2

Life, the Universe and the Scientific Method

This prescription pretty much rules out exobiology as a science. If no alien life is available to observe, how can we construct objective hypotheses about aliens by observing them? Even if we manage to construct hypotheses, how can we test them? Without observations, hypotheses, or tests, we have no scientific method. Therefore, no science of exobiology is possible. Only “guesswork”. Yet the public is interested in the questions like: Are we alone in the universe? As I write this, the NASA Phoenix laboratory is on the surface of Mars. The absence of reported results for just one week in June 2008 sent the internet into a real life episode of The X-files. Was NASA concealing Martian life that it had found, the bloggosphere asked. “What do the Martians being concealed by NASA look like?” It went downhill from there. In part, our fascination with aliens comes from our interest in other “big ques tions”. What is ‘life”? How did it arise? What is the future of our life in the cosmos? What activity other than science might effectively address such questions? Phi losophers have made less satisfactory progress addressing many of these questions without a scientific method than four centuries of science having a method. Thus, the public believes that “scientific” opinion is better than “non-scientific” opinion. A popular book by Thomas Kida subtitled “The six basic mistakes we make in thinking” exhorts us to “think like a scientist”. OK. Seems good. But how? Understanding life as a “universal” (or, as philosophers like Carol Cleland at the University of Colorado say, as a “natural kind”) is a goal of research in my own laboratory. Accordingly, one goal of this book is to explain how that research is making progress towards answering these and other big questions. A second goal is to show how “the scientific method” as taught in middle school is different from what real scientists actually do. Science often concerns things that are not observed. Observations are rarely neutral. Hypotheses are rarely objective. Proof is impossible for almost any interesting proposi tion. Disproof is also not easy. Experiments rarely distinguish alternative hypotheses. Thus, the real practice of science is very human, with weaknesses intrinsic to humans. In general, humans want to believe something. They then select from many observations only those that support that want. Like dead people that Cole sees in the movie The Sixth Sense (1999, Hollywood Pictures), humans see only what they want to see. The third goal of this book is to teach how scientists make progress despite this aspect of their humanity. Successful scientists develop Galileo Galilei (1564-1642) within themselves an intellectual discipline that by Giusto Sustermans.

Life and the Scientific Method

3

manages their intrinsic human propensity for self-deception. Scientists are “good” or “bad” depending on how well they use that discipline. Indeed, science itself can be defined as the human activity that has and uses such discipline to manage the tendency of humans to deceive themselves. This has a “good news” aspect. Given this discipline, science as performed by humans can also be more powerful than science that simply follows the recipe that we learned in middle school. Such science offers the opportunity for insight, intuition, and creativity, all positive features of the way humans think. These create the “leaps” in science. Rather than write one book discussing what we might believe about alien life, another showing how scientists work, and a third covering the virtues of disciplined thinking, I thought that it would be fun to combine all three. We will explore how disciplined scientific methods (the plural is deliberate) tell us something about life as a universal (including aliens) as we explore how scientists actually work.

Science often concerns subject matter that is not observed

A syllogism connects propositions in a way that also connects their “truth values”. For example:

First, let us lay to rest the idea that scientists must directly observe • All emeralds are green. their subject matter. We start with • X is an emerald. Galileo Galilei, the individual who, • Thus, X is green. more than any other, ignited the or development of modern scientific • All emeralds are green. methods in the 1600’s. • X is not green. In 1609, exactly 400 years ago, • Thus, X is not an emerald. Galileo developed the telescope as a In both syllogisms, if the first two tool to observe celestial bodies. Why propositions are true, then the third did he play around with telescopes? must also be true. No reasoning is on Perhaps he was having fun looking firmer ground. at the neighbors. Perhaps he saw military applications for a telescope. Regardless of his motivation, Galileo’s obser vations through his telescope led him to an interesting question: Does the Earth move around the Sun or does the Sun move around the Earth? Today, we prefer the first model for celestial mechanics over the second. In 1609, this was not the case. How was Galileo to decide which model was correct 400 years ago? He could not observe the motion of the Earth directly; both of his feet were planted on terra firma. Therefore, Galileo did something else. He rolled balls of different weights down inclined planes and measured how fast they rolled. By measuring the times when the balls passed points on the slope, Galileo showed that the weights of the balls

4

Life, the Universe and the Scientific Method

did not influence how fast they rolled. Or how fast they fell. Or when they were attached to a string, how fast the string swung backwards and forwards. So what do rolling balls have to do with the motion of the Earth around the Sun? Nothing, it seems at first. To understand the connection, we first must recognize that Galileo was worried about a syllogism that disproved that the Earth moved around the Sun. The syllogism went like this: Major premise: If the Earth were moving around the Sun, then we would sense the Earth’s motion. Minor premise: We do not sense the Earth’s motion. Therefore, the Earth is not moving around the Sun. Conclusion: The logical force of this syllogism was not lost on Galileo. As with all syllogisms, if the major and minor premises are true, then the conclusion is true and the Earth must indeed not be moving around the Sun. To deny the conclusion that he wanted to deny, Galileo needed to deny the major premise. The minor premise was undeniable; we certainly do not sense the Earth’s motion around the Sun. Galileo used the fact that heavy balls and light balls roll downhill at the same speed (or more apocryphally, fall from the Tower of Pisa at the same rate) to deny the major premise of this syllogism. He used his experiments to assert: Assertion:

Even if the Earth were moving around the Sun, then we would not sense the Earth’s motion.

Here is how his argument went. The only way we can tell we are moving is if we pass by other things that are not moving. If the heavy Earth is speeding around the Sun and we are also, and if the air around us moves around the Sun together with us, then we would not sense our motion around the Sun. Everything is moving together. We are not passing by things that are not moving. A model of the inclined plane that Galileo used to show that balls of different mass accelerate at the same rate. Think of the Earth as a heavy ball, you as a lighter ball, and the air as the lightest ball. You are moving around the Sun. Sincethe only way you might detect your motion is by making reference to the heavier and lighter balls, you cannot tell you are moving? This logic allowed Galileo to connect observations he could make to the motion of Earth, which he could not observe.

Life and the Scientific Method

5

So what makes us think that things having different masses (the Earth, us, and the air) all move at the same speed? Because balls roll downhill at the same speed regardless of their mass. The heaviest balls represented the Earth, lighter balls represented humans, and the lightest balls represented the air. All move at the same rate. Subtle, I know, but The four moons of Jupiter as Galileo observed this was how Galileo is said to have them, as you can see them today with a low power used rolling balls to deny the major telescope. The planet Jupiter is at the center. premise of the syllogism that ruled out motion of the Earth around the Sun.

Non-logical arguments may be more persuasive Having trained in a middle school science classroom, we find Galileo’s argu ment compelling. Today, we understand that the Sun’s gravity is accelerating the Earth as it swings the planet around. But the Sun also accelerates us, the grass below our feet, and the air above us the same. Therefore, we do not sense the acceleration. General relativity provides a more advanced explanation. According to general relativity, the gravity of the Sun distorts space-time. We, the Earth, and its air are all traveling in a straight line through curved space. Hence we do not sense our motion. This was not how Galileo tried to persuade his audiences. Nor could he have done so using such language without getting into more trouble than he actually did (Yeah, right, Galileo; space-time is warped). Galileo had a rhetorically better argument, however. As Galileo was developing the telescope, he looked at the planet Jupiter. He saw four dots A collage of the Galilean moons of Jupiter as we observe next to Jupiter. You can see the same them from NASA flyby missions (composite image).

6

Life, the Universe and the Scientific Method

dots from your backyard with your own telescope. When Galileo observed Jupiter again a day later, the four dots had moved. In still later observations, Galileo saw only three dots. Galileo suggested that one dot had moved behind Jupiter. Still later, four dots were again observable. The four dots were orbiting Jupiter. Galileo’s observations generated a hypothesis that these dots were four moons revolving around Jupiter. Further, the revolution of Jupiter’s moons around Jupiter looked like a model for the Earth orbiting the Sun. Jupiter allowed Galileo and his contemporaries to directly observe something that looked like a mini-solar system. Any Enlightenment gentleperson could, with a telescope, actually see the revolution of heavenly bodies around others. This had persuasive impact as a rhetorical argument.

Arguments by analogy and the persuasion of the public Although no polls were taken in the 17th century, we suspect that more people were convinced that the Earth moved around the Sun by observing Jupiter’s moons revolving around Jupiter than by observing balls rolling down slopes. Any such conviction came outside of the logic that we teach to be a necessary part of scientific methods, however. No syllogism connects the motion of Jupiter’s moons around Jupiter with the motion of the Earth around the Sun. Instead, the argument (moons revolve around Jupiter, therefore the Earth revolves around the Sun) is an argument by analogy. We teach (as we should) that an argument by analogy is not scientific. Why? Because analogies can be found to support almost any argument, making analogies always available for us to deceive ourselves. We may, of course, excuse this argument by analogy by saying that it was not intended for a scientist-to-scientist exchange, but rather to per suade lay people to arrive at a “correct” conclusion without their needing to understand the ac Simplicius may have been modeled on Cesare Cremonini (1550tual science. After all, while there is a connection 1631), a colleague in Padua who between rolling balls and the physics of the Earth’s allegedly refused to look into a motion, that connection is obscure. Scientists telescope. Unwillingness to inspect data in the fear that it might might say: “The public cannot understand real contradict one’s views is an antiscience. So we may set aside rules of scientific scientific attitude inconsistent with argument required among ourselves to help the the intellectual discipline required for successful science. But it is an public arrive at the correct conclusion that we want intrinsic human trait. them to arrive at.”

Life and the Scientific Method

7

Galileo himself did not hold this view. Instead, he wrote a book, (Dialogue Con cerning the Two Chief World Systems) for the educated public. Here, he argued the Earth-centric and Sun-centric hypotheses in a discussion between a scientist (Salviati), an intelligent layman who was initially neutral (Sagredo), and Simplicius, who defended the Earth-centered model of the Solar System. Simplicius was actually the name of a sixth century philosopher, but this did not prevent the name (and its similarity to words like “simpleton”) from being taken as an insult to the authority of the Church. Some historians write that this impolitic choice got Galileo hauled before the Inquisition. Politics is also important in science. Should scientists assume that the public cannot understand science? I do not believe so. Hence, this book. Further, many public policy issues, such as climate change on Earth and trans fats in our food, must be resolved using science. Public participation is needed to resolve these issues. If the public does not understand the science at some constructive level, then the public must rely on arguments from authority. In such arguments, the public must believe something on someone else’s say-so. As there are always authorities to be found to support any side of any dispute, the public must resolve disputes by selecting the authority rather than selecting the argument. Arguments from authority are not scientific. They are better regarded as antiscientific, as they undermine science at its roots. The starting point for science is the presumption of the absence of authority, at least the human kind. In this way, arguments from authority are worse than arguments by analogy. Setting aside rules of argument to “help” the public arrive at “the correct” conclusion carries a bigger risk: the corruption of scientists. Again, scientists are human. Further, they participate in human cultures that are easily driven by error, fads, and politics, as with all human cultures. The “correct” conclusion today is often not the “correct” conclusion tomorrow (or the correct conclusion, period). Therefore, in my view, it is better to try to find some way to explain somewhat complex science to the public than to resort to arguments from authority, even my own. Besides, science need not be so difficult to explain.

Galileo’s experiments did not distinguish alternative hypotheses Even a willingness to tolerate arguments by analogy as expedients does not free us from more serious problems arising from the comparison between the Earth- and Sun-centered models in 1609. By rolling balls, Galileo created a way to deny the major premise of the syllogism that denied his hypothesis that the Earth revolved around the Sun. But nothing else. All that Galileo’s rolling balls showed is that we need not conclude that the Earth does not revolve around the Sun based on our inability to sense the Earth’s motion. This does not mean that the Earth does revolve around the Sun. Indeed,

8

Life, the Universe and the Scientific Method

our inability to sense any motion of the Earth around the Sun is consistent with both the conclusion that the Earth does move around the Sun and the conclusion that the Earth does not move around the Sun. Remember our middle school science fair? Projects awarded blue ribbons distinguished alternative hypotheses. Galileo’s ball rolling project would not get that ribbon in a science fair today because it did not distinguish alternatives. Neither the observation of moons revolving around Jupiter nor our failure to feel the Earth’s motion around the Sun rules out the Earth-centered hypothesis.

We may refuse to have any belief about Solar System mechanics So what do we do? We can, of course, escape the problem; we can choose to be agnostic on the subject. Because we are not compelled to believe either hypothesis, we may simply choose to believe neither. This approach is certainly pragmatic. We have no constructive need to believe that the Sun revolves around the Earth or the Earth the Sun. By “constructive”, we mean that as we go about our daily lives, we would not behave differently if we believed one or the other. As it makes no difference to our routine what we believe on this matter, why clutter our brains? The same thing might be said about alien life. Unless aliens arrive here and start making us into puppets, we really have no constructive need for any belief about extraterrestrials one way or the other. We may therefore chose to have none at all. We will not adopt this comfortably agnostic lifestyle in this book. First, aliens might some day arrive on Earth. We will then need a science previously having no subject matter to help us figure out how to prevent our all becoming puppets. I

Life and the Scientific Method

9

hope personally to be one of the exobiologists that NASA calls to greet the aliens upon first contact. Puppet-hood seems to be less likely for those people. Further, some might actually live their lives differently if they held a belief about the existence of alien life. Perhaps some will not choose to rob a bank or start a war, fearing alien retribution. It has happened before in the movies (The Day the Earth Stood Still, 1951, Twentieth Century Fox). I hope that my readers purchased this book because they are interested in big questions even if they have no immediate practical importance. Further, while you may not need to know about alien life, you will certainly need to have a constructive knowledge of scientific methods. A discussion of life in the universe is a fun place to develop that understanding.

Standards of proof are used to bring debates to an end Let us decide to be constructively interested in life as a universal and how we might recognize alien life were we to encounter it. Constructive interest here is demonstrated by my having written and your having bought this book. We may ask: How might we come to believe any propositions about a kind of life that we cannot observe? Let us return to analogy, here between various scientific methods: How did we historically come to believe that the Earth moves around the Sun? Even though Jupiter and its moons offered an argument only by analogy and rolling balls failed to distinguish among alternative models for our Solar System, we (as a community) did eventually decide that the Sun-centric model was better. In part, our community’s conviction about the motion of the Earth came because of other observations made possible by Galileo’s telescope. The telescope also helped us observe mountains on our own Moon, for example. While mountains on the Moon do not drive any logical syllogism concerning the revolution of the Earth around the Sun, they do undermine any view of celestial bodies as objects different from what we might study on Earth. With a surface much like the Earth, the Moon seemed to be just another world. Another rhetorical argument.

Johannes Kepler (1610) did not begin his observations with the intention of showing that orbits of planets were elliptical. No, Kepler believed that the orbits of planets reflected the relative sizes of platonic solids (the cube, the octahedron, and the like). His own observations caused him to reject his belief. Accordingly, Kepler is cited as an example of how good scientists have sufficient intellectual discipline to discard their favorite hypothesis in the face of contradicting observation. This discipline is rare. Generally when observations contradict a favored hypothesis, humans discard the observations to protect their hypothesis.

10

Life, the Universe and the Scientific Method

These were not the only observations made by telescope that undermined a view that celestial objects were fundamentally different from objects that we might hold in our hands. For example, the telescope allowed Galileo to observe spots on the Sun, which thereby became another imperfect celestial body. Again, this observation drives no syllogism relevant to the revolution of the Earth around the Sun, but it provided another reason to doubt a broad cultural view of reality that might deny the possibility of such revolution. Ultimately, the acceptance of the Sun-centered model by the community came through additional observations of planetary motion made using telescopes. Many of these were obtained by the astronomer Johannes Kepler. Kepler made very precise observations of the positions of planets in the sky. He then used these observations to propose that the planets whose motions he could observe did not move around the Sun in perfect circles, but rather in ellipses, oval shaped orbits that brought the planet closer to the Sun at some times than at others. Scientists did not stop here. Isaac Newton then developed mathematics that accounted for the elliptical orbits observed for planets other than Earth. His theory of gravitation was found to apply in laboratories on Earth , where bodies much smaller than planets were found to attract each other. Newton’s mathematics even provided a model-explanation for the behavior of Galileo’s rolling balls. These multiple observations, correlations and theories, some directly related to the movement of planets, others less so, and still others that were related only by analogy, together met a standard of proof acceptable within the community. The community came to believe that the preponderance of evidence placed the Sun at the center of the solar system (and not the reverse), and that it was time to move on to other things. This happened long before we were able to directly observe the Earth orbiting the Sun. “Standard-of-proof ” is a concept from law, not science. A standard is met when evidence favoring one view over another is sufficient to satisfy the community of interested people. In law, the standard-ofproof is defined by statute. Proof “beyond a reasonable doubt” is Peter Galison, a physicist, historian and film producer, required to convict individuals of asked in his book How Experiments End (1987) how a culture develops in a community that allows scientists a felony. O. J. Simpson was not to decide to stop doing experiments to test a theory long convicted of murder in criminal before the theory can be said to be “proven”.

Life and the Scientific Method

11

court under this standard. “Preponderance of evidence” is a weaker standard-ofproof used in civil court. O. J. Simpson lost his civil case to the Brown and Gold man families under this standard. In science, standards-of-proof are neither legislated nor dictated by authority. Instead, they develop as part of the culture of a scientific community. That development is poorly understood and does not follow clear rules. Because no authority stands above any field in science to dictate standards-of-proof, many arguments in science are arguments over what those standards should be. We will encounter many such arguments as we consider life as a universal and the search for aliens.

What happens when a community has no accepted standard-of-proof? The short answer to this question is: It depends. Thomas Kuhn, Michael Polanyi, and other historian-philosophers of science suggested that science comes in two forms (Kuhn, The Structure of Scientific Revolutions, 1962). The first is normal science. Here, scientists work within a paradigm, a framework of problem-solution examples that they learned as they were trained. They use this paradigm to solve new puzzles in analogous ways, applying tools and intellectual approaches normal in their craft. Funding is won, papers are written, and tenure is achieved. On occasion, however, scientists encounter observations that cannot be resolved by the processes of normal science. Instead, the observations are seen as anomalies; Kuhn speaks of the “crises” that these anomalies create. This creates a second kind of science, “revolutionary science”. In cartoon form, scientists in revolution realize that observations do not fit within their normal view of the world. They have no standards-of-proof to adjudicate disputes about those observations. The community lacks a paradigm to support further work. Therefore, according to this car toon, scientists in the community flail. Wild and crazy things are tried. This leads to a “revolution”. A new idea emerges that resolves the anomalies. A new research para digm is adopted, new standards-ofproof are accepted, and the commu nity returns to normal science. More funding, publications, and favorable tenure decisions.

12

Life, the Universe and the Scientific Method

Science is never correctly viewed as “settled” Sometimes individuals become convinced that an issue in science has been settled finally and for all time. We will frequently make the point that such convictions are anti-scientific; they undermine the intellectual discipline required for science, just as arguments from authority undermine science. This does not mean that individual scientists or scientific communities do not on occasion declare that “the science is settled”, “the debate is over”, and (more to the point) “I have won”. Again, scientists are human. Non-scientists who rely on science also make such declarations (witness recent statements that the causes of global climate change are “settled”). Such pronouncements are always problematic. Theories, interpretations, and even “facts” are always subject to re-examination. Those who deny this have abandoned a key tool to avoid self-deception, and are therefore doing inferior science. The science of celestial mechanics was revisited after Galileo’s position prevailed, at least in the language that it had been framed. The debate over the Sun-centered model for the Solar System was essentially over by the 19th century (Galileo’s Dialogue was no longer prohibited by the Roman Catholic Church after 1835). This was also true for professional physicists given the elegance (a kind of rhetorical argument) and power of Newtonian mechanics. Physics and physicists went on to other things. As it turned out, the 19th century was too soon to end discussions of celestial mechanics. With the emergence of general relativity, famously proposed by Albert Einstein in 1915, it became clear that Newton was wrong in a fundamental way. The Newtonian framework had given us a good model for reality (the Earth moves in an approximately elliptical orbit around the Sun) but evidently not for the correct reasons. While the observations made by Kepler remained correct to the accuracy with which they were made, they were not sufficiently accurate to detect the error in Newtonian mechanics as a theory. Any historian of science can adduce examples where “the science” once thought to have been settled turned out not to have been. Scientists may not want to daily revisit better-grounded propositions (we have hardly the time to do more pressing work). But the discipline associated with being a scientist requires a constructive belief that even the most settled proposition may need to be revisited. Here again, “constructive” means that we might actually act consistent with that belief.

How are challenges to “settled science” to be managed? What scientific method should be applied when a community’s fundamental views are challenged, as they were by Galileo and as they will be throughout our discussion about the universal essence of life? Certainly, we cannot stop work to

Life and the Scientific Method

13

respond every time a crackpot decides to challenge what he failed to learn in middle school. Nor can we expect scientific communities composed of humans to be more liberal than other communities when orthodoxy is challenged; the standard human response is roughly: “Kill the heretic.” Scientists are supposed to apply their intellectual disci pline to balance the fact that settled science may need revisiting with the need not to waste time responding to truly stupid challenges. One element of this discipline is knowledge of the primary data, the actual observations that underlie science. A disciplined scientist is able to answer the question: “So you believe that the Earth moves around the Sun. What primary data support your belief? What reasoning gets you from those data to your belief?” Likewise, disciplined scientists will deny themselves the luxury of an opinion (and the joy of stating that opinion in cocktail parties and to the press) if they do not know the primary data. If they do not know the primary data, if asked for an opinion, a disciplined scientists will simply say: “I do not know.” We often ask students in science about primary data as we train them in scientific discipline. For example, everyone “knows” that water is “H2O”; a water molecule has two hydrogen atoms bound to one oxygen atom. Nevertheless, it is fair to ask a student in chemistry: “What primary data supporting this ‘knowledge’?” Entry into the fraternity of chemists requires that the student be able to answer such questions. This, in turn, requires another kind of intellectual discipline: a basic curiosity that finds interesting the history by which scientific facts were collected. Conversely, suppose this view is challenged by a crackpot who insists that water is H3O, claiming that water molecules are built from three atoms of hydrogen and one atom of oxygen. What do we do? A true scientist is not allowed to dismiss this challenge out of hand or to ground that dismissal using an ad hominem attack (“The individual making the assertion is stupid.”). This is, after all, what the Inquisitors were tempted to say about Galileo (another argument by analogy). Instead, the response most appropriate to the intellectual discipline sought in science is: “Well now, let me see. Suppose water were H3O. What primary data must we have misunderstood? What else in our current view of reality would need to be revisited?” In this case, many things that we believe to be true (and where

14

Life, the Universe and the Scientific Method

we have observations to support our belief ) must be false for water to be usefully modeled as H3O. We might make a list of these and then decide whether it is worth taking time from teaching, research, and committee meetings to revisit those. If we decide to take that time, having both sides of a dialectic before us allows us to decide (as a jury) which side better meets a standard-of-proof.

Occam’s razor and “exceptional claims require exceptional proof ” This liberal approach to managing challenges to orthodoxy is rarely followed by humans, even scientists. Individuals who challenge orthodoxy rarely encounter a community that says: “How nice that you are attempting to overthrow our belief system. Let us see if your arguments have merit”. Part of the reason is that unedu cated crackpots can waste hours of time. A simple cost-benefit analysis requires us to ignore most of the attacks against better supported propositions in our sciences. Further, those who do not want to be persuaded cannot be persuaded, no matter what the data. For example, many in the Inquisition felt that they had been insulted by Galileo by his (evident) ridicule of the Pope in the fictional character Simplicius. Insulted, they would refuse to accept the idea that the Earth revolved around the Sun regardless of the evidence presented. For example, Cesare Cremonini (15501631), Galileo’s colleague in Padua, famously refused to look into Galileo’s tele scope to observe the moon of Jupiter (“Hey, who’s got the time?”). After all, Joshua had clearly commanded the Sun to cease its motion in the heavens. What would we have said to a 17th century audience to persuade them that our Sun-centered view was scientific while the Inquisition view was not? Especially when alternative models involving epicycles (which had everything moving around the Earth in circles orbiting in circles) were also consistent with observation and our inability to sense Earth’s motion. For example, we teach students that Earth-centered models for the Solar System requiring epicycles were correctly dismissed in the 17th century because they are “more complicated” than a “simpler” Sun-centered model. Those who saw the movie Contact (1997, Warner Bros.) remember Ellie Arroway, played by Jodie Foster, an exobiologist determined to detect radio signals from aliens. Arroway repeatedly mentioned something called Occam’s razor. Named after the 12th century philosopher William of Occam, this “razor” proposed to “shave” from models all but their essential features. Arroway insisted that all other things being equal, the simpler model must be correct. For her, it was simpler to believe that aliens existed and could be detected. The opposite view was held by David Drumlin, the director at the National Science Foundation who refused to fund her work. Many practicing scientists believe that Occam’s razor is a defining tool within the scientific method. It is not. In fact, simplicity is in the eyes of the beholder. Whether I think it is simpler to model the Earth moving around the Sun or simpler to model

Life and the Scientific Method

15

the Sun moving around the Earth depends on me and how I have been trained. If I were a Jesuit in 1609 or William Jennings Bryan prosecuting John Scopes for teaching evolution in 1925, if I believe that the Earth revolves around the Sun, then logic having the force of syllogism requires that I also believe the Bible to be wrong when it records Clarence Darrow (left), a lawyer, sits with William Jennings Joshua commanding the Bryan (right), a lawyer twice nominated for the US presidency, Sun to stop moving. The at the 1925 trial where John Scopes was found guilty of teaching Darwinian evolution. Bryan considered the Sunnotion that the Bible is centered model of celestial mechanics more complicated because wrong creates (for me) it would require him to set aside a literal view of the Bible, all kinds of complexity. which provided him simplicity in many other ways. Darrow: The Bible says Joshua commanded the Sun to stand For example, accepting still for the purpose of lengthening the day, doesn’t it, and you this notion requires me believe it? to revisit everything else Bryan: I do. Darrow: Do you believe at that time the entire Sun went that I believe from biblical around the Earth? authority, including how Bryan: No, I believe that the Earth goes around the Sun. light came from the void, Complexity, for sure, but Bryan later held, Occam-like, that it was simpler to believe in the Bible: animals were created, and Bryan: …scientists differ from twenty-four millions to three humans became sinful. hundred millions in their opinions as to how long ago life came Today, most modern here, I want them to come nearer together before they demand of me to give up my belief in the Bible”. individuals believe that the Bible might be less-thanliterally true. Only for this reason, however, are we today prepared to regard a model that does not involve epicycles but does contradict the Bible to be simpler than a model that does involve epicycles but does not contradict the Bible. The same goes for the aphorism that “exceptional claims require exceptional proof ”. A favorite of Carl Sagan, this aphorism is also not very useful for settling scientific disputes. The exceptionality of a claim depends on who is evaluating the claim, how that person was trained, and where that person finds his/her culture. It is not an intrinsic feature of the claim itself. Today, it is exceptional to argue that the Earth is stationary and the Sun revolves around it. In 1609, it was exceptional to argue the opposite. Even today in scientific and popular literature describing alien abductions (for example), individuals on

16

Life, the Universe and the Scientific Method

both sides insist that the other side is making an exceptional claim, and therefore must meet an exceptional standard of proof. As science denies the existence of a human authority, no one is available to declare who is right.

Scientific propositions need not be disprovable Philosophers who observe scientists generally understand the foregoing. Scientists themselves, however, often do not. For example, one belief tightly held by many scientists is the notion that scientific propositions are fundamentally different from non-scientific propositions because they are disprovable. This view, famously developed in the last century by Karl Popper, has much the same problems as the view that proof is a feature of science. Let us take a simple proposition that has the form of a scientific law: “All emeralds are green”. We may regard this proposition as “scientific” because we can conceive of an observation that contradicts it, forcing us to abandon the law no matter how much we wanted to believe it. We might observe an emerald that is not green. As “the scientific method” is taught, a single observation of a single nongreen emerald would disprove the law, even if it had previously been supported by observations of thousands of green emeralds. But it turns out that whether or not an emerald is observed to be green depends on how it is observed and who is doing the observing. For example, an emerald may be observed to be red when observed under ultraviolet light. No problem, you say. The proposition can be modified to read: “All emeralds are green when examined under white light”. But even then, the proposition does not work if the examiner has redgreen color blindness. We must further Is the law: “All emeralds are green” scientific modify the proposition to read: “All because it can be disproven by observing a emeralds are green when examined non-green emerald? Observations come from under white light by someone who does experiments that have auxiliary assumptions. An emerald observed under white light is indeed green not have red-green color b lindness”. (top). The same emerald is red, however, when This can go on for a very long time, observed under ultraviolet light (below). To rescue and not unjustifiably, especially for pro the law from “disproof ”, it must be amended ad positions that are more interesting than hoc to read: “All emeralds are green when observed those about green rocks. As philosophers under white light”. Quine noted that we can make amendments of this type indefinitely, and Willard Van Orman Quine and Pierre not necessarily unjustifiably. This makes disproof Duhem pointed out, most interesting almost as problematic in science as proof.

Life and the Scientific Method

syllogisms connect their hypothesis to what is observable via many propositions. These are often called auxiliary propositions. When an observation is made that contradicts the conclusion of a syllogism, it is difficult to know which of these has been falsified. Is it the hypothesis itself or some other proposition in the chain of logic? For this reason, only bundles of propositions can actually be disproven by contradicting observation. Thus, observation of a non-green emerald disproves something, that all emeralds are green, that the observation was made under white light, or that the observation was made by someone with red-green color blindness. But it is not clear what exactly was disproven by the observation.

17

Willard Van Orman Quine argued that disprovability by experimental observation was a problematic criterion to distinguish scientific and non-scientific propositions. An observation that appears to falsify a proposition may simply be revealing a false auxiliary proposition that may not even be recognized by the individual who is attempting to use observation to do science.

Standards are specific for specific scientific cultures Standards-of-proof and standards-of-disproof both depend on the cultures of specific scientific communities. These depend, in turn, on the training, beliefs and core assumptions of the individuals in those communities. Unfortunately, exobiology draws on data and models generated by many different communities having many different standards. Our view of life as a universal depends on data from astronomy and astrophysics that have their particular standards. It also depends on data from chemistry, a field with very different standards. It also depends on data from biology, with still different standards. As our discussion in this book proceeds, we will analyze the confusion that results as these communities and their cultures and standards clash. Scientists are, for the most part, rarely multidisciplinary in a constructive sense. This is because we have been trained in the culture of our own fields. We may not know the standards in other fields of science. We may not respect the standards from other fields. We may not have even been trained to ask the question: Do adjacent fields of science have different standards? This raises the question: What do we mean by “science” anyhow? I had the pleasure of being a Junior Fellow in the Harvard Society of Fellows in the 1980’s in the company of many excellent young scientists. One was Gary Belovsky, now a professor of biology at Notre Dame. He was interested in how animals search for food and how this search relates to population dynamics, species competition and

18

Life, the Universe and the Scientific Method

nutrient cycling in the surrounding ecosystem. What he actually did for a living (in cartoon form) was travel around Montana chasing moose. Another was Lawrence Krauss, a cosmologist interested in the birth and death of the universe. Larry has just assumed leadership of the Origins program at Arizona State University. What Larry did for a living was sit in his office and work with equations. I was a chemist. I was interested in how the phenomenon of life could be explained by the interactions of chemical molecules. What I did all day looked much like what a chef does cooking in a restaurant kitchen. All of us called ourselves “scientists”. Yet what we did shared no more similarity than what is done by (for example) auto mechanics and symphony conductors. Why should we all have similar standards of proof and disproof? The problem of contrasting scientific cultures is not found only within multidisciplinary fields like exobiology. A similar problem exists even within single fields. Consider biology, for example. Organismic biologists (like Belovsky) share little by way of method with molecular biologists, even if both study some part of a moose. Belovsky runs with galloping moose. In contrast, a molecular biologist first shoots the moose, puts its pancreas (for example) into a blender, isolates a pro tein molecule from the resulting juice, and then determines its three dimensional structure by interpreting Fourier maps (whatever these are) obtained by bouncing x-rays through its crystals. What do these have in common other than the moose?

Decisions

biologists make

Life and the Scientific Method

19

Essentially nothing. The different standards in the two biologies, molecular on one side and organismic on the other, create fights even within biology departments. Members on either side of the divide do not necessarily concede that the other is doing interesting science; I know molecular biologists who do not think that moose chasers are doing science at all. Thus, biology departments that cross the divide have a hard time agreeing on whom to hire, whom to tenure, or what to teach. Accordingly, many biology departments have split to become two departments. Harvard is an example. Consider the problem faced at Harvard as James Watson (of Watson-Crick DNA fame, but a former bird watcher) was trying to “modernize” (in his view) the department by adding more molecular biology. What was to be done about “old school” biologists who did not know much molecular biology and (in any case) thought that any branch of biology that begins by putting the living into a blender is not, well, “biology”? The faculty could not decide whom to appoint, whom to tenure, or whom to make Chairman. In the coffee houses and back rooms, each side made snide remarks about the other. The problem was resolved when the biology department split into two department, one for Organismic and Evolution ary Biology, the other for Molecular and Cellular Biology. This is direction opposite to the direction that we must travel to develop a science of exobiology. When we ask about the essential nature of life or whether life is found on other planets, we must draw on practitioners from fields that have very different ways of evaluating scientific methods, including these. This creates strife that leads as often to ad hominem attacks as to reasoned arguments. We will point out some of these in the chapters that follow.

Sometimes, we must ignore disproof But it gets worse. Sometimes, we must ignore “disproof ” to get the right answer. For example, it is constructively believed today that the Earth was formed about 4.5 billion years ago and that life has been present on Earth for most of the time since it formed. This was not, however, a constructive belief in the 19th century as evolutionary theory was being developed by Charles Darwin, Alfred Russel Wallace, and other natural historians. These natural historians began to realize that hundreds of millions of years of history were going to be needed to explain the observed diversity of life under a model of gradual evolution. Accordingly, Darwin used an analysis of sedimentary rocks to propose that the Earth was at least 300 million years old. This proposal irritated William Thompson, a physicist who later became known as Lord Kelvin. Kelvin was a developer of the laws of thermodynamics, one of the most robust sets of laws that physics has ever produced. Kelvin’s contributions to physics were so significant that the absolute temperature scale is named for him.

20

Life, the Universe and the Scientific Method

We measure temperatures relative to absolute zero using “the Kelvin”, not “the degree Fahrenheit” or “the degree Celsius”. We do not even say “degrees Kelvin.” Starting in 1862, just three years after Darwin published the Origin of Species and continuing for 40 years, Kelvin used thermodynamics to argue that the Earth could not possibly be as old as evolutionists required. Why? Because the Sun around which the Earth revolved could not possibly be so old. Even if the Sun were made of the best coal possible, it could produce heat at its current rate for only a thousand years or so. To get more energy out of the Sun, Kelvin argued that meteors must continuously hit its surface, delivering their kinetic energy to the Sun. Even with this extra energy input, however, the Sun could have existed for no more than perhaps ten million years. Kelvin thus held that the laws of physics disproved the theory of Darwinian evolution. Here are two relevant syllogisms: Major premise: Minor premise: Conclusion:

Major premise:

Minor premise: Conclusion:

If the Sun were made from high-grade coal, then the Sun could not have burnt for more than a million years. The Sun is made from high-grade coal. Therefore, the Sun could not have burnt for more than a million years. Darwin’s theory of evolution can account for the diversity of species only if the Earth, and therefore the Sun, existed for hundreds of millions of years. The Sun has not existed for hundreds of millions of years. Therefore Darwin’s theory of evolution cannot account for the diversity of species.

So whom are you going to believe? A physicist after whom temperature is named? Or a guy who studies bird beaks for a living (an argument from authority)? Even today, arguments based on physics using laws written in mathematics are often constructively believed to trump arguments based on biology, whose laws are often difficult to express mathematically. After all, isn’t physics a “harder” (= better) science than biology? Nothing supports an argument better than the assertion that is supported by “computer modeling”. Today we know that the Sun generates its energy from nuclear fusion and radioactive decay, not from burning coal or being hit by meteors. Fusion and radioactive decay involve the conversion of matter to energy under Einstein’s famous e = mc2 equation. Kelvin did not know of this. Nor, however, did the bird beak natural historians who stubbornly continued to believe in evolution despite its having been “disproven” by a harder science.

Life and the Scientific Method

It turned out that Kelvin was wrong, not the bird beak guys. We say this confidently today because we know more physics. But think as someone living in the 1880’s. Had someone suggested that the geologists and biologists were correct and the physicists were incorrect because atoms fuse to give new atoms with a conversion of mass into energy, they would have been dismissed (Yeah. Right. Like that’s gonna happen). Just as someone today would be dismissed for suggesting that water is H3O. Historical examples such as this give serious scientists pause when someone says that a view has been “disproven”. Perhaps in the future a final “Theory of Everything” will emerge. Occasionally, physicists suggest that such a theory is just around the corner. Perhaps. But as we shall see with respect to life, as well as with our view of that part of the cosmos where life resides, discoveries continue to be made that, sometimes individually and nearly always collectively, change in fundamental ways what we constructively believe. Be prepared for this in the coming chapters.

Applying a criterion to exceptional hypotheses As we apply scientific methods to exobiology, we will encounter hypotheses that we might not be inclined to accept, perhaps because we regard them as exceptional. Again, we must ask the question: What in our common experience must be revisited if the exceptional hypothesis is to be correct? What are the primary data? As one illustration, one exceptional hypothesis might be that the Sun revolves around the Earth. Does it? If we apply

21

Lord Kelvin (above) used thermodynamics to “disprove” the theory of evolution advanced by Charles Darwin (below). Kelvin argued that the Sun, could not live long enough to offer Darwin the time that he needed for gradual change to account for the diversity of species. Only after radioactive decay and nuclear fusion were discovered could the geological and biological records be reconciled with what was “settled science” in physics.

22

Life, the Universe and the Scientific Method

our liberal view of epistemology, the correct answer is: “We do not know, but if it does, then much else that we believe must be wrong. We would need to reinterpret experiments that measure the gravitational constant. We would need to re-think observations made by astronauts from space. We would need to redefine our concept of acceleration.” Upon reflection, it will be clear that so much would have to change in both common experience as well as scientific theory, some far removed from physics, that we would probably decide not to revisit this 17th century proposition, even if theories accounting for celestial mechanics remain active topics for research. An analogous response would be made to those who believe that the Earth was created by divine intervention 6000 years ago. Maybe. But if that is what happened, then much else in our view of the world must be wrong. We would need a new explanation for radioactive decay. We would need a new explana tion for how the Sun gets its energy. We would be wrong about how we apply radioisotopes in medicine. Nearly every interpretation of the layers of rock that anyone can observe must be wrong. Again, so much would need to change that we would be inclined not to revisit this proposition even if models for Earth’s formation and the evolution of life on Earth remain active topics for research.

What must be revisited if life exists elsewhere? What about the proposition: Life exists elsewhere in the cosmos? For this to be true, what must be false in our 2009 view of the world? Absolutely nothing, as it turns out. We have, after all, life on Earth. Why not elsewhere? Planets like Earth almost certainly exist elsewhere. The chemistry of terran life is almost certainly universal. Life on Earth does not violate physical, chemical or biological law. Nor, then, would life elsewhere. This conclusion is remarkably robust with respect to our views in other areas. For example, the eminent physicist Paul Davies commented that the existence of alien life would demand a change in our theology. I disagree. Even if you do believe that the Earth emerged 6000 years ago, alien life generates no obvious problems to any major theology. After all, if God was personally involved with just two souls 6000 years ago (Adam and Eve), 100 million souls on Earth 2000 years ago, and six billion souls on today’s Earth, what is the problem with adding another ten billion souls on a planet circling a star somewhere in the vicinity of Betelgeuse? This is analogous to the problem that Joseph Smith pondered in upstate New York in 1827. Why did Jesus Christ visit the Old World when there was also a New World? The Book of Mormon offered a solution to the problem; Jesus visited both. And so, with nothing to preclude life elsewhere in the galaxy, we pick up our exobiology backpack and set out to see what inferences we might draw about subject matter that we cannot observe: Alien life.

Chapter 2 A Definition-Theory of Life In 2002, I got a call from David Smith, a physicist who works for the National Academy of Sciences. "Steve," David asked, "the National Research Council has been commissioned by NASA to write a report on what alien life would look like. John Baross [a microbiologist at the University of Washington] said he would chair the committee, but only if you agreed to co-chair." Five years later, after defections, disease and delay, many hours on airplanes, and a separate trip to Washington D. C. to sit with David to rewrite the entire draft, the report finally appeared in the summer of 2007. Entitled The Limits of Organic Life in Planetary Systems and published by the National Academy Press, the report provided an indepth discussion of some of the topics presented here. Readers interested in a more technical and detailed discussion of exobiology than what is presented in this book are referred to the National Research Council report. One thing will not be found in the National Research Council report, however: A definition of life. This is no accident. Early in the committee's deliberations, a conscious decision was made not to include a definition of life in the report. Perhaps this reflected cowardice. It many, however, be better viewed as an expedient based on wisdom and experience. Nearly every member of the panel had spent hours in other committee meetings discussing that definition with little productive outcome, and did not want to spend more hours doing the same.

The book produced by the National Research Council committee co-chaired by John Baross and Steven Benner. The book is an excellent place to go for a deeper understanding of many topics covered here.

Why is it so hard to define "life"? Precision in language is a virtue, especially in science. Nevertheless, imprecision in language (which is intimately connected to imprecision in thought) has defeated many attempts to define "life". For example, Daniel Koshland, a distinguished professor of biochemistry at Berkeley and then president of the American Association for the Advancement of Science (which publishes the prestigious journal Science), recounted in 2002 his own experience with committee discussions attempting to define "life": "What is the definition of life? I remember a conference of the scientific elite that sought to answer that question. Is an enzyme alive? Is a virus alive? Is a cell alive? After many hours of launching promising balloons that defined life in a sentence, followed by equally conclusive punctures of these balloons, a solution seemed at hand: 'The ability to reproduce – that is the essential characteristic of life' said one statesman of science. Everyone nodded in agreement that the essentials of a life was the ability to reproduce, until one small voice was heard. 'Then one rabbit is dead. Two rabbits - a male and female - are alive but either one alone is dead.' At that point, we all became convinced that although everyone knows what life is, there is no simple definition of life." Immediately, one sees a problem with imprecise use of language: The "elite" have confused the concept of "being alive" with the concept of "life". This is not just confusing an adjective with a noun. Rather, it confuses a part of a system with the whole of a system. Parts of a living system might themselves be alive (a cell in our finger may be "alive"), but those parts need not be coextensive with the living system. Using language precisely, one rabbit may be alive, but it need not be a living system, and it need not be "life".

15

Other proposed definitions for life do not make such mistakes. For example, a panel was assembled in 1994 by NASA to consider the possibility of life in the cosmos. They discussed various definitions of life, following an extensive review on the topic by Carl Sagan. The committee eventually settled on a definition of life as a "self-sustaining chemical system capable of Darwinian evolution". By using the word "system", this definition recognizes that entities can be alive (including a cell, virus, or male rabbit), but may still not by themselves exemplify life. The phrase "self-sustaining" means that a living system not require continuous intervention by a higher entity (a graduate student or a god, for example) to continue as "life". Within its meaning, the phrase "Darwinian evolution" carries 150 years of discussion and elaboration. Today the phrase makes specific reference to a process that involves a molecular system (DNA in terran life) that can be replicated imperfectly, and where mistakes arising from imperfect replication are themselves replicatable. Thus, "Darwinian evolution" implies more than "reproduction", a trait that ranks high in many definitions of life. The requirement for reproduction with errors, where the errors are themselves reproducible, excludes a variety of non-living chemical systems that can reproduce. For example, a crystal of sodium chlorate can be powdered and used to seed the growth of other sodium chlorate crystals. Therefore, the crystal can reproduce. Features of the crystal, such as whether its atoms are arranged in left-handed spirals or righthanded spirals (in the language of chemists, the crystals are "chiral"), can be passed to its descendants. The replication is imperfect; a real crystal contains many defects. Indeed, to specify all of the defects in any real crystal would require an enormous amount of information, easily exceeding the 10 gigabits of information contained in the human genome. But the information in these defects is not itself inheritable. Therefore, the crystal of sodium chlorate cannot support Darwinian evolution, and a system of sodium chlorate crystals is not life. The definition of life as a self-sustaining chemical system capable of Darwinian evolution also avoids other refutations by counterexample. These are often problems with "list definitions" that simply recite properties of terran life that we know (for example, "absorb compounds from their environment", "excrete waste", "grow", and the like). Fire is a convenient counterexample for many list definitions. Fire consumes "food", excretes "waste", metabolizes, moves and grows, but is not life. Fire as a chemical system misses a key element of the NASA definition: Fire is not capable of Darwinian evolution. Its growth may be imperfect, but those imperfections are not inheritable.

¯¯¯

Combining thermodynamic, genetic, physiological, metabolic, and cellular definitions of life, Daniel Koshland coined the PICERAS definition of life, an acronym for "program, improvisation, energy, regeneration, adaptability, and seclusion", and represented it as a temple with seven pillars. This is an example of a "shopping list" definition of "life".

Crystals of sodium chlorate (NaClO3, not to be confused with sodium chloride, NaCl) come in left-handed and righthanded forms, just like many biomolecules. In this image make with polarized light, the left-handed crystals are brown, and the right-handed crystals are white. A crystal can be powdered. Each speck in the powder will seed the growth of a new NaClO3 crystal. Thus, the NaClO3 crystals can reproduce, with the descendents having the parents' handedness. But the information in crystal defects cannot be passed to the progeny that they seed for the next generation. Hence, systems built from NaClO3 do not have access to Darwinian evolution, and therefore are not "life".

Definitions embody a theory of what is defined Perhaps the most important feature of the NASA definition of life is the way that it provides information about what kinds of life its framers believed are possible. It captures, in fact, a theory of life, some-

16

thing that is necessary for any definition. A subtle point, perhaps, but one that we will make throughout this book. A definition is inseparable from the theory that gives it meaning. Let us pursue further the notion that our definition of life is intricately connected with a theory of life. First, our definition excludes certain systems that are conceivable as forms of life simply because the community does not think they are possible. Forms of life that are not chemical systems capable of Darwinian evolution are easily conceived. For example, the crew of Star Trek has encountered conceptual aliens that do not fit the NASA definition. The nanites that infected the computer of the next generation Enterprise in Episode 50 (“Evolution”) are informational (not chemical); their evolution is not tied to an informational chemical, like DNA (although they require a chemical matrix to survive). The Crystalline Entity of Episodes 18 ("Home Soil") and 104 ("Silicon Avatar") appears to be chemical, but not obviously Darwinian. The Calamarain (Episode 51: "Déjà Q") are made of pure energy, not chemicals. And Q (Episode 1: "Encounter at Farpoint") appears to be neither matter nor energy, flitting instead in and out of the Continuum without the apparent need of either. Others have conceived of types of life that do not fit the definition. The physicist Freeman Dyson suggested that a form of life might be possible that reproduces without replication, has a compositional genome, and While many of the conceptual adapts without Darwinian evolution. Fred Hoyle developed a story about aliens from science fiction look a black cloud, a fictional entity that floated into our solar system from the like Hollywood actors with cosmos and blocked our sunlight, placing the Earth in distress. After the prostheses (such as the Ferengi above), others are not obviously black cloud realized that the Earth held self-aware forms of life, it politely chemical systems, such as the moved out of the way and apologized to us. Calamarain (below, a pure If we were to encounter Q, the Calamarain, or any of these other con- energy life form) or apparently jectural entities during a real, not conceptual, trek through the stars, we require neither matter nor enwould be forced to concede that they do represent living systems. We ergy, such as Q (bottom, talking with the android Data), who would also agree that they do not fall within the NASA definition of life, flits in and out of the and would agree that we must get a new definition for life. Similarly, if a continuum. We do not include black cloud floats into our solar system, and begins to talk to us, we will these in our definition of life certainly reject that definition. We do not reject our definition now be- because our theory of life does not consider them possible. cause we do not constructively believe that any of this is possible. We will use the word "constructive" to describe beliefs often in this book. A constructive belief is one that you might actually act upon. For example, many people say that they believe that global warming will raise sea levels in a few years. Those who constructively believe this do not buy condominiums in Miami Beach. Those who do buy condominiums there do not constructively believe that the sea level will rise. According to our definition-theory of life, nanites and androids (Data of Star Trek, Marvin of Hitchhiker's Guide to the Galaxy) are examples of artificial life. We do not doubt that Darwinian evolution can be simulated within a computer. We do not doubt that androids can be created, including androids (such as Data) who wish to be human. We do not, however, believe that either computers, their viruses, or Data could have arisen spontaneously, without a creator that had already emerged by Darwinian process (as is indeed the case with the android Data). Instead, our definition-theory of life regards these as biosignatures (evidence that true life 17

exists or existed), not life itself. Likewise, no matter how intelligent they are, this intelligence is artificial. Following similar reasoning, the computer in which nanites reside would be taken as evidence that a life form existed to create it. The computer would be a biosignature, and the nanites would be an artificial life form, something that required a natural life form to create. Any intelligence they displayed would be artificial. The nanites as well as the computer are again biosignatures, built by a self-sustaining chemical entity capable of Darwinian evolution. And what about humankind? Certainly, Darwinian theory requires that we got our present forms via a process where natural selection was superimposed upon random variation. By "random", however, Darwinian theory requires only that the variation not be "prospective". Whatever genetic mutations we pass on to our children in the DNA that we give them, whichever help those children survive, get married, and have their own children, and however they help them do so, those mutations may not anticipate their ability to make our children fitter. However, a few million years ago, our ancestors learned to make tools. They then learned to convey information on how to make tools not in the DNA that they passed to their children, but instead through education. Babies and adults learned to point, and to look where the pointing directed them to look. Some tool discovery was undoubtedly Darwinian. Here, our ancestors tried tools randomly; those who tried the correct tools survived, while those who tried the incorrect tools died. But at least some tool development must have been prospective, with variations not tried randomly, but rather with foreknowledge of the outcome. Thus, some tool making has not been Darwinian. Certainly, modern engineers perceive of a problem and try things that are more likely to be a solution, not things that are less likely. What works is passed via culture to the next generation. What does not work is not. The culture evolves, as does its fitness. This is non-Darwinian evolution. Rather, it may be (somewhat imprecisely) called "Lamarckian" evolution, in that there is a direct feedback from what will work into what is passed on to the next generation, by teaching in middle school. Homo habilis, whose tool But the inheritance need not remain cultural. For example, it may making set in motion the possisoon be possible to identify DNA sequences that will help our children bility of a life form that does survive, get married, and have their own children. We may soon gain not rely on Darwinian evoluthe technology that allows our pediatrician to place those DNA se- tion, but rather transmits fitness quences into our eggs and sperm, prospectively creating mutations that from generation to generation by culture, deliberately invents will improve the fitness of the species. If this happens, then the process fitness, and many, in few more that began with tool-making Homo habilis will allow us to escape generations, deliver fit DNA directly to germ cells that lead Darwinian mechanisms, for our genes as well as for our culture. There is a good news-bad news narrative that relates to this. The good to children This chap lived in the Olduvai Gorge in Tanzania, news is that we may not forever in the future need to see our children and about 1.8 million years ago die of genetic disease, the only mechanism available to Darwinian evo- (abbreviated 1.8 Ma). 18

lution for removing the mistakes that inevitably arise when copying DNA. The bad news, of course, is that we do not know at present how to beneficially change the sequence of the DNA molecules in our children, and may not be smart enough to learn before the technology causes us to kill ourselves off. But what about our definition-theory of life, which so prominently features "Darwinian evolution"? Must we not also modify that definition-theory to read: "A self-sustaining chemical system capable of Darwinian or Lamarckian evolution"? Here, we cannot escape by saying that we do not believe that this type of "Lamarckian evolution" is possible, as we clearly believe that it is. We can, of course, fall back on the fact that even as we are happily becoming cerebral beings by prospectively altering our personal DNA (pick your episode of Star Trek or The Twilight Zone), we still are capable of Darwinian evolution. Last, we might argue that, like an intelligent android, we could not have come into being had our ancestors not first had access to Darwinian evolution. We will leave aside for this book the question of whether a definition of life as a "natural kind" can have a historical component. Will our definition-theory be useful? The purpose of asserting a definition-theory of life is to allow the rest of this book to engage the question of life as a universal head-on. We have adopted two propositions: (a) Every system that we will encounter that produces the behaviors that we require from life will prove to be a chemical system capable of Darwinian evolution. (b) Any chemical system that is capable of Darwinian evolution is capable of producing all of the behaviors that we require from life. These propositions reflect the possibility that, at some time in the future, we may find that life has a range of properties larger than its range as we now understand it, in particular those that permit it to occupy places in the cosmos where the terran life that we know could not survive. For example, we do not know today of a life form that lives at 300 °C (570 °F or, if you want to talk like a scientist, 572 Kelvin) in sulfuric acid (as in the clouds above Venus) or in liquid methane at -179 °C (290 °F, 94 Kelvin) as in the oceans of Titan. Never mind. Our definition-theory insists if life exists in those environments, it will be a chemical system capable of Darwinian evolution. The virtue of committing to a boldly stated definition-theory is that it focuses any subsequent efforts to answer universal questions, even if they turn out to be wrong. As we shall see, many approaches constrain our concept of what life as a universal might look like, once we have committed ourselves to this particular definition-theory.

If life exists in the hot acidic clouds above Venus (above) or the cold methane oceans of Saturn's moon Titan (below), the definition-theory of life holds that it will be a selfsustaining chemical system that came to be via natural selection superimposed on random variation in its genetic structure.

19

20

Chapter 3 Four Approaches to Understanding Life The Galileo quandary with respect to life as a universal Throughout this book, we will use the definition-theory that considers life to be a self-sustaining chemical system capable of Darwinian evolution. Unfortunately, this definition-theory creates a quandary analogous to the one faced by Galileo. We are considering life as a universal, a "natural kind" of thing. But this includes life that we have not observed, may not observe for some time, and (for most cases in the cosmos, we presume) will never observe. So how can we be certain that we have chosen the "correct" definition-theory, or at least one that is useful? Could our definition-theory be used to recognize alien life should we encounter it, even if it does not plaster itself to our backs to control us like puppets? First, don't panic. Science often concerns what it cannot directly observe. Next, we need to identify experimental approaches that serve the same role for exobiology as the rolling balls did for Galileo's studies of the solar system. Even though we cannot observe life universally, we need to do experiments here on Earth to help us decide whether we have chosen a good definition-theory for life with the potential for universality. The next chapters outline some of these experiments and methods, and tell the stories of how their pursuit created (and continues to create) new scientific methods. No bucks, no Buck Rogers First, a lesson about one factor that determines the direction of science. The lesson comes from Tom Wolfe's book, The Right Stuff, which includes the following exchange with test pilots in the 1940's: Operative: You know what really makes your rocket ships go up? Pilot: The aerodynamics alone are so complicated … Operative: Funding. That's what makes your ships go up. No bucks, no Buck Rogers. Whoever gets the funding gets the technology. Whoever gets the technology, stays on top. One factor driving science is not taught in middle school: the process that decides how to direct resources to fund science. Different organizations that offer such resources are influenced by the community in different ways. Some agencies and private foundations have specific mission statements, chosen with community input, and select research to meet these. Others claim to seek individual innovators, searching for "pioneers" or granting "genius prizes". Some propose to grant prizes after specific goals are met (like Lindbergh's crossing the Atlantic), with community input determining goals worthy of prizes. Some make decisions internally from their own panels of advisors, who come from the scientific communities. Others distribute proposals to individuals in the community and base funding decisions on "peer review". The sociology associated with funding science needs a book to describe. We will not address this topic except when funding decisions drove the science that interests us. As a general rule, however, community-guided efforts do not fund "big questions" or breakthrough research. Nor are they expected to. Galileo was not funded by the pope, and certainly not after the pope understood what Galileo was up to. 21

Historians have documented many examples showing that funding helps science, of course. A sampling bias distorts any general conclusions, however, about this help. The absence of funding for a project generally means that a project will not be done. If the project is not done, then we will not hear of it. Not having heard of it, we would not know how much better the global outcome would have been had the rejected project been funded, with a funded project not having been funded. This is a common problem in history generally. We know well of Martin Luther and his protestant reformation. We know little about of Jan Hus and his protestant reformation. Why is this so? Most simply, the political environment surrounding Luther allowed him to survive the Inquisition, just as Galileo's fame allowed him to survive. No analogous political environment surrounded Hus. Hus was burned at the stake, so little is heard of him (outside of the Czech Republic). As they say, the victors write the history. Analogously, the National Institutes of Heath (NIH) often points to the successful research that it has funded and concludes that its procedures for identifying research projects must be successful. There is no doubt that the NIH funds some fine science, sometimes over the objection of its peer reviewers. For example, a stir was created when Mario Capecchi, winner of the 2007 Nobel Prize in Physiology or Medicine, mentioned that NIH peer review had told him not to do the research that won him the Nobel Prize. As we do not know about the projects that, if they had only had been given bucks, would have produced the biomedical equivalent of Buck Rogers, such conclusions can always be disputed. Sometimes crackpot research gets funding Sometimes, however, innovators with "crackpot" ideas (like Galileo or Capecchi) manage to get the resources that they need to pursue their idea to success. Perhaps the archetypal example from modern science Peter Mitchell. As Mitchell's work will be relevant for many of the chapters to come, let us describe it in cartoon form. A compound called adenosine triphosphate (ATP) attracted Mitchell's interest in 1961, as it had his community in general. ATP is a "high energy" compound used throughout biology. It was known that life on Earth made ATP by oxidizing food using oxygen from the air and releasing carbon dioxide. Everyone understood that oxygen was used in a small structure inside of each of our cells called the mitochondrion. The community knew that reacting food (organic material) with oxygen yields energy. After all, it had been known for 200 years that fire was the reaction of oxygen with food (in the case of wood, food for termites). But how could controlled burning of organic carbon create ATP inside mitochondria? This was a puzzle. Certainly, little ATP is made by setting fire to wood. To answer that question, Mitchell conceived of something called the "chemi-osmotic hypothesis". That hypothesis held that from ADP (upper the controlled burning of food did not make Peter Mitchell proposed that ATP is made left) by importing hydrogen ions (H+) through a membrane ATP directly. Rather, Mitchell argued, the (blue circles). The ions get outside of the cell when food is metabolism that oxidized food pushed hy- burned in the citric acid cycle, using oxygen (O2) and making drogen ions (written as H+) from inside the NADH. We will see ATP and NADH in Chapters 4, 5, and 6.

22

mitochondrion "matrix" to a space outside. This meant that the concentration of hydrogen ions was higher outside the matrix than inside the matrix. Mitchell noted that a system that has more hydrogen ions on one side of a barrier than on the other side is not at equilibrium. Whenever a system is not at equilibrium, then it holds something that scientists call free energy. Free energy can do work. Mitchell argued that the difference in the concentration of hydrogen ions inside of the matrix and outside of the matrix held free energy, arising from the fact that H+ wants to move from outside of the mitochondrion to inside Mitchell proposed that the work done by this concentration difference was to make ATP. Mitchell's hypothesis was wildly original. At the time that it was proposed, no one had a clue as to how one could make a high energy molecule (ATP) from a situation where more hydrogen ions were in one location than another. The hypothesis, however, also turned out to be true. Mitchell was awarded the Nobel Prize in Chemistry in 1978 for this hypothesis. John Walker was awarded the Nobel Prize in Chemistry in 1997 for his studies on the protein that makes ATP using the free energy in this gradient. So this is important research, under a syllogism that is widely accepted in popular culture: If a piece of research was recognized by a Nobel Prize, then it was important. Mitchell's hypothesis was recognized by a Nobel Prize. Therefore, Mitchell's hypothesis was important.

p er ce n ta g e o f ti m e

And, we would have expected that research to develop the chemi-osmotic hypothesis would have been funded. However, work to explore the hypothesis was not funded. The hypothesis was viewed by most of the community as "crackpot". Mitchell himself was similarly viewed by some. Not at the level of Galilean heresy perhaps, but scientific communities also have orthodoxies, and Mitchell certainly offended those. As a consequence, articles supporting the chemi-osmotic hypothesis were rejected by journals. Like many revolutionary ideas, Mitchell's chemi-osmotic hypothesis appeared to be headed towards the junkyard of ideas that were too good for their time. Fortunately, Mitchell was an individual "of means". He and Jennifer Moyle set up a non-profit research foundation (Glynn Research Ltd.) to support research into Mitchell's own hypothesis with an initial endowment of £250,000. This sum, valued at about $2.5 million today, was contributed equally by Mitchell and his older brother. In other words, Mitchell got the "bucks" that he needed from his own pocket. When Mitchell had something worth saying, he would self-publish a book to say it. This avoided his having to fight with peer reviewers from the community. Two notable books by 100 Mitchell were published in 1966 and 1968. And so the teaching chemi-osmotic hypothesis survived, to civilization's benefit. 50 research lunches & The lesson to be learned is analogous to the lesson from dinners an ancient Greek fable: "If your child's first lesson is obedicommittees ence, then his second lesson will be whatever you choose." 0 30 40 60 70 50 20 My lesson to students is analogous: If your first research age of academic scientist project is valuable, your second can be what you choose. Time is allocated for an academic scientist The prescription (first get rich, then do science) is a clas- differently as the scientist ages. When we sical approach to science. Many scientists that we will meet are young, time spent doing research is in this book were English country gentlemen who did this gradually lost to teaching and committee (generally by inheriting money). But in the modern acad- meetings, as famously noted by Albert Einstein. At age 70, we spend all of our time emy, this prescription is not easy to fill. It generally takes a eating lunches and dinners. Notice the lifetime to build a career in science, especially in a univer- absence of a wedge labeled "get rich".

23

sity, leaving little time to get rich. Conversely, it generally takes a lifetime to get rich, leaving little time to do research. Throughout this book, look for evidence that funding decisions guided the selection of research topics and the execution of research. The impact does not come (primarily) from researchers who have a company selling vitamin supplements on the side, and therefore do research to show that vitamin supplements are good. Rather, the more pervasive impact comes as scientists select and pitch research to their respective communities to raise funding. This has a mediocritizing impact on science, in a good news-bad news way. First, it removes the worst research (good news). But it also removes the most innovative research (bad news). Left behind is research of reasonable quality. Neither good news nor bad news, but (as Walter Cronkite would say), that's the way it is. To Switzerland and back For me personally, the story began in 1984, when I was setting up my laboratory as an Assistant Professor in the Chemistry Department at Harvard. My research group and I had expansive ideas; we very much wanted to begin research to connect the chemistry of life to the organism, the ecosystem, the planet, and the cosmos. The first book proposed from my laboratory in 1987 had the title Redesigning Life. But this brought us squarely into contact with the problem of bucks. In 1984, as my laboratory first became operational, it got some funding from the NIH to extend work that I had done as a student. I used that funding to start two fields. The first, now called synthetic biology (Chapter 7), began in my laboratory in 1984 when Krishnan Nambiar and Joseph Stackhouse completed the first chemical syntheses of a gene for an enzyme. It built on work extending back into the 1970's developing tools for doing DNA synthesis and recombinant DNA biotechnology. The second field is now called paleogenetics. This field resurrects ancient proteins from extinct organisms to test historical hypotheses. And is developed in Chapter 4. Together, these fields were to lead to new diagnostics and human therapeutic strategies, assisting today in the care of some 400,000 patients annually, just from my laboratory. Equally impressive results come via work in these fields from other laboratories. But when we happily told the NIH that we had used their bucks to innovate, the NIH did what we now understand was inevitable: They threatened to end our funding. In what was the most significant miracle of my professional life, matched only in my personal life by my wife agreeing to marry me, just as my laboratory The research team at the Swiss Federal Institute of Technology (ETH) in was about to be shut down, I got Zurich that started research in experimental paleogenetics as described in a call from Switzerland. The Chapter 4 (Scott Presnell, Arthur Glasfeld, Elmar Weinhold, Gerald Swiss Federal Institute of Tech- McGeehan), paleogenomics (Andrew Ellington, David Berkowitz), and nology wanted to hire me in Zu- synthetic biology as described in Chapter 7 (Joseph Piccirilli, Rudolf Allemann, K. Christian Schneider, Lawrence MacPherson, Simon rich and fund work in my labo- Moroney, Norbert Heeb, Tilman Krauch). 24

ratory there. They were interested in the biggest research ideas that I could generate. All I had to do was learn to teach in German. I accepted the offer, bundled up my laboratory, and moved to Switzerland. The move to Switzerland gave my laboratory the opportunity to explore four approaches to understanding life.. These approaches, illustrated in Figure 3.1, provide organization for this book. Figure 3.1. Four approaches to understanding life as a universal

Life is represented as a black box at the center of four experimental approaches that generate observations hinting at life's universal properties, analogous to how Galileo's rolling balls hinted at the structure of the Solar System. The approach illustrated by the bottom wedge uses paleogenetics, where molecular structures from ancient life are inferred from the structures of their descendents, and then resurrected in the laboratory to test historical hypotheses about life. The top wedge illustrates the prebiotic "forward in time" approach. Here, molecules likely to have been present on early Earth are examined in the laboratory for their potential to generate life. The left wedge represents exploration, which hopes to encounter alien life that may tell us something about life universally. The right wedge represents synthetic biology, which makes the challenge: If life is nothing more than a chemical system capable of Darwinian evolution, then we should be able to actually make artificial life in the laboratory. None of these approaches directly addresses the question: What is life? Instead, they help us develop models about a universal life that we cannot directly observe.

Working backwards in time The first approach that we will develop is represented by the bottom wedge in Figure 3.1. This approach starts with the life that we know on Earth, and works backwards in time. Terran life is, without a doubt, a chemical system capable of Darwinian evolution. Combining an analysis of terran life at a molecular level with Darwinian and historical perspectives should therefore deliver a better understanding of the interaction between chemistry and Darwinian evolution, at least in the life that we know. This understanding might be applied universally. Further, if we follow the history of terran life far enough back in time, it might inform us about life that was more "primitive" than the life that we observe today on Earth. More primitive life might reveal the

25

"universals" of life better than modern life, where those universals are buried beneath remnants of four billion years of historical accidents. This "backwards-in-time" approach will be discussed in Chapter 4. It is the most advanced of the approaches that have been developed recently to understand life. In particular, it is the approach that has generated a new scientific community that has an emerging consensus about scientific methods. While disputes still exist, the community has generated a general view about standards-of-proof which, in turn, enable "normal science". This approach has also supported the development of a new experimental field called "paleogenetics". In a paleogenetic experiment, biotechnology is used to bring ancient genes and proteins back to life for study in the laboratory. These molecular resurrections allow the experimental method to be brought to bear on historical hypotheses. Before paleogenetics became experimental, many had constructively believed historical hypotheses to be intrinsically untestable. Paleogenetics experiments show this belief to be incorrect. Last, the approach has been combined with whole genome sequencing to develop a new field called "paleogenomics". Here, entire genomes are compared to permit us to understand the evolution of organism-wide behavior. Paleogenomics is beginning to allow us to understand human diseases, some quite complicated, such as alcoholism, hypertension and diabetes. In short, Chapter 4 offers a success story in the development of scientific methods. In less than 40 years, the field has progressed from a few simple ideas to a community doing normal research in many areas. Reviewing these successes will allow us to see how scientific methods develop when things go well. Further, the backwards-in-time approach has provided insight into the chemical structures of very ancient life quite different from that observable today on Earth. In particular, the approach has driven the hypothesis that an early form of life on Earth used no proteins, but rather used ribonucleic acid (RNA) to play both genetic and catalytic roles. Four billion years ago, according to this model, Earth was an "RNA world". Thus, the backwards-in-time approach provides insight into life as a general phenomenon that allows us to consider in a substantive way potential structures for alien life.

26

b

river buffalo swamp buffalo ox eland nilgai impala thompson's gazelle bridled gnu topi goat

a

c d

g f

moose roe deer reindeer red deer fallow deer

e

h i

pronghorn antelope giraffe

j

bovine seminal plasma camel, acidic camel, basic hippopotamus pig 50

40

30

20

10

0

million years before present (approximate)

Studying life by working backwards in time (bottom wedge in Fig. 3.1) combines geology (represented by the Brule Formation in Nebraska), paleontology (a fossil ruminant from the Brule), chemistry (DNA sequences inferred from that ruminant), and informatics (to determine a tree for the evolution of ruminants). The community implementing this approach by combining natural history, physical science and computers, is new.

Working forwards in time from chemistry One result emerging from work backwards in time was an appreciation of the importance in natural history of ribonucleic acid (RNA) to terran life. RNA appears to have played a very important role in earlier life on Earth. Further, it appears that an early episode of life on Earth used RNA as the only encoded biopolymer. In this "RNA world", RNA played both genetic and catalytic roles. A life form based on RNA alone is not known on Earth today. It is substantially simpler than the life that we do know, and is perhaps more "essential", and therefore more representative of life as a universal. Further, the RNA-world hypothesis suggests that life on Earth may have begun with RNA. This is the RNAfirst hypothesis, which has given a focus to the field known as prebiotic chemistry. Prebiotic chemists start with organic compounds that may have been present on early Earth. Then, the community uses laboratory experiments to understand how rules of chemical reactivity, presumed invariant over time and space, might have allowed Darwinian chemical systems to have emerged in the past. The RNA-first hypothesis provides a target for prebiotic chemical activities. Chapter 5 concerns how scientists have struggled to get RNA from primitive organic molecules, even as others in the community have doubted that such strugThe approach to life that works forward from gles will be successful. Exact events occurring billions chemistry (top wedge in Fig. 3,1) starts with of years ago in the history of Earth appear inherently organic molecules found in the cosmos unknowable. Therefore, disagreements in this field (represented here by the Eagle nebula, which is a star nursery), some organic molecules observed center primarily on standards-of-proof. in interstellar nebulas, geology (represented by In Chapter 5, these issues will be used to illustrate a the Earth), its minerals (represented by less-than-successful development of scientific method tourmaline and colemanite, two minerals that that struggles with such standards. Unlike in paleoge- contain boron), in the hope of obtaining ribose netics, no consensus has yet emerged to say which and other components of RNA (lower right). standards-of-proof should be used, or how the commu- Setting prebiotic chemistry explicitly within a planetary context is new, and geology, planetary nity should develop an understanding of life's origins. science, and organic chemistry are interacting in There is not even an agreement about what form that a new way to address a question as old as understanding will take. Some in the community do not humankind: How did we come to be? even believe that it is possible to find that understanding. Contrasting with the largely successful enterprise discussed in Chapter 4, the enterprise discussed in Chapter 5 will be contentious and mostly unresolved. Fundamentally, the field needs some new ideas.

27

Exploration to jolt to our definition-theory of life The third approach, represented by the left wedge in Figure 3.1, continues the tradition by which humankind has long generated new ideas: exploration. Through the activities of NASA, the European Space Agency, private enterprise, and others, we can now search for life in places that have, until now, been inaccessible to humans. Further, unlike activities being pursued under the title "Search for Extra-Terrestrial Intelligence" (SETI), missions that actually go to alien worlds do not require that alien life be intelligent to be found. Thus, explorations seeking life elsewhere are more likely (and possibly much more likely) to find life than a passive search of the heavens for radio signals, like Jodie Foster did in the movie Contact. As discussed in Chapter 6, the community agrees that the discovery of alien life would be a landmark event in human history. It would drive our understanding of the essence of life more than any other, especially if alien life is very different chemically and/or physiologically from the life we know on Earth. The community does not agree, however, on a simple standardof-proof: How would we recognize alien life if we were to encounter it? None of the molecular tests useful for detecting life on Earth will detect non-terran life if it is based on chemistry even slightly different from ours. Further, we expect that the alien life that we encounter first will not try to make us into puppets. Rather, we expect that our first encounter will be with alien life that is microbial, needing a microscope to observe. Further, we are likely to kill the microbial alien life that we encounter before we get a chance to study it. Thus, the standard-of-proof will need to identify biosignatures in a system that used to have the capability of Darwinian evolution, on a tiny scale. Chapter 6 will discuss possible solutions to this problem, including a theory for the structure of genetic molecules universally. Here, considerations about funding are especially important. The exploration of space is expensive; this is Buck Rogers. To be realizable within funding constraints, the experiments that can be done on alien worlds are quite limited, often making their interpretation difficult. Further, to make the kinds of statements that lead to publication and funding, we will find researchers declaring that life is, or is not, possible on Mars, often with the weakest of scientific arguments. Synthesizing life from scratch The fourth approach is represented by the right wedge in Figure 3.1. It comes under the title "synthetic biology", and reflects the fact that an understanding can be defined by an ability to create. If we

28

Nothing would jolt our view of life more than finding through exploration another version of it. Possible places to look in our Solar System include Europa, a Galilean moon of Jupiter planet (upper left), Titan, a moon of Saturn (upper right, showing pebblesized objects on Titan's surface), and Mars (from the Phoenix lander near the Martian pole). A combination of geology, chemistry, and instrumentation (represented by the MECA instrument package below) will look for life. But how would we recognize that life if we encountered it, and if it did not turn us into puppets? Are the striations on Europa, the objects on Titan, or the salts found in Martian soil by Phoenix indicative of the habitability of the environment? Or do they preclude life? The community does not agree.

truly understand automobiles, we should be able to build an automobile. Likewise, if we understand life, we should be able to construct our own life in the laboratory. Conversely, if we cannot build life from scratch, even if we have obtained the funding to attempt to do so, then we must not understand completely what life is. Therefore, synthetic biology places before us a direct challenge to our definition-theory of life. If life is nothing HC CH H HC CH more than a self-sustaining chemical system capable of DarN R N N R O N N C C H H C C H winian evolution, and if we understand how chemistry might N O N N N N H C C C H C C H support Darwinian evolution, then we should be able to synN N N C O O C HC H N H thesize, in the laboratory, an artificial chemical system capaC N C N R H R H CH ble of Darwinian evolution. That system should generate the HC CH H N N R N R N C N O C C behaviors that we consider to be characteristic of life, howC H H C H N O N N N N H C HC H C ever those behaviors come to be defined. And if we cannot C H C C N C N C N O H H N C build our own artificial life, after we have tried for a fair HC H C O N N R R H amount of time and failed, we may conclude that our definition-theory of life is missing something. Indeed, in the synthetic effort, the nature of our failure should help us decide what is missing from our definition-theory of life. Chapter 7 will describe efforts to build, in the laboratory, artificial chemical systems capable of Darwinian evolution. Here, we are not constrained by what might have happened on early Earth, its geology or its prebiotic chemistry. Rather, The building blocks of life are the chemical elements (top), and are presumed to be truly we are permitted to use whatever tools we can get our hands universal. If we do understand life as a on to get to the goal. According to our definition-theory of chemical system, then we should be able to life, once an artificial chemical system capable of Darwinian build life by deliberate assembly of the evolution is in hand, it should be able to generate any prop- chemical elements to make molecules (middle), assembly of the molecules to make erty that we consider necessary for system to be called living. biopolymers (bottom), and assembly of the Of course, one kind of synthetic biology has created new biopolymers to get a system capable of Darforms of artificial life for centuries. Husbandry and horticul- winian evolution. Efforts to do this have ture rearrange pieces of naturally occurring terran life to get created one branch of the new field now new forms of life. Indeed, the very act of generating baby known as "synthetic biology". guinea pigs from two parent guinea pigs is, in a very real sense, an experiment in synthetic biology. This type of synthetic biology took its next step forward in the 1970's, when biotechnology allowed the deliberate shuffling of genes. Indeed, Waclaw Szybalski coined the phrase "synthetic biology" in 1974 specifically to describe gene shuffling. More recently, Craig Venter, Hamilton Smith, and others are attempting to completely shuffle all genes in a bacterium, and thereby synthesize a functioning minimal bacterium that is capable of Darwinian evolution. Other efforts in synthetic biology are going farther than simply rearranging genes from existing organisms. For example, George Church at Harvard is attempting to reorganize the coding structure of natural genes. My laboratory has developed artificial DNA, quite different from that found naturally on Earth, and placed it in systems that support Darwinian evolution (although they are not self-sustaining). The farther synthetic biology goes away from what is found on Earth, the more that synthetic biology will tell us about life as a universal. The most information will come from entirely artificial genetic and 3

3

3

29

catalytic systems that support Darwinian evolution. Should a research effort succeed in producing these, it should have as much impact on our understanding of life as a universal "natural kind" as the discovery of a form of alien life that shares no ancestry with terran life. Keep a look-out for developing scientific methods In Chapter 8, we move into the exotic. Following the lead of the National Research Council report on the Limits to Organic Life in the Solar System, we will let our imaginations range further, to ask what kinds of life might not fit our definition-theory. Rock life? Energy life? Dark clouds? What kinds of scientific method available today should be applied to define the limits of such weird life? Should we invest scarce bucks to go looking for it? And if we do, how would we recognize it cost-effectively? The research activities described in these chapters are far from complete. In many cases, we are discussing research paradigms that are not even at the end of their beginnings. This is in part the reason for the excitement in telling these stories at this time. But this is also the reason why exobiology is a useful playground to develop concepts in scientific method and to acquaint you with some of the issues in science that concern method. I thought about trying to develop these concepts within the context of an issue that is presently the focus of social concern, such as the energy crisis, global warming, or dietary prescriptions for better health. I thought better of it. Those issues are so intimately tied with political ideology that serious thinking about the underlying methods is generally obscured by the yelling and chair-throwing. Fortunately, very few people are invested politically in whether alien life exists (although we will spend a few paragraphs in Chapter 7 discussing controversies surrounding potential hazards of synthetic biology). Therefore, we can have a cleaner discussion of scientific methods, standards-of-proof, and strategic decisions made about scientific direction by a scientific community and its culture when we discuss aliens than when we discuss "hot" issues of the day. But keep in mind: The lessons about scientific methods that you learn when you think about aliens are applicable elsewhere. Enough said.

30

Chapter 4 Working Backwards in Time from Life on Earth Today Classification as a scientific method in biology As humans, we have a special advantage as we seek to understand life as a universal: Unlike Galileo, who had no knowledge of the moons of Jupiter during his formative years, every child knows that life exists. Further, humans instinctively distinguish the living from the non-living, and do so long before we know intellectually how challenging it is to formalize a definition-theory that makes this distinction rigorously. Thus, biology is a science with a subject matter. Therefore, if we want to understand life as a universal, we at least have a place to start: the life surrounding us on Earth. Many books offer information describing what is known about terran life. These are written at many levels, from picture books for young children to textbooks designed to train the next generation of biologists. We will not summarize their contents here. Our discussion of life as a universal relies on only a few features of terran biology, and we will explain these as we go along. We start with a simple scientific method: classification. As Heinlein wrote in his book Have Space Suit, Will Travel (1958), library science is basic to all science. Much science begins by an attempt by humans, as librarians, to classify what our species has already observed. Humans are instinctive classifiers. Unfortunately, different instincts give different classifications. For this reason, as we shall see, classification systems and the language that they use often tell us more about the classifier than about the classified. For example, we learn the “animal-vegetable-mineral” classification system early in school. It is associated with Charles Linnaeus, the Enlightenment scientist who based his General and Universal System of Natural History on this three-way division. Leaving aside his exaggerated use of the word "universal" (Linnaeus had no access to extraterrestrials), this classification distinguishes the non-living from the living. It then divides the living in two classes, animal and vegetable. In practice, this division of life is done by inspecting characters, attributes of the entity being classified. As with green emeralds, a character can be color. Or the ability to walk. For example, to make the animal-versus-vegetable classification, we ask: Is the living entity green? Or: Can it hop, jump, or slither? If the entity is not green and can move, it is animal. If it is green and cannot move, it is vegetable. It might be argued that "able to move" is a better character for classifying terran life than "green in color". A frog is green, but can move. As we instinctively believe that a frog is better classified as an animal, mobility must trump greenness. A Japanese maple leaf is red, but the maple tree cannot move. Nevertheless, we instinctively believe that the

Life backward in time

Eucarya Archaea Bacteria

Paleogenetics

Title page of English translation of Systema Naturae (1816) by Charles Linnaeus. This introduced the "animal-vegetablemineral trichotomy.

This frog is green, but can hop, and is "therefore" an animal. Leave of the Japanese maple tree are red, but the tree cannot hop, and is "therefore" a plant.

31

maple is a plant. Immobility trumps non-greenness in our classification scheme. Anthropologists might try to infer our constructive beliefs concerning classification by examining our behavior. If they did, the ease with which we discard classification characters when those characters fail to deliver the classification that we want would suggest that we constructively believe that the concepts "animal" and "plant" are more fundamental than the characters themselves. This is already metaphysical progress, as it suggests that to our minds, "animal" and "vegetable" are the natural kinds. Nevertheless, the characters are useful, as they provide a way to determine the community-accepted name for a living entity. In lamprey classical systematics, students shark are given a classification key in the form of a decision tree. Each frog some node in the tree, corresponds to a other animal no does it have Tyranosaurus question about the animal being a vertebrate bird yes no column? examined. The answer deterhole in kangaroo does it have jaws? yes hip? no mines which branch the students does it have 4 limbs? no yes follow as they navigate the tree. lion does it have an amniotic sac for egg? yes no If the questions are properly andoes it have a synapsid hole for jaw? yes no swered, the students follow the ox does it have a placenta? yes no does it have a stirrup shaped ear bone? yes correct path through the tree to a no human does it have a grasping hand?yes leaf bearing the community-accepted name of the animal. gorilla

Correct and incorrect classifying A classification tree for vertebrates with some key questions. Having jaws or a hole in the jaw are community-accepted classification characters. characters Today, biologists have accepted as a community common sets of characters for animals that permit classification. These came only through struggle, however. As these sets were developed, it was often not clear what characters provided the best classifications. In particular, human instinct did not necessarily deliver the character set that biologists have now come to accept. For example, if the key asks: "Does the animal live in the ocean?" then whales and fish are classified together. If the key asks: "Does the entity fly?", then bats and many birds are classified together. Today, biologists agree that bats should be classified with whales before they are classified with birds. Why? The community answers this question thus: Because whales and bats are both mammals. But this is no answer at all; it simply states that the community thinks that the ability of a mother to suckle her young (a mammal) is a better classification character than domicile on land, air, or water. We now enter a trail of "But why?" questions, not unlike those from a small child. But why does the community think that? Our instinctive understanding of biology does not make this thought obvious. Indeed, this is why the whale-bat grouping must be taught in school (and why some students fail middleschool science). This thought was not delivered by an authority; it certainly is not found in the Biblical story of Jonah (who was swallowed by a big fish). Therefore, we can reasonably ask: Why is the question: "Does the species suckle its young?" better in a classification key than the question: "Does the species live in water?" What objective metric allows us to say that any one character is better for classifying than any other? History can provide an answer to this question.

32

time

time

Natural history provides a "correct" classification scheme sister cousin second today brother cousin To understand how history helps in classifying, we turn to 10 years something closer to home: our family. A "natural" classification of our family would group siblings together before grouping 25 years first cousins. Again, a tree structure makes sense. We naturally place brothers and sisters on adjacent branches of the tree, placing first cousins on branches that separate from the tree 40 years root farther back in time. Lines leading to second and third cousins branch further back in time. The topology of branching within In modern systematics, the correct the cousin sub-trees would reflect relationships similarly. Such classification reflects correctly natural a classification would be defensible, even if your sister "has history, the order in which the animals cousin Edna's eyes" and you do not. We would then seek char- emerged, just as a correct family tree places siblings on adjacent branches. acters (not cousin Edna's eyes) that would allow a student to look at you and your relatives and infer the correct branching. bat bird fish today whale As natural historians came to realize that all animals on Earth 100 Ma were related by common ancestry, they began to constructively suckling believe a historical classification scheme was also better for emerges animal biology. In natural history, bats shared a common an250 Ma amniotic cestor with whales about 100 million year ago ("million years sac emerges ago is abbreviated "Ma"). In contrast, the last common ancestor 350 Ma of bats and birds lived 250 Ma. The lineage leading to bats, root whales, and birds diverged from the lineage leading to fish about 300 Ma. The characters that the community has chosen reflect this history. Suckling is a character that bats and whales share; flying and swimming are not. Birds, whales, and bats are correctly classified together as amniotes. The word refers to the water-impermeable, oxygen-permeable sac that encloses their eggs, truly one of the great wonders of the natural world. History thus gives biology a standard against which to resolve disputes of the form: "Whose classification system is better? Yours or mine?" That resolution accepts the existence of a historical reality, distinct from our classification instincts. If we can discern this reality, we can pick among competing classifications to arrive at one that is independent of our instincts. Cool. So how can we know historical reality from millions of years ago? Unfortunately, an epistemological problem remains even if we agree that the correct classification is the one that accurately models the historical reality. Since we (presumably) cannot build a time machine, we have no direct way to observe the historical past. We cannot observe, for example, the event 100 million years ago when the lineage leading to whales diverged from the lineage leading to bats, after the (batwhale) lineage diverged from the lineage leading to birds. Thus, while we have a standard for defining the correctness in a classification scheme, we seem to have no way to apply that standard. We have no way of knowing whether the key questions that we have chosen infer the correct historical relation between animals. Another science without a subject matter? Fortunately, sedimentary rocks on Earth hold remnants of animals that lived approximately 100, 250 and 350 Ma. We can date these sedimentary rocks using radioactive isotopes found in igneous rocks that are associated with them. We then can use characters that classify modern animals to classify the ancient animals found in those rocks as fossils. Since the dates when the ancient organisms lived are constrained by the dates of the rocks where they were found, we can estimate geological dates for branch points in our 33

classification tree. This is a start towards connecting our classification scheme with the historical reality that makes it correct. An example of this method may be useful. Perhaps you saw the BBC documentary Walking with Prehistoric Beasts? This documentary followed a morning stroll of a leptictidium (a cute furry mammal) mother living 50 Ma in the Eocene. The mom was worried about being eaten by ambulocetus. So what is an ambulocetus? Judging by some of the characters that classify modern whales and modern animals having cloven hooves (camel, sheep, eland, and ox, as well as the non-ruminant pig and hippo), ambulocetus looked more like a whale. Judging by others, ambulocetus looked more like a cloven-hoofed animal. Thus, ambulocetus is a "transitional form". Its name (which means "walking whale") captures its intermediacy between two orders of mammals. The first order of mammals is the "artiodactyls" (the classification key asks the question: Does the animal have an even number of toes on its feet?). The second order of mammals is the "cetaceans" (Is it a whale?). The structure of ambulocetus suggests that artiodactyls and cetaceans are closely related, and diverged sometime before 50 Ma. Creating a classification system for modern animals that is robustly supported by the fossil record is not automatic. Only a tiny, tiny fraction of the vertebrates who have ever lived left a fossil for us to find, and only the hard parts (= bones) of vertebrates are commonly fossilized. But with thousands of paleontologists exploring the Earth for hundreds of years, many fossils have been discovered. Setting aside ongoing disputes over the detailed branching within various parts of the tree, the community has come to settle on a set of classification characters consistent with both fossilized animals and modern animals. Those are the characters that say that the lineage leading to whales (and oxen) diverged from the lineage leading to bats after the (whale-ox-bat) group diverged from the birds. Although uncertainties remain in dating divergence of various lineages (those uncertainties become bigger as we go farther back in time), a consensus has emerged here concerning the approximate dates for the divergence tree of all animal life on Earth. A corresponding consensus has emerged for plants. You may learn more about those consensuses by browsing by the Tree of Life web page (www.tolweb.org). Classification characters make implications about ancestral lifestyles The characters that allow us to correctly infer the natural (that is, historical) organization of animals also provide a statement about the appearances and lifestyles of now-extinct animals that lived millions of years ago. Not perhaps to the level presented by the BBC. But if all animals who suckle their young form a natural class (mammals, meaning that no non-mammal diverged from any mammal after the last common ancestor of all mammals), then that ancestor suckled its young. If all of these animals had hair, then the ancestor had hair. 34

This ambulocetus lived in Pakistan ca. 50 Ma. It is one of many "missing links" that connect natural history to classification keys for modern animals. When key questions are asked, some answers indicate that ambulocetus is a whale (mammal order cetacean), while others say it is a pig (order artiodactyl). Such answers suggest that cetaceans and artiodactyls diverged from a common ancestor, and suggest a new combined order for mammals, the cetartiodactyls, which groups whales with pigs, and oxen.

Hair is taught to be an intrinsic feature of mammal-ness in middle school. I was never happy with this (do whales have hair?), and doubted a science that would instruct youth to accept without question the retort "Well, a whale fetus has hair." Are you sure? Did you look? Hair does not fossilize well. Nevertheless, it can leave imprints. This fossil was left by Eomaia ("dawn mother"), the oldest complete placental mammal fossil, from the Jurassic-Cretaceous, 125 Ma. It is sufficiently preserved that its fur imprint survives. This is direct evidence for what would otherwise be an inference: early placental mammals had hair.

These inferences may include characters that are not observed in fossils. Thus, to infer that a mammal (such as Eomaia) suckled its young, we need not find a fossil Eomaia nursing a baby. This is fortunate; mammary glands, like other soft tissues, are rarely preserved in fossils. Fortunately, characters derived from shapes of bones (which are better preserved) can be used to define mammals in other ways. From these fossilized characters, we infer that features of ancient life that were not fossilized (mammary glands, fur), in anticipation of finding a very rare fossil that confirms those inferences. Incidentally, we are not invoking Occam’s razor, even though we might. We could say that if all modern mammals suckle their young, then it is simpler to infer that the ancestral mammal suckled its young, because (given our concept of simplicity) it is simpler for a parent who suckles its young to produce offspring who suckle their young, than to produce offspring who do not. But that is not our argument. Rather, we are saying that suckling is a feature intrinsic to "mammal-ness". Set within natural history, this means that suckling arose just as mammals emerged, and was present in the last common ancestor of all mammals. If mammals in one lineage subsequently lost hair or the ability to suckle, we would still call that lineage of descendants mammals. Of course, we would no longer use the question: "Does it suckle?" in a classification tree, as it is no longer a reliable classifier).

Table 4.1 Some Geological Names PALEOZOIC

Cambrian, 544-505 Ma. Marine invertebrates, such as trilobites, dominate Ordovician, 505-440 Ma. First vertebrates, jawless fish Silurian, 440-410 Ma. Fresh water and jawed fish appear. Animals/plants on land Devonian, 410-360 Ma. First forests, land vertebrates Mississippian, 360-325 Ma. Coal deposits begin. Sharks and large trees Pennsylvanian, 325-286 Ma. First reptiles. Insects and fern forests spread Permian, 286-245 Ma. Ends with extinction of ca. 90% of species.

MESOZOIC

Triassic, 245-208 Ma. First mammals, birds Jurassic, 208-146 Ma. Age of dinosaurs Cretaceous, 146-65 Ma. Flowers, placental mammal. Dinosaurs "extinct" at end

CENOZOIC The age of mammals

Paleocene, 65-55 Ma. Radiation of mammals Eocene, 55-38 Ma. Whales diverge from cloven-hoofed animals Oligocene, 38-28 Ma. Ruminants emerge Miocene, 28-5 Ma. Climate cooling resumes Pliocene, 5-1.8 Ma. Hominids appear in Africa Pleistocene, after 1.8 Ma. Ice Ages

Inferences about ancient life = inferences about simpler life So far, we have described how biologists practiced their classification scientific method a century ago. You can compare these activities with what physicists were doing in 1900, and wonder why the word "scientist" was used to describe both. Indeed, Ernest Rutherford, first Baron Rutherford of Nelson, a noted physicist who worked at this time, commented on such biology by saying: "In science there is only physics; all the rest is stamp collecting." Rutherford was not complimenting biologists. Why did the community of biologists put up with this? Because their scientific methods led somewhere that the methods from physics could not go. As biologists examined more and more of the life on Earth, biologists inferred more and more of the characters of last common ancestors scattered throughout the "universal tree of life". Calling things found widely on Earth "universal" is, again, a stretch; these biologists never raised the funding needed to launch a spacecraft. But the choice of language reveals underling beliefs. Those scientists were not constructively considering life elsewhere in the universe. This aside, analyses of the terran biosphere that started with a classification of modern organisms and adjusted the choice of characters to include fossils allowed biologists to work backwards in time. Not a time machine, of course, where we might actually observe a leptictidium mom being eaten by ambulocetus 50 Ma. But maybe the next best thing. Another reason exists to tolerate "soft" scientific methods as we dig up fossils to collect "stamps". If we can find enough forms of life on Earth that diverged long enough ago, we might be able to infer characters of very ancient life forms, perhaps even those that lived as long ago as 4000 Ma (four billion years), "soon" after the Earth formed (relatively speaking). This is also soon after life emerged on Earth. 35

Such ancient life forms should be primitive, not burdened by the baggage of historical accidents that "happened" to terran life in the billions of years leading to life on Earth today. Therefore, any information about those primitive life forms might suggest essential features for life as a universal. Not bad for stamp collecting. Unfortunately, the most ancient animal does not even get us close to the origin of the Earth or the origin of life. The fossil record shows that a few different kinds of animals lived perhaps 550 Ma; the "backwards-in-time" approach certainly gets us this far back. But no reliable fossils are found of visible animals from rocks that are much older than this date, and these do not have structural features that allow their classification with any modern order of animals. This suggests that the last common ancestor of all animals (including molluscs) was older than 550 Ma, but probably not much older. Any extrapolation of the physiological change backwards in time would suggest that visible animals were probably not on Earth 1000 Ma. The Earth formed over 4500 Ma, four times farther into the mists of time. And we have no tools to relate animals with other terran life forms (plants and fungi, for example) by looking at fossil bones because the last common ancestor of animals, plants and fungi (evidently) did not have bones. Biomolecular structures also help classify life So where do we go next? Fortunately, two more developments help implement the "backwards-in-time" approach. Chemistry provided the first. Chemistry is not exactly "stamp collecting", at least not at its core. Rather, chemistry is a science that emphasizes analysis, a paradigm that dissects matter into its elements. Analysis as a research strategy is illustrated by chemistry that emerged as This green rock comes the mineral part of the natural world was dissected. Early in civilization, from King Solomon’s mineralogists made the observation that by placing a certain kind of green mines. Heating with rock (one that was not an emerald) in a fire with charcoal, globs of copper charcoal gives the element copper. emerged as a reddish metal. This suggested that copper was an "element" of the mineral. Analogous dissection of other parts of the mineral world discovered other elements. Some discoveries were so important that ages in civilization are named after them. These include the element tin, which enabled the Bronze Age, and the element iron, which enabled the Iron Age. It took time, but chemists eventually recognized that the mineral world is constructed from just 90 of these elements. Some are familiar, like gold and silver. Others are less so, like scandium and indium. Stamp collecting perhaps, but application of the analysis paradigm successfully did what the best philosophical minds of the ancient For a modest fee, you can buy a collection of world failed to do: It determined the essence of the min- elements in their pure form. This box holds my eral world (no, it is not earth, air, fire, and water). At least collection, a gift from my students, with some of the natural and refined elements that I have to one level of reduction. assembled over the years myself. The history of chemistry then became the history of the

36

isolation of elements. Sometimes, isolation was easy. The element carbon, for example, could be isolated simply by heating organic matter (charcoal is essentially elemental carbon). You can do this in your kitchen. Likewise, oxygen is all around. It needed only to be recognized as the element in the air that supported combustion (and life). Nitrogen was the element in air that did not support combustion (or life). Hydrogen became known as the element that, when combined with oxygen in a 2:1 atomic ratio, gave water ("hydro" = water; "gen" = generating). As elements were discovered, it became clear that they could combine in different ways to form different compounds that behaved differently. Water is a potable liquid (at least at room temperature at sea level on Earth) formed by combining two gaseous elements, hydrogen and oxygen, in an atomic ratio of 2:1. Ammonia is a pungent alkali obtained by combining the elements hydrogen and nitrogen in a ratio of 3:1. Methane is a flammable gas obtained by combining the elements hydrogen and carbon in a ratio of 4:1. Applying chemistry to classifying life Biochemistry began when the analysis paradigm from chemistry was applied not to the mineral world, but rather to the animal and vegetable worlds. Living systems clearly contained carbon (they gave charcoal when heated). They also contained oxygen, hydrogen, nitrogen, sulfur, phosphorus, and a few other elements in smaller amounts. Indeed, we teach in middle school that if the human body were combusted, the elements in the combustion products would be valued today at about $5.00. However, the greatest progress in biochemistry was made when the reductionist analysis of terran life stopped before life was fully converted to elements. Instead, analysis was stopped at the level of the molecules in living systems (biomolecules) that were formed from these elements. This analysis began in the second half of the 18th century, just as modern chemistry was identifying the elements of the chemical universe. At that time, the pharmacist Carl Wilhelm Scheele crystallized the first organic molecule (the barium salt of lactic acid) from sour milk. Crystallization was well known in the mineral kingdom. Here was a mineral-like crystal extracted from the animal and vegetable kingdoms. This drove the conclusion that the animal and vegetable could also look like mineral. It took just two centuries for chemistry to be developed to the point where it could solve problems from systematics in biology. But there is a direct path connecting Scheele's work with lactic acid to the sequencing of the human genome, and it is chemistry all the way along.

Carl Wilhelm Scheele was a Galileo of chemistry in the 18th century. He discovered the elements, oxygen, chlorine, barium, manganese, molybdenum, and tungsten, as well as many biochemicals, such as citric acid, lactic acid, and glycerol. He invented a process like pasteurization, and a way of massproducing phosphorus, which allowed Sweden to become a leading producer of matches. A straight line connects Scheele's work to the human genome. Elements in Humans Oxygen Carbon Hydrogen Nitrogen Calcium Phosphorous Potassium Sulfur Sodium Chlorine Magnesium Iron Iodine Fluorine Silicon Manganese Zinc Copper Aluminum Arsenic

65% 18% 10% 3% 1.5% 1% 0.35% 0.25% 0.15% 0.15% 0.05% 0.0004% 0.00004% trace trace trace trace trace

trace trace

All terran life built from the same three biopolymers built from the same sets of building blocks First, chemists discovered that proteins could be isolated from every form of terran life examined, including animals, plants, fungi, bacteria, and viruses. Proteins are polymers (loosely translated, "poly" = 37

many, "mer" = building blocks); all proteins from all known life on Earth are linear strings joining amino acid building blocks. When analyzed, the proteins from all terran life were shown to be built from just 20 amino acids. Further, the amino acid building blocks of proteins were the same 20 regardless of what form of life delivered the protein. Likewise, nucleic acids in the form of ribonucleic acid (RNA) could be isolated from every form of terran life. RNA molecules are another class of polymer. Here, the linear strings were built from four ribonucleotide building blocks. Remarkably, RNA was found in all terran life and, wherever it was isolated, it was built from the same four ribonucleotides. Last, nucleic acids in the form of deoxyribonucleic acid (DNA) were found to be a third biopolymer found throughout terran life. Analysis of DNA showed that DNA molecules are linear strings of joined building blocks; these were called deoxyribonucleotides, and came in four varieties. Again remarkably, the same four deoxyribonucleotides were obtained from DNA from every organism examined. Why is all of terran life built from the same building blocks? So now we can apply the scientific method exactly as it is taught in middle school. We have started with the community observation that all terran life, when subject to analysis, yields three biopolymers, protein, RNA, and DNA. Further, the community has observed that each of these biopolymers is built, in every case, from the same common set of building blocks. Now, to hypothesis formulation. These observations are consistent with only a few classes of hypotheses. First, we might hypothesize that the only chemical system capable of Darwinian evolution has these three specific biopolymers, and that the only types of protein, RNA, and DNA that can support Darwinian evolution are built from these exact sets of blocks. If this hypothesis were correct, then it would imply that inspection of terran life has solved the problem that we set out to solve four chapters ago. Protein, RNA and DNA are the essence of life. We are done-done. Or perhaps this set of biopolymers does not represent the only chemical system capable of supporting Darwinian evolution, but rather represents the only one that could emerge spontaneously out of whatever prebiotic soup was present on early Earth. Perhaps other chemical systems could support Darwinian evolution as well, but there was no way to get them without the intervention of a divine creator. Alternatively, the uniformity of the chemistry of life could indicate that all life on Earth is descendant from a common ancestor, just like the ancestor of all mammals suckled her young and had hair. According to this hypothesis, the ancestor of us all had proteins, RNA, and DNA, and used these 20 amino acids in its ancestral protein molecules, these four ribonucleotides in its ancestral RNA molecules, and these four deoxyribonucleotides in its ancestral DNA molecules. In other words, these building blocks are the ultimate set of characters for classifying life, and all of the life on Earth belongs in one big class. Set in the world-view of Lord Kelvin, a Genesis-like creation, and an Earth that formed 6000 years ago, other hypotheses come to mind. For example, God might have chosen one chemical solution for his bio-creation, not because others were impossible, but because one was sufficient (the "lazy God hypothesis"). Alternatively, the uniformity of building blocks among all life on Earth may have been designed for useful purpose. For example, the fact that vegetable proteins have the same THE LAZY GOD HYPOTHESIS: WHY ALIENS LOOK LIKE amino acids as animal proteins may be have been designed HOLLYWOOD ACTORS WITH PROSTHESES. to allow vegetables to nourish animals. 38

As long as we are being good scientists and correctly applying scientific methods to consider the universe of possibilities as much as our imagination allows, we may consider more exotic hypotheses. For example, vegetables may have originated on Earth using amino acids that are now familiar to us. Animals may have arrived from Mars with proteins built from a different set of amino acids. These animal arrivals would have immediately experienced Darwinian selection pressure to replace their Martian amino acids with terran amino acids, as these were what was now available to eat. Ultimately, Earth-living Martians would evolve proteins built from the same amino acids as the proteins found in the native terran vegetables.

We have not exhausted all hypotheses within the Darwinian world-view, of course. The building blocks might be the fittest, with Darwinian processes evolving over four billion years to select the fittest and reject less fit building blocks. They may have been the best subject to constraints presented by the conditions where life originated. They may be remnants of historical accidents. And so on.

Protein amino acid sequencing supports classification Just as decomposing life to its elements is too reductionist to be the best way to understand life's underlying structure, decomposing proteins, RNA and DNA to their constituent amino acids, ribonucleotides and deoxyribonucleotides (respectively) proved not to be the most informative way to use these biopolymers to understand life's natural history on Earth. In the protein biopolymer, amino acids are strung together in a specific order. This order is called the protein's "sequence". Typical proteins have a few hundred amino acids in that sequence. Analogously, the nucleotides in RNA and DNA are arranged in strings in defined order. RNA molecules can have up to thousands of ribonucleotides strung together. DNA molecules are typically longer; chromosomes can have tens of millions of deoxyribonucleotides strung together in a particular order. These RNA and DNA molecules also have sequences. In the 1950's, analysis technology in chemistry matured to the point where chemists could determine the amino acid sequences of proteins. Chemists then set about sequencing ("sequencing" is now a verb) proteins from any source that they could get their hands on. As always, funding drove the science. For example, many humans like to eat hamburgers made from oxen. Fewer like to eat pancreas. This meant that 20th century biochemists could get a lot of pancreas tissue from oxen cheaply at the local slaughterhouse. Ox pancreas is rich in certain proteins, and these were the proteins that ended up being studied. One of these proteins, called ribonuclease (RNase), is an enzyme that catalyzes the digestion of RNA. It has a mass some 13,000 times larger than the mass of the hydrogen atom and 145 times larger than the mass of the lactic acid isolated by Scheele two centuries earlier. RNase is abundant in ox pancreas, where it is made. It was so abundant that chemists could isolate RNase in amounts large enough to grow crystals A cartoon showing the protein of the protein, crystals analogous to those of barium lactate grown by ribonuclease (RNase). The amino acid sequence is represented by Scheele two centuries earlier. the ribbon, which folds in three A crystal is generally a pure collection of molecules, and a pure col- dimensions. The positions of two lection of molecules is where chemists like to begin any analysis. In this histidine (His) and one glycine case, chemical analysis showed that the RNase protein strung 124 (Gly) amino acids are indicated, amino acids together. Analysis technology was then applied to deter- with the numbers being their mine the sequence of the amino acids in RNase. That sequence was position in the chain. found to be:

39

KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPNCAYKTTQAQKHIIVACEGNPYVPVHFEA SV A word about these letters. The 26 letters in the Roman alphabet is more than the number of amino acids (20) in terran proteins. Therefore, we can represent each amino acid by a single letter, with enough letters to go around. For example, "K" stands for lysine, known to those who visit health food stores. "E" stands for glutamic acid, known as its sodium salt as a flavor enhancer (monosodium glutamate). "T" stands for threonine. Thus, the RNase protein begins with the amino acid sequence Lysine-Glutamic acidThreonine (=KET). Protein sequences contain many classification characters One cannot look at the sequence of the RNase protein and not be struck by the amount of information that it contains. At one level, the information appears to be no more than a jumble of letters. However, in 1963, Linus Pauling and Emile Zuckerkandl realized that the information contained within the amino acid sequences of specific proteins might allow the chemical structures of proteins to help solve problems in systematics in biology. Pauling and Zuckerkandl again returned to the view of natural history that relates all terran life by common ancestry. This view implies that the components of various organisms are descendents of ancestral components found in ancestral organisms. Thus, the RNase proteins in oxen and humans are descendents of an RNase protein that was present in the animal that was the last common ancestor of oxen and humans. This ancestor lived about 100 Ma in the Cretaceous. Each amino acid in a protein can be viewed as a character to determine family relationships of the proteins themselves. The sequences of sister proteins should be more similar to each other than the sequences of first cousin proteins. The sequences of first cousin proteins should be more similar to each other than the sequences of second cousin proteins. And if the family relationships between proteins could be determined, as Pauling and Zuckerkandl conjectured, these might indicate the family relationships between the organisms that provided the proteins. This is a classification scheme, for sure. But the ability to fly or swim have a very direct relation to survival than the amino acids in a sequence. Therefore, they might have arisen independently in biology in different lineages (biologists call this convergent evolution), as they apparently did in bats and birds. Amino acid characters, in contrast, were proposed to be linked only indirectly to survival. Further, with 20124 different amino acid sequences for proteins having the 124 amino acid length of RNase (this number is super-cosmically huge), Pauling and Zuckerkandl hypothesized that the probability that two sequences would converge by random chance would be vanishingly small. Therefore, the sequences of proteins should provide better characters to discern the true, historical relation between terran life forms than their physiology. And further, since the set of amino acid building blocks is fixed at 20 (for whatever reason), biologists doing systematics would not have the opportunity to argue over which character was better suited to discern the true historical relation between animals. By avoiding conferences where biologists threw chairs at each other arguing over whose classification characters were better, this was worth money. 40

The Pauling-Zuckerlandl approach can easily be applied to RNase. The first six amino acids in RNase from oxen are KETAAA. The first six amino acid sequence of RNase from sheep, a close relative of oxen that diverged perhaps 20 Ma, is KESAAA. Not identical to KETAAA, but also not very different. The RNase proteins from oxen and sheep appear, therefore, to be related by common ancestry, just like the oxen and the sheep themselves. The use of amino acid sequences to classify proteins is best illustrated when we add sequences for RNases from a few other mammals, such as buffalo, eland, sheep, deer, camel, pig, hippo, whale, bat, and human, and (while we are at it) chicken (a bird). These are represented as an alignment constructed so that the similarities between the aligned protein sequences are the most perspicuous (Figure 4.1). GETRYEKFLRQHVDHPRTLGLMGHYCAVMLARRQVTAGRCKPSNTFVHAPAEDLVATCTR-chicken | || ||| | || | || ||||| ||||| | | | KESWAMKFQRQHMDPDGYPTNSSSYCNLMMRRRKMTEGRCKPINTFVHEPLVDVQAICLQ-fruit bat || |||||||||| | | ||| || ||||| ||||| |||||| | || | | | -ESPAMKFQRQHMDSGNSPGNNPNYCNQMMMRRKMTQGRCKPVNTFVHESLEDVKAVCSQ-minke whale KESRAKKFQRQHMDSDSSPSSSSTYCNQMMRRRNMTQGRCKPVNTFVHEPLVDVQNVCFQ-human KETAAEKFQRQHMDTSSSLSNDSNYCNQMMVRRNMTQDRCKPVNTFVHESEADVKAVCSQ-hippo KESPAKKFQRQHMDPDSSSSNSSNYCNLMMSRRNMTQGRCKPVNTFVHESLADVQAVCSQ-pig -ETAAEKFERQHMDSYSSSSSNSNYCNQMMKRREMTDGWCKPVNTFIHESLEDVQAVCSQ-camel KESAAAKFERQHMDSSTSSASSSNYCNQMMKSRNLTQDRCKPVNTFVHESLADVQAVCSQ-sheep KETAAAKFERQHMDSSTSSASSSNYCNQMMKSRDMTKDRCKPVNTFVHESLADVQAVCSQ-eland KETAAAKFERQHMDSSTSAASSSNYCNQMMKSRNLTKDRCKPVNTFVHESLADVQAVCSQ-ox KETAAAKFQRQHMDSSTSSASSSNYCNQMMKSRNMTSDRCKPVNTFVHESLADVQAVCSQ-river buffalo KETAAAKFQRQHMDSSTSSASSSNYCNQMMKSRSMTSDRCKPVNTFVHESLADVQAVCSQ-swamp buffalo Figure 4.1. The first 60 amino acids in the sequences of ribonucleases (RNases) from some vertebrates. Each letter represents one of 20 amino acid building blocks. Blue letters represent amino acids that have been conserved in RNase in the 250 million years since birds diverged from mammals. Red letters represent characters that show that bats are correctly clustered with mammals, not birds. Vertical lines show amino acids that are identical in bats and birds (there are 25), and in bats and whales (there are 41). Chemistry supports classification in biology.

What can we say from these sequences? First, they are quite similar to each other. A bit over a third of the amino acids are the same in all proteins in the 60 sites shown in this alignment (letters in blue). This implies that the amino acids at those sites have been conserved in the 250 million years since birds diverged from mammals. Further, the sequence of RNase from bat is more similar to the sequence of RNase from whale (41 identical amino acids from these 60 sites) than the bat sequence is to the sequence of RNase from bird (just 25 identities). Fewer hypotheses are consistent with the similarity in the amino acid sequences of RNases from these animals than are consistent with the observation that these animals all use same amino acid building blocks. Clearly, more than one protein sequence can function as an RNase; indeed, it appears that many sequences can in these different animals. Sequence similarity is not necessary to permit one animal to feed on another. After all, protein food is digested in the stomach to amino acids; it makes no difference what the amino acid sequence was before digestion. The only hypotheses that adequately explain the similarities in these RNase sequences from various animals use models for the natural history of these proteins set within the natural history of these animals. The sequence similarities imply that all of these RNase proteins share a common ancestor. In the words of biologists who do molecu- An eland, a close relative of the ox. 41

lar evolution, the sequences are all homologous. As before, the alignment is correct if and only if it accurately represents the history of individual aligned sites. The amino acids in aligned sites must be "descendents" of a single site in the ancestral protein.

d

c

b

a

river buffalo swamp buffalo ox eland nilgai impala thompson's gazelle bridled gnu topi goat

moose g Evolutionary trees roe deer reindeer Common ancestry is further supported by an analysis of e red deer f fallow deer the details of the similarities and differences among the h pronghorn antelope i various RNase sequences. For example, the sequences of giraffe j the RNases from ox and buffalo are more similar to each bovine seminal other than either is to the sequence of the RNase from plasma eland. The sequence of the RNase from eland is more camel, acidic camel, basic similar to the ox and buffalo sequences than any of these are to the sequence of the RNase from sheep. And hippopotamus pig RNases from the set (ox-buffalo-eland-sheep) are more 50 40 30 20 10 0 similar to each other than any are to the sequence of the million years before present (approximate) RNase from camel. Evolutionary tree for cloven-hoofed mammals Just as in human genealogy and systematics, these rela- (artiodactyls) from analysis of their RNase setionships can be represented in a tree. This tree provides a quences. Letters at nodes designate intermedigraphical model for the history of this protein family. The ates in the evolution of the protein family. The leaves of the tree represent modern sequences from con- whale sequences is an outgroup in this tree, and temporary organisms. Each internal node, or branch is not shown. Also not shown are the bat, human, and bird sequences, which diverged still point, in the tree represents a protein that was the last earlier than 50 Ma. The eland is shown below, common ancestor of the proteins at the leaves above it. in case you have never seen one. It looks sort of Thus, the RNases from oxen and buffalo are siblings, and like an ox. these are both first cousins to eland RNase. And so on. Remarkably, the tree relating the species based on analysis of RNase protein sequences has essentially the same topology as the tree given by classification keys that analyze the bones of oxen, buffalo, elands (which looks like an ox), sheep, and their fossilized ruminant ancestors. Many ancient ruminants were herd animals, just like many modern ruminants. Thus, they left many fossils in the 40 million years since the lineage leading to modern camels diverged from the lineage leading to sheep, eland, and bovids, so many that fossils of ruminants can be purchased on eBay (many are authentic). Because of the abundant fossil record representing ancient ruminants, natural historians did not actually need the RNase sequences to establish the history of the ruminants that made them. However, the correspondence between the RNase sequence tree, the fossil tree, and traditional classification supports the use of sequence-based trees to infer details of natural history when the fossil record is less complete.

The sequences of ancestral proteins can be inferred from the sequences of descendent proteins In addition to inferring the connectivity of a tree describing the family relationship of RNase proteins, we can use the sequences of modern RNases from extant ruminants to infer the sequences of ancestral RNases from the now-extinct ruminants. In making an alignment, we already used an Occam's razor argument. We assumed that the correct alignment made the similarities between two sequences most perspicuous. Thus, the alignment process itself assumes that the correct alignment is the one that has the fewest differences (the Occam criterion) between amino acids at the aligned sites. Our next goal is to infer the amino acid sequences in ancestral proteins (represented by nodes, or branch points, in the tree). The best ancestral sequence, under Occam, is the one that gives the derived proteins 42

37 38 39 40 41 42 43 with the fewest changes. This is often called a Arg Asn Met Thr Lys Asp Arg Pachyportax rule of "minimum evolution" or "maximum parsimony". This rule is based on a model of evoluArg Asp Met Thr Lys Asp Arg Eland Met39Leu tion that holds that the absence of an amino acid Ox Arg Asn Leu Thr Lys Asp Arg Asp38Asn replacement is more likely than a replacement (an P River Arg Asn Met Thr Ser Asp Arg Buffalo Occam's razor argument). As the RNase seLys41Ser quences of human and ox still bear striking simiSwamp Arg Ser Met Thr Ser Asp Arg Buffalo larities, this seems reasonable. 37 38 39 40 41 42 43 Asn 38 Ser Let us now apply this "backwards in time" rea- Amino acids inferred for ancestral RNases. To better soning to infer sequences of RNases from a few connect amino acids with what is found in health food ancient ruminants that are now extinct. We ex- stores, we use 3 letter abbreviations. "Arg" is the amino tract the part of the tree that includes the buffalos, acid arginine (one letter code R), Asn is asparagine (N), Met is methionine (M), Thr is threonine (T), Lys is oxen, and eland, and consider just alignment sites lysine (K), Asp is aspartic acid (D), Leu is leucine (L) 37 through 43. We focus on the node labeled "P", and Ser is serine (S). Changes along individual branches the RNase in the last common ancestor of oxen, are noted by a transecting double line, with the starting swamp buffalo and river buffalo. What can we amino acid, the site number, and the mutant amino acid. infer about the amino acids in this protein from Note that the amino acid at the node in the tree marked P is hypothesized to have had a Met at position 39. This an animal that lived ca. 4 Ma? Met was converted to a Leu in the branch leading to ox, At site 37, inference under an Occam rule is as indicated by a Met39Leu label. easy. RNases from the ox, swamp buffalo and river buffalo all have an Arg (the amino acid arginine, or R) at site 37. Therefore, it is most parsimonious to infer that the RNase from ancestor P also had an Arg at site 37. This way, the R in the descendent sequences can all be derived from the Arg in the ancestral sequence with no changes at all. In a cartoon version of the historical model, the ancestral organism had two calves, one of which founded a lineage leading to oxen, the other founding a lineage leading to buffalo. The first baby bovid had an Arg at site 37 of RNase, as did its babies and its babies, all of the way to the ox that ended up in your hamburger. The baby founding the lineage to buffalo also had an Arg at position 37 of its RNase, as did its babies and its babies. Right down to the buffalo that pulls a plow in Asia. Minimal motion. No amino acid replacements must be invoked to explain the common Arg at position 37 in all modern bovid RNases. In contrast, at site 39, ox RNase differs from the others in having a Leu (leucine). All other RNases have Met (methionine) at that site. A fragment of horn core and Therefore, a model for the history of site 39 that has no changes at site skull of Pachportax latidens, 39 is not possible. At some point in the evolution of RNases, the amino from (now) Pakistan. This approximates the acid at site 39 was replaced. A most parsimonious model (having one bovid ancestor of the buffaloes and replacement at site 39) infers that Met was present in ancestor P. The modern oxen, and is rendered model then requires just one replacement to account for the diversity of artistically below. amino acids at site 39 in the RNases from ox, swamp buffalo and river buffalo. Some time in the evolution of the modern ox from ancestor P, a mutant was born whose protein had a Leu replacing the Met in the RNase that had been present in its mother. And that mutation managed to be distributed in a population until it gave rise to the oxen that were converted into your hamburger.

43

So what did the animal that held the RNase P look like? The fossil record is incomplete. Therefore, one can essentially never find in the fossil record a bone that came from the individual, or even the population of individuals, that generated two divergent lineages that eventually evolved to give two descendent species. Nevertheless, one can seek fossils that approximate this last common ancestor. To do this, I traveled to museums in Basel (with a fine collection of fossil ruminants), the Musée d'Histoire Naturelle in Paris (where many original fossils collected by Cuvier are on display) and the British Museum of Natural History in London. There, I met curator Alan Gentry. In the musty shelves in the back room, Alan pulled out a horn core of the genus Pachyportax. Not much left of the beast from Pakistan, but we decided that this was the closest fossil that we had to the animal represented on the tree by the letter P. History as a tool to understand biological chemistry The reader might at this point ask: So what? We can reconstruct on paper the family relationships of proteins, such as RNases, from an analysis of their amino acid sequences. We can infer the sequences of ancestral proteins. We can reconstruct the history of amino acid replacements in various protein lineages. We can even infer the sequence of ancient RNases. But what good is this? Is this not just arcania? Hardly. Any system, natural or human-made, can be better understood if we understand both its structure and its history. We would not understand the computer QWERTY keyboard, the Windows operating system, or the Federal Reserve Bank (for example) if we simply deconstructed each into its parts. An understanding of the history of each is essential to an understanding of the systems themselves. Our definition-theory of life, a self-sustaining chemical system capable of Darwinian evolution, has two components within it: chemistry and history. It has taken us 14 pages in this chapter to get here, but we now have the tools needed to make the connection between chemical structure and natural history in the terran biosphere. This is big. According to theory, Darwinian evolution at the molecular level arises from natural selection imposed upon random variation in the sequences of the DNA molecules, chemicals that encode the sequences of the proteins (like RNase) found in the organism. We will not care much about which of two similar alternative trees is more correct (we are happy with an approximate tree). We will not care much about whether individual amino acid replacements (like the Met39Leu replacement in the lineage leading to ox RNase) helped the mutant baby survive better, or whether this mutation had no impact on survival (which experts in the field call "neutral") and became fixed in the population by accident. These are interesting questions, but for the experts. Rather, we want to show how the connection between chemistry and natural history can be used to understand the life that we have before us. Then, we want to show how this combination of chemical analysis and history allows us to begin to understand life as a universal, as best as the terran record allows. An analogy with the history of languages We begin with another analogy, here between scientific methods used in natural history and scientific methods used in the field of historical linguistics. Historical linguistics studies the sequences of sounds (represented by letters) within words taken from contemporary languages. From these sound sequences, linguists identify languages that are related to each other by common ancestry. For example, the words for "one", "two" and "three" are strikingly similar in languages as diverse as Latin, Greek, German, Russia, Celtic, and Sanskrit. This led historical linguists in the 19th century (just as Darwinian evolution was coming up to speed) to infer that these languages should all be classified as "Indoeuropean". Indoeuropean languages were set apart from other languages in the same general geo-

44

graphical space, such as Basque (in northern Spain), Finno-Ugric (in Finland and Hungary), and Semitic (in the Middle East and northern Africa). Historical linguists argued that this classification was correct because it correctly represented a historical reality, one that hypothesized that all Indoeuropean languages are related by common ancestry. But historical linguists did not stop there. By applying their own versions of Occam's parsimony and general rules of language change, historical linguists inferred the sequences of sounds in words in languages ancestral to various Indoeuropean tongues. Such inferences were made using a process that is quite analogous to the process that we just used to infer the sequences of the RNase from the now-extinct Pachyportax. Historical linguists looked at similarities and differences in the descendent words across the languages. They generated empirical rules that described patterns in language change. They then used these rules, together with a parsimony (the Occam criterion), to infer the structures of words in extinct languages.

Indoeuropean Numbers Language Numbers English one two three German eins zwei drei Greek hels duo treis Sanskrit ekas dva trayas The similarities between simple numbers in various Indoeuropean languages are too great to be accounted for by random chance. Rather, those similarities indicate a common ancestral language. Analogous words from Japanese, not an Indoeuropean language, makes the similarities between the Indoeuropean words more obvious: In Japanese, hitotsu (one); futatsu (two); mittsu (three).

Figure 4.2. Evolution of the word for "snow" in Europe-centered Indoeuropean languages. From the existence of a word for "snow" in inferred ancestral Indoeuropean, we infer that Indoeuropeans lived in a place where it snowed. This is therefore an example for how a model for a past structure can tell us something more than simply the past structure.

Many words were reconstructed for the common ancestral language, called Proto-Indoeuropean. For example, the w Proto-Indoeuropean word for "snow" (*sneig h-, the asterisk means that the word is inferred, rather than found in a written text; the superscript "w" implies a particular sound value) was reconstructed from the descendent words for "snow" in the descendent Indoeuropean languages (German schnee, French neige, Irish sneachtu, Russian sneg, Sanskrit snihyati and so on). Other features of the histories of these languages, such as the universal replacement of "sn-" by "n" in the Romance languages, can also be inferred from this analysis. The reconstruction of the ancestral Proto-Indoeuropean language provided paleoanthropological information as well, about the people who spoke that language. For example, Proto-Indoeuropean evidently had no words for the elements "gold" or "silver" (or for that matter, copper, tin, or iron). This suggests that those elements (and the appropriate cultural transformations that they wrought) were discovered only after the various people speaking the various descendent languages had dispersed across thousands of miles from Ireland to India. However, the existence in Proto-Indoeuropean of a word for snow (with some concessions: The Sanskrit word actually means "he gets wet") tells us something about Proto-Indoeuropeans: They lived in a place where it snowed.

45

The application of the scientific method of the historical linguist to paleogenetics is directly analogous. Here, the fact that we can reconstruct the sequence of RNases from the last common ancestors of ox, buffalo, and eland, or the last common ancestor of ox, whale, and human, or the last common ancestor of ox, human, bat, and bird, means not only that these common ancestors all had RNases, but also that they all had RNA, and were all able to metabolize that RNA. A small inference for sure, but we have some 40,000 families of proteins in mammals to work with. So if we can make small inferences for each of these families, perhaps a very comprehensive set of inferences would emerge about ancient mammals. This would tell us much about our ancestors, and explain much about us living today, with our QWERTY keyboards and Federal Reserve banks. It will also tell us about the behavior of Darwinian evolution at the chemical level, two parts of our definition-theory of life as a universal. What scientific methods must we apply to extract those inferences?

Table from 2045 BCE arranging for a purchase of an ox for silver. Comparison of the words for silver in modern languages dates the first use of silver to a time after the divergence of principle Indoeuropean languages, and before this "fossil". The universal was, of course, the need for funding (here, silver). About as much history transpired between the writing of this table and the payment of Judas in silver, as between that payment and today.

Ribonucleases, "just-so stories", and the scientific method Let us return to RNases, mentioning again that these proteins digest RNA. RNases are made in the pancreases of ruminants (animals with cloven hooves who chew their cud) and secreted into the ruminant digestive tract. RNases are abundant in the digestive tract of ruminants (but not in humans). Further, because RNase is secreted from the pancreas into the digestive system (where powerful juices that digest dietary protein are also present), digestive RNases are themselves quite stable against digestion (as proteins go). RNase even survives in dilute sulfuric acid for a time. These behaviors certainly made RNases easy (as well as inexpensive) to isolate and study. This is good for funding, and explains their prominence in biochemical research. But why does the digestive tract of oxen contain such large amounts of RNase? According to the Darwinian paradigm, such "Why?" questions have only one answer: Because oxen having digestive RNase are fitter than oxen lacking digestive RNase. That is, the axiomatic Darwinian explanation requires that oxen having digestive RNase were more likely to survive, get married, and have kids than oxen lacking digestive RNase. The alternative explanation (oxen have digestive RNase for no reason at all) is excluded under molecular Darwinian theory by the fact that the siblings, cousins, and ancestors have had digestive RNase for millions of years. If digestive RNase did not confer fitness, then it would have been lost in this time through the imperfections that characterize reproduction in any Darwinian chemical system. But this answer simply raises another in the series of "But why?" questions that characterize good scientific method. But why does digestive RNase make oxen (and their sisters, cousins, and aunts) fitter? In 1969, Eric Barnard, at the State University of New York in Buffalo, proposed a hypothesis to answer this question. Barnard noted that pancreatic RNase was abundant primarily in ruminants and certain other herbivores. While humans have various RNases, those are not present in the human digestive tract. Barnard therefore used one element of "the scientific method" as we teach it in middle school: correlation. However, Barnard did not stop there. He offered a hypothesis that connected the observed correlation with Darwinian theory. He suggested that the ruminant digestive system created a special need for a digestive RNase. He suggested that if that need were not met, the ruminant would be less likely to survive, get married, and have children. 46

Ruminant digestive physiology is considerably different from human digestive physiology. Ruminants have multiple stomachs that serve as vats to hold fermenting bacteria. Oxen deliver grass to these bacteria, which produce digestive enzymes that the oxen cannot. Important among these are enzymes that digest cellulose, which is abundant in grasses. The bacteria digest the cellulose, converting its carbon into a variety of products, including low molecular weight fatty acids. The fatty acids then enter the circulation system of the ruminant, providing energy. Then, the oxen digest the bacteria downstream in the digestive tract for further nourishment (need we be more graphic?). Fair enough. According to Barnard's hypothesis, this digestive A woodcut from Kipling's just-so story explaining the wrinkly skin physiology answers a "But why?" for the large amounts of digestive and bad temper of the rhinoceros. RNase seen in ruminants. Rumen bacteria are packed with RNA. Indeed, between ten and twenty percent of the nitrogen in the diet of a typical ruminant enters the lower digestive tract in the form of RNA from rumen bacteria. Therefore ruminants need digestive RNases. Therefore, ruminants have digestive RNases. In contrast, we humans get very little nutrition from the RNA in bacteria. This seems like a reasonable model-explanation for the correlation. But one of the peculiarities of natural historians is their cultural fear of something called a "just-so story". These are exemplified by stories written by Rudyard Kipling around 1900 that had titles like: How the Rhinoceros got his Wrinkly Skin. In this particular story, a rhinoceros having smooth skin (which buttoned beneath) stole a fruitcake from a Parsee (a practitioner of Zoroastrianism) who lived on an island in the Red Sea. To retaliate, the Parsee put stale crumbs from another fruitcake in the rhinoceros' skin after he took it off to go swimming a few weeks later. The thieving rhino failed to notice the crumbs when he put his skin back on and, rather than simply shaking the skin out when the crumbs itched, he scratched himself against a tree until his skin was wrinkled and the buttons fell off. And so (if "so" is the appropriate word to use here) ever since, rhinos have had wrinkly skins and bad tempers. A just-so story is the archetype of an ad hoc explanation. The events behind a "just-so story" (the Parsee, the thieving rhino) cannot be independently verified. Further, the story could easily be replaced by a different story, just as compelling, had the observations needing explanation been the opposite. For example, if the rhino had smooth skin and a pleasant disposition, the story may have been set in a vacation retreat on the Red Sea where they served free fruitcake. Thus, a just-so story is not an explanation at all. It puts no constraint on our understanding of the natural world. In the cultural wars between biologists who focus on living organisms (like the living moose in Chapter 1) and dead organisms (like the dead and dissected moose in Chapter 1), biologists who examine biochemicals think that their approach to biology is "harder" than the approach of biologists who studied moose droppings. Accordingly, biologists who studied moose droppings were defensive and sought to impose intellectual discipline on their own. First on their list was to exclude just-so stories, or anything that might resemble one, from their academy. Was the Barnard hypothesis merely a just-so story? Had the facts been the opposite, if digestive RNase were abundant in the human digestive system and scarce in ruminants, could Barnard not have generated an equally convincing explanation for that different set of facts? "Scientific method" since Galileo has held that experiments can be used to tighten up intellectual discipline. But how would we test Barnard's hypothesis experimentally? We might synthesize an ox that lacked a digestive RNase, push him onto the savannah, and see how well he survives and reproduces. But 47

this is beyond anything that we can fund. We need a Galileo experiment, analogous to rolling balls, that can indirectly support or deny Barnard's hypothesis. Something that we can do now with today's funding. Bringing experiment to bear on historical hypotheses With these thoughts in mind, we set out in the 1980's to develop the field of experimental paleogenetics. Our goal was simple: to bring experimental method to bear on such "Why?" questions, without needing to build our own personal ox from scratch. Here is how paleogenetics experiments were used to add something to the Barnard hypothesis, to make it less of a just-so story. As discussed above, available RNase sequences were adequate to support the inference, with little ambiguity, of the sequence of the RNase represented (approximately) by the fossil ruminant Pachyportax. In fact, enough sequences were available to go farther back in time, generating sequences of ancient RNases from a variety of ancestral ruminants. These included an early antelope such as Eotragus, which lived in the Miocene, early deer, and so on. We then sequenced more RNase genes from material that we got from the San Diego Zoo. Then, Nathalie Trabesinger-Ruef, Katrin Trautwein-Fritz, Jochen Opitz, Mauro Ciglic, Joseph Stackhouse, and Thomas Jermann had enough sequences from modern ruminants to infer the sequences of RNases all of the way back to Archaeomeryx, the mammal in the fossil record believed to represent the first ruminant, and further back to Diacodexis, a cloven-hoofed predecessor of Archaeomeryx that was not a ruminant. Diacodexis may have been an ancestor of the ruminants, pigs, and (approximately) whales, the last two also not being ruminants. With the inferred ancestral sequences on paper, we used recombinant DNA technology to bring these ancient proteins back to life. Genes were synthesized that encoded the inferred ancient RNase sequences. The synthetic genes were then cloned in bacteria. The bacteria then used the information in the synthetic genes to make the ancestral RNases from the long extinct ancestral animals. The ancient RNases were isolated from the bacteria and purified. Some were crystallized. This was not quite a time machine, but this method did bring back to life Leptomeryx, an early rumiRNases that had not been on Earth for some 50 million years. These nant related to ArRNases were in our hands, available for study by experiment. At last, a chaeomeryx. This fellow lived in South Dakota in science with a subject matter. We first set out to see if the evolution of behavior within ancient RNases the Oligocene, ca. 40 Ma and, according to the as biochemicals was consistent (or not) with the answer that Barnard of- Barnard hypothesis, needed fered to the "Why?" question. Again, the community's classification sys- a digestive RNase. tems suggested that Archaeomeryx was the first ruminant artiodactyl. Its predecessor, Diacodexis, was a non-ruminant. Add to this some geological dates, and one concludes that ruminants arose on Earth ca. 40 Ma, some time after Diacodexis lived but some time before Archaeomeryx, along the lineage representing point i in the tree and point g. Next, we drew on a scientific method that is widely used in Diacodexis, an ancestor of paleontology: Function can be inferred from form. Thus, if Tyrannosaurus the pig, the camel, and all ruminants, but that lived in rex has teeth suited for tearing meat, then we infer that T. rex ate meat. the Eocene ca. 50 Ma We then added a concept from geology's scientific method, captured in before ruminant digestion the aphorism that "the present is the key to the past". Present-day digestive arose. Diacodexis was not a RNases act best on the kinds of RNA that are found in the digestive tracts ruminant and, according to Barnard's hypothesis, did of present-day ruminants: They act less well on kinds of RNA not found not need a digestive RNase. 48

there. Further, as noted above, modern ruminant digestive river buffalo Pachyportax ruminant swamp buffalo RNases are themselves much more stable against digesox a artiodactyls eland b tion than typical proteins (a "digestive behavior"). nilgai Eotragus c impala digestive d We then measured in the laboratory the ability of the thompson's gazelle behavior bridled gnu topi resurrected ancient RNases to act on digestive and nonArchaeogoat meryx digestive RNA, and their own stability against digestion. moose non-digestive g Paleomeryx roe deer behavior The observations made on the resurrected RNases proreindeer i f e red deer vided a confirmation of Barnard's hypothesis. All of the fallow deer j RNases back to Archaeomeryx behaved as expected for pronghorn antelope Diacodexis giraffe digestive enzymes. They were stable against digestion. camel, acidic non-ruminant They acted well on digestive RNA. They acted poorly on camel, basic artiodactyls forms of RNA not found in the digestive tract. hippopotamus pig In contrast, RNases coming from organisms that lived 50 40 30 20 10 0 before Archaeomeryx did not behave like digestive enmillion years before present (approximate) zymes. The more ancient enzymes were themselves more The letters at nodes in the tree represent the easily digested by stomach enzymes. They had less activ- ancestral RNases that were resurrected in this ity against RNA found in the digestive tract, and had paleogenetics experiment. Throughout the tree, more activity against types of RNA not found in the di- RNases represented by red had digestive behavior, while RNases represented by blue did gestive tract. not. This suggested (under the "function The observation that RNases displayed (in today's labo- follows form" rule) that RNase evolved a ratory) digestive behaviors when they were resurrected digestive function 40 Ma during the episode of from the ruminant Archaeomeryx and its ruminant de- evolution that connects RNase i to RNase g. scendents led to the inference that those RNases were This correlates with the paleontologist's view digestive enzymes. The observation that RNases resur- from standard classification keys that ruminant digestion arose as a physiology also ca. 40 Ma rected from non-ruminant animals ancestral to Ar- in this lineage. Note the position of the moose chaeomeryx did not display digestive behaviors drove the at the center of the tree. inference that these more ancient RNases were not digestive enzymes. This also implied, therefore, that digestive behavior in RNases arose in natural history when ruminant digestion arose. This was exactly as expected under Barnard's hypothesis: Digestive RNase is present in the ox pancreas because oxen are ruminants. Of course, we are still using correlation as our scientific method. Our correlation is, however, different from the correlation that prompted Barnard's hypothesis in the first place, and covers quite different subject matter. The properties of the ancestor of the ox and the moose that we are discussing would, if we possessed a time machine, come after we killed our prehistoric artiodactyl. Correlations do not prove anything. But, as discussed in Chapter 1, proof has little to do with the practice of science. Experiments end when the community decides that enough has been done to meet a standard of proof to allow the community to move on. Increasing the number of interconnecting observations and correlations helps the community come to that decision with respect to RNases. The fact that paleogenetics allows some of those correlations to be testable by experiment makes this so much the better. Putting a planetary perspective on paleogenetic data These paleogenetic experiments converted the Barnard hypothesis (or, pejoratively, the Barnard just-so story) into a broader narrative that interconnects ruminant physiology, the molecular behavior of RNases, and the historical changes in features of mammals with cloven hooves (artiodactyls). But we need not stop here. We can tie this narrative into other facts drawn from natural history, to make the narrative still 49

broader. Remember, the greater the number of interconnecting narratives, the more likely the community will end experiments and move on to the next problem. Let us return to the next in the trail of "But why?" questions. But why did ruminants emerge 40 Ma? It helps in considering this question to realize how important ruminants have become on Earth over the past 40 million years, especially in competition with another order of herbivores known as perissodactyls. Perissodactyls are animals with an odd number of toes on their feet. Perissodactyls are March of artiodactyls in the Paris Jardin represented in the modern biosphere by just three species des Plantes. Today, more than 200 species groups, the horses, the tapirs, and the rhinoceroses. But in the of ruminants co-habit the planet with us. tropical Eocene, 50 Ma, some 250 species of perissodactyls All emerged since the Oligocene over the lived. In other words, in just 50 million years, ~99% of the past 40 million years. perissodactyls have gone extinct. In contrast, the number of species of artiodactyls, who have an even number of toes (oxen, buffalo, sheep, camels, pigs and hippos, for example) has increased to over 200. This major extinction and reworking of the large vertebrate herbivore ecosystem correlates (there is that word again) with dramatic changes in Earth’s climate that began in the Oligocene, a geological epoch that started 38 Ma. This climate change converted a planet that was covered with tropical and temperate forests (Antarctica was forested) into the cold and relatively dry Earth that Why has the Earth cooled drawe have today. Since the Eocene, the mean temperature of Earth matically in the past few million years to give Ice Ages? The has dropped by approximately 17 °C (31 °F, 17 Kelvin). isthmus of Panama was formed In turn, this drop in temperature almost certainly helped grasses over the past 20 million years by emerge over much of the globe. Grasses became the predominant continental drift. This isolated the source of vegetable food on the prairies of Nebraska, the steppes of Atlantic and Pacific oceans, forcing Asia, and the savannahs of Africa. As the forests and rain forests re- a reworking of ocean currents and global energy transfer. The ceded and grasslands emerged, the interactions between herbivores consequence: a rapidly changing and their foliage changed. environment that, in addition to Grasses offer poor nutrition compared to other types of flora. Try selecting grass and ruminants, it. Eat some grass and compare the experience with eating whatever conferred fitness on species that leafy vegetables you have in your kitchen from the grocery. evolve by making tools: humans. Humankind is developing a kind of Ruminant physiology appears to have substantial adaptive value evolution, more Lamarckian than when eating grasses. And therefore, RNases. Darwinian, which may require Correlations are everywhere in these interconnecting narratives. revisiting our definition-theory of Why RNase? The record of natural history, supported by life. Thus, planetary change is responsible for human features, paleogenetics experiments, says that RNases are needed for tool-making, consciousness, and ruminant digestion. But why ruminant digestion? Because it is self-awareness. Had there been no needed to digest grasses? Why grasses? Because of the planetary Ice Ages, we would not be here. cooling that characterizes Earth's recent history. And why the global cooling? This is an active area of geological research, but current thinking blames the drift of Antarctica away from other continents and the later formation of the Panama land bridge that connects North and South America. The first permitted circumpolar currents to isolate Antarctica. The 50

second isolated the Atlantic and Pacific oceans, redirecting ocean currents. Together, these are the most dramatic changes in the climate wrought by geological change in recent times. These correlations create a narrative that extends from the protein to the planet, and involve interconnecting narratives from many disciplines. They eventually become convincing. Just when they become convincing to you depends on your culture. And this is just one of the many narratives that can be built from the 40,000 families of proteins from modern animals.

Mikhail Matz resurrected a set of ancestral fluorescent proteins from corals, and used these to paint an evolutionary tree that shows how the color emitted by these proteins changed over time.

A paleogenetics community has developed its own standards-of-proof Resurrected genes and proteins has caught on as a way to explore origins in physiology. For example, Belinda Chang resurrected ancestral visual proteins that work in the eyes of birds to detect light. This paleogenetics experiment took her back in time to the Jurassic. From the behavior of the resurrected proteins, she drew inferences about the vision of bird-dinosaurs that lived 150 million years ago. Joseph Thornton resurrected proteins that bound steroids in animals that lived 500 million years ago. Mikhail Matz resurrected ancient proteins that cause corals to fluoresce, and showed how their color changed. As these cases were developed, certain problems were encountered. For example, it is not always possible to say with certainty which amino acid was historically present at a specific position in an ancestral protein. This is called ambiguity, and needs to be managed. The simplest way to manage such ambiguity (if, for example, one does not know if site 102 holds a P or a Q) is to resurrect two candidate ancestral proteins, one having a P at site 102 and the other having a Q at site 102. The experiment then measures the behavior of both ancestral candidates to see if the measured behavior is robust with respect to the ambiguity. The community has worked over the past few decades to create metrics, and standards. Disputes still remain, and some are heated. But we are observing a process where a community coalesces around a scientific method after a new field was established. This is success: the development of a new science. Today, over two-dozen examples of paleomolecular resurrections have addressed questions in biology. Table 4.2 lists many of these done in the 20 years since paleogenetics emerged as an experimental science. Extant genes

Table 4.1. Examples of molecular resurrections

Digestive ribonucleases Digestive ribonucleases Lysozyme L1 retroposons in mouse Chymase proteases Sleeping Beauty transposon Tc1/mariner transposons Immune RNases Pax transcription factors SWS1 visual pigment Vertebrate rhodopsins Fish opsins (blue, green) Steroid hormone receptors Yeast alcohol dehydrogenase Green fluorescent proteins Isopropylmalate synthase Isocitrate dehydrogenase Elongation factors

Ancestral organism

Ancestor of buffalo and ox First ruminants Jurassic bird Ancestral rodent ancestral ammal mammals ancestral fish Ancestral fish Ancestral primates Ancestral paralog ancestor of bony vertebrates Jurassic bird Fish Ancestral paralog Dinosaur yeast Ancient bacteria Ancestral eubacteria Ancestral archaebacteria Very ancient eubacteria

Approximate age (million years) 5 40 10 6 80 10 10 31 600 400 240 30-50 600 80 ca. 20 2000 2000 2500

Laboratory Benner Benner Wilson Hutchison Husain Ivics Ivics Rosenberg, Benner Sun, Merugu Shi & Yokahoma Chang Chinen, Kawamura Thornton Benner Matz Dean Iwabata et al. Benner/Gaucher

51

The microscope created new classification problems Whatever the challenges faced by Linnaeus to select characters to classify animals according to their natural history, those challenges became greater as the Earth was further explored in subsequent decades. Fungi, for example, presented a special kind of challenge. They are Two pictures of the same ornot green (a character that indicates the ability of an organism to ganism, the fungus Physarum photosynthesize, creating organic molecules from carbon dioxide and polycephalum. The left picture sunlight). But nor could they walk like animals to get their carbon. shows the organism in its feedAnimal or vegetable? And how were we to classify the slime molds? At ing stage, where it moves as a slime to capture food. The right times in their life, slime molds crawl across the forest floor (an picture shows the organism animal?). At other times, slime molds stop crawling, send up stalks, and when it is fruiting. Animal or vegetable? look like a vegetable. These types of questions led to disputes where otherwise distinguished scientists would end up throwing things at each other during conferences. Often, such disputes were managed simply by separating the disputing parties. Universities had departments of botany separate from departments of zoology. These separated the vegetable scientists from the animal scientists in the academy. In many cases, these divisions remained long after cell and molecular theories suggested a grand unification of biology. To make things worse, the compound microscope (which Galileo himself helped to develop) uncovered an entire biosphere of tiny organisms built from just one cell. In many cases, single-celled organisms could be classified with animals and vegetables, as they shared many of the characters of cells from multicellular animals and vegetables. These organisms, such as the amoeba, had nuclei, mitochondria (which use atmospheric oxygen to make ATP via the paths elucidated by Peter Mitchell's self-financed research), and other things that made them seem like animal cells. The common structures of cells implied that the microscopic biosphere shared some features of the macroscopic biosphere, and perhaps a common history. All of these, plants and animals included, were therefore called eukaryotes (eu, from Greek for “true”, and karuon, Greek for nut; go figure). The real problem came when trying to classify single-celled life that did not look like cells cut from an animal or a vegetable. These single-celled organisms were called prokaryotes, because their cells did not have a nucleus. The public knows these organisms by the name "bacteria". Even in the name, biologists allowed theory to creep into their supposed-to-be-neutral observations of Nature. Had they named these bacteria akaryotes (without a nucleus), the name might have been neutral. "Pro-" implies "before in time", a term that implies that bacteria inhabited the Earth historically before cells with nuclei arose. Maybe true. Maybe not. More later. How should we classify prokaryotes? Various characters were suggested. Some are rod shaped; others are round. Some can grow in the presence of air; some of these require air. Others require the absence of air. These all became characters that were used in various classification keys to organize bacteria. Especially important were schemes that classified bacteria according to what they ate and what they excreted. After all, metabolism was incorporated into many definitions of life, so it made sense to use metabolism as a classification character. Classifying the microbial world: Ribosomal RNA sequences Soon, however, it became clear that classification based on microbial metabolism were not going to do the trick. A useful classification character is one that does not convergently arise over historical time. Unfortunately, useful characters can do so. We have already met one: flight. The last common ancestor of bats and whales did not fly. The last common ancestor of birds and mammals did not fly. Flight arose 52

twice, once in a lineage leading to birds, and again in the lineage leading to bats. Hence, flying is a lousy classification character. Bacterial metabolism turned out to be the same. Bacteria evolve to eat what they encounter, as fast as they encounter it. When well fed, bacteria can have children in less time than an episode of Star Trek, giving them more than enough opportunity to make random changes in their genome, kill off the organisms whose changes are not adaptive, and allow the mutants with adaptive changes to cover the territory. Thus, metabolic features are gained and lost too fast to be useful classification characters. The ribosome is the machine Chemistry again came to the rescue, here through its analysis of the that makes proteins. It is found in homologous form in all life sequence of the RNA biopolymer. RNA molecules are found as part of a sub-cellular particle called the ribosome. The ribosome is a molecular on Earth. It contains many RNA parts (shown in orange and machine that is used to make proteins in living cells. Ribosomes are white), as well as some protein found in all forms of life known on Earth. parts (in blue) The RNA parts Funded by the NASA Exobiology program, Carl Woese at the are the parts of the ribosome University of Illinois set about to determine the sequences of RNA that actually make the protein. molecules that were part of the ribosomes from as many organisms as The blue proteins on the outside of the ribosome do not. he could manage. Analogous to the methods used to analyze RNase sequences, Woese aligned the sequences of ribosomal RNA molecules to form trees. He found that trees classified ribosomal RNA from plants and animals were consistent with trees obtained by classical and protein-based systematics. Analysis of ribosomal RNA sequences from bacteria carried, however, a classification surprise. Woese found that the ribosomal RNA sequence from prokaryotes could be separated into two classes. These two were as different from each other as either was from the ribosomal RNA of eukaryotes. Woese concluded that the prokayote "kingdom" contained two kingdoms. Again, the naming of these kingdoms told us more about classifiers than te classified. One type of prokaryote was called "eubacteria" (for "true" bacteria). The other was called "archaebacteria" (for "ancient" bacteria). Why is this a problem? We evolutionists The "universal" tree of life organizing all life presently known on Earth (from Norman Pace) shows as much distance constantly remind everyone that any separating Archaea and Bacteria as separating both from organism living today is no older (and no Eucarya. Note your position in the tree (Homo, look on the younger) than any other organism. We both right side of the Eucarya branch) next to Zea (which is corn). live today, and just as much time separates 53

humans from our last common ancestor with frogs as separates frogs from their last common ancestor with humans. Therefore, frogs are no more ancient than humans, and we should not call frogs a "lower" life form. The same goes for humans, archaebacteria, and eubacteria. We are all related by common ancestry. But just as much time separates that ancestor from modern archaebacteria as modern humans. None of the modern representatives of any of the three kingdoms is any older than any other. The naming of the three kingdoms has created an impossible problem in the public perception of biology. Even on the BBC, archaebacteria are said to be "older". Why? Because of their name. The problem was made worse when the kingdoms were renamed (they are now Eucarya, Bacteria, and Archaea). This reinforces the view that one of the kingdoms is older than the others. This view is not correct. Unfortunately, the names will not be changed, meaning that educators will need to fight the impression left by the name forever. But this makes a point important to this book. Listen to the language that scientists speak. It will often tell you more about what they constructively believe than their syntax. Inferring the lifestyles of life deep in the history of life on Earth When we resurrected words from the Indoeuropean languages, we inferred something about ow the Indoeuropeans lived. The "backwards-in-time" research activities in paleogenetics can also tell us something about how our ancestors lived. Again, our goal is to go so far back in time that we might infer the structure of a very primitive form of life, something that will reveal the essence of life as a universal. To learn how far back in time paleogenetics could take us, Eric Gaucher (now on the faculty at Georgia Tech) started working in my laboratory with a focus on a family of proteins called elongation factors. These proteins work with the ribosome to make proteins. Their sequences have been conserved for billions of years, so much so that we were able to infer the sequences of elongation factors from bacteria that lived billions of years ago (with some ambiguity). Eric then resurrected a sample of ancestral elongation factors to study in the laboratory. He made a remarkable discovery. In an example of behavior following function, elongation factors in modern life are adapted to perform best at the temperature where they live. For example, the bacterium E. coli lives in the human intestine at 37 °C (98.6 °F). Accordingly, the elongation factor from E. coli performs best at 37 °C. Analogously, elongation factors from bacteria that live in 65 °C hot springs work best at 65 °C. Using again a method from geology ("the present is the key to the past"), Eric measured the optimal temperatures for the functioning of the ancestral elongation factors that he resurrected from bacteria that lived perhaps two billion years ago (the date is subject to much uncertainty, as are the ancestral sequences). In the laboratory, the resurrected elongation factors performed best at 65 °C (150 °F). This, we suggested, was evidence that these ancient bacteria lived at high temperatures. For comparison, Grand Prismatic Springs from Yellowstone National this is the temperature of typical hot (but not the Park. The laboratory behavior of elongation factors hottest) springs in Yellowstone National Park. resurrected from bacteria that may have lived two There are, of course, many auxiliary hypotheses billion years ago suggested that the last common anbuilt into this inference. We cannot be sure that cestor of bacteria may have lived in an environment analogous to that found on the periphery of such hot laboratory experiments done with isolated resursprings or on a planet that was 65°C overall. The rected proteins measure the same behaviors as stripe at the lower right is a road. 54

would be measured in whole ancient bacteria if we built a time machine and traveled back to measure them. But it remains remarkable that we can draw an inference by experiment not only that Indoeuropeans lived where it snowed, but that two billion year old bacteria lived where it was hot.

H2N C

N CH

C

N

C

CH

N

O

O O

N

P CH2O

CH CH CH CH O OH HO

O O

P

O

P

O

O

=

O

Adenosine triphosphate (ATP) phosphate transferenergy transfer O

H2N

C N C C Inferences about still more ancient terran life. NH HC C CH O O O O C N HC CH C N These elongation factors are, to date, the most ancient P P N + CH O CH O O CH N O fragments of life that have been resurrected for laboratory CH CH CH CH CH CH CH CH O OH OH HO HO study. Significantly more sequence data will be required from Nicotinamide adenine dinucleotde the modern terran biosphere before more ancient resurrections (vitamin = niacin) oxidation, reduction O can be attempted, or even before resurrections of the most HN C H N ancient elongation factors can be said to be secure. C N O C N CH O O O O C N C C CH Nevertheless, it is possible to use this backwards-in-time N N C P P N C C CH CH O CH O O CH N research route to draw broader inferences about the structure of HC C CH CH CH CH HO CH CH CH CH CH very ancient life on Earth. O OH HO OH HO For example, in 1988, Andrew Ellington, then a graduate Flavin adenine dinucleotde (vitamin = riboflavin) H N student in my laboratory (now on the faculty at the University oxidation, reduction N C CH O O O O C of Texas), set out to apply this approach to learn as much as he O N CH C N P P CH SH CH C CH O could about the nature of the organism represented by the node CH O O C CH N NH CH CH CH CH CH HO that joined the three kingdoms of terran life. We called the CH CH CH NH O OH HO O genome in that organism the protogenome (using the prefix Coenzyme A (vitamin = pantothenic acid) "proto", as in Proto-Indoeuropean, to indicate an inferred form). carbon-carbon bond formation H N C N A rule of parsimony was applied. If a biomolecular trait were H N CH C + N CH found in all three kingdoms, then that trait was inferred to have CH C N COOH CH S CH CH N been present in the organism having the protogenome. CH CH CH S-Adenosylmethionine CH CH (nutrient = methionine) O As Andy and I constructed our model for the metabolism OH HO one carbon transfer surrounding the protogenome, we were guided by a hypothesis For many cofactors involved in that addressed one of the major paradoxes that underlies metabolism across the terran biosphere, modern molecular biology. In modern terran life, proteins are RNA portions (shown in magenta) are needed to make DNA, while DNA is needed to make proteins. appended to reactive portions (in black) from vitamins. The RNA portions do not The obvious question then arises: Which came first? Proteins or participate in metabolism, and are believed DNA? to be vestiges of a time on Earth when life In the early 1960's, Alexander Rich at MIT had proposed a used RNA as the only encoded simple solution to this problem based on the third biopolymer biopolymer. In bold is the kind of found in modern terran life: RNA. Rich suggested that proteins metabolism involved. In parentheses is the vitamin you eat. At the top is adenosine were initially made by RNA catalysts. That is, encoded proteins triphosphate (ATP), the high energy came after encoded RNA. compound studied by Peter Mitchell in his work. Rich's hypothesis was supported by the RNA molecules found inself-funded the same ribosomes that Woese used to classify archaebacteria. As noted above, ribosomes are the machines that terran life uses to make proteins. These machines contained many molecules of RNA, and it is these RNA molecules in the ribosome that actually make the proteins. The syllogism is: If A makes B, then A arose before B. RNA makes proteins. Therefore, RNA arose before proteins. H

2

2

2

2

3

2

2

2

3

2

3

2

3

2

2

2

2

2

2

2

2

2

3

55

This suggested that before life on Earth evolved the H2N three biopolymer system known today (the three bioC N polymers being proteins, RNA, and DNA), there existed CH O O O O O O C N on Earth an episode of life that used RNA as the only N C P P P CH encoded biopolymer. Under this "RNA world" hyO CH2O O N O pothesis, RNA supported both genetics and metabolism. CH CH CH CH Adenosine triphosphate (ATP) Already in the 1970's, Harold White at the University O phosphate transfer OH HO of Delaware, and Cornelius Visser and Richard Kellogg at the University of Groningen in the Netherlands had O O O O amplified on this hypothesis by examining structures not P P involved in genetics, but involved instead in metabolism. O O O They noted that many cofactors had small pieces of RNA Pyrophosphate works just as well in attached to them. These pieces are shown in magenta in phosphate transfer as ATP the picture to the right. The other pieces, in black, are often found as vitamins in our food. Pyrophosphate does not have the magenta RNA Cofactors are relatively small molecules that help en- portion of ATP. Chemistry suggests that zymes catalyze reactions in metabolism. Unlike in the pyrophosphate should work just as well as a phosphate donor as ATP, and isolated examples ribosome, to play these roles, the cofactors do not need are found in natural biology where it does. the RNA pieces. The RNA pieces do not get involved in Therefore, we cannot explain the existence of the actual chemistry that supports any metabolic reaction. the RNA portion of ATP by arguing that it is They are handles that many enzymes use to bind the co- essential for function, as there examples exist factor, a problem for which RNA is not the only possible where it is not. solution. And yet these RNA pieces were found in all three kingdoms of life. Under the rule of parsimony that Andy and I used, this suggested that these cofactors were present in the last common ancestor of all life on Earth. One of these RNA cofactors is adenosine triphosphate, the ATP that we met in Chapter 3 when Peter Mitchell was doing his self-funded research. ATP is a universal donor of phosphate in terran metabolism. Whenever a phosphate group is needed, ATP provides it and becomes adenosine diphosphate, or ADP. The adenosine part of ATP is the magenta ribonucleotide. Why would the most ancient forms of life on Earth have a metabolism based on cofactors that included RNA structures? Again, we can construct several hypotheses. One is that unless these cofactors had RNA pieces, they would not work in metabolism. Chemistry, however, suggested that this hypothesis was unlikely. For example, a large number of studies have examined how phosphate groups are transferred in chemistry that is not biological. These studies have shown that one does not need the RNA piece of ATP to donate phosphate. Indeed, a simple molecule containing just two phosphate groups joined together, called pyrophosphate, is entirely adequate as a phosphate donor. Once chemistry showed in the laboratory that simpler molecules can donate phosphate groups, a search through the biosphere on Earth found biological examples of it. Organisms are now known that transfer phosphate using pyrophosphate as a phosphate donor, instead of ATP. However, unlike ATP (which is everywhere), metabolic steps that use pyrophosphate as a phosphate donor are scattered across the evolutionary tree. This implies that, like the scattered mammals who have lost their hair, the replacement of ATP by a molecule that lacks an RNA piece occurred in just a few isolated lineages. The ancestor had ATP with its RNA handle, just like the ancestral mammal had hair. But the point is nevertheless made: Given the appropriate enzymes, pyrophosphate is entirely adequate as a phosphorylation agent. It does not need the RNA portion of ATP to do the job. 56

The fact that the RNA portions of RNA cofactors are H N incidental to their chemical reactivity helps make C N CH O O O O C O stronger the argument that they reflect common ancestry. CH N C N P P CH SH CH C CH O CH CH O O C The argument is analogous to that proposed by Pauling N NH CH CH CH CH CH HO and Zukerkandl to assert that amino acid characters are CH CH CH NH O OH HO better indicators of history than physiological characters. O Coenzyme A It is also analogous to the argument used by Stephen J. (vitamin = pantothenic acid) carbon-carbon bond formation Gould in his book The Panda's Thumb, that nonfunctioning features of living systems are those most O O O CH instructive about history. If the RNA portion of ATP P protein CH SH CH C O O C CH were necessary for phosphate donation, then it could NH CH CH CH HO have emerged by convergent evolution (like flight in bats CH NH O and birds), whether or not it was present in the last Truncated form of Coenzyme A (vitamin = pantothenic acid) common ancestor. The fact that the RNA portion of carbon-carbon bond formation in fat synthesis RNA cofactors is not necessary for phosphate donation implies, in contrast, that the RNA portion was present in In natural history, in some organisms, the RNA portion of coenzyme A (shown in magenta) has the last common ancestor. been lost, most frequently for fat synthesis. The For other RNA cofactors, exploration of the biosphere SH group containing sulfur (on the right) surhas found examples that show a history of loss of the vives in this example of molecular vestigiality. RNA portions. For example, the coenzyme A molecule, Argument: which includes the unit pantothenic acid (check a box of If RNA portions of cofactors are required for metabolism, then they will be present Cheerios®; pantothenic acid is added as a vitamin) has an whenever a cofactor is used in metabolism. RNA piece at one end. The essential reactive part of Truncated coenzyme A is used in metabolism. coenzyme A is, however, a sulfur-containing -SH group yet has no RNA portion. at the other end. If one searches sufficiently among the Therefore RNA portions are not required in leaves of the tree of life, organisms can be found that use coenzyme A for metabolism. fragments of coenzyme A that have lost first the RNA part, and then later more of the part of the cofactor that is incidental to the reactivity of the SH group. These all point to one conclusion. If we work backwards in time, it appears as if RNA was more and more important in the working of more and more ancient, and more primitive, organisms. 2

3

3

2

2

2

2

2

3

3

2

2

2

2

RNA parts not essential for RNA cofactor function arose in an era when RNA was all that we had. So why do so many molecules found throughout terran metabolism have pieces of RNA that they do not need, but were evidently present in life that lived in the distant past? One class of hypothesis relates to the concept of "vestigiality", and ties in directly to Rich's hypothesis that an episode of life on early Earth used RNA as a catalyst. A vestigial structure is one that previously had a use, but has it no longer. In cartoon form, the RNA-world hypothesis holds that before proteins arose on Earth, terran life had RNA as its only encoded biopolymer. RNA was used for genetics (to have kids). RNA was used as a catalyst (for example, in the ribosome to make proteins). RNA was used for metabolism. In the RNA world, the pieces of RNA on RNA cofactors served as "handles" for the RNA molecules that catalyzed individual steps in the RNA-world metabolism. Eventually, the ribosome emerged, proteins were made, and protein catalysts made by the ribosome started to displace RNA catalysts. According to the model, the protein catalysts were superimposed on an already existing metabolism involving RNA cofactors, which emerged when RNA was all that we had. The new biopolymer did not change that metabolism. Instead, the protein catalytss evolved to accept those pre-existing metabolites, including the RNA cofactors with the RNA handles. Even in the three57

biopolymer world, it was too complicated to re-work the entire complex metabolism to get rid of the RNA cofactors. And so they persisted, except in an odd organism here and there. What did the RNA world look like? The RNA world is another scientific model without a directly observable subject matter. But perhaps, Andy and I thought, we could set up a scientific method that might make inferences about how it operated. This would require some formal rules, invented for the purpose. Such rules would be arguable. However, one is allowed in science to draw inferences from whatever rules one likes, as long as one states those rules and obtains something that is predictive and experimentally testable from those inferences. After we did the best that we could inferring what was present in the metabolism surrounding the protogenome, Andy and I set out to develop such rules. We began by examining all of the gene and protein sequence families that we could find. For each family, a sequence alignment, an evolutionary tree, and ancestral sequences throughout these trees, were inferred. Eventually, Stephen Chamberlin built the Master Catalog, a collection of over 100,000 families of proteins from the modern terran biosphere. It was already clear in 1989 that the last common ancestor of archaebacteria, eubacteria, and eukaryotes was metabolically and genomically quite complicated. Long before total genome sequencing was done on any organism, enough sequences had been collected to allow us to infer approximately 400,000 nucleotides of the ancestral protogenome. Andy and I then asked: What molecules could be inferred to have been present in the protogenome that contained RNA pieces, where RNA was not necessary for their molecular reactivity? To answer this question, we looked across the terran biosphere and made a list. We then established a rule: If a molecule: (i) could be placed in the protogenome by working backwards-in-time from modern biochemistry, (ii) contained a piece of RNA, and (iii) had no chemical use for that piece, then that molecule was hypothesized to have arisen in the RNA world. Like the words inferred for Proto-Indoeuropean, the molecules could be inferred for the RNA world. We then asked: What inferences could be drawn about the lifestyles of RNA world organisms from the biochemicals that we hypothesized were present there? The reasoning was analogous to historical This is an RNA molecule that linguistic reasoning for the Indoeuropean society. By inferring the catalyzes the joining of two presence of words in the language that the Indoeuropeans spoke molecules of RNA. The RNA (wheel, snow, ox, pig, and so on), linguists built a model for that enzyme was artificially evolved society. By inferring the presence of RNA molecules in the RNA in the laboratories of Jack Szostak and David Bartel, and world "society", we inferred how that society functions. its crystal structure was solved For example, if ATP, an RNA cofactor, donates phosphate groups to in the laboratory of Jennifer metabolites, and if ATP were present in the RNA world, then the Doudna. RNA-world organisms metabolism of the RNA world must have included steps where are modeled to have many such phosphate groups were donated to metabolites. If coenzyme A, an RNA enzymes catalyzing complex metabolism. RNA cofactor, makes new carbon-carbon bonds in metabolism, and if coenzyme A were present in the RNA world, then the metabolism of the RNA world must have included steps where new carbon-carbon bonds were formed using coenzyme A. These inferences are not logical necessities, of course. Rather, they exploit two scientific methods from geology and paleontology, that "the present is the key to the past" and "function follows form". In 1989, based on these inferences, we suggested that life in the RNA world included metabolic pathways 58

involving the transfer of one-carbon fragments, oxidation reactions, and reduction reactions, in addition to the formation of carbon-carbon bonds and the transfer of phosphate. This implies that even before the first encoded proteins were made on Earth, life was already complex. More recently, Eugen Koonin and his group at the NIH have broadened the foundation that supports this implication. What does the backwards-in-time model tell us about life as a universal? As paleogenetics emerged from a concept to generate a community with developing standards-of-proof and research methods, it provided us with many narratives showing how individual proteins, pathways, and metabolisms have adapted on Earth of the past. As these further accumulate, an interconnecting narrative about life on Earth will undoubtedly emerge, at least for the epochs for which the backwards-intime approach can be supported by genomic data. In time, these interconnected narratives will support a grand unified "synthesis" that connects chemistry to the planet by way of biology. This synthesis already captures two of the themes (chemistry and Darwinism) that feature prominently in our definition-theory of life. But what about life as a universal? Curiously enough, although paleogenetics (including experimental paleogenetics) has not stepped off planet Earth, it has managed to say something about universal life that is not obvious from direct inspection of the terran life that we know. Life on Earth today is a Darwinian system that uses three biopolymers. We use DNA to do genetics. We use proteins to do catalysis, including catalysis for most steps in metabolism. We use RNA to carry information from DNA to proteins, as well as to do bits and pieces of catalysis in processes that are vestigial of the RNA world (according to the model). O P Many have believed that genetics and catalysis H O O H N N are best done using different biopolymers, one O HC C C C CH CH2 N C N H specialized for genetics, the other specialized for CH OH N CH O O C N N C O CH catalysis. There are good reasons to have a view of CH P N H O CH CH O CH O O reality that drives this belief. For example, to H CH O O OH H P CH O catalyze a reaction, a biopolymer is best if it, by O H N N O CH CH2 HC C C C folding in three dimensions, forms a cavity to CH2HC N H OH O N C N O CH surround the metabolite. The cavity is needed to O N C P HC N CH CH O CH CH O deliver the chemical units needed to catalyze the CH O CH O O O H OH reaction from all sides. In other words, molecular O P N H CH2 O N O systems that are best for catalysis must fold. This CH HC C O C CH2 HC OH N H N C N is exemplified by the folded RNase pictured earO CH O N C P C N CH CH O CH lier in this chapter. CH H N O CH O CH O O H O In contrast, the genetic molecules are best if they CH OH P H O do not fold. The essence of the Watson-Crick CH2 N N H O O CH2 HC C C O C CH model for genetic duplication requires that a DNA N C CH O CH N OH H N O strand lie out in an extended structure. This allows C N N CH CH P CH CH CH O O CH it to serve as a "template", where a second, comO CH O O OH CH plementary DNA strand is synthesized on it. This CH2 is the process by which DNA replicates. Folding, This is what a piece of double helix RNA looks like. it would seem, would obstruct templating. The logic would seem to be Aristotelian. To "do" catalysis, a molecule must fold. To "do" genetics, a molecule must not fold. Under this logic, it would seem that no single class of molecule could do both genetics and catalysis. Hence, a form of life that has just a single biopolymer would appear to be impossible. 59

The backwards-in-time approach contradicts this view. It implies that there was on Earth a form of life based on a single biopolymer: RNA. This, in turn, means that one biopolymer life forms are possible. Chemical systems do not need proteins and RNA and DNA to gain access to Darwinian evolution. They may not even need two biopolymers, one specialized for genetics and the other specialized for metabolism. This paradox requires us to ask a question about our understanding of chemical reality, following the liberal strategy outlined in Chapter 1: What must be wrong in our view of reality if the RNA world actually existed? The answer is: Some biopolymers must be able to both fold (to catalyze) and not fold (to template). Ultimately, this was established by experiment for RNA. Thomas Cech, Sidney Altman, Jack Szostak, Andrew Ellington, Donald Burke, Faqing Huang and many others now have RNA molecules that fold and catalyze reactions under some conditions, but can template under others. Several of these molecules come from contemporary terran biology. Others come from laboratory experiments where RNA molecules were forced to evolve under artificial, human-made selection pressures. This juxtaposition of two dialectical perspective drives an inference. For a single biopolymer to support both catalysis and genetics, that biopolymer must have delicately balanced folding properties. The ability of RNA to play a genetic role (as in the human immunodeficiency virus, which causes AIDS) shows that RNA can do genetics. The participation of RNA in the ribosome shows that RNA can do catalysis (here, to synthesize proteins). These two together suggest that RNA is a molecular system that, at least for some of its sequences, can achieve that balance. The inference has repercussions throughout biology, We will discuss these as we discuss the origin of life in Chapter 5, the exploration for alien life in Chapter 6, and the synthesis of life in Chapter 7. The RNA-world hypothesis does not give us a clear description for the first forms of terran life This is the good news. The bad news is that working backwards in time has not delivered a model for the very first form of life on Earth, the chemical system that first gained access to Darwinian evolution. Unfortunately, the RNA world appears to have left us just a single descendant lineage, one that postdated the emergence of proteins as a second encoded biopolymer. This means that we cannot use any threelineage triangulation to infer by parsimony any farther back in time. Because we have found only one lineage descendent from the RNA world (so far), we are at the "end of history", at least for the time being. Judging by rule-based inferences stretched to their (current) limits, the RNA world seems to have been quite complicated, at least at the time that the ribosome emerged to begin the biosynthesis of encoded protein. Further, by the time that the descendents of the RNA world diversified into the three modern kingdoms of life, many of the RNA enzymes that had catalyzed metabolism in the RNA world had already been replaced by protein catalysts. Further, by that time, DNA appears to have already emerged and been established as the universal genetic molecule. This means that while the backwards-in-time strategy has delivered a model for a simpler form of life, it has not delivered to us a model for the simplest form of life. The backwards-in-time approach has not delivered to us a model for terran-type biology so simple that we can confidently extract from it a view of the essence of life, free of the baggage of historical accident and contingency on Earth. What is the future of the backwards-in-time approach as a research paradigm? Paleogenetics has evolved in just 45 years from a concept in the minds of Pauling and Zuckerkandl to an experimental science with its own methods and its own standards-of-proof. Paleogenetics experiments now give inferences that are constructively believed by individuals in the community.

60

A normal science has emerged. Today, approximately two dozen examples are available where experimental paleogenetics has addressed a scientific question. In each case where a resurrection has been done, it has delivered a new dimension of understanding of the family of proteins involved. About ten Nature and Science papers cover these examples, a remarkable percentage for any field. Let us now work on a few scientific methods to predict the future of this field. With 40,000 families of proteins in vertebrates, and some 100,000 families within the Master Catalog, this particular scientific method can certainly be applied over a far broader scope. We can even imagine a paleogenomics approach, where entire segments of the genomes of ancestral organisms might be inferred. David Haussler and his group in Santa Cruz have done this for the region on the mammalian genome that is relevant for the disease known as cystic fibrosis. Thus, using a method that predicts the future of a scientific field from scientific considerations alone, the backwards-in-time approach has "legs" that should allow it to run for some time as a “normal science”, producing discoveries about the biosphere as it does. But, as noted in Chapter 3, funding drives science. Paleogenetics rests on sequence data that require bucks to collect. Today, funding agencies have purchased the sequences of whole genomes for organisms scattered throughout the "universal" tree of life. From those wishing to do a historical analysis of life, especially in its deep branches, we wish to have more. The greater the density of sequencing, the better we are able to apply this particular brand of scientific method to address problems. Indeed, for understanding human disease, we would like to have whole genome sequencing densely placed throughout the primate tree (a subject discussed in the companion book: The Future of Medical Research, 2009). Will more sequence data be forthcoming? This depends on funding. The community that funds research may think that 15 vertebrate genomes is enough and stop the collection of more sequence data. Balancing this is the behavior of large institutions, which suggests the opposite. Whole genome sequencing is "big science", requiring paramilitary-style large organizations with heavy capital investment and large teams. Indeed, in the 1980’s, many life scientists opposed doing whole genome sequencing because of this, arguing that this would change the nature of biological research. Some even argued that bringing "big science" into biology would destroy innovation in biology. That story is yet to be told. What is clear, however, is that many genome centers are up and running today with (largely) federal funding. Even though program managers at the NIH and NSF can be heard to grumble about needing to "feed the beasts" (these centers consume tens of millions of dollars each), simple institutional inertia associated with past funding decisions will ensure that these centers will continue to generate more sequence data, at least for a while. Things also look good considering the cost structure of DNA sequence collection. The cost of collecting sequence data is dropping. Many speak of "Moore’s law" (which is not a law in the sense that we like to use that term, but never mind). The technology lowering the cost of genome sequencing is being driven by potential applications in human medicine. So this is also suggests that more sequence data will emerge. Thus, our vision for the future of the backwards-in-time approach is likely to materialize. With a community in place, problems being solved, and a stream of supporting data coming at ever lower cost, there is little doubt that the "backwards-in-time" approach to understanding life will continue to be followed for some time. As it develops, we will learn more about our more primitive ancestors. And this will tell us more about the basics of life as a universal.

61

62

Chapter 5

Forward in Time: From Chemicals to the Origin of Life The backwards-in-time approach to understanding life represented by Prebiotic Chemistry the bottom wedge in Figure 3.1 is teaching us about the lifestyles of our O H C N H C C C N C ancestors via scientific methods that organize our observations of terran O H H HO O biology set in a model for Earth's history. Experimental methods are C C C H NH2 forward H H brought to bear on consequent historical hypotheses through resurrection H in time interstellar organics of ancestral genes and proteins. In the two decades since paleogenetics A path to the began as an experimental science, over two-dozen studies have developed simplest first life those methods. More paleogenetics studies are possible, and still more will become possible as sequences emerge from medical genomics. A sciLife entific community has emerged, developing its own research methods and backward in time to standards-of-proof. The field has matured to support normal science. In simpler life short, Chapter 4 reports success in the development of a new field. But what about life as a universal? The backwards-in-time approach has also helped, but not decisively. Paleogenetics has provided broad support Eucarya Archaea Bacteria for the RNA-world hypothesis, a historical model that suggests that an infer ancestral life forms; resurrect for laboratory study earlier episode of life on Earth used RNA as its only encoded biopolymer. Paleogenetics This, in turn, suggests that simpler single biopolymer forms of life are possible, even though they have never been observed. However, the realities of natural history on Earth mean that the backwards-in-time approach cannot do it all. Known terran life apparently diverged well after the terran biosphere had acquired three biopolymers, proteins, DNA and RNA. Further, as best as we can model it, the RNA world was quite complex metabolically, and evolved far past its original state, before it began to produce encoded proteins. This means that we cannot triangulate our way from modern biology to a model for the most primitive form of life that existed on Earth. This, according to our definition-theory of life, would be the form just after chemistry gained access to Darwinian evolution, the one most reflective of the "essence" of life, the closest thing to non-biology on this side of the non-biology/biology frontier. To model this most basic life form, we must look elsewhere. Working forwards in time from chemistry Fortunately, Figure 3.1 suggests where else we might look. Complementing the bottom wedge in Figure 3.1 is a top wedge, which represents research that works forwards in time. This research starts with a list of organic molecules that might have been present on Earth before life formed. It then tries to build a model for how Darwinian systems might have emerged from that chemistry. We can begin that list by identifying organic compounds that arrive from the cosmos to the Earth continuously on meteorites today. Some meteorites are carbonaceous, meaning that they bring in large amounts of carbon. Sandra Pizzarello, George Cooper, and many others have extracted and identified organic molecules from carbonaceous meteorites. These include amino acids, glycerol, and adenine, all of which are important to our discussion in this book. Observational astronomy is expanded the list or organic molecules that form without the assistance of life. Telescopes that detect microwave radiation (like the microwaves in your kitchen) can observe microwaves that are emitted and

The gray color of meteorites known as carbonaceous chondrites comes from the organic compounds that they contain.

63

absorbed by organic molecules in dust clouds. Many of these dust clouds are "star nurseries", giving rise to new solar systems as we observe them. These dust clouds contain many organic molecules, some of which are shown at the right. Unless we want to consider these clouds as life forms (a consideration that the community believes belongs only in the science fiction of Fred Hoyle), we postulate that these organics form without the intervention of life. Further, missions launched by NASA and ESA have identified organic molecules present on other worlds that may resemble today the Earth of four billion years ago. These include Titan, a moon of Saturn, which was recently explored by the Cassini-Huygens mission. The atmosphere of Titan is hazy brown, the color arising from a complex mixture of organic molecules. A few components of that mixture were identified by Cassini-Huygens. Last, laboratory experiments on Earth with simulated primitive atmospheres suggest compounds that might emerge abiotically in a planetary context. For example, carbon dioxide is abundant in the cosmos, constitutes the majority of the present atmosphere of Mars, and was a likely component of the early atmosphere above Earth. When hit with lightning, ultraviolet light, and other sources of energy, moist carbon dioxide yields mixtures of organic molecules. One of these is formaldehyde, built from two hydrogen atoms, one carbon atom, and one oxygen atom (CH2O). Formaldehyde will play an important role in our story. Using a scientific method taken from natural history (the present is the key to the past), we postulate that organic species observed today in planet-forming interstellar gas clouds, Earth-arriving meteorites, Titan, and laboratory experiments were all present on an early, pre-biotic Earth. All that we now need do is make from them a self-sustaining chemical system capable of Darwinian evolution. Seems easy.

H

O

H O C O carbon water dioxide O

N H

H

C

O H

H

O C-

C

N+

H

H H

H

H H

C H

C

C S S

N H H ammonia

N C

O+ carbon monoxide

H

C H H formaldehyde

O

-C

H C

H

H

H C C H acetylene H N C O H N C S

H C C N+ C-

C H

C

H C N hydrogen cyanide

H

H C C C N H C C C C H

H C C C C C N H

H C C C C C C H H C C C C C C C N

H C C O H C C C C C C C C C N H H C C C C C C C C C C C N H H H H H H N S C O H C H O C C C C H C H H H H H H H H O H H H H O H H C C C C C H C N C N H C H H H O H H H N H H H H C C H C C C O C H H C O C H H H H C H H C H O H H O H O C C C C C N H H C H H O H HH H H H N C O C H C C C OH H H H C H C C H H H N H O H C H glycolaldehyde H H formamide H

Some molecules containing carbon (C) combined with hydrogen (H), oxygen (O), nitrogen (N) and sulfur (S) compounds observed by telescope in interstellar gas clouds analogous to those that formed Earth. No need to memorize; we will repeat the structures for those that become important in later parts of this chapter.

RNA as the first form of life? Unfortunately, it is not so easy. Conceptually, the number of compounds in gas clouds, meteorites, Titan, and laboratory simulations of early Earth is enormous, too many for any but a super-human imagination to start puzzling over. Each of those n compounds (where n is a large number) can react with any of the other compounds (for the mathematically inclined, this gives n2 reactions). Of course, each of these n2 products can react further. Thus, any useful scientific method must begin by constraining the enormity of possibilities that observations present to focus the minds of us mortal scientists. Fortunately, the backwards-in-time approach gives us such a focus: the RNA-world hypothesis. In the hypothetical RNA world on Earth, RNA played roles both in genetics and metabolism. The paleogenetics community believes that the RNA-world hypothesis offers an adequate explanation-model for the role of

64

RNA in the ribosome, the magenta RNA portions of RNA cofactors, and RNA as a messenger. Developing a model for the RNA world is part of the "normal science" in the field introduced in Chapter 4. The RNA-world hypothesis may also constrain our chemistry-forward discussion, however, by suggesting a hypothesis for the origin of life: the "RNA-first hypothesis". This hypothesis holds that not only was modern life on Earth preceded by a life form that used RNA as its only encoded biopolymer, but also that life on Earth began with RNA. The RNA-first hypothesis offers a target for our efforts to extract Darwinian chemical systems from the long list of organic species that are made by the cosmos without biology. Our goal is no longer to find a way to extract just any Darwinian system from meteoritic, interstellar, and planetary organics. Rather, with RNA first, we must find a way to extract RNA from those organics. Chapter 4 has provided a target for Chapter 5. The RNA-first hypothesis in cartoon form. In cartoon form, here is how the RNA-first hypothesis models the origin of life. About four billion years ago, abiotic chemistry generated pools of RNA molecules, perhaps averaging 100 building blocks in length. Among these were RNA molecules that found themselves in an environment where they could generate copies of themselves. However, these RNA "children" were imperfect copies of their parent RNA sequences. Some were able to have grandchildren more efficiently than others; some could not have grandchildren at all. The imperfections that enabled the more potent RNA molecules to reproduce faster were passed to the grandchildren, again with imperfections. Some of those imperfect grandchildren could reproduce faster than others, so they had great grandchildren faster. And so Darwinian evolution began. The first Darwinian RNA molecule and its descendants then helped themselves to the benefits of Darwinian mechanisms to improve themselves. The mutant RNA children who were better able to synthesize children captured more resources from the planet, and came to dominate other RNA life. RNA molecules then emerged from random sequences that catalyzed transformations within an emerging RNA world metabolism. Those RNA enzymes used RNA cofactors. Eventually, the first ribosomes emerged, using RNA catalysis to assemble peptides following instructions from genes. The rest is history, reconstructable under the backwards-in-time scientific methods exploited in Chapter 4.

H

O C O

H

H C

H C C C N

H

C

H

O

H C N

C

H C H O N

C OH H H H

H H O C H H

H H N C O

H

H H H HO C C C OH H OH Qui c kTi me ™ a nd a TIFF (Unc o mp res se d ) d ec o mp re s so r a re ne e de d to se e th is pi c ture .

N

H

O

H

O C C H

H

O O

P OH O

HC CH2 CH O CH

O

C

C

N

C

N C CH

CH

O

O

O

N

OH

P

N

H

N

H

H

O HC C CH2 HC N H CH O N C O CH CH CH O O H OH P O N H O CH2 HC C HC N CH O N C CH CH O CH O O OH P H O N N H O CH HC 2 C C N C CH O N N CH CH CH CH H O O

O H

In cartoon form, the RNAfirst hypothesis for the origin of life requires us to find conditions where simple organics found in the cosmos are converted to RNA without help from an intelligent designer.

What must be true for the RNA-first hypothesis to be true? The forwards-in-time approach (from chemistry) and the backwards-intime approach (from modern biology) are complementary. We hope that the chemistry derived from the first approach will meet the inferred ancestral biochemistry derived from the second. While the initial contact is expected to be tentative, we hope that normal science will then fill in the gaps. If we could establish nothing more than that life on Earth could have emerged in this way, our definition-theory of life as a universal would become more tangible. An RNA molecule able to catalyze the template-directed synthesis of more RNA (with inheritable imperfections) would be a very simple chemi65

cal system capable of Darwinian evolution. It may not be the only such system. But even just one working example of such a minimal system would have an enormous impact on our definition-theory of life, and the community acceptance of it. But careful. Despite its appealing promise to unify everything that has gone before in this book, the RNA-first hypothesis can be considered to be "crackpot", almost as bad as the hypothesis that the Earth is four billion years old (in 1850), or that ATP was synthesized in our cells from a gradient of hydrogen ions (in 1961), or that water is H3O not H2O (today). Following the liberal procedure for evaluating crazy suggestions outlined in Chapter 1, we ask: What in our common experience must be wrong for the RNAworld hypothesis to be correct? Further, since the RNA world is not easily observable, what kinds of Galilean rolling-ball experiments might we do to develop it? We will spend much of this chapter on the first question. As for the second, we can easily envision two research strategies to develop the RNA-first hypothesis by experiment. First, we can seek non-biological processes that generate RNA molecules from organic species that were available on early Earth. Second, when built from four nucleotides, RNA molecules 100 nucleotides long have many different possible sequences. This makes 4100 ≈ 1060 different sequences of RNA molecules built from four building blocks 100 nucleotides long (a typical star holds 1057 hydrogen atoms; a galaxy holds 1069 hydrogen atoms). We may ask: What fraction of these could actually self-replicate, have kids, and initiate Darwinian evolution? This question is also approachable by an experimental strategy. We can imagine synthesizing, in the laboratory, a sample of the 1060 different RNA sequences and counting among them the fraction that can self-replicate. This should allow us to estimate the size of the RNA library that must have been formed prebiotically on early Earth to create a reasonable chance of starting life. This estimate would be used to estimate the likelihood that life emerged as modeled by the RNA-first hypothesis. There is little constructive belief behind the RNA-first hypothesis Despite the logic behind the two types of strategies and the significance of the question (can you imagine a bigger question?), very few research groups are actually doing them. To understand why this is, we must turn to the culture of the community that might do (and fund) these two specific research projects. The relative scarcity of scientists doing such experiments begins with the overall lack of a constructive belief in the role of RNA in early life. On one hand, many scientists (if asked by a pollster, for example) would say that they believe in the RNA-world hypothesis. Few, however, actually believe that work that they might do could develop a model for the RNA world. Still worse, a sizeable fraction of the community does not have even a constructive belief that the origin of life is knowable. For example, a prominent Harvard chemist, George Whitesides, is quoted as describing life’s origins as a "metaphysical singularity". Albert Eschenmoser, a famous Swiss chemist who took up the problem in his later years, is famously quoted as saying that the origin of life is a problem suitable for "only old men" (implying that origins research is no place for a young scientist to build a credible professional reputation). Nobel laureate Christian de Duve, citing Gerald Joyce, a noted scientist who is now dean at The Scripps Research Institute, wrote that "even the staunchest defenders of the RNA world have expressed despondent views on the future prospects of this line of research". It is as if Galileo, not believing that Jupiter could have moons, did not bother to build a telescope to look at Jupiter. Those of us who specialize in contrarian science always see opportunity in despondency, non-constructive belief, and admonitions to avoid, especially when logic presents such clear ways to develop

66

a well-defined hypothesis (even if it proves wrong). We might try to finagle funding for it. If successful, we might even be able to pull off a trick analogous to the one that led to paleogenetics. A new community might emerge with a new scientific method and its own standards-of-proof to address the origins question. But that requires that we manage the culture of despondency among those who believe and an absence of a constructive belief among the larger scientific community. To do that, we must start with a "But why?" question: But why does the community have this culture? The answer turns out to be a fascinating illustration of how science is actually practiced. Scientific methods and standards-of-proof for "origins" questions If something originate during the time when humankind was present on Earth, we would turn to historians and anthropologists to help us model what happened. They would show us how to collect documents and relics relevant to that origin event, draw inferences from these, and resolve evident discrepancies. They would help us produce narratives that mesh with other historical narratives bracketing the same time. They would help us evaluate the interconsistency of those narratives. These steps would follow well-trodden paths within their scientific methods. This community has done this type of thing before. For origin events occurring before humans began to litter the planet with relics, we would consult our natural historian friends. They would show us how to search for strata, rocks, and fossils, draw inferences from these, and resolve apparent discrepancies between narratives covering similar periods of time. Again, they would apply methods well established in the various natural history communities. These communities have done this type of thing before. The narratives that would first emerge would certainly contain gaps, puzzles, and controversies. These might require years of scholarly work to resolve. Eventually, however, these would be expected to be resolved, experiments would end, and the community would move to other things, of course with the recognition that at some point, the narratives might need to be revisited should new data emerge. For many of these origins questions, the community is confident that new data will emerge. For example, the next generation of exploration of A half century ago, it was thought that fossils of mulEarth will almost certainly uncover new strata, rocks and fossils that ad- ticellular animals appeared dress specific historical questions back one or two billion years. New in an "explosion" ca. 542 rocks relevant to mass extinctions, origins of mammals, and origins of Ma. This created a puzzle: multicellularity are especially desired by today's natural historians. The How did multicellularity community constructively believes that exploration will almost certainly arise? Further exploration discovered older fossils of find relevant new rocks. Therefore, the community explores. multicellular life, such as Origins questions not of this Earth also appear to be in hand. For exam- the Dickinsonia above, ple, no insolvable paradoxes appear to obstruct our modeling of the proc- allowing biologists to reesses that originate stars, nor elements within stars, nor the planets that visit the problem using methods that they have orbit stars. Analogous processes are happening today in the observable always used. The commuuniverse. This means that observation (via the Hubble telescope, for ex- nity constructively believes ample) can support narratives about how our star and planet formed. that further exploration of Many of these narratives are correlated, Galileo-like, with Earth-based the "normal" type ca lend laboratory experiments. Further, the community has little doubt that the further insight ino this problem, but not into the next generation of observational astronomy and theory will generate new origin of life, as the record data that will confirm (or deny) those narratives, just as the last generation for this origin problem is has. seven fold older.

67

We have not done the origin of life "thing" in the past There are no analogous paradigms with track records to develop an understanding of the origin of life. No "worked examples" are available. No solutions for analogous problems offer examples for how future solutions should develop. Therefore, from the perspective of scientific method, our position is analogous to that of Cuvier and Lamarck as they considered the first fossils, Kelvin as he disputed the age of the Earth, or us when we first contemplated resurrecting ancestral proteins from nowextinct animals. Nor is the community confident that new data will emerge. When the oldest sedimentary strata now known were deposited, life on Earth already apparently had much of the complexity (at the molecular level) of modern life. Further, in the oldest rocks where biosignatures are suspected, the signs of life have been cooked, warped, and distorted. This means that we have no reason to be optimistic today that a wider search of our planet's surface will find rocks substantially better able to inform us about the origin of life than we have now. Further, even if we find older rocks, there is little reason to be confident that information from early Darwinian systems will have survived within them. We may hope, of course, but hope does not drive cultures or the agencies that fund them. Reason for despondency.

Carbon structures found in 3.4 billion year old Apex chert from Australia. An interesting afternoon was spent at an astrobiology conference watching William Schopf duke it out with Martin Brasier over whether these are fossils of 3.4 billion year old life. The same rocks contained other structures that did not look so convincingly like bacteria. With intelligent questions from the floor, it was clear that any claim was disputable. And we were still a billion years short of the origin of Earth. The community believes that the Earth is running out of places where undeformed older rocks might be discovered. Maybe the Moon?

The community's beliefs about the origin of life Nevertheless, some have pressed on. Preparing to write this book, I went to the shelf in the University of Florida library at call number QH325. This number collects books covering the origin of life. I read them all, a total of 62. This exercise in "due diligence" produced an interesting observation (the first step in the middle school scientific method). Some books treated life as something easy to originate, an inevitable consequence of the laws of physics and chemistry. Consequently, they viewed life as being abundant in the cosmos. Reading these books, I began to believe that aliens are everywhere. Other books presented an opposing view. These provided pages of reasons why life could not easily emerge by any known process. Their authors saw the emergence of life as a highly improbable event, suggesting that life in the cosmos should be scarce. Reading these books, I felt lucky to be here myself. With these observations in hand, I took the next step in standard scientific method: Seek a correlation. One was immediately evident. Those who viewed life as easy came from the cultures of mathematics, physics, biology, and the law. Those who saw nothing but problems came from chemistry. The optimistic view of mathematicians, physicists, biologists and lawyers was captured by an exchange reported by Stuart Kaufmann, a mathematician and MacArthur "genius prize" recipient in his book Investigations (2000). Kaufmann mentioned a day-long meeting that he had attended in Washington D.C. in 1997. The meeting was hosted by Albert Gore, then vice president. Gore, a Harvard-trained lawyer, was seeking advice from scientists on how to handle a meteorite from Mars found in Antarctica. That meteorite appeared, to some, to hold evidence of life on Mars. The usual elite of science were present, including Stephen Jay Gould (Harvard), Andrew Knoll (Harvard), and Kauffman himself. Kauffman described what happened. "The vice president, it appeared, had read At Home in the Universe", a book that Kauffman had written a few years earlier. "In At Home, I explore a theory … that as-

68

serts that, in complex chemical reaction systems, self-reproducing molecular systems form with high probability. The vice president looked across the table at me and asked: 'Dr. Kauffman, don't you have a theory that in complex chemical reaction systems life arises more or less spontaneously?' "Yes. "'Well, isn't that just sensible?'" Gore asked. Kauffman reported that he was "rather thrilled, but somewhat embarrassed. 'The theory has been tested computationally,'" he told the vice president, "'but there are no molecular experiments to support it." "'But isn't it just sensible? The vice president persisted." "I couldn't help the response," Kauffman replied. "Mr. Vice President, I have waited a long time for such confirmation. With your permission, sir, I will use it to bludgeon my enemies." Remarkable, I thought. A lawyer, the individual who was (nearly) elected President of the United States just a few years later, believed it "just sensible" that life would emerge from a complex chemical reaction system, even though no Galileo-type experiments were available to support that belief. The view that life is easy to originate is everywhere. For example, physicist Paul Davies quoted Richard Terrile, who wrote: "Put [water and organics] together on Earth and you get life within a billion years" (The Fifth Miracle, 1999). Whenever you hear someone say such a thing, your instant thought should be: How could anyone possibly know that? Have they tried? The reasoning behind the "life-is-easy" world view What prompted such optimism among the mathematicians, physicists, biologists and lawyers? Nearly all of the QH325 books presenting the "life-is-easy" perspective repeated mention of the same classical experiments in prebiotic chemistry. These included experiments by Stanley Miller, Alexander Oparin, Juan Óró, and other heroes of the field. Each of the experiments had generated a bit or a piece of life by putting energy into organic matter. Each of these experiments is widely regarded as a "classic". For example, most books prominently featured some work of H. G. Bungenburg de Jong and Alexander Oparin in the 1930's. These scientists observed (microscopic) aggregates of organic matter that appeared to resemble cells simply by heating protein matter (such as gelatin and gum Arabic). They called these aggregates "coacervates". This observation is easily repeatable. You can make it in your kitchen. Coacervates are widely presented in the literature as examples of organic self-organization. The physicist J. D. Bernal is reported as saying that coacervates are "the nearest we can come to cells without introducing any biological substance". Without disrespect (OK, maybe just a little), the stamp collectors would interject that both gelatin and gum Arabic come from terran living systems. The books also cited work by Stanley Miller that started the field of prebiotic chemistry in the early 1950's. Miller sparked electricity through mixtures of methane (CH4), ammonia (NH3) and water (H2O), compounds that (in 1954) were thought to have been present in Earth's primitive atmosphere. Miller found some amino acids among the products, including amino acids that are the building blocks of modern terran proteins. Juan Óró took the field another important step. He started with hydrogen cyanide (HCN, one atom each of hydrogen, carbon and nitrogen), a molecules observed in abundance in interstellar dust clouds. He showed that, Stanley Miller and a mockgiven a little energy, adenine was formed from HCN. up of the apparatus that he What is adenine? There are many ways to answer this question. First, used to launch the field of adenine is a molecule made of five atoms each of hydrogen, carbon and prebiotic chemistry.

69

nitrogen. That is, adenine has the molecular formula H5C5N5, which is H H H C N the same number of atoms as five HCN molecules combined. Another N H C N answer emphasizes adenine as a part of adenosine, a part of many C N H C N C N RNA cofactors, including the ATP that Peter Mitchell studied in selfH C H C N C C funded research. This is ubiquitous on Earth as donor of phosphate in N N H H C N modern terran metabolism. Further, adenosine is one of the four H 5 hydrogen cyanides adenine building blocks of RNA. While we might complain that coacervates are not "non-biological" Adenine is a part of RNA, and is and that Miller's experiments used an incorrect model for Earth's early the "A" in ATP. With 5 hydrogen, 5 nitrogen, and 5 carbon atoms, it atmosphere, the result from Oró prompts only one question: What might be assembled from 5 more could you ask for? Oró started with a molecule that is abundant molecules of HCN. Juan Oró in the non-biological cosmos. He ended with a building block (ade- showed that adenine could, in nine) of ribonucleoside (adenosine) that is itself a building block of fact, be made from 5 HCN RNA and RNA cofactors. Even better, various scientists observing molecules. organic material in meteorites found adenine arriving to Earth from outer space. In other words, Oró s laboratory result corresponds to natural reality. The first round of prebiotic experiments were very persuasive These observations were very persuasive to the community, and for good reason. The first attempts at prebiotic chemistry did not give actual proteins built from amino acid building blocks, or actual RNA. The cell-like structures obtained by Oparin did not contain proteins or nucleic acids (or much of anything else that might support Darwinian evolution). But give us a break! To get this far in the first attempt! Would not more normal science demonstrate self-assembly of whole proteins and RNA? None was more optimistic that chemistry would spontaneously give Darwinian systems than the noted physicist Freeman Dyson [Origins of Life, 1985]. Dyson wrote: "Orgel demonstrated that nucleotide monomers will, under certain conditions, polymerize to form RNA if they are given an RNA template to copy." Dyson, then suggested that "Eigen and his colleagues have done experiments which show us biological organization originating spontaneously and evolving in a test tube. More precisely, they have demonstrated that a solution of nucleotide monomers will, under suitable conditions, give rise to a nucleic acid polymer molecule which replicates and mutates and competes with progeny for survival.” [p. 9, citing Eigen et al. Scientific American 244 (4) pp. 88-118]. Reading these QH325 books, I said to myself, "So, what's the problem?" "Nothing at all", I said to myself. Life is easy. Christian de Duve agreed: "Life has often been represented as a highly improbable phenomenon” (De Duve (2005) Singularities, p. 157). Dismiss this idea, de Duve suggested, as it is "little more than a quantified sense of wonderment". The universe is "pregnant with life." (Vital Dust, 1995). Chemists disagree But then there were the books on the QH325 shelf written by chemists. One of my favorites was by A. Graham Cairns-Smith (Genetic Takeover, 1982). Cairns-Smith, at the University of Glasgow in Scotland, began by reviewing the very same experiments that had generated the optimism of the mathematicians, physicists, cell biologists, and vice presidents. Remarkably, he arrived at the opposite conclusion. The Miller experiment, Cairns-Smith conceded, did generate some

70

A. Graham Cairns-Smith, who proposed that minerals could support genetics in the first form of life.

amino acids. No doubt, amino acids are parts of terran proteins. But, Cairns-Smith noted, the amounts of amino acids formed in the Miller experiment were tiny. Further, amino acids were only a very minor part of a gooey, tarry mixture of organic molecules that came from Miller's experiments; this had otherwise no obvious relation to biology. Cairns-Smith wrote: "No sensible organic chemist would hope to get much out of a reaction from starting materials that were tars containing the reactants as minor constituents." There it is again, Vice President Gore's word: "sensible". No chemist would try to make a protein from Miller's goo by assembling those trace amounts of amino acids that it contained, even by a deliberate chemical act. And no chemist would think it "sensible" that they would assemble spontaneously. As a chemist, Cairns-Smith saw further reasons for pessimism. Even if those amino acids had managed to assemble into proteins, Cairns-Smith noted, "All the major biopolymers [including proteins] are metastable in This shows what is actually produced when one sparks aqueous solution in relation to their (deactivated) monomer. Left to itself electricity through mixtures in water, a [protein] will hydrolyze to its constituent amino acids." That is, of methane, water, and even if proteins managed to self-assemble from the amino acids in the ammonia: brown tar. Notice goo, the chemist found it "sensible" to expect those proteins to fall apart, the rhetorical value of an image by comparing this not go on to generate Darwinian chemical systems. photo with the proto two Kauffman had acknowledged to the vice president that no experiments pages earlier. contradicted the view offered by Cairns-Smith. But what about the computational tests that Kauffman mentioned? To those coming from the community of organic chemistry, such tests do not meet any standard-of-proof. Most organic chemists will point out, if asked, that numerical computation cannot model simple processes in chemistry, like the dissolving of salt in water, or the melting of ice. How can we expect computations to "test" anything more complicated? An a fortiori argument, our logician friends would say. See more in Chapter 7. Cairns-Smith had even harsher things to say about the spontaneous formation of nucleic acids (like RNA) from such goo. These pieces, including ribose (the "R" in RNA), are also formed in very small amounts in Miller-like experiments, if they are formed at all. And if the pieces were formed, and if they were to self-assemble to give genes, those RNA genes would fall apart in water, just like proteins. So Cairns-Smith considered the same experimental data as the mathematicians, physicists, and lawyers when they argued that life is easy. But in these data, Cairns-Smith saw evidence that life is hard. Further, Cairns-Smith clearly contemplated something outside of "normal science": He wrote: "Few would deny that there are difficulties in the doctrine of [prebiotic formation of biomolecules]. The question at issue is whether these are to be taken as puzzles or anomalies." In other words, should we approach the failure of complex chemical reaction systems to selfassemble to gain access to Darwinian evolution as a puzzle that we might solve by doing a bit of normal science? Or does the failure of complex systems to generate life reflect a fundamental defect in how we view the world? Are we facing a crisis of the type that Thomas Kuhn suggested precedes a scientific revolution? The need for anthropologists for scientists So far, in examining the community concerned with the origin of

71

life, we have not seen anything analogous to what we saw in Chapter 4, where a community came together to define scientific methods and standards-of-proof to drive the development of paleogenetics as a new field. Instead, we encounter authorities who conclude that life is easy from exactly the same experimental data that other authorities use to conclude that life is difficult. One way out of this mess might be to become an anthropologist. Why not stake out the laboratory of chemists interested in the problem and observe what they do? We might infer from their actions what their constructive beliefs are, just as in Chapter 4 we identified what classification systems biologists really accepted by way of classification characters. We might then draw upon the scientific method of our anthropologist friends to address the question: What do these scientists constructively believe? Organic chemists tackle the "origin of life" Unfortunately, not many chemists work on the origin of life. This is not as pressing as the search for a cure for cancer, for example. Chemists can apply their talents to either search and, perhaps not surprisingly, more money is available to chemists who choose to study cancer. No bucks, no Buck Rogers. This means that only a few chemists doing origins research are available to observe. But there are a few, including a few who have a constructive belief in the RNA-first model for the origin of life. What do we observe them doing? Hardly unexpectedly, they are doing what their community has trained them to do, as Cairns-Smith suggested. They seek reactions that generate precursors of RNA in high yield (not as minor components of tars) from purified starting materials (not tars). This is the culturally correct response (for an organic chemist) to the challenge presented by CairnsSmith. Chemists understand that it takes several steps to convert a simple molecule like HCN into adenine, or a simple molecule like formaldehyde (HCHO) into ribose, the R in RNA. Therefore, chemists trained in synthesis, when working in the field, do not start with HCN or HCHO. Rather, they say to themselves: "Oró showed that one could get adenine from HCN. So that step is good. Butlerov showed that you can get ribose from HCHO. So that step is also good. So let us buy a sample of pure adenine and pure ribose and see if we can join them to get adenosine. Let us see if we can get that step under control." A relay synthesis yielding one building block of nucleic acids This is called a "relay synthesis". As in a relay race, no molecule actually goes from the start of the race to its end. Just the baton. Unfortunately, the use of relay syntheses in efforts to get RNA from cosmic organics has created a dispute within the community. Let me illustrate this dispute using one of my favorite relay syntheses from the chemist John Sutherland (University of Manchester, England). Sutherland recently proposed a potentially prebiotic synthesis to make a building block of RNA (cytidine-5'-phosphate). To test this idea, Sutherland purchased (from a chemical supply house) two (pure) precursors, called "2-aminooxazole" and "glyceraldehyde-3-phosphate". His students then went into the laboratory, dissolved these two compounds in a ratio of 1:1 in water, and adjusted the acidity of the mixture until the solution was neutral. They then allowed the mixture to incubate, stopping the reaction after two days at 25 °C (77 °F, 298 Kelvin). Analysis showed that some of the glyceraldehyde-3-phosphate had decomposed, so that particular reactant was doubled in subsequent experiments. The students then added some freshly prepared cyanoacetylene, and allowed the reaction to go on for a bit longer. Then, they isolated the product, an isomer of the cytidine. They reported a yield of 80% (100% is a perfect yield). The product that Sutherland's student made (an isomer of the cytidine) is not exactly a building block for RNA (which is cytidine itself). Atoms in molecules are arranged in three-dimensional space. While the molecule that Sutherland made in these two steps has the right number of atoms, those atoms are arranged in space incorrectly. To get the actual cytidine building block for RNA, a third step is required.

72

This step is effected by light. In a separate paper, Sutherland's group examined the ability of light to convert the product that they got in 80% yield into the product that they wanted. They discovered that light destroyed most of the cytidine; they got just 4% percent of what they wanted. And, like the good chemists that they are, they found out why the yield was low. Sutherland and his students were doing what good chemists do. The 80% yield was published in the Journal of the American Chemical Society, arguably the most prestigious place to publish work in chemistry of any kind. The reaction involving light (which produced a 4% yield) was published in what the community likes to call "a specialty journal". Such specialty journals are places where less spectacular, but still solid, results are published. Chemists versus chemists I myself like Sutherland’s work because (I admit it) I come from the culture of physical organic chemistry, just like Sutherland. Further, I think that this work explores the realm of the possible in chemistry, something that we need to know more about. But I do not represent the entire community of those who call themselves "organic chemists". Another member of that community is Robert Shapiro, a professor of chemistry at New York University. Shapiro has written a number of layperson-accessible books (see especially: Origins, a Skeptic's Guide to the Creation of Life on Earth, 1986; Planetary Dreams, 1999). Shapiro has long been a skeptic of the life-is-easy view of reality, in part because his research laboratory discovered many of the ways that DNA and RNA can fall apart in water (the same "water problem" that Cairns-Smith mentioned). To illustrate the problem, Galileo-style, let us set up a dialog between opposing individuals. Let us have Salviati defend Sutherland's position and Sagredo defend Shapiro's position. Since both Shapiro and Sutherland are my professional friends (at least until this book comes out), I will have no one's views represented by a character called Simplicio.

O P

HO HO

H O

H

C

H

C C

H glyceraldehyde3-phosphate O

N H

OH

C

H

C O

N

C

H

2-aminooxazole

H

O P

HO HO

O

H C

H

O

C

H

C

N H

H

C

C

C

N

O

HO

H

H

H

H

O

C

C

C

N

cyanoacetylene HO HO

P

O

O

H C

H

C

H wrong arrangement of atoms in 3 dimensions

H HO

C

N

C

C

C

C

H

H

H

H H

N C

N

H

C

HO HO

P

C

O

O

N H

C

H

OH

light 4%

O

H

N

C

O

C

N

H

C

H

C

H cytidine-3phosphate

O

H HO

C

C

H

C

H

H OH

A proposed relay syntheses for a building block of RNA, here, a cytidine carrying a phosphate. It is difficult to represent three-dimensional objects on two-dimensional sheets of paper. Chemists do the best they can by making bonds to atoms that lie in front of the sheet of paper bold wedges, and making atoms in front of the paper a bit bigger, while making bonds to atoms that lie behind the sheet of paper dotted, and those atoms a bit smaller. It turns out that what was made in high yield (80%) has the wrong 3D structure, while the molecule with the correct 3D structure is made only with low yield (4%).

Salviati: Sutherland's group has produced one of the four building blocks needed for RNA in just two steps, without requiring any life. A few more steps, and prebiotic chemists will have RNA! Sagredo: Uh, what do you mean, Signor Salviati? Are not Sutherland's students themselves forms of life (and intelligent life at that)? Was not their life (and their intelligence) needed to get the correct ratio of starting materials into the flask? To adjust the acidity of the mixture? To stop the reaction after just two days before it turned to tar? To then add a freshly prepared solution of cyanoacetylene? And stop the second reaction at just the right time? Salviati: Well yes, but it could also have happened without them. A rainstorm could have come after two

73

Sagredo: Salviati: Sagredo: Salviati: Sagredo: Salviati: Sagredo: Salviati: Sagredo: Salviati: Sagredo: Salviati: Sagredo:

Salviati:

Sagredo:

Salviati: Sagredo:

Salviati: Sagredo: Salviati: Sagredo: Salviati: 74

days on early Earth to wash the first product into a mixture of cyanoacetylene. And cyanoacetylene is a molecule observed in interstellar space. So why did the student need to use a freshly prepared solution of cyanoacetylene? Does this material not easily polymerize to form tar? Well yes it does if it is left around, but perhaps a meteorite delivered needed cyanoacetylene to Earth at just the right time. And what about glyceraldehyde-3-phosphate? Is that found in interstellar space? Well no, but Albert Eschenmoser suggested that glyceraldehyde-2-phosphate might have arisen prebiotically from an aziridine followed by phosphorolysis followed by hydrolysis followed by condensation with formaldehyde. I do not know what these words mean, but isn't glyceraldehyde-2-phosphate different from glyceraldehyde-3-phosphate? Does glyceraldehyde-2-phosphate work just as well? Well, actually no, but we hope that we will eventually think of some way to get glyceraldehyde-3-phosphate from glyceraldehyde-2-phosphate. What about this aziridine? Is it seen in interstellar space? Well no. They have looked for it, and have not seen it. But you know, Quine and all. The absence of proof is not proof of absence. Fair enough. But what about 2-aminooxazole? Is that found in interstellar space or meteorites? Well no, but 2-aminooxazole can be formed from glycolaldehyde and cyanamide, both of which are found in interstellar space. Is 2-aminooxazole formed from glycolaldehyde and cyanamide in the presence of glyceraldehyde-3-phosphate? Well no, Sutherland and his students tried this, and they found that cyanamide destroys glyceraldehyde-3-phosphate. So Signor Salviati, how could we have 2-aminooxazole together with glyceraldehyde-3-phosphate if the precursor for 2-aminooxazole destroys glyceraldehyde-3-phosphate? Could anything sequence of this type happen without direct intervention of intelligence? Are you an advocate of the "intelligent design theory" that will emerge in the New World in four centuries? Not at all! Sutherland suggested that 2-aminooxazole was formed in a location different from glyceraldehyde-3-phosphate, sublimed from that location, floated in the air, and was rained into the pond holding the glyceraldehyde-3-phosphate. To show that this was possible, they left some 2-aminooxazole on a bench at 50 °C (120 °F, 323 Kelvin) for a day and found that half of it sublimed into the atmosphere. Are you trying to tell me that 2-aminooxazole, by accident, was formed in water, the water then evaporated at 50 °C (120 °F, 323 Kelvin), the 2-aminooxazole sublimed into the atmosphere of the entire planet Earth, where it was diluted into 3 x 1019 cubic meters of air, but nevertheless managed to rain out into a pond at 25 °C where it accidentally found itself in a 1:1 ratio with glyceraldehyde-3-phosphate, which was somehow formed from glyceraldehyde-2-phosphate, which came from an aziridine that no one can find any evidence was available on early Earth? Could-a-happened. So what is your standard-of-proof? Are you going to say that anything that you can get going in your laboratory with the hands of talented students is plausibly prebiotic, no matter how long the series of fortunate events must be postulated ad hoc? Do you have any Occam criteria that constrains your proposals? I do not understand your weenie philosophical concepts, but Albert Eschenmoser has said that such experiments are "necessary". Who is Albert Eschenmoser? Are you making an argument from authority? Well, Signor Eschenmoser is a famous chemist. Some have called Eschenmoser a "pope". Isn't Signor Eschenmoser the man who said that prebiotic chemistry should not be done by young scientists? Well yes, but isn't that an ad hominem?

Simplicio left the Gordon Conference with Salviati and Sagredo throwing chairs at each other. In 2007, in an article in Scientific American, Shapiro carried the argument forward. He suggested that what Sutherland (and many others) do is analogous to a golfer: "… who having played a golf ball through an 18-hole course, then assumed that the ball could also play itself around the course in his absence. [The chemist-golfer] had demonstrated the possibility of the event; it was only necessary to presume that some combination of natural forces (earthquakes, winds, tornadoes and floods, for example) could produce the same result, given enough time." This, Shapiro argued, is good synthetic organic chemistry. It may even represent what an intelligent designer would do to make RNA. But it has absolutely nothing to do with the origin of life. Further, since it has no clear intellectual constraint, it will never provide us any insight into how life emerged on early Earth. This type of work is the chemist's equivalent of a just-so story, which we considered and discarded in a different context in Chapter 4. Disputes between dueling experts? When faced with disputes among experts, it is common for laypeople to try to pick the better expert from the better field rather than to try to pick the better argument. This is similar to the choice that we considered in Chapter 1 when thinking about the age of the Earth. Should we pick the representative of the “harder” science (Lord Kelvin, representing physics) or the “softer” science (Darwin, representing zoology)? At least in 1895, the two sides were relatively cleanly divided. The 1895 natural historians were pretty much agreed that the Earth looked rather old. The 1895 physicists were in fair agreement that they could not identify the source of the Sun’s energy if the Earth was rather old Here in Chapter 5, the bad news is that we cannot even pick a field (chemistry) and run with its experts. Even though its experts share a common culture, common ancestry, common methods, and common standards-of-proof, chemistry does not agree on how to resolve the problem, even when presented with a formally simple hypothesis for life’s origins. And so, even if we wanted to let the reader arrive at a conclusion by picking the field rather than evaluating arguments, it will not work here. But it is worse. The community does not agree as to how we would recognize a solution to the origins problem even if it were to materialize before us. In the best spirit of Galileo,

John Sutherland (1988) Jack Baldwin (1964) Derek Barton (1942) Isador Heilbron (1909) Arthur Hantzsch (1880) Wilhelm Heintz (1842) Heinrich Rose (1821)

Robert Shapiro (1964) Robert Woodward (1937) Avery Ashdown (1924) James Norris (1895) Ira Remsen (1870) W. R. Fittig (1858) H. F. P. Limpricht (1850) F. Woehler (1823)

J. J. Berzelius (1802) Johann Afzelius (1776)

We may try to use historical tools from Chapter 4 to understand the different cultures in a field, or perhaps even to choose our favorite expert. Here, we trace academic ancestry by Ph.D. supervisor (with the dates of their dissertations) to get historical models for the evolution of chemists Robert Shapiro and John Sutherland. They are academic sixth cousins once removed. Evidently, common ancestry is not a good predictor of constructive belief or world view. It was also not so in the past. For example, Berzelius (the last common academic ancestor of both Shapiro and Sutherland) long argued that living and nonliving systems were fundamentally different, while Woehler, his student, did the first experiments showing that life was made of the same stuff as non-life.

75

Sutherland is doing what he can do. Nevertheless, his work lacks the philosophical constraints that others in the community demand. In other words, Sutherland could do his brand of prebiotic chemistry until the cows came home, and may arrive at an explanatory model satisfactory to him, but not meet any standard of proof acceptable to Shapiro. Two commonplace observations Let us introduce another scientific method: the search for paradoxes. To seek paradoxes, we first set aside all of the reasons that we might have, personal, political, and cultural, for believing what we constructively believe (this is the hard part). Then, we ask: What in our common experience would need to be wrong for the RNA-first hypothesis to be correct? We then look for apparent paradoxes within our answers to these questions. Given our a definition-theory of life in place, the method identifies an apparent paradox that arises by the contrast between two commonplace observations: (a) When one puts energy into an organic system not having access to Darwinian evolution, one gets tar (a complex mixture of many different compounds). (b) When one puts energy into an organic system having access to Darwinian evolution, one gets babies. Before delving into the impact of this contrast on models for the origin of life, let me emphasize "commonplace" by challenging you to make the observations yourself. Let us start with the first. Put down this book, go to your kitchen, and pull off the shelf a sample of organic matter. It makes no difference which; it can be a can of clam chowder or some Cheerios®. Add some water (or not, your choice), and give that sample some energy by putting it into your oven and turning the temperature to 220 °C (450 °F, 493 Kelvin). Then, come back, settle in, and return to reading the book, until the stench is overwhelming, the smoke alarm turns off, or the fire company arrives. Then go see whether that complex chemical mixture did something "sensible". See if it generated life. Nothing in the experience of any organic chemist suggests that it will. Instead, that experience suggests that the mixture will have "evolved" to look like asphalt. And the outcome, by external physical appearance, will not depend much on what organic material you pulled from the pantry. Things will look pretty much the same if you started with chowder or Cheerios. You may try this in other ways. You may exclude oxygen (you probably should if you want to make the experimental environment more like early Earth). You may add propane. You may mix different organics. You may use energy from lightning or the household current instead of heat. But the outcome will be analogous. Tar. Which is also what Stanley Miller made in his experiments, as Cairns-Smith and Shapiro noted. Analogous transformations happen on Earth as a planet, of course. Organic materials from previously living systems are often heated without oxygen. The products are petroleum, tar sands, and coal. Chemists can infer something about the organic material that fed the process, if they examine the tar closely enough, just as chemists could infer whether the tar created in your kitchen began with chowder or Cheerios. But the more energy put into the system, the less the outcome resembles life. Contemplate that as you clean out your oven. Organics plus energy gives asphalt, something good for paving roads. Not the self-organized matter worthy of the name "life". This observation is commonplace. This is why chemists complain when biologists suggest that chemistry is "pregnant with life", or mathematicians, physicists and lawyers assert that life emerges "sensibly"

76

by cooking complex mixtures of organic species. Maybe it can, and maybe it did, but to believe so would require us to disregard thousands of experiments of the type that you just did in the kitchen. Now, let us consider the second commonplace observation. Go to the pet store. Buy two guinea pigs, one male and one female. This pair exemplifies a chemical system that is capable of (and has historically had access to) Darwinian evolution. Now, put the two pigs in a cage and give them some Cheerios (we recommend Cheerios over chowder, but that is a preference specific to guinea pigs) and a few other essentials (like water). Settle back to read this book. Do this for about 40 days. The odds are that after this time, when you return to the cage, you will discover that the Cheerios were converted not into asphalt, but into more guinea pigs. The product of energy plus a selfCute furry babies that display all of the attributes expected of the sustaining chemical system with access to Darwinian evolution: living state. babies. Our definition-theory suggests that the only relevant difference between the two systems is that the second has access to Darwinian evolution, while the first has lost that access. But independent of that theory, the pair of observations suggests a paradox surrounding origins. If chemical systems spontaneously and intrinsically make tar when provided energy, how can they ever spontaneously generate a Darwinian system? RNA or otherwise? Paradoxes and revolutionary science These two commonplace observations (tar versus baby guinea pigs) seem to create a bona fide crisis, in the sense discussed by Thomas Kuhn in Chapter 1 of this book. The paradox is even recognized in the popular press. Consider this exchange recorded in The DNA Files, a science education broadcast that pitted journalist John Hockenberry against a know-it-all computer named Mnemosyne [The DNA Files: Astrobiology, 2001]. The exchange went like this: Mnemosyne: Hockenberry: Mnemosyne: Hockenberry:

Well, chemists think that if you could recreate the conditions of the Earth about four and a half billion years ago, you'd see life happen spontaneously. You'd just see DNA just pop out of the mix. Can they do that in a laboratory? Well, no. Actually, they've tried, but so far they can't seem to pull it off. In fact chemists have a little joke about that, you know: they say that life is impossible. Experience shows that it can't happen. That we're just imagining it. Ha ha ha ha.... Right. Those chemists...

Let us put this into a chemical context. Figure 5.1 shows just some of the products that might emerge from the reaction of just two compounds known in the cosmos: formaldehyde (HCHO) and glycolaldehyde (HOCH2CHO). All of the compounds in Fig. 5.1 have the formula (HCHO)n, where n is an integer. For formaldehyde, n is 1; for glycolaldehyde, n is 2. The compounds are arranged in Figure 5.1 in rows, with increasing n. Compounds in the row below are obtained from compounds in the row above by adding one HCHO molecule (and therefore increasing n by 1). The layperson should think of just one word when looking at Figure 5.1: complexity. We will worry about the details of that complexity a bit later in this chapter. The point to be made now is that this complexity will eventually evolve to give tar. Talented organic chemists can prevent the formation of tar from this manifold of reactions. The easiest way to do this is to remove the energy source at the right time to stop the reaction process. But this is exactly what Sagredo complained about in Sutherland's experiment. Having someone intervene to stop the reaction at exactly the right time cannot possibly be what happened on early Earth, unless one moves into 77

the realm of intelligent design. Only if conditions changed by themselves accidentally at the right time would the formation of tar have been avoided. And if that is postulated, Salviati will start calculating the number of times such change is needed, and the probabilities associated with that number. If conditions needed to change an arbitrarily large number of times, then those probabilities would be very low. HOH2C OH O

C2,3

OH

H

C

C

H

HCHO

OH

HO

C

H

glycolaldehyde

HO

O

C

C

C

C

C

H

H H

H

C

C4

HO

HO

C

C

OH

C

HO

C

HO H

C

C

C

H H

H H

C

H H

H H H

C

C

C

HO

C

C

OH

HO H

H H

OH

H H

H H

C

OH

C

O

C

OH

H H

HO

C

HO H

H2O2 Fe++

OH

C

OH

C

H

C

O

H H

OH

H H

HO

O

OH

H H

C

H H

HCHO

formaldehyde

OH

C

dihydroxyacetone

OH

H OH

glycerol

HCHO OH

H H

C5

C

H H

H H

O

H H

C

C

C

C

C

HO

C

C

HO H

OH

HO

C

C

HO H

OH

OH H H

HO CH2

C

C

OH

H OH

HO

C

OH

C

C

HO

C

C

C

OH

C

HO

HCHO

HO

HO H

C

C

O

C

HO H

C

HO

C

HO CH2

H

OH OH

C

OH

H H

C C

OH

C

C

HO

C

HO CH2

HO CH2

C

C

C

OH

H H

H H

C

HO

OH

OH

OH

OH H

H H H2C OH

C

HO CH2

H

OH

C HO

HCHO

C C

HO H

C C

C O

HO CH2

HO

H OH

C C

HO H

H H H2C OH

C

OH

C

C

C

HO

H H

O

C C

HO H

OH C

OH

C

H H

O

H H

C

C

C

HO

C

OH

enolization aldol at primary C aldol at secondary C retroaldol bond broken in retroaldols Fenton reaction

C

OH

OH

C

C C

HO H

C C

O

H H

C OH

C

HO

C HO

C

HO CH2

HO H

H H

HO CH2

H

C

C

C

HO

OH

C

O

C C

HO CH2 H H OH

HO

OH

HO CH2

H

C

C

C

OH C

C

C

HO CH2 HO CH2 OH

OH

H H

HO CH2 H OH

C O

HO

C

C

OH

C

C

C

HO CH2

O

H H

OH

Figure 5.1. This figure is intended to convey a single word to the layperson: "complexity". The chart shows the structures of organic molecules made of only carbon atoms (C), hydrogen atoms (H), and oxygen atoms (O), in a ratio of 1:2:1. The compounds are ordered by size, with compounds containing two and three carbon atoms (C2 and C3, respectively) at the top, with compounds containing four, five, six, seven and eight carbon atoms (C4, C5, C6, C7, and C8) ordered in rows. The arrows show reactions that interconvert these compounds. The heavy black arrows show the addition of formaldehyde molecules (HCHO), which add a carbon atom to make a compound one carbon larger. The open arrows show reactions that interconvert species having the same number of carbon atoms. Red arrows show reactions that fragment a larger molecule to give two smaller molecules, where the bond that is broken in the fragmentation is red. Blue compounds are dead-end compounds that accumulate in the reaction. Except for these, one needs to have a chemist intervene to generate any particular species in any amount. Without timely intervention by a chemist, this mixture evolves further to give tar.

Galileo's strategy: Address the (apparent) paradox, not the unobservable Galileo, when arguing that the Earth was speeding around the Sun, was presented with an apparent paradox: We do not sense the Earth's motion. His experiments were directed towards resolving that apparent paradox. Hence, he rolled balls down inclined planes. Recognizing the paradox is the first step towards doing something to resolve the paradox and, indeed, to go beyond the Sagredo vs. Salviati debate (which will otherwise continue forever). Salviati must concede one point. He needs to do something that might plausibly have happened naturally (that word again from 78

OH

OH

OH

OH

H H

C

HO CH2

OH

C C

C

C

OH

O

H H H2C OH OH

C

HO CH2 H H

OH

H H H2C OH HO CH2 HO

H H

HO CH2 HO CH2

H H

OH

OH

C8

OH

OH

OH

H H H2C OH

O

C

OH

C7

OH

C

H H

C

OH

C

OH

HO CH2 H OH

OH

C

HO CH2 H H

O

C

H

H H H2C OH C

C

OH

H H OH

C

H

H H

OH

C6

C

O

HO H

H H

O

C HO

HCHO

H H H2C OH

H H

Chapter 4). But both must concede the larger point: To resolve the paradox (and to learn whether it is apparent or real), the goal is not to make RNA. Rather, the goal must be to not make tar. This goal might also manage strategically the cultural reality in this field. Given the culture of despondency in 2008, it may not be timely to ask what actually happened on Earth to create RNA. It may not be timely to do the two experiments outlined above to develop the RNA-first hypothesis. The tar-versus-pigs paradox may be so strong that neither funding agencies nor coworkers have a belief sufficiently constructive to enthusiastically do experiments to answer those questions at this time. As in many fields, enthusiasm is a near-requirement for success in science. Rather, a more timely question may be: How can the propensity of organic molecules to form tar be naturally constrained? The strategic plan is simple. Should an example of a natural constraint be provided, the constructive beliefs of the community might change. This would be a Kuhnian "paradigm shift". One example of a natural constraint on organic tar formation might create a constructive belief that others exist. This might generate funding to find more examples of natural constraints on tar formation. This, in turn, might eventually uncover natural ways to assemble RNA itself from the complexity of organic molecules available on early Earth. Ultimately, we might hope to pull off the same trick described in Chapter 4, where a few examples generated a community with a normal research plan and accepted standards-of-proof. Feel free to skip ahead to page 87 Stephen Hawkings' publisher told him that his book, A Brief History of Time, would lose sales for every equation it contained. Analogously, I was advised that this book would lose sales for every chemical structure it contained. But, like Galileo, I am not comfortable trying to convince readers of any model of reality based on my rhetorical ability or my authority. I am certainly uncomfortable trying when the scientific questions offer so many teaching opportunities, including the opportunity to study the emergence of scientific methods in real time. Further, those who have reviewed this manuscript have suggested that it explains organic chemistry to non-chemists rather well, and this might be the most valuable part of this book. After all, chemistry is at the core of any substantive discussion of the origin of life. As many of you have signed on to actually gain a substantive understanding of life as a universal, and studies of its origin are an important approach to that understanding, I will try to provide just enough chemistry to let you understand the issues associated with the origin of life and their potential resolution. Those who want to skip the chemistry, either because you already know it or because you do not want to know it, the narrative resumes on page 87. Atoms are formed from a nucleus and orbiting electrons First, let us talk about atoms and the bonds that hold atoms together when they are assembled into molecules. As taught in middle school, atoms are the irreducible forms of the 90 elements that chemists have isolated from the mineral world. At the center of an atom is its nucleus, which is built from protons (which are positively charged) and neutrons (which have no charge). The number of protons in a nucleus determines the name of an atom. Thus, atoms that have one proton in their nuclei are called "hydrogen atoms". Atoms that have two protons in their nuclei are called "helium atoms". Atoms that have six, seven, and eight protons in their nuclei are "carbon", "nitrogen", and "oxygen" atoms, respectively.

79

Electrons orbit the nucleus. Electrons each carry a negative charge, and (in a cartoon version of quantum mechanics) those negative charges are attracted to the positive charges of the protons in the nucleus. A neutral atom has the same number of orbiting electrons as it has nuclear protons. For example, a neutral hydrogen atom has one orbiting electron to match the one proton in its nucleus. A carbon atom has six orbiting electrons to match the six protons in its nucleus. And so on. Atoms form bonds to obtain a desired number of electrons Anthropomorphizing for just a bit, atoms "want" to have certain numbers of electrons orbiting around their = H H H H H H nuclei in "shells". For example, both hydrogen atoms and two dihydrogen hydrogen molecule dihydrogen helium atoms "want" to have two electrons orbiting atoms (line molecule; each representation) atom has 2 shared around them in their first shell. As a neutral atom, helium electrons already has two orbiting electrons to match the two pro- Two hydrogen atoms, each bringing one electons in its nucleus. Helium is, therefore, "happy" as an tron, form an H2 molecule by sharing these two isolated atom. This means that a neutral helium atom does electrons. The big circle centered on the left H not react with other atoms to form molecules. Thus, you holds two electrons (one red, one blue). The big circle centered on the right H also holds two can buy a tank of helium atoms to inflate balloons. You electrons. By sharing, each hydrogen atom can spark the helium atoms, smash the helium atoms, or "thinks" it has two electrons in its outer shell, torch the helium atoms, but they will remain individual the number that each atom "desires". helium atoms. Hydrogen is not so fortunate. The neutral hydrogen atom has only one of the two electrons that it would like to have orbiting in its first shell. Accordingly, hydrogen atoms form bonds with other atoms to get a second electron by sharing. To illustrate this in cartoon form, consider dihydrogen (H2). Dihydrogen, as its name implies, is built from two hydrogen atoms. Therefore, H2 has two protons and two electrons, one each from each hydrogen atom. To form the H2 molecule, two hydrogen nuclei share the two electrons. By sharing, each hydrogen atom “thinks” it has two electrons in its first shell. Therefore, each hydrogen atom is "happy", in the same sense that the helium atom was happy with the two electrons it has in its first shell. Analogously, the H 2 molecule is rather stable. The H2 molecule is represented on paper by two H's letters joined by a single line. The H letters represent the two hydrogen nuclei. The line represents the two electrons that the two hydrogen atoms share, the bond that holds the two atoms together. Each hydrogen atom is said to have a single "valence"; that is, a hydrogen atom wants to form exactly one bond. For carbon, nitrogen, and oxygen atoms, things are a bit more complicated. Those atoms put two of their six, seven, and eight electrons (respectively) in the first (inner) shell, as does helium. Once in that inner shell, those two electrons are happy, and can be ignored. This leaves four (for carbon), five (for nitrogen), and six (for oxygen) electrons to orbit in the outer shell. It is these four, five, and six electrons (respectively) that form bonds with other atoms. Now, in their outer shells, carbon, nitrogen, and oxygen atoms each "want" to have eight electrons. Their desire to get those eight electrons drives these atoms to share electrons with other atoms. Each pair of electrons shared with another atom forms a bond. We illustrate this by drawing representations for three basic molecules in the cosmos involving hydrogen atoms bonded to carbon, nitrogen, and oxygen. These are methane (H4C), ammonia (H3N), and water (H2O). Consider methane. The carbon atom in a methane molecule is represented by the letter "C"; that atom contributes four bonding electrons to the molecule. The four hydrogen atoms are each represented by a

80

H letter "H"; each hydrogen atom contributes one bonding H H electron to the molecule. Each bond is represented by a H C H H C H H C H single line connecting each H to the central C. This line H H represents two electrons shared by the bonded hydrogen H and the carbon. The carbon atom is therefore surrounded Four hydrogen Methane. Each Representation of atoms, each with 1 hydrogen has 2 methane with lines, by eight bonding electrons, four from the outer shell of electron, and 1 shared electrons; each representing 2 carbon atom with carbon has 8 electrons and 1 bond the carbon atom itself, and one each from each of the 4 electrons shared electrons two hydrogen atoms. The carbon atom is now "happy". So are the hydrogen atoms, as they (through sharing their one electron with one electron from carbon) think they H N H H N H H N H have two electrons in their first shell. H H H Representing the ammonia (H3N) molecule is a bit Ammonia. Each Representation of Three hydrogen more complicated. The nitrogen atom in ammonia is hydrogen has 2 ammonia with lines, atoms, each with 1 shared electrons; each representing 2 electron, and 1 represented by the letter "N". The N atom contributes nitrogen has 8 electrons and 1 bond nitrogen atom electrons with 5 electrons five bonding electrons to the molecule. As before, each hydrogen atom is represented by an "H"; each hydrogen atom again contributes one bonding electron. Again, the H O H O H O bond between each H and the central N is represented by H H H a single line. Again, each of these lines represents two Two hydrogen Water. Each Representation of electrons shared by the bonded hydrogen and the nitroatoms, each with 1 hydrogen has 2 water with lines, electron, and 1 shared electrons; each representing 2 gen. The nitrogen atom is therefore surrounded by eight oxygen atom with oxygen has 8 electrons and 1 bond 6 electrons electrons bonding electrons, five from the outer shell of the nitrogen atom, and one each from each of the three hydrogen Representations of methane (H4C, often written atoms. The nitrogen atom is now happy with the eight CH4), ammonia (H3N, often written NH3), and electrons that it has, six shared and two not shared. The water (H2O). Electrons on each atom are colorhydrogen atoms are also happy with two electrons in coded. The electrons on the central atoms (C, N, or O) are represented by black dots; the their outer shell, both shared. electrons from each hydrogen atom are repreAmmonia is different from methane, however, in that sented by dots having the same color as the the central atom has an unshared pair of electrons, not atom. The circles centered on each atom in the involved in binding to any hydrogen atom. Each of these representation of the molecule show the electrons that each atom "thinks" it has in its outer electrons is represented by a dot. In contrast, the carbon shell. In each molecule, the hydrogen atom has in methane does not carry any unshared bonding elec- the 2 electrons that it wants, while each of the trons. This will become important in a moment. heavier atoms has the 8 electrons that it wants. Water (H2O) is next. In water, the central oxygen atom The nitrogen (N) atom has one unshared pair of is represented by an "O". The oxygen atom contributes electrons (the two black dots at the top of the molecule); the oxygen (O) atom has two unsix bonding electrons to the molecule. Each hydrogen shared pairs of electrons (the two black dots at atom is again represented by "H", and contributes one the top of the molecule, and the two black dots bonding electron. In the line representation of the water on the right of the molecule). Each pair of molecule, each H is joined to the central O by a single electrons that forms a bond is represented in the line, which represents a pair of electrons shared by those right-most structures as a line joining the atoms. Carbon forms four bonds, nitrogen forms two atoms forming a single bond between H and O. The three bonds, and oxygen forms two bonds. oxygen atom is also surrounded by eight bonding electrons (and is happy), six from the outer shell of the oxygen atom, and one each from each of the four hydrogen atoms. Again, each electron not involved in a bond is represented by a dot. In the water molecule, the oxygen atom carries two unshared pairs of electrons. The hydrogen atoms are likewise happy, as they have around them two shared electrons.

81

Distribution of charge determines properties of a molecule Now that we can represent on paper these archetypal molecules, we can draw representations for most molecules that interest us. Further, we can use these representations of molecular structure to anticipate the properties of the molecules themselves. One thing to remember: The number and distribution of electrons (and, consequently, of charge) is key to the behavior of molecules. We will worry about these often. In some molecules, the number of electrons is not equal to the number of protons. In this case, the species carries a full charge, and is said to be an ion. The ions in sodium chloride (NaCl, table salt) are a good example. Sodium chloride is formed by the reaction between a neutral sodium atom (Na, with one electron in the outermost shell) and a neutral chlorine atom (Cl, with seven electrons in the outermost shell). In that reaction, an electron is transferred from the sodium atom to the chlorine atom. After the transfer of one electron, the chlorine atom has one more electron than its number of nuclear protons, and therefore carries a negative charge. As chemists like to change the names of things when their structure changes, this is now the “chloride” ion (or anion, which means a negatively charged ion). As a result of the transfer of a single electron, the sodium ion carries a positive charge, and is called a sodium "ion" (or "cation", which means a positively charged ion). Molecules with equal numbers of protons and electrons are not charged overall. Nevertheless, as electrons orbit the nuclei, the nega-

Na sodium atom with 1 electron in its outer shell

Cl chlorine atom with 7 electrons in its outer shell electron transfer

Na sodium cation

Cl chloride anion

Table salt (sodium chloride, or NaCl) is formed by transferring an electron (represented as a red dot) from a neutral sodium atom (with 11 protons in its nucleus) to a neutral chloride atom (with 17 protons in its nucleus). The resulting chloride ion (called an anion because it is negatively charged) has the 8 electrons that it wants in its outer shell. Stripped of its outermost electron, the sodium ion (called a cation because it is positively charged) now is left with just 10 electrons (not shown), 2 electrons happy in the first shell, and 8 electrons happy in the second shell.

water ammonia tively charged electrons need not be distributed exactly on top of the  positive charges in the nuclei. When they are not, parts of the molecule • • • •O• • where the electrons spend more time are more negatively charged, N + + + + H H H H while parts where the electrons spend less time are more positively H + charged. If the distribution of charge is especially uneven, the molecule Oxygen and nitrogen are both is said to be polar. more electronegative than hyThe more uneven the distributions of charge in space, the more polar drogen. Therefore, the O and N the molecule is. This uneven charge distribution is the source of nearly atoms attract electrons more than H. Therefore, O-H and Nall of the physical properties of the molecules that we will discuss. The distribution of charges can be anticipated using the concept of H bonds are polarized to place a partial negative charge on the "electronegativity". Oxygen and nitrogen atoms are more electronegaO and N atoms (- indicates tive than carbon and hydrogen atoms. Thus, oxygen and nitrogen at- this), and partial positive oms embedded in a molecule generally attract the electrons towards charges on the H atoms (indithemselves, and have more negative charge. In contrast, hydrogen cated by +). atoms, especially those bonded to oxygen and nitrogen atoms, are more positively charged. For example, the O-H bond connects two atoms (oxygen and hydrogen) that have very different electronegativities; oxygen is much more electronegative than hydrogen. The oxygen atom therefore carries a partial negative charge (represented in models for molecules by -), while the hydrogen atom bonded to the oxygen atom carries a partial positive charge (represented by +). Thus, O-H units are "polar", and a

82

molecule that has O-H units is itself polar. Water, having two of those OH units in a bent structure, is very polar. In contrast, atoms with similar electronegativities share electrons equally. Bonds between such atoms are not particularly polar. For example, carbon and hydrogen atoms have similar electronegativities. Hence, the electrons in a C-H bond are distributed rather equally between the two atoms. Consequently, the C-H bond is not very polar. This means that hydrocarbons, molecules built just from hydrogen and carbon atoms and having only C-C and C-H bonds, are not very polar.

+ - H O O H H H The interaction between the partial positive charge on a hydrogen atom of water and the partial negative charge of the oxygen atom on water allows an attraction between water molecules, providing an explanatory model for the structure of ice (below) and the persistence of a liquid phase of water over 100 °C (180 °F, from 32 to 212 °F) at terran sea level pressure.

Charge and partial charge are important Positive charges attract negative charges. Likewise, partial positive charges attract partial negative charges. Interaction between charges, full or partial, on molecules is a defining feature of molecular interactions, and we will return to it again and again. Water provides an example that we will encounter again and again in H H our discussion of life. Its oxygen atom has a partial negative charge; its •O O••• • •• H two hydrogen atoms have partial positive charges. The partially positively H • • •O•• charged hydrogen atom of one water molecule can interact with the parH• H H •H tially negatively charged oxygen atom of another water molecule. This •O• ••O • • interaction, called a hydrogen bond, causes two water molecules to stick H H together. This interaction offers an explanation-model for many of the common properties of water. At low temperatures, these interactions allow water molecules to assemble in an ice crystal. At higher temperatures, after the ice melts, these interactions hold a collection of water molecules together in a liquid state until its boiling point. The hydrogen bonds in water are typically 5% as strong as the O-H bond itself. Hydrogen bonding is, however, a general interaction between the partial positive charge on a hydrogen attached to any electronegative atom, including nitrogen. For example, the hydrogens on ammonia (NH3) also have partial positive charges, and these interact with the partial negative charge on the nitrogen atom. With three hydrogen bond donors and only one hydrogen bond acceptor on NH3, the interaction between NH3 molecules is weaker than H2O, and so it is a liquid at much lower temperatures than H2O (-33 °C, -27 °F, 240 Kelvin). As we shall see in Chapter 7, hydrogen bonds using the N-H hydrogen as a donor are central to the formation of the DNA double helix. Further, science fiction writers use NH3 as an alternative solvent on other worlds that are much colder than Earth; liquid NH3 is found in clouds above Jupiter. Compare, now, methane (CH4). Carbon and hydrogen have quite similar electronegativities. Therefore, the C-H bond has much less charge separation; the C-H bond is not polar. Further, the central atom of methane does not carry an unshared pair of electrons. This provides an explanation-model for the fact that methane is a liquid at much lower temperatures. Methane boils at -162 °C (-259 °F, 111 Kelvin) at one (terran) atmosphere of pressure. Methane freezes at -182.5 °C (-297 °F, 91 Kelvin). Liquid methane is therefore important in the cosmos only at very low temperatures. For example, on Titan, a moon of Saturn, methane serves the same role in the weather as water on Earth. Titan has methane snow, methane rain, methane lakes and methane vapor, all because its temperature is so low. Molecular structures anticipate the solubility of molecules

83

The partial charges in a water molecule can interact with charges, partial or full, on other molecules. When those interactions are favorable, water dissolves those molecules, which become "solutes" (water is the "solvent"). Such interactions account for the ability of water to dissolve table salt (sodium Na+ as a positive ion, chloride Cl- as a negative ion). The hydrogens in water carrying partial positive charges surround the negatively charged chloride ions, while the oxygens in water molecules carrying a partial negative charge surround positively charged sodium ions. Water also dissolves polar molecules that do not have any overall charge. Ribose, for example, has many C-O and O-H bonds. As these bonds join atoms with different electronegativities, the bonds are polar as well. The H atoms have partial positive charges; the O atoms have partial negative charges. Because the partial charges on the H and O atoms of water can interact with the partial charges on the H and O atoms of ribose, we anticipate from its structure that ribose dissolves easily in water, even though it is not an ion. This is in fact the case. Water does not easily dissolve molecules that are not polar. Hydrocarbons, non-polar molecules that contain many C-H and C-C units, are the oils and fats that you may have heated in your kitchen. Their non-polarity is the general reason why oil (petroleum oil and vegetable oil analogously) and water do not mix. For example, octane, a component of gasoline, has only carbon and hydrogen atoms, 8 and 18 respectively. Thus, the molecule has only non-polar carbon-carbon and carbon-hydrogen bonds. Octane is therefore anticipated to be non-polar, insoluble in water, but soluble in other oils. The contrast between molecules with many C-H and C-C units that do not dissolve in water and those with many O-H and N-H units that do dissolve in water is found throughout biology. The first molecules are hydrophobic (water fearing). The second are hydrophilic (water loving). More anthropomorphizing, but chemists like to do that.

H • •O •• •• ••O H

H H

+ +

+ H Cl

+H

H • •O• •  H H

• ••O • H

-

H • O• •• •• O• • H H

- •• O Na

-

• •

••O • • H

H H

Sodium chloride dissolves in water via interaction of partial charges on water molecules with negative and positive charges on its sodium (Na+) and chloride (Cl-) ions, which are very polar.

H C H

C

H

C

H

C

H

C H

• • O •• - H •• O• • H •• O• • H • •O  •• • • •O• H

+ +

ribose

+

+

Ribose, the "R" in RNA, has no charges, but plenty of partial charges in its polar O-H bonds interact with water. Ribose, like most sugars, therefore dissolves easily in H2O.

Molecular reactivity Molecules that contain only C-H and C-C single bonds are relatively unreactive. These bonds are relatively strong, and do not break easily, at least at standard temperatures. Single bonds between hydrogen and H H H H H H H H nitrogen, and between hydrogen and oxygen, are different. These C C C C H are made and broken rapidly, especially in water, with the hydrogen H C C C C coming off the molecule without any electrons. A hydrogen atom H H H H H H H H without any electrons is simply a proton, and is represented by H+. octane Chemists love to invent weird words, and we introduce two of Octane does not have any polar these here. Bonds are shared pairs of electrons. Thus, when a new bonds; just non-polar C-H and C-C bond is formed, we may ask: Where did the bonding electrons come bonds. Therefore, octane does not from? If both came from one of the atoms forming a new bond, that interact with polar water molecules atom is called a nucleophilic center, and the molecule holding that and does not dissolve in water. atom is called the nucleophile. The atom to which the bond will be

84

formed is called the electrophilic center. The molecule holding that atom is called the electrophile; the electrons in the new bond did not come from the electrophile. Reactions that make new bonds can be described by showing the movement of electron pairs associated with the bonds being made and being broken. Chemists use curved arrows to show that movement. The arrow begins with the pair of electrons that will form the new bond (on the nucleophile). The arrow is drawn to end at a position (on the structures of the reactants) where the electron pair will be after the bond is formed, somewhere between the nucleophilic center and the electrophilic center. For example, consider NH3 as the nucleophile. The nitrogen atom in NH3 has an unshared pair of electrons. This electron pair (the two red dots) is therefore available to form a new bond with an electrophile. The electrophile is a proton (H+), a hydrogen atom that has no electrons around it. A curved arrow describes the movement of electrons in the reaction of NH3 with H+ to give the ammonium ion (NH4+). It starts with the red pair of electrons on N and ends between the N and the H+, where those electrons will be (bonding N to H) when the reaction is finished. The analogous reaction can occur with H2O. Here, the oxygen atom has two unshared pairs of electrons. One of these pairs (in red) on the oxygen forms a new bond (also red) to H+. This reaction gives H3O+ (a hydronium ion). A related example illustrates the reaction that transfers a proton from a water molecule to an ammonia molecule. A curved arrow describes the transfer. A curved arrow starting with the unshared pair of electrons on nitrogen ends half-way to the hydrogen atom that is to be transferred from the water molecule. The bond holding that hydrogen to the oxygen must be broken. This process is described with curved arrows that start with the line joining the hydrogen being transferred to the oxygen, and ends on the oxygen itself. The result of the two arrows is to create ammonium hydroxide (NH4+ OH-) from NH3 and H2O. Formaldehyde (HCHO): A compound with a C=O unit The ability of hydrogen, carbon, nitrogen, and oxygen atoms to share electrons can be summarized by rules of that are so simple even a caveman can use them. Hydrogen forms one bond, oxygen forms two bonds, nitrogen forms three bonds, and carbon forms four bonds. Behind these rules are the "desires" of hydrogen atoms to have two electrons in their outer shells, and of carbon, nitrogen and oxygen atoms to have eight electrons in their outer shells. These rules can be met with only certain combinations of hydrogen, carbon, nitrogen and oxygen atoms. Consider, for example, a molecule built from two hydrogen atoms, one carbon atom,

nucleophilic center

H

H

••

H

N

ammonia

electrophilic center

H

H ammonium

H

N

H

H

A curved arrow shows movement of two electrons (red dot) that begin unshared on the N (the nucleophilic center) to form a new bond (the red line) between N and H+ (the electrophilic center), converting ammonia (H3N:) into the ammonium ion (H4N+). The curved arrow starts with the pair of electrons that will form the new bond, and ends at a point in the starting structure where that pair will be in the product structure (between the newly bonded N and H atoms). electrophilic center nucleophilic center

H

H

H

••

H

O

O

••

••

H

H

A curved arrow shows formation of a new bond between the O of a water molecule and H+ to give H3O+ (the hydronium ion). The red electrons on O are originally unshared, and provide the pair of electrons used to form the new bond, also red. Hence, the O is the nucleophilic center. The curved arrow starts with the pair of electrons that will form the new bond, and ends at a point in the starting structure where that pair will be in the product structure. H

hydroxide electrophilic center

H

nucleophilic center

H ammonia

H

O

O H

N H

H

H ammonium

N

H

H

Curved arrows show H+ transfer from H2O to NH3 to give ammonium (H4N+) hydroxide (HO-). N is the nucleophilic center; one H on H2O is the electrophilic center. The red electron pair forms the new N-H bond. This reaction models the formation of an alkaline solution when NH3 is added to H2O.

85

and one oxygen atom. Can we build a molecule that satisfies all of the rules of valence from this set of atoms, so that each of the hydrogen atoms is surrounded by two electrons and the carbon and oxygen atoms are each surrounded by eight? The answer is yes. The valence rules are met if the carbon atom shares four electrons (not two) with the bonded oxygen atom. Two of the shared electrons come from the carbon atom, while the other two come from the oxygen atom. The two lines joining the oxygen atom to the carbon atom represent a double bond, with each line in the double bond representing two shared electrons. To complete the discussion of the structure, the carbon atom then shares two electrons with each of the hydrogen atoms. Each hydrogen atom forms one bond to carbon. The oxygen atom forms two bonds, and the carbon atom forms four bonds. The resulting molecule is called formaldehyde (H2C=O, or HCHO). A complete model for the formaldehyde molecule also shows the positions of electrons not involved in covalent bonding. The oxygen atom carries two pairs of unshared electrons. This is similar to the oxygen in water and, as with water, each unshared electron is represented by a dot. Reactive centers are recognized from molecular structure Nucleophilic and electrophilic centers can be recognized by examining a molecule's structure. Atoms that are nucleophilic centers because they carry an unshared electron pair are easy to

Formaldehyde is made from 2 H atoms, each with 1 electron; 1 carbon with 4 electrons; 1 oxygen with 6 electrons

O H

C H

In the molecule, each hydrogen has 2 electrons; carbon and oxygen each have 8 electrons

O H

C H

O H

C H

+

Representation of formaldehyde with lines, each representing 2 electrons and 1 bond. Note the C=O double bond.

The structure of formaldehyde (H2CO, or HCHO) follows rules of valence: Hydrogen forms one bond, oxygen forms two bonds, and carbon forms four bonds. Thus, there is a double bond between the C and the O atoms. Also, each hydrogen thinks it has 2 electrons around it, while the C and O each think that they have 8 electrons around them. The oxygen has a partial negative charge; the carbon has a partial positive charge. Therefore, HCHO is a polar molecule, and is very soluble in water.

hydrate of spot. The nitrogen atom of an ammonia molecule and the oxygen formaldehyde formaldehyde atom of a water molecule are both nucleophilic centers. H •• H •• Electrophilic centers are frequently less easy to spot. The proton O •• O•• + (H ) is an obvious electrophilic center, as it has no electrons C C around it. But the hydrogen atom on a water molecule can also be H H H H •• O •• an electrophilic center, if the bond attaching it to its oxygen atom H O •• nucleophilic H •• H center is broken. This was exactly what happened in the reactions disH cussed previously where ammonium hydroxide was formed. Reaction of electrophilic C of formalCarbon atoms can be electrophilic centers as well when one of dehyde (HCHO) with the nucleophilic the four bonds to carbon is broken. This is especially true when oxygen of water gives the hydrate of the carbon atom is doubly bonded to an oxygen atom. For exam- HCHO. Note how the C atom loses ple, the carbon of formaldehyde (H2CO) has four bonds. That one of its two bonds to O (the red carbon atom does not seem to have any need to form a new bond one), with the red pair of electrons with anything. But breakage of the second of the two bonds be- forming a new (red) bond to H. tween carbon and oxygen can allow the carbon to form a new bond to an incoming nucleophile. This reaction turns out to be one of the most important in biology. This reaction between HCHO and H2O (as a nucleophile) can be described using a few curved arrows. The first curved arrow represents the motion of one electron pair, which moves from a position between

86

the C and the O atoms up to a position between the oxygen and a proton, to which it will form a new bond. The second curved arrow shows the movement of electrons holding one of the hydrogen atoms to the oxygen atom in water. This leaves behind a proton. The resulting product is known as the hydrate of formaldehyde (H4CO2), because it is the combination of one HCHO molecule and one H2O molecule. When electrophilic C=O and nucleophilic –OH units are in the same molecule, cyclic structures form Ribose contains a C=O unit, where the carbon atom is an electrophilic center. Ribose also contains nucleophilic oxygen atoms that are parts of its various OH groups. Therefore, one of the nucleophilic O atoms within ribose can react as a nucleophile with the electrophilic carbon of the C=O unit, also within ribose, to give a cyclic structure. This reaction is directly analogous to the hydration of formaldehyde. In fact, the reaction is so good that ribose exists nearly entirely in one of various cyclic forms. More on this in a moment.

H

O C

H H H

H

C OH HO

H

C OH

H

C

O

HO

C

H

H

H H

H O

O

H

H

H HO

OH

H ribose (open form)

ribose (cyclic form)

Formation of a cyclic form of ribose, Here, a new bond (red) is formed between the C=O carbon and the O on the carbon four atoms down the chain. Because 5- and 6-membered rings are favored, the red and green OH groups preferentially act as nucleophiles to form the cyclic structures.

Why do organic molecules from our pantry form tar? As we will see below, carbohydrates (named because they are hydrates of carbon, combinations of n atoms of carbon and n molecules of water) have fascinating reactivity in part because they carry both nucleophilic centers (the oxygens of their OH groups) and electrophilic centers (the carbons of their C=O groups). Their C=O unit is the structural feature that allows them to form tar. But before we go there, we now know enough chemistry to be able to handle a real problem in prebiotic chemistry. So let us return to the library shelf holding books with call number QH325. We focus on a book written by Robert Hazen (Genesis. The Scientific Quest for Life's Origins, 2005). Hazen is a planetary scientist who wanted to address the "tar paradox" outlined above. Hazen and his colleague Harold Morowitz, a well-known biophysicist from George Mason University, understood the paradox that had Salviati and Sagredo throwing chairs at each other at the conference. They realized that the field would move forward only if they (or someone) could find some way to naturally constrain the intrinsic propensity of organic material to form tar. And they had a hypothesis to do so. They hypothesized that tar formation might be naturally constrained by high pressure expected at the bottom of an ocean near a hydrothermal vent. Further, they envisioned other advantages for prebiotic chemistry going on in this environment. High temperatures (=energy) would be available to allow chemical reactions. Metal ions would come from the Earth's sub-surface to catalyze those reactions. This is quite a different strategy from what we discussed earlier. Hazen and Morowitz start with the natural setting, not with a synthetic organic chemical goal. So they immediately engage the paradox that obstructs progress in the field. To test their hypothesis, Hazen and Morowitz designed an experiment to mix two reactants containing carbon to create a third, more complex organic product. The first reactant was carbon dioxide (CO2). The argument for the presence of CO2 on early Earth is compelling by many lines of reasoning. CO2 is above Venus and Mars today, is abundant in the cosmos, is expelled by rocks, is found in comets, and figures prominently in any model of the atmosphere for early Earth.

87

The second reactant was more exotic. Called pyruvic acid, H it contains three carbon, three oxygen, and four hydrogen H O pyruvic acid atoms (C3H4O3). As its name implies, pyruvic acid is an acid, O H and will dissociate in water to give H+ and the pyruvate anion. C C But we will draw it in its acid form for simplicity here. O H C OH H Hazen and Morowitz hoped to obtain a product that was the O H sum of pyruvic acid and CO2. This molecule is called oxenolization of H pyruvic acid aloacetic acid, and has the molecular formula C4H4O5. Notice O that the number of atoms of each type in oxaloacetic acid O H H sums from the number in pyruvic acid plus carbon dioxide. In C C enol of O H the words of the chemist, the equation CO2 + C3H4O3 = pyruvic H C OH acid H C4H4O5 "balances". O To understand the reaction, we first write the structures for H carbon dioxide, pyruvic acid and oxaloacetic acid, using some O O H H colors to distinguish different bonds. Then, we draw a curved arrow mechanism that gets us from the two reactants to the C C desired product. Two steps are required. The first is a loss of C OH H a proton (H+) from pyruvic acid. This reaction is called an O O enolization, and creates an enol as the product. C carbon H Then, in a second step, the enol reacts as a nucleophile to dioxide O form a new carbon-carbon bond. The C=O unit in CO2 reacts H O analogously to the reaction of the C=O unit in HCHO. The carbon atom of CO2 is the electrophilic center in the reaction. H addition A carbon of the enol is the nucleophilic center. The second H O O H bond in the C=O double bond is broken during the reaction, H which is called an addition. C C oxaloacetic The reaction does not create RNA, of course, but at least it acid C OH makes a product molecule that larger than either of the startO C O H ing molecules. The difficulty arises because all chemical reO H O actions are reversible, and carbon dioxide is gas. That means that it is just as easy for oxaloacetic acid to break a carbon- The Hazen-Morowitz experiment to create carbon bond to create pyruvic acid and CO2 as it is to make oxaloacetic acid via the reaction of the enol of pyruvate (as the nucleophile) with oxaloacetic acid from pyruvic acid and CO2. Hazen and Morowitz were prepared for this problem. They carbon dioxide (the electrophile). Notice how the green bond to hydrogen in pyruvic placed the system under high pressure with lots of CO2. This acid is first broken, with the green was, of course, the environment expected deep in the ocean at electrons stored as a C=C double bond in a rift venting CO2, where pressure is also high. Cramming the enol. These green electrons then are CO2 into the system should drive the 1 + 3 = 4 reaction to used to form the new C-C bond to the carbon of carbon dioxide. make oxaloacetic acid. Hazen then writes how he mixed the reactants at high pressure in a sealed tube, waited for a time, and then opened the tube. The result was as anticipated from commonplace observations. No oxaloacetic acid was seen. Instead, what was observed was a "yellow brown oily substance". Hoping to at least find some of the desired oxaloacetic acid in the mixture, they went to their colleague George Cody, who had a gas chromatography (GC) instrument able to identify even small amounts of oxaloacetic acid. No luck. Hazen reports their disappointment:

88

"Pyruvate had reacted in our capsules, to be sure. But instead of the simple 3 + 1 = 4 reaction that Morowitz had proposed, we had produced an explosion of molecules, tens of thousands of different kinds of molecules. A bewildering array."

H HO

H O H

H

C

H

O

C C

OH

C O Another prebiotic experiment encounters tar, just as you saw in O O H your kitchen. High pressure did not solve the tar problem. OH H We already have enough chemistry to guess why the Hazenenolization to the right H Morowitz experiment failed. First, all atoms in pyruvic acid have O H O the number of electrons and bonds that they want. In particular, H O the CH3 carbon, the one that needs to react as a nucleophile, does H C C oxaloacetic H not have an electron pair available to form a new carbon-carbon acid C OH O C bond to CO2. O O H To get that electron pair, one of the C-H bonds in the CH3 OH H + group must be broken, H must be remove, and a pair of electrons enolization downwards H formerly holding that proton must be left behind, with a negative O H O charge. But carbon is not an electronegative atom; it does not like H O having a negative charge. H C C H Pyruvic acid solves this problem with its C=O group. Described C OH C O by the appropriate curved arrows, the pair of electrons that forO O H OH merly held the H+ is "stored" as a second carbon-carbon bond, the H C=C unit in the enol. At the same time, the electrons that form the Once oxaloacetic acid is formed in the second bond in the C=O unit go sit for a time on the oxygen, Hazen-Morowitz experiment, the which is electronegative and can tolerate a negatve charge. And electrons in the green bond have two when a carbon dioxide molecule comes along, the electrons can ways to go in an enolization reaction: (a) up and to the right, and (b) down flow back through the system to make a new carbon-carbon bond. and to the left. Thus, in cartoon form, But look at the product oxaloacetate. Here in the product, the this means that oxaloacetic acid will CH3 carbon has become a CH2 carbon. It is now flanked on both enolize faster as the starting pyruvate, sides by C=O units. This means that there are now two places to and will therefore not accumulate. put the electrons that hold the H+ to the carbon of oxaloacetate, making it (in this cartoon description) twice as easy to lose H+. It is easier to make a reactive nucleophilic species from oxaloacetic acid than from pyruvic acid. This means that oxaloacetic acid reacts more rapidly than it is formed from pyruvic acid. What can the enol of oxaloacetic acid react with? Briefly, any electrophile. This includes CO 2, of course, but also another molecule of pyruvic acid. While pyruvic acid can lose a proton to become a nucleophile, it also has a C=O unit which can react at its carbon as an electrophile. This leads to the general rule in chemistry. If a molecule (like pyruvic acid) can generate both nucleophilic and electrophilic centers, then it can react with others of their own kind. Molecules that can do so can form tar. In short, Hazen and Morowitz discovered a reaction very much like the reaction that you discovered when you heated Cheerios or chowder. These foodstuffs also have molecules containing C-H units that can lose H+, and other molecules with C=O units that can react as electrophiles. The details of the outcome are different, depending on what the exact components of the starting organic mixture. But the overall appearance of the outcome is the same: asphalt, tar, or road pavement.

Thermodynamic stability as a solution to the tar problem

89

So how can we find natural ways to constrain the intrinsic propensity of organic species having interesting functionality to form tar? One way to manage the tar problem is to generate a product that is the most stable arrangement of the atoms available. For example, many ways exist to arrange six carbon atoms and six hydrogen atoms in a way that satisfies valence rules. The most stable way to do so, however, H H H N is to put the six carbon atoms in a ring, have them joined by H H C C alternating double and single carbon-carbon bonds, and N C C C N C H complete the molecule by attaching each of the carbon C C C C H C H N N H atoms to exactly one hydrogen. This is the arrangement of benzene adenine H H atoms in the molecule known as benzene, which organic H H chemists call aromaticity (further illustrating their O N propensity to create weird names for things) H C H C H N C N C Benzene is a very stable molecule, and is abundant in the C C C C cosmos. Even on Titan, a moon of Saturn where a much tar O N H O N H cytosine uracil is present, a large amount of benzene is present. H H Fortunately for those believing in the RNA-first model O O for the origin of life, and seeking to get the components of H C C H N N C N RNA out of prebiotic organic molecules and energy, all C N H C H C C C H four nucleobases that store information in RNA have the C C N N N N O N type of stability that benzene shows. In the language of the guanine xanthine H H H H chemist, adenine, guanine, cytosine and uracil are all aromatic. Accordingly, these molecules may be made by pre- Molecules with 6 or 10 electrons in a ring are biotic reactions without the guidance of a human chemist especially stable ("aromatic"). The second bonds between atoms, and the unshared pair and without forming tar simply because they are relatively of electrons on N in magenta show the 6 happy arrangements of atoms, and would survive after they electrons (in benzene) and the 10 electrons (in adenine) make these molecules intrinsiare formed no matter how they are formed. This was the discovery by Juan Oro, mentioned in many cally resistant to decomposing to tar. Uracil, found in meteorites, and cytosine are two of the books at QH325 on the library shelf, and discussed other components of RNA that also have 6 previously. Once formed, adenine's aromaticity makes it electrons in a ring. Xanthine, found in metereasonably stable, especially in the absence of molecules orites, and guanine, the fourth nucleobase component of RNA that might be made from (like water) that might react with it to make something else. Uracil, another nucleobase in RNA, is similarly stable. xanthine, each have 10 electrons in a ring. Uracil is made in interstellar space and makes it to Earth via meteorite, including the Murchison meteorite that we say earlier in this chapter. Good news. Phosphate, another part of RNA, is also a stable combination of phosphorus and oxygen. More good news. The prebiotic synthesis for ribose, the "R" in RNA? So what is the problem? In a word: ribose, the "R" in RNA. Ribose turns out to be a big problem, not obvious by reading the life-is-easy books having the QH325 call number. These books often cite something called the formose process as a way of making ribose without the guiding hand of an intelligent designer. The formose process was discovered in the 1860's (that's right, 150 years ago) by the Russian chemist Aleksandr Butlerov. Butlerov incubated formaldehyde (HCHO) in hot solutions of calcium hydroxide (Ca(OH)2, pH 12.5). Initially, nothing happens. After a time, the formaldehyde is rapidly consumed. The mixture turns brown: the anticipated tar, but the brown mixture has a sweet taste and smells like toasted marshmallows. Further, if the reaction is stopped at the right time, some carbohydrates can be isolated from it. 90

ribose open form H

C

arabinose open form H

O

O

C

H

lyxose open form

O

C

H

O

C

H

C

OH

HO

C

H

H

C

OH

HO

C

H

H

C

OH

H

C

OH

HO

C

H

HO

C

H

H

C

OH

H

C

OH

H

C

OH

H

C

OH

H

C

OH

H

C

OH

H

C

OH

H

C

OH

H

HO

xylose open form

CH2H

O

H

HO

H

CH

CH

C

CH2H

OH

OH CH

C

OH

OH

H

OH H CH2 O CH CH C

C

arabinose cyclic form

OH OH CH2 O CH CH

HO OH

C

H

H

HO

ribose cyclic form

HO

O

CH

C

HO

H

C

xylose cyclic form

lyxose cyclic form H

H H

ribulose open form

H

H

OH

OH

C

C

H

H

C

O

H

C

OH

H

C

OH

HO

H

C

OH

ribulose cyclic form

H2 C

H

H O

OH

H

CH2OH

H2C

H

CH2OH

O

OH

OH OH

HO

H

xylulose cyclic form

C

H

C

O

HO

C

H

H

C

OH

H

C

OH

xylulose open form

H

There is strong evidence that formaldehyde was present on early Earth. Formaldehyde is abundant in the interstellar matrix and could come to Earth via meteorite. Formaldehyde is also made by electrical discharge, ultraviolet light, and other sources of energy on atmospheres containing methane or moist CO2. And just as adenine is the sum of five HCN molecules put together, ribose (C5H10O5) is five formaldehyde molecules put together: C5H5O5 = (HCHO)5. What could be simpler?

Figure 5.2. We might as well hit you with it. Here are carbohydrates with 5 carbons (and called pentoses or pentuloses, from the Greek penta = 5). Pentoses have a –HC=O group in their open form; pentuloses have a C=O unit flanked by two carbons All have the formula C5H10O5. All can exist in both open and cyclic forms. The different pentoses differ in how their atoms are arranged in 3-dimensional space. The C of the C=O unit is an electrophile, and can react with the O of an –OH group as a nucleophile to form the cyclic forms. In many cases, more than one cyclic form is possible. Chemistry is complex.

Unfortunately, the formose process creates quite a bit of chemical complexity, some of which was shown in Figure 5.1. The complexity arises from the same kinds of reactions, occurring over and over again, that created the tar in the Hazen-Morowitz experiment: (i) removal of a proton (H+) from a carbon next to a C=O (carbonyl) group (enolization, in the language of organic chemistry), and (ii) attack of the resulting enediolate (a nucleophile) on HCHO (an electrophile) to form a new carboncarbon bond (addition, in the language of organic chemistry). H O

H O C H R'

O H

H

H

O

C

O H

H O O

enolization

R

H

H

C R'

O

C R

Top. Enolization is the abstraction of a proton (H+) by a base (here, hydroxide) to give an enediol (which can then act as a nucleophile in an aldol addition reaction).

H O H

H O C R' R"

C O

aldol addition

C R R''' H H

O

H O

O

C C R' R R" C R''' O

H

Figure 5.3. Two reaction types create the complexity in the formose process, a part of which is shown in Fig. 5.1.

H O

Bottom. An aldol addition is the reaction of an enediol with a carbonyl compound (the electrophile). When the car-

O

91

bonyl species is formaldehyde, R" = R''' = H.

By repeating these reactions again and again, the complexity in Figure 5.1 can be generated. Eventually, the material undergoes still more reactions that are too many to capture in that figure, going on to tar. Unfortunately, because five carbon carbohydrates (including ribose) also have C=O groups, they are also unstable to further reaction, either enolization or attack by an enol, especially at high pH. The formose reaction is not a plausible way to get carbohydrates under prebiotic conditions The complexity of the formose process also underlies its unsuitability as a prebiotic source of carbohydrates. The QH325 books that cited the formose process as a way to get ribose to support the RNA world were too optimistic. In fact, the formose process does not generate any pentose or pentulose (Figure 5.2) as a major (or even significant) end product under the alkaline conditions needed to initiate and carry it out. Under those alkaline conditions, pentoses and pentuloses react further to form tar. This was recognized by the community over a decade ago. In 1995, Stanley Miller measured the rate of decomposition of carbohydrates. Ribose, for example, decomposes with a half-life of 73 minutes at 100 °C (212 °F, 373 Kelvin), even at neutral pH. Tar formation was more rapid at higher pH. Thus, we cannot get around the tar problem with ribose as we did with adenine. Ribose is not aromatic and is not the most stable end product arrangement of carbon, hydrogen, and oxygen atoms in a ratio of 1:2:1. This led Miller and his coworkers to "preclude ribose or any other sugar" from the first genetic polymers. This, in turn, was a direct denial of the RNA-first model for the origin of life, from no less a scientist than the individual who created the field of prebiotic chemistry. The syllogism is simple: RNA cannot be formed on early Earth unless ribose was present prebiotically. Ribose cannot be present prebiotically. Therefore, RNA was not formed on early Earth. A salt flat and the mountains at

Don't forget the rocks on early Earth Death Valley, a good place to The instability of ribose under conditions where it is formed is one accumulate alkaline borate reason why Joyce referred to RNA as a "prebiotic chemist's nightmare" minerals. and was reportedly despondent about research based on the RNA-first hypothesis for the origin of life. Working back in time had suggested an intensive role for RNA in life early on Earth. Was the intrinsic instability of the "R" in "RNA" now going to deprive us of the hypothesis-by-extension that RNA started life? At a conference in 2003 at Colby Sawyer College in Maine, I was confronted by Jack Szostak, a believer in the RNA-first hypothesis (at least at the time), an inventor of methods to select for catalytic RNA Borate minerals are formed by crystallization from water molecules, and a contributor to other parts of origins research. Szostak solution. Shown is colemanite, was also frustrated by the "so near and yet so far" aspect of the RNA- an alkaline calcium borate. first hypothesis, having the nucleobases and the phosphate but not the sugar in this elusive molecule. "Why don't you chemists just do some work and solve the [ribose] problem?" mused Jack.

92

We actually already had a solution to the problem, and spent the day searching Colby Sawyer to find material to demonstrate it to Jack. That solution came from my youth, where I had collected rocks as a hobby. Accordingly, I knew the basic set of minerals to complement my "animal" and "vegetable" knowledge. I had collected a fair number of them and knew how to walk across landscapes and read strata. In particular, I knew that some rocks contained boron. Best known, of course, is borax, which is mined in Death Valley. Borax is a sodium borate salt. Other boron minerals found in Death Valley include ulexite (a sodium calcium borate, sold in museum gift shops under the name "TV stone", since it is a natural fiber optic material) and colemanite, calcium borate. Borate minerals get to Death Valley because most borate salts are relatively soluble in water. Borate is excluded from many minerals forming in igneous rock. It is therefore concentrated in the residual molten rock, eventually crystallizing as tourmaline. Tourmaline is well known as a gemstone. In igneous outcroppings in the Sierra Nevada mountains, brown iron tourmaline is widespread. When it comes in contact with water, the tourmaline is eroded, the borate salts enter the water, and the salts are concentrated downstream when the water enters a place like Death Valley. The water that carries borate to Death Valley is also alkaline. One source for alkali in geology is the mineral peridotite (its gem form is peridot). This green mineral is a magnesium iron silicate, and is widespread in rocks called serpentines. When serpentines erode with water, a reaction called serpentinization occurs. This creates both reducing power (and reduced organic molecules can be formed by serpentinization) and alkalinity. For example, Lake Mono, just to the east of the Sierra Nevada mountains in California, has a pH as high as 12, like the conditions of the formose reaction.

Water-soluble borates often come from tourmalines, known as both gems and as black igneous iron "schorls".

Peridotite, a magnesium iron silicate, is a green mineral found in serpentine; and igneous basalts. It is the gem "peridot". H H O H

B

O

O H

O H

O

H

O B O H

O H

H

B

O

O H H

H H

O

O

O H

O B

O H

O H

Boron contributes three black electrons to a molecule. In boric acid (H3BO3), however, boron is surrounded by only 6 electrons. It can get 8 by adding a hydroxide anion (HO-). Now, the boron has four bonds, a negative charge, and is surrounded by four oxygen atoms.

Boron binds to and stabilizes organic compounds with adjacent hydroxyl (-OH) groups Boron offered a solution to the tar problem with ribose. Standing right before carbon in the Periodic Table, boron has only three electrons in its outer shell. This means that boron must pick up five more electrons to get the eight that it wants in its outer shell. That is a lot, and boron spends its life looking for things with electrons to bind to. Oxygen atoms in water, or oxygen atoms in –OH groups of organic molecules, offer those electrons. Each of these oxygens has two unshared pairs of electrons. Therefore, the borax in your laundry room has boron atoms surrounded by water oxygens. In such compounds, as in geology, borate is an anion whose central boron atom is bonded to four oxygen atoms. For this reason, boron likes to bond to organic molecules that have two adjacent –OH (hydroxyl) groups. These two oxygens give the boron four of the five additional electrons that it needs. Such organic molecules are known as 1,2-diols.

93

And guess what? Pentoses like ribose in its cyclic form (Figure 5.2) are 1,2-diols. Their adjacent -OH groups bind borate tightly. Still better, their cyclic borate complexes lack C=O groups necessary for carbohydrates to react to form tar. The borate complex of ribose is stable, even at high pH. Boron minerals guide the reactivity of carbohydrates But would a borate-rich environment like Death Valley actually generate ribose in a stable form? With bucks from NASA and the Templeton Foundation, Hyo-Joong Kim, Alonso Ricardo, Matthew Carrigan, and Heshan Illangkoon set out in my laboratory to explore the interaction between formaldehyde, higher carbohydrates, in the presence of boron. We found something better than we expected. HO

O C

H

C

C

H

H H glycolaldehyde

Ca(OH)2 enolize

HO

O C

C

H H

H C O

H

H

C

H H

C

O

H

C

O

H

C

OH

H

C

OH

borate

C

H

H glyceraldehyde borate inhibits enolization

OH

HO

H

CH2

OH

aldol addition

B OH

glyceraldehyde borate

H

C

H

C

OH

H

C

OH

C

HC

OH borate

HC

CH

HO

O

O B

O

H

ribose open form

ribose closed form

OH

H C

H

C

O

O

O

O C

H H

O HO

H

C

OH

C

OH

C

OH

H

Figure 5.4. In this cartoon showing the interaction between mineral borate and two prebiotic carbohydrates, glycolaldehyde and formaldehyde, the intrinsic propensity of organics to form tar is constrained in two ways. First, borate binds (weakly) the adjacent hydroxyl groups of glyceraldehyde. This inhibits the enolization of glyceraldehyde, forcing it to enter reactions as an electrophile, with the C of its C=O unit being the electrophilic center. The 2 + 3 aldol reaction between the C2 nucleophile and the C3 electrophile gives a C5 pentose (ribose is shown in its linear form), not a pentulose (ribulose or xylulose) (Fig. 5.2). The pentose cyclizes to remove the C=O unit, and complexes borate to give a stable pentose (shown is ribose) complex that no longer forms tar, even at high temperatures.

Again, when boron has four oxygens around it, the boron atom has a negative charge. Charges are important, especially to enolization reactions, which lose an H+ to leave behind a negative charge. And here is the important part. If the carbohydrate is already bound to a boron, the bound complex already has a negative charge. As like charges repel, a carbohydrate bound to boron enolizes slower than a carbohydrate not bound to boron. This can be illustrated with glyceraldehyde, the C3 carbohydrate. Glyceraldehyde is too short to form a 5-ring cyclic structure, and therefore still has a C=O group. In its open form, the 1,2-diol unit of glyceraldehyde binds borate. Once bound, the borate and its negative charge slow enolization. This means that glyceraldehyde cannot easily lose H+ and react as a nucleophile. The C=O unit of glyceraldehyde can still react as an electrophile, however. In particular, the C=O unit of the borate- glyceraldehyde complex can react with the enol of glycolaldehyde as a C2 nucleophile in a 3+2 = 5 process. That product is a pentose like ribose. The product is then stabilized by the same borate that controlled the reactivity of glyceraldehyde and guided the formation of a C5 pentose. There is no need for a graduate student to mix reagents in the correct amount, or adjust the acidity by hand, or prepare fresh solutions. There is no need for an intelligent designer to stop any prebiotic reactions before their

94

products generate tar. An igneous outcropping containing tourmalines and peridots will drive glyceraldehyde and glycolaldehyde to ribose and other 5-carbon sugars. And there they stop. Are we done? Is the science settled now that we have all three pieces of RNA? Is the problem solved? If you have come this far, you know that this is something that I will never say. Further, because my laboratory developed the idea that borate might control the intrinsic propensity of carbohydrates to form tar, and because the model fits my desire to couple mineralogy with chemistry, I have a special affinity for the idea. Since love of one's own ideas is the first step towards self-deception, the discipline required for science requires me to be especially opposed to saying "mission accomplished". When a new idea emerges, a community often splits into groups. One chooses to extend the new idea using its own expertise and insight. The other chooses to oppose the new idea. Fights break out in the literature. Chairs are thrown at conferences. Papers are rejected. Scientists become personally invested in ideas. And this slows progress. It is helpful, therefore, for a laboratory to adopt an intellectual discipline that internally challenges its own ideas as an internal adversary. Thus, the dialectic (as philosophers might call the juxtaposition of opposites) is established within a research group and, still better, within an individual research. This allows flaws to be found in a theory more rapidly than by simple advocacy, where others from the outside must become involved. Let us try this scientific method here, and try to shoot down the "borate as a solution to the prebiotic ribose" problem. Let us begin with a simple question of abundance. Boron is scarce in the comos, in part because it easily absorbs neutron to be come other elements. But some boron is made; it is just not abundant in the mass of the Earth. But then, many borate minerals easily dissolve in water. Thus, if the borate that we had on early Earth came in contact with the bulk ocean on an early Earth, it could easily have been diluted to the point where it became inefficient stabilizers of ribose. For borate to have stabilized ribose on early Earth, borate must have been present at reasonable concentrations at the places on Earth where ribose was formed. On modern Earth, a "Death Valley" (or should we call them "Life Valleys"?) is exactly the environment that is needed. But were such life valley's present on early Earth? Did early Earth even have dry land? Or was early Earth always flooded. We are not certain. Models for planetary formation currently picture the earliest surface of Earth as molten rock. Later in Earth's formation, water was delivered to the surface by the from beneath and from above via comets. Setting aside the ongoing discussion of the relative contributions of each process to the water in Earth's early history, we really do not know the inventory of water over the time in Earth's history when life might have been formed. Joseph Kirschvink, a planetary scientist at the California Institute of Technology, offered a creative "good news" solution for this possibility. Kirschvink pointed out that while dry land might not have been available on early Earth, Mars was a drier planet than Earth (according to models currently accepted by the community). Kirschvink argued that Mars always had dry land, including life valleys where borate minerals might concentrate. He therefore proposed that the ribose needed to form RNA formed on Mars. This material (or RNA made from it) may have been ejected by impact from Mars, where it landed on Earth. If Kirschvink is correct, then we are all Martians. The Phoenix lander now on the surface of Mars is making observations consistent with this possibility. It appears as if Mars has igneous rocks that (by analogy with terran rocks) could have delivered borate to salty evaporite basins. Those rocks might also have contained serpentine, which (upon weathering) would create alkaline solutions, just as in Death Valley. The 2008 Phoenix lander has found alkaline soil of ex95

actly the type needed to make ribose from formaldehyde in a Martian Life Valley. Good news. Is ribose present today on the surface of Mars? The instruments delivered by the Phoenix were not designed to tell. Borate shuts down paths to ribose as well as paths from ribose But we are not yet finished with establishing the dialectic internally. While borate solves a problem by preventing carbohydrates of a certain size from reacting to form further tar, borate creates a new problem for those attempting to make pentoses like ribose prebiotically. Let us return to the reaction scheme (Figure 5.4) where one molecule of glycolaldehyde reacts with one molecule of formaldehyde to give one molecule of glyceraldehyde in a 2+1 = 3 reaction. The glyceraldehyde then reacts with one molecule of glycolaldehyde to give one molecule of pentose in a 3+2 = 5 reaction. That scheme proposes a model where borate "supervises" the net process overall, a 2+1+1+1 = 5 outcome. This is fine if the environment has plenty of glycolaldehyde in excess over formaldehyde, under the argument: Formaldehyde was present on early Earth. Borate was present on early Earth. Glycolaldehyde was present on early Earth. One molecule of glycolaldehyde reacts with three molecules of formaldehyde to give one molecule of pentose in the presence of borate, which then reacts no further. Therefore, ribose was present on early Earth. But suppose we start by dialectically denying some of the premises. The first and second hypotheses are not easily denied, given current community-accepted models for planetary atmospheres). But the third is not so certain. For sure, glycolaldehyde is abundant in interstellar nebulas. If we assume (as an auxiliary hypothesis) that organic molecules present in the interstellar nebulas were also present on early Earth, no problem. But what if Earth went through a stage where it was a ball of molten rock after the time that the nebula that formed it condensed but before the time that water arrived. Any glycolaldehyde available from the start would be toast (which is worse than tar). The only glycolaldehyde that would be available when organics actually had a chance to form would be what came later, by meteorite, in relatively small amounts.

96

For those who believe in the Butlerov process from the 1860's, there is no problem. Under one model, large amounts of formaldehyde are the only organic ingredient that Butlerov needed to get formose. The difficulty is that to get formose, the process needs to have the complexity shown in Figure 5.1, cycles after cycles of reaction that involved intermediate carbohydrates, their enolization, and their aldol addition. The problem: Borate inhibits those cycles by binding to the 1,2-diols that are everywhere in this process. This means that in a 2+1+1+1 = 5 process, one HCHO reacts as electrophile at gets only one ribose molecule out for every glyconucleophilic sites laldehyde molecule in. And if the only glycolaldeOH H H OH H H hyde present is what comes from meteorites after C C OH C C OH O C C HO C C Earth has cooled, this limits severely the amount of H H B O H H OH ribose that was available on a prebiotic Earth. Bad borate inhibits loss of loss of enolization enolization on news. red H red H to the left left side And so Hyo-Joong and I trudged back to the O H H O H H laboratory to try some new experiments. There is C C OH C OH C nothing quite like doing some experiments to make O C C HO C C discoveries. Once concerned the C4 carbohydrate B O H H H HO H H H product, called erythrulose, which comes from the loss of loss of enolization blue H reaction between formaldehyde and glyceraldehyde blue H to the right in a 3 + 1 = 4 reaction. Erythrulose, like OH H H OH H H glyceraldehyde, is too small to form a cyclic speC OH C C C OH C C O HO C C cies with a five- or six-ring. Unlike glyceraldehyde, B O H H HO H H however, erythrulose has C-H units on both sides of its C=O unit. Therefore, erythrulose can enolize in HCHO reacts as electrophile at two different directions, towards its "CH2-OH" nucleophilic sites OH side, or to its CHOH-CH2OH side. More O H H H H H H HO CH2 complexity. More bad news. C HO

C C

HO H

C C

C

OH

HO

C

O

C

H OH

C

HO H

H

cyclizes, binds borate, is stabilized

cannot cyclize, reacts further

H

H OH C

C HO

H

O

H2C

H OH

C

C OH OH

H2C

C

O

H OH C

H

C

C OH

OH borate

borate

Erythrulose, the four carbon species in this picture, can enolize in two ways, giving multiple nucleophilic sites. But if borate binds to the diol, it guides enolization to the right. Then, once formed, the enol can react to form either a linear species (bottom left), which cannot cyclize, or two branched species that differ in the arrangements of groups in space, which can cyclize, bind borate, and be stabilized. Again, borate moderates the reactivity of carbohydrates that would otherwise form tar.

97

HH C

HO

cannot enolize

C

C

HO

HH

addition

O C

OH

H2O2 Fe++

HO

C HH

C

C

OH

C

HH

OH

HH

H

C

C

HO H HO H

C

OH

C

C

branched pentoses C5b

OH

C

addition

enolization

C

C

C

H H HO H H HO

C

C

C C

HO

addition 3+2 addition H H HO H

O C H

HO

C

C

C

O

H

enolization

C

C2e

H HO

C

C

C C

HO H HO H

xylose

lyxose

addition

ribose

arabinose

H

borate controls diastereoselectivity HO H

pentoses, borate stabilizes

HO H O HO

C

C

C

C

C

H H H OH H H xylulose

OH

HO

addition

C

C

C

HO

C

O

C C

C

H

H H HO H

HO H O C

H

C

HO H HO H

equilibrate in borate

C

O

HO H HO H

3+2 addition

C

Glycolaldehyde C2a

HO H HO H

Glycerol

H HO

H

H H HO H

O

C

C

C H

Glyceraldehyde C3a

C3e

OH

HCHO

OH

C HH

H

C

cannot enolize

HH

retroaldol fragmentation slowed by borate complexation H OH

C

H

HO H

C OH H H HO C O C C HO C erythro H HO H

H

C4e

O

C

C OH H H HO threo O C C C HO C

HCHO

OH

HO H

enolization

OH H H

HO

C

HH

O

HH

addition

HO C

xylulose + ribulose 9:1 (below)

OH

HCHO

C5l

OH

C

C

C

C

Erythrulose C4k

H C

C

O

H H HO H HO

HO

OH C

H H enolization

HH

C

HO H

HCHO

addition

Dihydroxyacetone C

C

HH

HH OH

HCHO

HO

O

C

OH H H

C

O

C

C

HH

borate inhibits

HOCH2 OH H

OH

C

OH

threose, borate stabilizes

H H H OH H H ribulose

pentuloses, borate stabilizes

Figure 5.5. A borate-constrained abiotic "metabolism" that forms pentoses (including ribose) and pentuloses, and proceeds in clockwise cycles "fixing" formaldehyde (HCHO). Compounds in green are expected prebiotically from meteorites, electrical discharge, photochemistry, minerals, or the interstellar nebula. Species are labeled based on the number of carbon atoms they contain (e.g., C2 has two carbon atoms), and whether they are aldehydes (a), enediols (e), ketones (k), branched (b) or linear (l). The red bond in the C5b species can break in a fragmentation process that is the reverse of the addition of C3e and C2a.

Once again, borate helps. On its CHOH-CH2OH "left" side, erythrulose has a 1,2-diol unit. This can bind borate. Once bound, borate inhibits the enolization of erythrulose to the left side. Borate does not bind to the right side of erythrulose, however. Therefore, borate should direct enolization to the CH2-OH side of erythrulose. Less complexity. Good news. The enol arising from the borate-controlled enolization of erythrulose to the right side has two nucleophilic centers. Therefore, it can react with formaldehyde in two different 4 + 1 = 5 reactions. The first nucleophilic center is at the end of the carbon chain (1C, in the figure). If the formaldehyde reacts there, a linear C5 species is formed, with its C=O unit in the middle of the five-carbon chain. Further, with r C=O in the middle of the chain, this species cannot cyclize to form a ring. Therefore, it will itself enolize and react further to give (as we will see in a moment) two pentuloses, ribulose and xylulose. Which bind borate only slowly. More good news. The second nucleophilic center is the next carbon in on the carbon chain. If the formaldehyde reacts there, two branched C5 species are formed. These differ in how their atoms are arranged in three dimensions, but never mind. Each has a C=O unit at the end of a chain. Therefore, they can each form cyclic species by having that C=O unit react as an electrophile with an -OH unit reacting as a nnucleophile to form a five-membered ring. The ring species present adjacent OH groups that bind to boron tightly, in a

98

complex that no longer has any C=O unit. Further, the branched carbohydrates do not have a C-H unit adjacent to their C=O units, even in the open form. Therefore, the branched carbohydrates cannot enolize. Last, in the laboratory, we discovered that if formaldehyde is present in excess over glycolaldehyde, a condition likely on early Earth, these branched species are formed in this borate-guided sequence of reaction in borate-stabilized form. More good news. But what is the long term fate of the borate-stabilized branched C5 carbohydrates. Remarkably, they can do only one thing: fragment. This fragmentation, known to chemists as a retroaldol reaction, breaks the C5 species into a C2 species (glycolaldehyde) and a C3 species, the enol of glyceraldehyde. These areo, of course, the very two species that react in the presence of borate to form ribose and other pentoses. A metabolism without life This is a metabolic cycle, but without life. Work with me here to see the magic of minerals combined with organics. Think of an early Earth with an excess of HCHO being continuously generated in a primitive CO2 atmosphere. What happens if some glycolaldehyde (C2a) comes to Earth in small amounts via a comet? Well, it will enolize to give the enol C2e. This reacts as a nucleophile with HCHO (C1) acting as an electrophile to form glyceraldehyde (C3a) in a 2 + 1 = 3 reaction. This C3a product enolizes to form another enediol (C3e). The enolization may be a bit slow in the presence of borate, since borate binds to the 2-diol of glyceraldehyde. But since this 1,2-diol is in an open chain, not in a cyclic structure that presents two OH units directly to borate, borate binds weakly, and enolization proceeds. Now, HCHO adds to C3e to give erythrulose (C4k) in a 3 + 1 = 4 reaction. Erythrulose cannot cyclize to give a five-membered ring (no OH group is correctly placed), so it enolizes under the direction of borate to give the 1,2-enediol (C4e), which fixes a third HCHO in 4 + 1 = 5 reactions to give either the linear or branched C5 species at the top (C5l) and top-right (C5b, various stereoisomers) by reaction at the less hindered or more hindered enediol carbon of C4e. The branched C5b species do not carry enolizable hydrogens. Therefore, they cannot react as nucleophiles. They can, however, form cyclic structures that bind borate and accumulate. These C5b species can, however, undergo a fragmentation to generate C2a and C3e (5 = 2 + 3). This fragmentation is the reverse of an addition reaction between C2a and C3e, both of which can proceed around the cycle again in the presence of HCHO, or collapse to form five-carbon sugars. Finally, a reason to learn organic chemistry We can push the dialectic farther, and consider some more potential bad news. The presence of glycolaldehyde on early Earth is conjectural. As noted above, glycolaldehyde is seen in interstellar gas clouds. But it has not been reported in meteorites, and its reactivity in the absence of borate is such that one does not expect to see much of it in meteorites. Meteorites do, however, contain glycerol (C3H8O3) in large amounts, the three-carbon species at the left of Figure 5.5. Glycerol lacks a C=O and is therefore quite stable. This means that glycerol was almost certainly present on early Earth. To allow this three-carbon species to support the cyclic prebiotic metabolism to fix HCHO, it must first be oxidized to create a species that has a C=O unit. This turns out to be easily done. One of the other compounds that arises when lightning shoots through a stormy atmosphere on early Earth is hydrogen peroxide, H2O2. Hydrogen peroxide, which may be found in your bathroom as a mouth rinse, is an oxidant. Divalent iron (that is, Fe++) helps in that oxidation. In the presence of Fe++, hydrogen peroxide will oxidize to give dihydroxyacetone, the brown compound at the left of Figure 5.5. Dihydroxyacetone, unlike glycerol, has a C=O unit. Therefore, dihydroxyacetone enolizes to give C3e. Remarkably, the C3e enol of dihydroxyacetone is exactly the same as the C3e enol formed from glycer-

99

aldehyde. Therefore, C3e reacts with formaldehyde in exactly the same way, and is processed around the pathway to give C5 species. Ribose is no longer precluded from the first genetic molecules on Earth We set out to resolve the paradox that was at the frontier obstructing the development of the RNA-first hypothesis. We needed to address the paradox: Why did carbohydrates not form tar on early Earth? We can now say that if an early Earth had some dry land, some water, some igneous rocks, and some atmospheric CO2, it would be difficult not to generate C5 pentoses here, including ribose. C5 carbohydrates were the very molecules that Stanley Miller had asserted could not have accumulated on early Earth because of their intrinsic instability. Without ribose, a simple syllogism excluded the RNAfirst model for the origin of life on Earth: RNA cannot be formed on early Earth unless ribose was present prebiotically. Ribose cannot have been present prebiotically on early Earth. Therefore, RNA was not formed on early Earth. Now, we can deny the second premise of the syllogism: Ribose could have been present prebiotically as its borate complex on early Earth. Therefore, we cannot exclude the RNA-first model for the origin of life on Earth in the way that Miller did. To exclude the RNA-first models, we must find problems in subsequent steps in the prebiotic assembly of RNA, those that occur after ribose-borate. The next problem: Water is corrosive Of course, just as Galileo's rolling balls did not establish that the Earth did revolve around the Sun, these experiment do not show that life on Earth originated in the form of RNA. Unfortunately (or fortunately, if one wishes to make a career in this field), huge steps separate ribose-borate, nucleobases, and phosphate from RNA itself. Some of these smack of paradox as strong as the tar problem. O

O

P

O HO

-

O

H CH C

CH

O

O

H CH C HO

HC N

O N

N

C N

CH HC

CH C

C H

C

HC N

N H

O

O

O C H2 H C N

H CH C O

HC CH

N N H

H

O

O

P

HO CH

C

-

O

P

O

C H2

O

O

C

O

O

C H2 H C

N H

HO

O

H CH C CH

P

C H2

O

HC CH C

N

CH N

C

O

C

N C

H

C

N

N

H

H

O

O

Figure 5.6. The red bonds in RNA are each unstable in water. Each of these bonds represents a problem for prebiotic synthesis of RNA in water, even after the building blocks are in hand, since the synthesis of these bonds requires the loss of water. Further, even if the RNA could be made, the red bonds would break in water. In modern life, damage done by water to RNA and DNA is repaired. Such repair systems were presumably not present prebiotically. Another paradox.

We have already mentioned one issue discussed by both Cairns-Smith and Shapiro: The water problem. In water, adenine, guanine, and cytosine all eventually lose their NH2 units, the phosphate backbone of RNA hydrolyzes, and the nucleobases will fall off of ribose. Did life originate in a solvent other than water? Here, the scientific method requires some radical thoughts, which do not lend themselves easily to funding. Fortunately, the Templeton Foundation was interested in radical thoughts and, in the spring of 2005, became interested in water. They provided us with the bucks to seek solvents other than water to support an RNA-first model for the origin of life. The alternative must be a polar molecule able to dissolve RNA and available prebiotically. There are not many choices. One choice is the solvent formamide (HCONH2). At modern terran atmospheric pressure formamide is a liquid over a wider range than water (-20 to 220 °C, -4 to 528 °F, which is 253 to 492 Kelvin). Further, formamide is very polar; dissolving most things that water dissolves, including RNA and common salts,.

100

Further, formamide does not dissolve many things that water does not dissolve, including oils like octane. Membranes form in formamide from oily things. The best news is that formamide does not corrode RNA. On the contrary, it allows the "red bonds" in RNA to form. This was shown nearly 30 years ago by Allen Schoffstall, who mixed phosphate with nucleosides in formamide gives nucleoside phosphates. Daniel Hutter, working in my laboratory, did further work. First, he established a temperature-humidity correlation that defined the conditions over which nucleoside phosphates could be assembled from nucleosides and inorganic phosphate at temperatures above 65 °C. Interestingly, formamide is also a precursor of nucleobases such as adenine, something known for 30 years as well. More recently, Saladino, Di Mauro, and their colleagues have been developing the possibility that formamide might have been the source of prebiotic adenine. But did any of this actually happen historically? The community is again divided, especially with respect to our view that formamide could have been present in substantial amounts on early Earth. Many in the community view this as being crackpot. We return to applying the liberal scientific method when considering crazy hypotheses. What in our common knowledge must be wrong should formamide have been the solvent where the first libraries of RNA assembled on Earth? Formamide has been detected in interstellar nebulas. Further, formamide is spontaneously formed from water and hydrogen cyanide. With excess water, formamide spontaneously hydrolyzes to give the salt, ammonium formate. However, at temperatures about 65 °C, formamide-water mixtures spontaneously dry to give formamide having very little water. On the other hand, present models for early Earth suggest that the ratio of water to hydrogen cyanide is substantially larger than 1:1. Therefore, direct delivery of (for example) cometary HCN would be associated with the delivery of excess water, which would be (again, according to current models) supplemented by water coming from the Earth itself. Thus, if formamide were to have been the prebiotically relevant solvent on Earth, Earth could not have been flooded. Earth must have had dry areas (like Death Valley), not only for the borate mineralogy to constrain the intrinsic propensity of organic materials, but also to allow formamide to be sufficiently dry so as to solve the water problem. And that is the way it is Walter Cronkheit, a newscaster from my childhood, used to end his nightly broadcast with this signature sign-off. We repeat it here, even as we note that we have failed by either the backwards-in-time approach of Chapter 4 or the forwards-in-time approach here to get to the simplest, essential, form of life. That life form has been constrained to reside in a box much smaller than the box that held it when we began. We have developed scientific methods that offer hope for future progress. And we still have two more wedges in Figure 3.1 to help us constrain that box further. Onwards.

101

Chapter 6

Exploration to Expand Our View of Life In the past two chapters, we have gone as far as we could to explore life as a universal based on what we know about the present. In Chapter 4, we described an approach, based on natural history, that began with living species that we know today and worked backwards in time. This approach achieved some success, in that a community of scientists has emerged that is now doing normal science with the methods that have been developed. In Chapter 5, we described an approach, based on chemistry, that begins with universal organc species and works forwards in time. Even though the communities involved do not share standardsof-proof or criteria for evaluating relevance, progress of a sort can be claimed. Paradoxes have been identified that are central to origins. Once focused on those paradoxes, exotic solvents and less exotic minerals have allowed progress to be made towards partial solutions under an "RNA first" hypothesis. Unfortunately, although these approaches have placed new constraints on our view of life as a universal, these constraints are not very tight. Because of the realities of natural history, it is difficult to trangulate from today's biosphere back in time to a form of life that is simple enough to capture life's essentials. Known life on Earth evidently all diverged from a rather advanced common ancestor, an ancestor that already carried baggage from perhaps a billion years of historical accidents and contingency of history of life on Earth. Even if we could complete a model for this last common ancestor, it baggage would obscure the "essence" of truly primitive life. Further, even though the forwards-from-chemistry approach offers some proposals for natural ways to constrain the intrinsic propensity of organic molecules to form tar and fall apart in water, we still do not have a convincing narrative to get from prebiotic organic molecules to RNA. Further, even if we find ways to get pools of RNA on early Earth, we still cannot estimate the likelihood that those pools would have contained some RNA molecules that would ignite Darwinian evolution. As a consequence, the culture still lacks constructive belief in the possibility of the RNA-first hypothesis for the origin of life. Therefore, two essential ingredients for success in science (funding and enthusiasm) are not in hand. We need some new ideas This is no reason for despondancy. Every science worthy of the name has had similar issues at some point in its history. Nevertheless, one thought comes to mind as one reads Chapters 4 and 5: It would be really nice to have some new ideas. Previous chapters have already provided examples (if examples are needed) of the value of new ideas, even as we acknowledge our human propensity to reject these. The idea of resurrecting ancient proteins brought experimental methods to bear on historical hypotheses, something that many had thought was impossible. The idea that borate minerals might stabilize ribose as it is synthesized under prebiotic conditions resurected serious thinking about the RNA-first hypothesis. The idea of formamide as a

101

solvent to manage the intrinsic destructive power of water mitigates some of the problems associated with water. But the prescription ("Get a new idea") is more easily written than filled. The human brain does not easily create new ideas. This is undoubtedly a consequence of our Darwinian history. Too much imagination is dangerous. New ideas cause us to try things not known to be safe and effective. Darwin Awards are as likely as Nobel Prizes. For this reason, new ideas often come via discovery. Discovery is the constructive observation of something that has not been constructively observed before. Discoveries can, of course, come by accident, after something from the outside intrudes into our daily routine. For example, if alien life pops from the pavement and starts using ray guns to deprive us of our Darwinian capabilities (as in War of the Worlds), or if Jodie Foster starts hearing prime numbers broadcast from Vega on a radiotelescope (as she did when playing Eleanor Arroway in the movie Contact), one positive from the experience would be the discovery (at last) of an answer to the question: Does intelligent alien life exist? Travel as a recipe for discovery Outside intrusions are rare, however, especially of the alien kind. This makes the pace of discovery arising from them correspondingly slow. Suppose we wanted to increase that pace, increasing the number of conceptual jolts that drive our science? How would we do it? Nothing creates jolts like the act of leaving home. Indeed, leaving home created much of today's view of life on Earth. Just as Galileo was developing the telescope, Europeans were beginning a historic episode of travel and exploration. Travel to Africa, Australia, and the Americas dramatically increased the number of living species known to those who did biological classification. This delivered a huge conceptual jolt to biology. In the medieval barnyard with just a few species of cloven-hoofed animals (oxen, sheep, and pigs), classification schemes were hardly necessary. They became necessary only when the number of cloven-hoofed species increased, now to over 200. It is therefore no accident that the "animal-vegetable-mineral" scheme of Linnaeus came after Columbus. Exploration also provided the jolts that drove Charles Darwin to develop his theory of evolution. Nearly every middle school student knows of Darwin's voyage in the sailing ship Beagle to the Galapagos Islands off the west coast of South America. There, isolated micro-environments made perspicuous the gradual changes in physiology that characterized his mechanism for evolution. This created Darwin's "Aha!": Natural selection superimposed upon small, un-directed variation drives the emergence of new species. Discoveries on Earth made during this period also influenced classical

The

author

in

the

Galapagos Islands behind a swimming iguana, not known

elsewhere.

discovery

expanded

This the

concept of "iguana-ness" on the Beagle, and in the

102

crew in the movie Master

and Commander.

views of natural history. For example, Noah's ark was recorded in the Hebrew Bible as being 450 feet (150 meters) long, 75 feet wide, and 45 feet tall. This was more than enough to hold seven pairs of every clean animal species (those with cloven hooves who chew their cud) and one pair of every unclean animal species known to the Hebrew fathers, with space left over for Noah and his family. However, as exploration discovered new ruminants by the dozen and new unclean species by the tens of thousands, and added still more fossil species (which, according to certain interpretations of Noah's story, were killed by the big flood), it became clear to Enlightenment naturalists that Noah's ark was not big enough to have held them all. This gave rise to what, in the 17th century, might have been (but was not) called "creation science". People attempted to apply the scientific method (as it was then emerging) to the problem: How did all of these animals fit on the ark? Where did Noah keep the fodder for 40 days and 40 nights? How did Noah prevent the carnivores from eating the herbivores? And what did they do with the poop? Exploration of the Earth coupled to scientific method and simple mathematics drove the conclusion: Maybe there was a flood and maybe there was an ark, but it could not have been exactly as described in Genesis 6. The exploration of the terran biosphere is not at an end. As we shall discuss in Chapter 8, Earth may hold an entirely new form of life awaiting discovery if we only knew how to look for it in the right way. Further, NASA, the European Space Agency (ESA), and others are opening extraterran environments to exploration. The rocket ship provides humankind the opportunity to use a radically new type of exploration to understand the nature of life in the 21st century, just as sailing ships and navigation tools did in the 16th and 17th centuries. Community consensus on the search for alien life We have emphasized in previous chapters how the constructive beliefs of a scientific community determine which research is done and how it is done. Even with their common focus on a "single" theme (life as a universal), Chapters 4 and 5 have presented very different kinds of disagreements on scientific method and standards-of-proof. In this chapter, we will encounter several more. Here, the community lacks standards that would lead to an agreement over what observations would force the conclusion that we have discovered alien life, making exobiology no longer a science without a subject matter. While Gallup does not seem to have done the survey, there is little doubt that a poll of biologists generally would find that most agree that the discovery of alien life would revolutionize our understanding of life as a universal. Indeed, I doubt that anything could jolt our view of biology more than dissecting an alien who does not share common ancestry with life on Earth. This is undoubtedly why aliens in the movies are either trying to shoot biologists or trying to hide from biologists. Being dissected is painful. This consensus, as well as the fascination of the public with the possibility of alien life, drives NASA, ESA, and others to launch missions to other planets. These missions carry instruments designed, if not to detect life itself, at least to detect habitable environments, those that could hold life. But here constructive community belief loses consensus. Many believe it unlikely that we will find alien life on any location that NASA can soon access. Further, the community has no accepted scientific method to tell us how to look for life. Unless the aliens are intelligent enough to talk to us, shoot at us, or plaster themselves to our backs to make puppets of us, the community has no

103

standard to judge whether or not alien life has been or is being observed. Thus, the problem that exobiology must address in this chapter comes not only because it is a science without a subject matter. More to the point, it is a field without a method saying how to get a subject matter. Mars, Martian meteorites and the scientific method A map of Mars drawn in 1877 by Giovanni Nothing illustrates this better than an actual example where Schiaparelli showing "canals" on Mars. alien life has been sought. And no example is more relevant What would you think if you saw this about the potential for Martian life? here than the search for life on Mars. Arguably, the search for life on Mars began in 1877 when Giovanni Schiaparelli, an astronomer at the Milan Observatory in Italy, announced his telescopic observation of "canali" on Mars. Although the best English translation of "canali" might have been "channels", the word was translated as "canals." With the opening of the Suez Canal just 20 years earlier and the Panama Canal not far in the future, the implication-by-analogy was clear: Schiaparelli had found evidence of intelligent life on Mars that included engineers. Thoughts about life on Mars then became interconnected with the work of Percival Lowell. Lowell, an astronomer "of means" (again, funding is important in science), understood that telescopes were better placed on dark mountains far from large cities, away from light pollution. He therefore built his telescope in Flagstaff, Arizona, and turned it towards Mars. There, he reported an intricate system of Martian canals, changing colors consistent with vegetation appearing and disappearing during Martian seasons, and other indications of Martian life. Today, we ascribe all of Lowell's observations to a combination of optical illusion, the intrinsic propensity of the human eye to connect dots, and the intrinsic propensity of humans to see what they want to see. There are no canals on Mars. This was shown clearly by the Mariner mission, which flew past Mars in the 1960's without landing. Observations by Mariner also suggested to the community a stronger inference: There could not possibly be canals holding liquid water on the surface of Mars. How is it is possible for a mission that did not land on Mars to speak authoritatively about the presence of something like water on the Martian surface? Here, the scientific method applied knowledge about the properties of water on Earth and the assumption that those properties are universal (in the correct sense of the term). Mariner provides an excellent teaching example of this combination. First, experimentalists on Earth observe that water boils at temperatures lower than 100 °C (212 °F, 373 Kelvin) when the atmospheric pressure is lower than that at terran sea level. If your kitchen is in Miami, the pressure is one atmosphere (in international scientific units, this is ~100,000 pascals). At that pressure, water boils at 100 °C (212 °F, 373 Kelvin). If your kitchen is in Denver, however, one mile above sea level, the pressure is lower (about 84,000 pascals) and water boils at 95 °C (202 °F, 368 Kelvin). At the top of Mt. Everest, the pressure is still lower (only 26,000 pascals) and water boils at only 69 °C (156 °F, 342 Kelvin). In fact, if the pressure is low enough, ice "boils" before it melts in a process called sublimation. Below the triple point of water (611 pascals), water cannot exist as a liquid. Water ice moves directly to water vapor without going through an intermediate liquid state. In your kitchen, sublimation of ice is used in self-defrosting freezers. If the freezer pulls a little vacuum after you close the freezer door, whatever ice formed from humidity outside will vaporize and the water vapor will be pumped away. Without there being any liquid water in between.

104

The Mariner flyby measured the atmospheric pressure at the Martian surface to be only 600 pascals, about 7% that of Earth. This pressure is much lower than the pressure at the top of Everest, and just below the pressure at the triple point of water. This implies that if you put a pan of pure liquid water on the surface of Mars, it will start to boil. The vaporizing water will carry away heat, lowering the temperature of the water that remains until the water freezes into ice. Then, the ice will sublime, lowering the temperature further until an equilibrium is achieved between cold ice and the water vapor in the air above the ice. The implication of this to the community was that pure liquid water cannot exist on the surface of Mars. Not in canals (if there were canals). Not in lakes. Not anywhere. As Mariner's observations were interpreted, water is either ice or vapor at the surface of Mars. The Viking 1976 exploration of Mars Just as the image of moons circling Jupiter was more persuasive to laypeople than the mathematical elegance of Newton's mechanics, the cultural tradition that placed canals on Mars, amplified by H. G. Wells War of the Worlds, Ray Bradbury's Martian Chronicles, and countless Hollywood movies undoubtedly influenced biologists who designed the mission to Mars that followed Mariner. That mission was called Viking. Unlike Mariner, Viking did land on the surface of Mars at two locations in 1976; two landers were sent with identical payloads. And the Viking payloads carO ried instruments designed to detect life on Mars. C H OH Those who designed the Viking life-detection instruments were fully formic acid O aware of the data from the Mariner flyby that drove the conclusion that O liquid water was not possible on the Martian surface. And yet, anthroH C OH H C pologists studying the scientists as they designed the Viking payload OH H2N H would be forced to infer that this conclusion was not entirely construcHO H glycine tive. As evidence, many of the experiments designed to detect life on glycolic acid Mars that lived in liquid water, the very substance that was supposed to O O be impossible on the Martian surface. The experiments were pressurized H3C C H3C C OH OH to prevent the water from boiling away as they were run on Mars. H OH HO H Let us look at the four life-detection experiments originally planned D-lactic acid L-lactic acid for the Viking lander. One dipped some Martian soil into a large O O amount of water containing nutrients. If life were present in the soil H C C H C C 3 3 sample, then the solution should become turbid. OH OH A second added Martian soil to an aqueous solution of seven organic H NH2 H2N H compounds that contained carbon-14, a radioactive isotope of carbon. L-alanine D-alanine These included lactic acid, the very organic compound that was isolated by Scheele 200 years earlier. Life in the Martian soil was expected to These compounds, labeled oxidize these compounds to give off radioactive carbon dioxide. A third tool also placed a sample of Martian soil into a nutrient broth, with radioactive carbon-14, but with a different detection scheme. If life were present and capable were fed to the Martian soil of doing photosynthesis, molecular oxygen should evolve, and this in water under pressure. Dcould be detected upon exposure of the mixture to Martian light. and L-lactic acid differ only in A fourth tool presented a mixture of radioactive carbon dioxide and carbon monoxide to the dry Martian surface and sunlight (no water). the arrangement of their Any life on the surface might be able to convert these gasses, both pre- atoms in 3-D; the same sent naturally in the Martian atmosphere, into organic compounds that distinguishes D- and Lwere characteristic of (terran) life. alanine, an amino acid. Life

on Earth would oxidize these to

generate

radioactive

105 carbon dioxide. The Viking landers

looked

for

the

exhalation of this radioactive

The Viking designers thought that metabolism was key to life The Viking scientists did not clearly state what their definition-theory of life was. But as anthropologists, we can infer their constructive definition-theory by observing the tests that they built to detect life. Evidently, the Viking designers placed "metabolism" high in their list of criteria for life. Why? Because all of their life-detection tests essentially tested for metabolism. The first experiment is a classical test for bacteria on Earth, one that you may have inadvertently run in your kitchen. If you leave a glass of clear fruit juice out on your counter for a week, it will become turbid. Why? Because microorganisms drop from the air into the juice and grow, metabolizing the nutrients in the juice to create bacterial offspring. These offspring make the juice cloudy. The radiolabel release experiment also reflects metabolism known in terran life. If you swallowed the seven radioactive organic molecules in the second life-detection tool and then exhaled into the Viking detector, that detector would conclude that you are alive. You would breathe in oxygen, your metabolism would use that oxygen to oxidize the radioactive organic compounds to form radioactive carbon dioxide, and the detector would detect the radioactive carbon dioxide that you exhaled. The third and fourth tests for life were reminiscent of terran photosynthesis. On Earth if you sprinkle water and nutrients on soil and expose that soil to sunlight, photosynthetic organisms in the soil will emit oxygen. If you expose a terran plant to radioactive carbon dioxide and give it sunlight, the plant will fix the CO2 into organic compounds in the plant, which will become radioactive. Eventually, only the last three experiments flew. Why? First, the spacecraft was found to have room for only three experiments. In choosing which to send among the four that had been designed, the scarcity of liquid water on the Martian surface became a constructive consideration. The first three experiments all involved the interaction of liquid water with the Martian surface, so all of them were problematic. The first experiment, however, required more liquid water than the second or third. Therefore, the first test lost out in a Darwinian-analogous fight for space resources. The Viking life-detection tests all gave positive results So what happened on the surface of Mars when the three life-detection experiments were landed in two locations by Viking? Without dwelling on the details, all of the life-detection experiments gave positive results. Radiolabeled carbon dioxide was released when the seven radiolabeled organic compounds were added in water to pressurized Martian soil. This result was analogous to the result that we would obtain if you personally ate these organic compounds. Release was not observed after the Martian soil was heated, as expected if the heat had killed something living in the soil. Likewise, oxygen was released when water was added to Martian soil, just as if the soil contained photosynthetic organisms. Radioactive carbon was fixed on the Martian soil when exposed to light and radiolabeled carbon mono- and dioxide, also consistent with the presence of photosynthetic organisms. The results could have been more definitive. It would have been nice if more radioactive carbon had been fixed from the atmosphere, not the small amounts that were observed. The pattern of oxygen release was a bit perplexing upon heating and cooling. But Gilbert Levin, who designed the label release instrument, still holds that his test detected life on Mars. And why not? The results were positive. Nevertheless scientists concluded that Mars had no life As good scientists, what should we conclude from the Viking tests for life on the Martian surface? Here is one argument: If a system holds life, then it transforms carbon-containing molecules. The surface of Mars transforms carbon-containing molecules. Therefore, the surface of Mars is a system that holds life.

106

This is, of course, not a valid syllogism. Those trained in logic see this immediately. Those not trained in logic can see this by drawing a Venn diagram. Here, the blue circle that represents within it all living systems sits inside of the red circle, which represents all systems that transform organic compounds. Non-living processes exist that interconvert carbon-containing molecules. These might be present on the Martian surface. And these might account for the Viking observations of "metabolism". But why did the scientists set aside the very results that they had (by design) intended to interpret as positive signs for life? As soon as positive signs for life came in from Mars, why did the community instead seek non-biological mechanisms to explain the fact that carbon-containing molecules were metabolized by the Martian soil? Briefly, the positive indicators of life were set aside because of results from another pair of instruments delivered to Mars. The first was a gas chromatograph (GC for short). The second was a mass spectrometer (MS). Here is how the GC-MS instruments worked. First, a scoop of Martian soil was delivered to a cup. The cup was sealed and then heated. Vapors that emerged from the heated Martian soil were whisked away by a stream of hydrogen gas into a column. In this column, different compounds moved at different rates, causing their separation. This was the GC part of the instrument. Then, the separated compounds were injected into the MS. There they were converted to ions, which were accelerated in a magnetic field. The presence and the molecular masses of the ions were thereby determined. Mass spectrometry can detect a very small number of ions. Indeed, MS is so sensitive that some had suggested that the GC-MS experiment would produce positive results irrelevant to detecting life on Mars. As noted in Chapter 5, meteorites containing organic material continuously fall to Earth, and therefore to Mars. Thus, many expected that the GCMS experiment would simply detect the meteoritic organics, that we already knew from studies on Earth. These were not interesting when considering Martian life. Viking scientists valued carbon above metabolism for life Exploration once again generated a conceptual jolt. When it was finally run, the GC-MS instruments did not detect organic molecules from meteorites. They did not detect organic molecules from Martian life. In fact, the GC-MS detected no organic molecules at all, other than what the instrument had brought with it from Earth. The fact that organics brought from Earth were detected ruled out the Quine-Duhem auxiliary hypothesis that the instruments were not working, They were working, but saw nothing Martian by way of a carbon-containing organic molecule.

all living systems

all systems that transform organic compounds

A Venn diagram illustrates the propositions that while all living systems transform organic compounds, not all systems organic

that

transform

compounds

are

living. Thus, the detection of a living system guarantees detection of a transforming system,

detecting

a

system

does

transforming

not guarantee detection of a living system.

The Viking GC-MS instrument was tested in flight, before it arrived on Mars. In these tests, the instrument detected benzene and a silicon-containing compound, both from Earth. These were also detected once the instrument arrived on Mars. This established that the instrument was working on Mars, ruling out one auxiliary hypothesis (the instrument was broken) that might have explained the failure of the GCMS to detect organic molecules on Mars.

107

This result drove the community to conclude that the Martian surface contained no life at all, even though all three experiments that had been designed on a definition-theory of life focusing on metabolism had produced results that were positive for life. As anthropologists of scientists, we can say: "Aha! We have made a discovery". The fact that the community so easily abandoned metabolism as a criterion for life in the presence of a separate failure to observe organic species by GC-MS says something about the definition-theory of life that was actually constructively held by Viking scientists. That definition-theory placed a higher value on organic composition than on metabolism. Evidently, the Viking scientists subscribed to the argument: All life requires organic molecules containing carbon No organic molecules containing carbon exist on the Martian surface Therefore, there is no life on the surface of Mars This argument drove the community to consider not-life ways to release CO2 from radioactive organics, release oxygen from a surface, and fix atmospheric carbon dioxide in the presence of Martian sunlight. Illustrating another feature of science as it is practiced, the instant that non-biological mechanisms were constructively believed to be needed to explain away the positive life-detection results, such mechanisms were found. Still worse, those mechanisms were immediately thought to be obvious. This despite the fact that they were not thought of before they were constructively believed to be necessary. Non-life hypotheses to explain Martian metabolism without life focused on the thin atmosphere of Mars, the same thin atmosphere that H O H O made liquid water unlikely on the Martian surface. Unlike Earth, Mars H H does not have an ozone layer to filter out hard ultraviolet radiation. ultraviolet Thus, the Martian atmosphere allows harsh UV rays to come from the light Sun to the surface. And this hard UV light can do chemistry. H O For example, UV light can split water molecules. Splitting H2O gives H O a hydrogen atom (H•, one proton with one orbiting electron) and the two hydroxyl radicals hydroxyl radical (HO•). As discussed in Chapter 5, two hydrogen atoms H can combine to give the hydrogen gas (H:H). H2 is very light (it can H two hydrogen atoms float balloons). Thus, H2 should escape from the Martian atmosphere recombine into outer space, leaving behind the HO• radical. radicals The HO• radical is itself a very powerful oxidant. In particular, HO• H O O H H H will (on contact) oxidize most organic molecules containing C-H units to give C-OH units, C=O units, and eventually carbon dioxide. dihydrogen hydrogen escapes peroxide Alternatively, two HO• radicals can combine to give hydrogen perinto space remains oxide (HO:OH, or H2O2), which you can find in your kitchen pantry in soil under "sterilizing agents". Hydrogen peroxide is also an oxidant. For example, H2O2 oxidizes radiolabeled glycolic acid, lactic acid, formic oxygen gas acid, and alanine to give radioactive carbon dioxide. Just like you, a Route proposed to generate living respiring organism. A trace of iron helps. hydrogen peroxide (H2O2) from In 1976, before television cameras and with a sinking feeling recorded two water molecules present as in contemporary reports, Viking scientists realized that they might be vapor on Mars, to explain the able to get this positive sign of life (exhalation of radioactive carbon evolution of dioxygen (O2) dioxide) without life. In fact, a sterile Martian soil would "exhale" car- from the Martian soil in the bon dioxide if it contained nothing more than the H2O2 formed via Viking experiment and the absence of organic molecules action of UV light on water vapor above the Martian surface. expected from meteoritic infall. The Viking scientists then generated ways to explain (as the products

108

of not-life) the other positive life signs delivered by the other life-detection experiments, once the GC-MS data made them necessary according to the definition-theory of life that Viking scientists actually constructively held. Hydrogen peroxide in the presence of water can give water and molecular oxygen, which bubbles away. This was invoked to explain the release of oxygen from the surface of Mars that might otherwise be explained as the consequence of Martian photosynthesis. Even carbon fixation could be explained without invoking the presence of life, once an explanation was constructively sought. Adding energy from UV light to carbon dioxide and (especially) carbon monoxide generates tar. This tar would stick to the Martian soil. Here is that word "tar" again; we thoroughly explored the concept of "tar" in Chapter 5. All of these rationales had been known as the Viking life-detection experiments were being designed. The oxidation of the organic species by hydrogen peroxide had been discovered by Henry J. H. Fenton in the 1890's. Norman Horowitz, a biologist at the California Institute of Technology who helped design the carbon fixation experiment, had published a paper just a few years earlier showing that non-biological processes could fix carbon in the presence of ultraviolet light. The community certainly knew that the thinness and composition of the Martian atmosphere (which contains no ozone to shield against ultraviolet light) would allow UV light to come to the Martian surface. But although all of this was "known" in the sense that libraries contained papers and books discussing it, it was not constructively known by those who designed the Viking life-detection tests. This is an excellent example of how the act of exploration forced the scientists to look at old data in new ways. A jolt. But what about the missing meteoritic organic molecules? The ones that fall daily to the Martian surface from outer space, but were nevertheless not detected by the Viking GC-MS? According to the explanation that explained the Viking observations without Martian biology, these organic molecules were not detected by the Viking GC-MS because they had been oxidized to carbon dioxide by the HO• radical in the presence of UV light. And now for a "Quine-Duhem moment". Be careful when you claim disproof A tidy explanation? It seemed so. Indeed, the explanation was so persuasively tidy that it drove the culture of the community involved in Mars exploration for 20 years. Most planning for missions to Mars after Viking focused on learning more about the "powerful oxidant" that rendered the Martian surface "self-sterilizing". Martian rovers sent in the early 21st century did not search for life, but rather focused on the geology of the surface. Just as Lowell's telescope had biased the community towards the presence of life on Mars in the minds of scientists in the 1970's, the oxidant theory biased the community away from a Martian biology in the 1980's. But there was another problem. Organics from meteorites fall to Mars at a certain rate. According to the model, organics on Mars are also oxidized at a certain rate. Input minus output equals accumulation (an aphorism from chemical engineering). Thus, if the amount of organics on the Martian surface is negligible, none have accumulated. The destruction of organics must be as fast as their arrival. This, in turn, drives the conclusion not that the Martian surface is oxidizing, but that it is very oxidizing. Too oxidizing to be likely to support any kind of organic life, given our constructively held definition-theory of life that includes organic molecules as a key constituent. Right? Careful. Scientific method and precision in language do not require the conclusion that no organic compounds are on the surface of Mars, but rather simply that the Viking GC-MS did not detect organic molecules on Mars. The statement that Viking did not detect any organic molecules on the surface of Mars is different from the statement that organic compounds do not exist on the surface of Mars. In other words, we are having a "Quine-not-Popper moment". The syllogism is solid: Life requires the presence of organic compounds.

109

No organic compounds are present on the Martian surface. Therefore, no life is present on the Martian surface. The Viking GC-MS might seem to have provided a disproof of the presence of life on Mars. But there are always auxiliary hypotheses. And, as Quine explained, a negative experimental result can never be said to overturn a specific hypothesis, about emeralds or Martian organics. The negative outcome can always be the consequence of a failed auxiliary hypothesis. An observation that appears to disprove the hypothesis of interest may in fact simply disprove an auxiliary hypothesis required by the larger logical framework.

H H

C

H

C

C

C

C

H

benzene C:H ratio 1:6

C

H

H H

H C

H

C

H

C

C

C C

H

C

H

naphthalene C:H ratio 5:4

C

The missing organic molecules on Mars C C The goal of this book is not to deliver facts, conclusions, or even H H judgments concerning the Viking design. Rather, our goal is to develop an understanding about the scientific method as actually practiced H H while discussing an interesting problem: Alien life. And so it is imC C portant to see how this "good news-bad news" story about the possibilH C C H ity of Martian life evolved as discovery drove the development of stanC C anthracene C:H ratio 7:5 dards-of-proof and forced knowledge to become constructive. H C C H I myself became involved in this problem in 1997 after my laboraC C tory returned from Switzerland to the United States. I had volunteered H C C H to serve on a NASA panel to evaluate proposals to fund research in C C exobiology. The panel met at the Lunar and Planetary Institute (LPI) in H H Houston for three days. We panelists worked all day and had dinner C C together. But the evenings were free, and we had access after hours to C C C the library at LPI. There, I found a shelf that held the reports of the ViC C C C king mission to Mars, by then two decades old. I spent the evenings C C C reading them. And I came to a horrible realization. Graphite C C C C:H ratio large First, a word about the organic compounds that fall to Mars via meC C teorite. While these include many compounds, including some interC C C C esting for those thinking about the origin of life (see Chapter 5), the C C C bulk of meteoritic organic material is best described as tar (there is that C C word again). That is, it is a complex mixture of many different organic species, none individually present in large amounts. Polycyclic aromatic hydroFrom Chapter 5, this is not surprising. Space has a reasonable carbons (PAHs) are abundant in amount of energy. Cosmic rays, charged particles, and UV light are all the organic species that fall to zipping around the cosmos, putting energy into any organic matter that Mars on meteorites. They were expected to be so abundant that is floating about. True, the density of radiation is not very high. But the GC-MS would be overthen again, organic matter wandering around in space is being bom- whelmed by their presence and be unable to see organics from barded for billions and billions of years. During that time, carbon-hydrogen bonds are broken to create H•, Martian life. which combines to give H:H (H2, that is), which evaporates into space. What is left behind is carbon, bound in molecules that have higher and higher molecular weight. Tar, that is. Meteoritic tar is essentially carbon on the way to graphite. Graphite is a hexagonal chicken wire network of carbon atoms, while fragments of graphite are called polycyclic aromatic hydrocarbons (PAHs).

110

So what happens when graphitic tar and PAHs are exposed to oxidants C C C on the surface of Mars? Again, our scientific method assumes that the beC C C C havior of organic species is a universal; what happens to PAHs on Mars is C C C C C C C assumed to be analogous to what happens to them in our laboratory on C C C Earth. Unfortunately, the specific oxidants on Mars are not known. They C C C C may include the HO• hydroxyl radical, hydrogen peroxide, metal oxides, C C C or even the perchlorate that was discovered by the 2008 Phoenix mission C C to Mars. Oxidation may be catalyzed by Martian minerals and facilitated Part of the graphite extended hexagonal chickenwire structure by UV light. This uncertainty makes it difficult to know that experiments oxidation on Earth reproduce the conditions on the surface of Mars. O OH Fortunately, most PAHs react with most oxidants in analogous ways. In OH C O general, the process of oxidation burns the PAH down to a core ring, a C C C OH O C C circle of six carbon atoms (called a benzene ring), an arrangement that is HO C C O especially stable. Each of the carbon atoms is attached to a COOH unit C C C (called a carboxylic acid). The product is therefore called a benzenehexO C OH HO O acarboxylic acid. This species is rather stable to further degradation. benzenehexacarboxylic acid This process happens naturally on Earth. For example, when carbon (mellitic acid) from below the Earth's surface (coal, for example) comes to the surface aluminum oxide and is exposed to our atmosphere, it encounters oxidants. Most abundant of these on Earth is, of course, molecular oxygen. But the oxidation on the O O O C O surface of Earth is also mediated by the hydroxyl radical, oxidized minerC C C O als, and light. Coal is, like meteoritic tar, carbon on its way to graphite O C C Al Al and contains many PAHs. On Earth's surface, coal is also converted to O C C O C C C benzenehexacarboxylic acid. O C O As the name implies, benzenehexacarboxylic acid is an acid. Therefore, O O it reacts with any base to form a salt. For example, if the base is aluminum mellite oxide, the aluminum salt is formed. Aluminum benzenehexacarboxylate is a mineral, called mellite (after the Greek word for honey; many specimens of mellite are honey brown in color). As many organic molecules are named after the source from which they were originally isolated, benzenehexacarboxylic acid is often called "mellitic acid". What did this suggest to an organic chemist who has a bit of constructive knowledge about mineralogy? I expected meteoritic organic compounds landing on Mars to be converted to mellite (or perhaps the analogous salt of benzenehexacarboxylic acid made with another ion). This gave rise to the horrible thought as I sat on the floor of the LPI library in Specimen of mellite (the the evening. Salts are not volatile. They would not have passed through aluminum salt of the Viking GC column in its stream of hydrogen to be detected in the Vibenzenehexacarboxylic king MS. In other words, the Viking lander could have been sitting on a pile of mellite from meteoritic organics and not observed any of it. Not acid) formed from the of PAHs because it had been toasted by a very oxidizing Martian soil, but because oxidation it was not volatile. (polycyclic aromatic The more I read the mission logs, the more it became clear that this chemistry and geology had not hydrocarbons) by the been known to those who designed the Viking experiments or interpreted their results. The GC-MS had radical from the heated the Martian soil sample, of course, creating the possibility that some ofhydroxyl any benzenehexacarboxyatmosphere Earth. late salt present could have decomposed to give a volatile organic compound that wouldonhave made it C

+++

C

+++

111

through the GC to get to the MS. Unfortunately, in addition to being rather stable against further oxidation, benzenehexacarboxylate salts are also rather stable against decomposition upon heating. Depending on its salt, benzenehexacarboxylates generate volatile products upon heating at 600 °C. But Viking had heated the Martian soil only to 500 °C. The literature contained a warning of the possibility that non-volatile organics would have been missed by the Viking GC-MS. The scientist in charge of the Viking GC-MS, a professor at the Massachusetts Institute of Technology named Klaus Biemann, knew how to use language precisely. His paper reported exactly correctly the result, even in its title. That title said that his instruments had not detected volatile organic molecules on Mars. Which they had not. But as reports emerged covering the reports of the reports of that paper, the critical word "volatile" was lost. The failure to observe volatile organic compounds on the surface of Mars was constructively absorbed by the community as the failure to observe any organic compounds on the surface of Mars. This interpretation drove the community to conclude that the oxidant must be very powerful; otherwise it could not burn completely the large amounts of meteoritic organics that arrived on the Martian surface daily. Kevin Devine and I went back to the laboratory and did some experiments. We found that 500 °C was just not hot enough to decompose the benzenehexacarboxylate expected to be present on the surface of Mars from meteoritic infall, assuming partial oxidation. The Viking GC-MS could have scooped a substantial amount of these salts and still have observed nothing. And while we do not know what bioorganic compounds could be present on the surface of Mars from indigenous life, those compounds too, if they were not completely oxidized to carbon dioxide, could be converted to salts that would not have been easily converted into volatile species in the Viking GC-MS. Of course, just as Galileo's rolling balls in Chapter 1 did not establish that the Earth revolved around the Sun, and the experiments with borate in Chapter 5 did not establish that life on Earth began as RNA, the fact that benzenehexacarboxylate salts on Mars could have been overlooked by the Viking GC-MS does not mean that they are present. All that these experiments showed was that a line in a syllogism need not be true. Organics might be on Mars, and missions need to be designed to detect them recognizing their likely structures. Mars comes to us As noted in Chapter 5, scientists are humans who suffer despondency when things do not go well. The Viking results created a despondency about the possibility of life on Mars that lasted for two decades. The Viking results persuaded most in the community that the surface of Mars would toast any organics that might indicate life, even life below the surface. Further, they persuaded the community that life was not possible at all on the top surface of Mars; we would need to dig on Mars to get below the "self-sterilizing" surface presumed to be there. But still worse, the community came to believe that it was not possible to design a definitive life-detection experiment that could fit onto a This rock was recovered from reasonably sized rocket. After all, the scientific elite, from CalTech and the top of an Antarctic ice field elsewhere, including Nobel laureates, had done the best job that could in the Allan Hills, implying that it fell from above. Examination possibly be done. And failed. That despondency disappeared almost overnight in 1996 with a re- of gases trapped inside showed that the rock came from Mars, port from David McKay and his colleagues at the Johnson Space Cen- where it had been ejected by an ter. It had been known since the early 1980s that pieces of Mars have impact there before it voyaged come to Earth by a process that begins with a meteorite strike on Mars to Earth.

112

that ejects rocks from Mars into solar orbit. After wandering in the solar system, some of these ejected pieces of Mars get trapped by Earth's gravity and fall to Earth. If they land in the ocean or a tropical rainforest, they are lost. But if they land on the top of an Antarctic ice field or the Sahara desert, they stand out as anomalies and can be recovered. Both Antarctica and the Sahara are rich sources of meteorites, so much so that you can buy on eBay pieces of Mars that came to Earth in this way. Some are actually authentic. McKay and his colleagues had focused on a meteorite that had been collected from the top of an ice field in Antarctica near the Allan Hills. Appropriately, the meteorite was called Allan Hills 84001. Its Martian heritage was established by showing that gas trapped in its shock glass was similar to gas in the Martian atmosphere. So why did McKay's report dispel despondency? Because McKay published images of small, cell-like structures in Allan Hills 84001. These, he suggested, could be the remnants of microbial life on Mars. Cell theory of life trumps metabolism and carbon McKay's suggestion raises yet another definition-theory of life, the cell theory of life. Back we go to Galileo and the compound microscope that he helped develop. As discussed in Chapter 4, the compound microscope allowed the discovery of an otherwise unobservable bio- The small structures in the Allan Hills 84001 meteorite, sphere of single cell organisms. which was ejected from Mars Historically, the microscope did more than that. It drove the devel- by an impact there and landed opment of a general theory of biology: Cell Theory. For example, in Antarctica. These were reRobert Hooke (1635-1703) used the microscope to examine slices of ported by David McKay and his cork, reporting in 1665 that some of this plant tissue appeared to be colleagues in 1996, resurrecting interest in the possibility of life made up of small containers or "cells". Other studies showed that anion Mars. Are these cells? The mal tissues were also made from cells. structures are only 0.0001 cm It took some time, but Theodor Schwann and M. J. Schleyden sug- across (about 0.00005 inches). gested in 1847 that animal and vegetable biology could be unified For scale this is approximately under a theory that all living systems are built from cells. Indeed, some 1000 water molecules across. definition-theories of life capture this requirement with a passion; both the C (compartmentalization) and the S (seclusion) components of the PICERUS definition of life proposed by Daniel Koshland (Chapter 2) are closely related to cell theory. You can decide for yourself whether you think that the structures discovered in the Allan Hills meteorite look like cells from living systems. The debate went back and forth, with passion on both sides. Here it is in a good news-bad news format. First the bad news. McKay's structures appeared to be too small to be living cells; they are only 100 nanometers across. To give you a sense of scale, the ribosome, the molecular machine discussed in Chapter 4 used by terran life to make proteins, is approximately 25 nanometers across. This means that the "cells" in Allan Hills 84001 can hold only four ribosomes across. Many argued that this was too few for a viable cell.

113

My laboratory got involved at that point. In Chapter 5, we noted that the ribosome was the machine invented in the RNA world to make proteins. What, we asked, if Martian life had not already invented proteins? What if the structures being observed in the Allan Hills meteorite were remnants of life from an RNA-world on Mars, the same RNA-world that was proposed to have existed on early Earth? Such a form of life would not need ribosomes. It therefore need not be large enough to hold ribosomes. This led us to the question: How big would RNA-world cells need to be? About 70% of a typical terran bacterial cell is used to make proteins as it supports its three biopolymer lifestyle. If Martian life were RNA life, its cells (viewed under Cell Theory to be necessary for life as a universal) would not need to hold that 70%. Therefore, such Martian cells could be 70% smaller than the cells of the three biopolymer life form known on Earth. Further, the likelihood of finding RNA-world life depends on both the rate at which such life emerges and the rate at which such life invents more biopolymers. As the second rate might be slow, one might expect to encounter single-biopolymer life more frequently than "higher" life. So McKay's structures were not too small for the cells expected for the kind of life that might be encountered most frequently. Good news. But there is bad news. Other images of the structures in the Allan Hills meteorite from a different angle argued that what McKay interpreted as cells looking at the specimen from one direction looked more like mineral ridges when viewed from another. Bad news. So what about other biosignatures? For example, did the structures contain organic molecules? Richard Zare and his group at Stanford detected some polycyclic aromatic hydrocarbons (PAHs again) from the structures. Good news? No, bad news. Unfortunately, the isotopic signatures of these PAHs suggested that the PAHs within Allan Hills 84001 came from Earth, not from Mars. PAHs are contaminants everywhere. And what about benzenehexacarboxylates? Alonso Ricardo in my laboratory developed a very sensitive assay for these and applied it to a different sample from Mars. A weak signal was seen. Unfortunately, benzenehexacarboxylates are used as plasticizers in plastics on Earth, and these meteorites had been stored in plastic. Thus, we could not tell whether the weak signal that Alonso saw indicated a small amount of benzenehexacarboxylates in a Martian meteorite, or contamination from the plastic in which the meteorite had been stored. Others looked at the same meteorites in the hope of finding a mineral biosignature. For example, Joseph Kirschvink, the same scientist from Caltech who had suggested that Mars may have supported the borate-moderated prebiotic synthesis of carbohydrates, called attention to small grains of magnetite (an iron oxide having the formula Fe3O4) in Allan Hills 84001. Magnetite is made by bacteria on Earth and by other organisms that use its magnetic properties to orient themselves relative to (Left) Magnetite (Fe3O4) crystals in the Earth's magnetic north and south poles on. Kirschvink sug- Allan Hills 84001 meteorite from Mars. gested that some of the magnetite particles might have arisen (Right) Magnetite crystals from terran from life on Mars, commenting that their sizes and shapes bacteria that use magnetite to orient themselves in the Earth's magnetic field. were similar to the sizes and shapes of magnetite particles Are they sufficiently similar to meet a made by microbes on Earth. standard-of-proof that the second, being This too was disputed. Some observed that not all of the biotic, implies that the first are as well? magnetite in Allan Hills 84001 had this distinctive shape. The community has no consensus.

114

Kirschvink responded by saying that, well, some do. But can the shape can be generated by non-biological processes? Since all non-biological process for creating magnetite can never be completely explored, we can never prove that no non-biological process generates magnetite in the shape ascribed by Kirschvink to biology. Another victim of applied epistemology. The anthropologist in us is happy. We are observing a scientific community recognizing that the problem in its field is the absence of a community-accepted standard-of-proof, and is actively seeking one to define a universal biosignature. This is more progress than the origin-of-life community has made. And so we can segue to the question: How do we define a universal biosignature to support exploration that might jolt us with a discovery that exobiology is a real science? Searching for what is necessary and unique to life We now know that neither metabolism nor Cell Theory enabled definition-theories of life that rule in, or rule out, life on Mars using data obtained from Mars as input. At least not to the satisfaction of all of the interested communities. Instead, results from both Viking and Allan Hills 84001 persuaded the community that it needs to be more selective about structures and chemical transformations that it seeks as biosignatures. Of course, the most we can reasonably hope to find in a meteorite from Mars are remnants of Martian life that has long been dead. Any but the most robust form of life would be killed by being ejected from Mars, wandering the solar system (and its proton storms) for thousands of years, and then falling to Earth. It would almost certainly be easier to recognize Martian life if it still had the capability of Darwinian evolution, or perhaps if it were only freshly dead. At least the bio-chemicals would be fresh. So le us advance the discussion by assuming the best of all possible worlds, that we have a fresh sample of Mar. What might be sought in that sample that would drive the conclusion that alien life, living or freshly dead, was present? Fifty years ago, we might have easily made a list. For example, we might have argued that the presence of adenine (the nucleobase), glycine (an amino acid), or ribose (a sugar) would be signs of the presence (or the recently past presence) of life in a sample. After all, these three compounds are parts of terran nucleic acids, terran proteins, and terran carbohydrates. No longer are things so easy. As noted in Chapter 5, adenine can be found in carbonaceous meteorites (presumably arising from non-biology, unless there is life on comets). Glycine can also be found in meteorites and is generated abiologically by sparking electricity through moist methane atmospheres in the presence of ammonia. Ribose comes from prebiotic compounds in the presence of borate minerals (Chapter 5). While successful prebiotic experiments increase the possibility of spontaneous generation of life (and help us out in Chapter 5), every successful prebiotic experiment diminishes the number of molecules that can be used as reliable biosignatures. A non-biological route to a biomolecule makes that biomolecule no longer unique to life. Can our definition-theory help? We are left with a conundrum. We need some chemical structures that are universal in living systems (that is, found in life regardless of its genesis) and unique to living systems (that is, not generated by any non-life processes). We have a definition-theory of life: a self-sustaining chemical system capable of Darwinian evolution. But this definition-theory is not obviously operational. That is, it is not clear how one directly applies it to a sample taken from an alien world. The criticism is, effectively: "So Steve, tell us. How long do you plan to wait around on Mars until the glob you are observing evolves?" But perhaps we can be a bit more sophisticated. We might ask: What features must organic compounds have in general to support Darwinian evolution? If we can agree on those, then we might look for organic

115

compounds having those features in our fresh sample of Mars. Instead of observing actual Darwinian evolution in a sample from an alien world, we can observe the molecular capability for Darwinian evolution. Again, we begin with the only example of such molecular systems available to us, from life on Earth. This self-sustaining chemical system, indisputably capable of Darwinian evolution, is a three biopolymer system built from DNA, RNA and their encoded proteins. The "three biopolymer" system contains a well-recognized paradox relating to "origins". On Earth, the biopolymers are specialized to perform three functions (genetics, information transfer, and catalysis, respectively for DNA, RNA, and proteins). The sequence of each biopolymer has an encoder-encoded relation to the sequence of the other two. It is difficult to imagine that all three arising spontaneously and simultaneously in an environment lacking life. Further, it seems to be astronomically improbable that the three biopolymers arose in this way with an encoded-encoder relationship already built into them. As noted in Chapter 4 and exploited in Chapter 5 and our analysis of the Allan Hills structures, research that works backwards in time suggests that single biopolymer life based on RNA might also work. Indeed, that approach suggested that our own ancestors on Earth used RNA as the only genetically encoded biopolymer. At the very least, this RNA-world hypothesis addresses the "chicken or egg" problem in deciding which biopolymer came first. In fact, we might view single biopolymer life as the most likely form of life that we will encounter as we explore the cosmos. Here is the argument: Single biopolymers capable of self-sustenance and Darwinian evolution are possible. The "breakthrough" to create a second biopolymer specialized to be catalysts (on Earth, proteins) is slow. With the rate of formation of single biopolymer life being faster than its rate of conversion into multipolymer forms of life, single biopolymer life forms should dominate in the cosmos. Fair enough. But what molecular features does a single biopolymer need to allow it to support Darwinian evolution? What molecular structures join replicatability, mutability, and adaptable function (e.g., catalytic activity) in a single molecular system that can evolve in response to natural selection? The molecular system should at the very least support template-directed replication. Templated replication is widespread in chemistry; many molecular systems can seed the formation of copies of themselves. For example, as discussed in Chapter 2, a crystal of sodium chlorate (NaClO3), if fragmented, will nucleate (or "seed") the formation of more crystals of NaClO3. But again, replication is not enough. The replicate must be an imperfect copy of the template. The descendents should at least have the possibility of being different from the parents. And those imperfections (the differences) must themselves be replicatable. The mutation cannot disrupt the next generation of replication. Crystallization does not have this property. A NaClO3 crystal might be said to be a "mutant" if some of its sodium atoms are replaced by potassium at specific sites in the crystal. These are known as crystal defects. These defects carry information, and make the replicate crystal different from the parent crystal. But the defects in the child crystal cannot be passed on to the grandchild crystals. Thus, the NaClO3 recrystallization cannot support Darwinian evolution. Chemical systems that replicate imperfections are scarce Nor can proteins easily support Darwinian evolution. Examples are now known of proteins that can join fragments to make more of themselves; several were developed in the laboratory of Reza Ghadiri at Scripps. That replication can, in principle, be imperfect. Unfortunately, different protein molecules with slightly different amino acid sequences can behave in very different ways. For example, single amino acid replacements in a protein can convert the protein from one that is soluble in water (perhaps essential for

116

replication) to one that is insoluble in water. Nothing stops Darwinian evolution faster than having your genetic molecule precipitate. This behavior in proteins is common in other molecular systems. In general, molecular behavior changes dramatically with small changes in molecular structure. This is put to good use in protein evolution. Once one has a ribosome, messenger RNA, and other pieces of a system that allows proteins to be genetically encoded and undergo Darwinian evolution, biological systems can acquire new proteins with new behaviors and functions by mutating existing proteins. Just a few changes in the protein sequence allows this to happen. But encoding molecules cannot have this property. Instead, they must not change their properties upon mutation, to allow the properties of the encoded molecule be the focus of selective pressure. Proteins do this poorly. The textbook case is sickle cell hemoglobin, a protein examined by Linus Pauling as the first example of a genetic disease. Native hemoglobin is itself a very soluble protein; it must be very soluble, as it is present at high concentrations in red blood cells. But if a single amino acid is replaced in the protein, it starts to precipitate. Not entirely (see the structure to the right), but enough to distort the form of the red blood cells that carry it. This causes disease (bad news) and creates a degree of resistance to malaria (good news). The upshot of this is that while hemoglobin (and by analogy proteins in general) is a great molecule for transporting oxygen in a rapidly changing world where adaptation in behavior is needed, it is a poor molecule to do genetics. Darwinian evolution in a two biopolymer system would be hopelessly constrained if a change in one of the building blocks in the encoding would create adaptive pressures different from those guiding the evolution of the encoded molecule. The repeating charge in DNA and RNA allows it to support genetics To support genetics in a Darwinian chemical system, a class of molecules is needed that can change its information content by changing its molecular structure, but without greatly changing its physical properties. This molecule should be able to store millions of bytes of information, where the molecular structure holding each byte at each position can be changed to change the information encoded, but without having the physical proteins of the genetic molecule change much. We want to be able to change baby's hair color without having the molecule carrying the information encoding a new hair color precipitate from water, turn to tar, or otherwise cease to be able to further support Darwinian evolution. Of course, we already have examples of such molecules: RNA and DNA. Every (or almost every) RNA or DNA sequence dissolves in water. Every (or almost every) RNA or DNA molecule binds to its complement following two very simple rules (A pairs with T, G pairs with C); these are the rules by which DNA copies itself. Every (or almost every) RNA or DNA molecule will direct the synthesis of a

The hemoglobin protein, depicted as a molecule above, is very soluble. But if a single amino acid is mutated, the variant undergoes partial precipitation, distorting the red blood cells into a characteristic sickle shape that causes disease (below). Thus, the hemoglobin protein would be a poor encoding molecule.

117

daughter molecule given the building blocks and an appropriate enR1  R2  R3     zyme. And the replacement of a nucleotide at any site in an RNA or     DNA molecule by any of the other three nucleotides in the genetic al6 5 R R    R4 phabet generally does not disrupt these properties. Mutant RNA and      DNA retain their ability to dissolve in water, pair with complements,  H and template the synthesis of a child. the repeating C N dipole in the Not many real organic molecular systems have these properties. In  O backbone of  H fact, in virtually no other chemical system (including proteins, disproteins C N cussed above), can we change molecular structure with confidence that  O the change will not dramatically alter molecular behavior. R1 R2 R3 So what exactly are the molecular structures within RNA and DNA that allow these biopolymers to change their sequence without dra- A linear molecule that has a matically changing their physical properties? What structural features repeating dipole in its backbone that allow DNA and RNA to support Darwinian evolution as an en- easily folds on itself, as the partial positive charges can coder, in ways that are not found in proteins, carbohydrates, or other interact with partial negative biopolymers? Asking these questions in the late 1980s in Switzerland, charges. You can show this my laboratory set out to get an answer by synthesizing analogs of RNA yourself by tying a series of and DNA that removed, one at a time, the structural features charac- magnets to a string. The magnets, having both north and teristic of both molecules. south poles, will aggregate, So now some chemistry. RNA and DNA molecules are built from folding the string. The analog three kinds of pieces. In the ladder structure that supports the double from biology is the protein helix, the uprights of the ladder are an alternating sequence of phos- backbone. Each of the –COphate-sugar-phosphate-sugar. The sugar is ribose in RNA and deoxyri- NH- units in the backbone contains a more electronegative bose in DNA. The rungs of the latter are the nucleobases: adenine, atom (O) and a more electroguanine, cytosine, and uracil. The sequence of the nucleobases carries positive atom (H). Proteins, the information, while the structure of the uprights remains the same in linear strings of these, fold. all RNA molecules, regardless of sequence. As discussed in Chapter 5, the distribution of electrons in a molecule is the most important behavior-defining feature of a molecule. The most striking feature of electronic distribution in a molecule is a charge. Thus, the most distinctive thing that one can say about a sodium cation (Na+) and a chloride anion (Cl-) is that they each have a charge. Indeed, as discussed in Chapter 5, this is why salt dissolves in water. The backbones of RNA and DNA contain many charges. Indeed, every phosphate in the uprights of the ladder carries a negative charge. These repeating negative charges dominate the physical properties of nucleic acids. They account for the solubility of essentially all RNA and DNA sequences in water. They account for the inability of RNA and DNA molecules to pass through a gas chromatograph. The negative charges on the backbone on one strand of the double helix repel each other, causing RNA and DNA molecules to stretch out, ideal for templating. Repeating negative charges on the backbone on one strand of the double helix repel those on the other strand. This forces the strands to interact with each other as far from the backbone as possible; this is ideal for Watson-Crick pairing. Last, the repeating negative charges on the backbone so dominate the physical properties of the molecule overall that those properties do not change much if one changes the nucleobases in the rungs in the ladder, the nucleobases that determine the encoded information. These are exactly the properties that one wants in an encoding biopolymer. To demonstrate the importance of the repeating charges to allow DNA to encode, Clemens Richert, Zhen Huang, Monika Blaettler, and others working in my laboratory tried to make RNA and DNA analogs that did not have them. They prepared a series of molecules that looked like RNA and DNA in most ways, except that they did not have the charge repeating on backbone phosphate.

118

These uncharged RNA and DNA analogs behaved remarkably. First, they folded. The absence of the repeating negative charges meant an absence of the backbone charge interaction that causes natural RNA and DNA to intrinsically extend themselves. Thus, although short sequences displayed the same rulebased molecular recognition displayed by RNA and DNA, longer sequences of the uncharged analogs did not. Removing the repeating charges therefore lost the ability of these molecules to do what RNA and DNA must do to make copies of themselves. Further, once the repeating charge was gone, the physical behavior = Cl- HPO4 biopolymer with of the uncharged RNA and DNA analogs became very sensitive to complementary charges SO4= changing the sequence of the nucleobases that carry the information. A Brvery few changes in their sequences changed their solubility in water, - - - - - - - + + + + + + + + + + their folded structures, and other features of their behavior. In this respect, these uncharged analogs of RNA and DNA behaved like proteins, and were similarly unsuited as genetic molecules. regularly spaced charges on surface

The polyelectrolyte theory of the gene These results suggested that a repeating charge is essential in a biopolymer if it is to support Darwinian evolution in an encoding capacity. The repeating charge does this in many ways:

A molecule that has a repeating negative charge can be pulled from a solution by a solid support that has a repeating positive charge.

(a) The repeating backbone charge keeps RNA and DNA dissolved in water. (b) The repeating backbone charge forces the interaction between strands to occur as far distant from the backbone as possible, as the backbone charges from one strand repel the backbone charges from the other strand. This prevents sugar-sugar interactions, sugar-backbone interactions, interactions between the sugar and backbone groups of one strand and the back side of the nucleobases on the other, and so on, from competing with the Watson-Crick interactions needed to make replicates. (c) The repeating backbone charge keeps the RNA and DNA molecules from folding up, allowing them to act as templates. (d) The repeating backbone charge in RNA and DNA dominates the physical behavior of a DNA molecule, which is therefore largely O O N NH -O P independent of its sequence. O

Based on these arguments, we proposed that this particular structural feature (a repeating charge) was general to all life in water, universally. Being chemists, we called this the "polyelectrolyte theory of the gene". A polyelectrolyte is a fancy word to describe a molecular structure that has a repeating charge. The polyelectrolyte theory of the gene provides a working hypothesis that today guides our search for universal chemical structures that support Darwinian evolution. Proposition: As a universal, living systems must contain at least one biopolymer having a repeating charge. This biopolymer will play both catalytic and genetic roles in single-biopolymer forms of life, and the genetic role in multi-biopolymer life forms. The polyelectrolyte theory of the gene constrains our search for life in the cosmos. We need not consider all biopolymers that we might encounter in our exploration of the cosmos, just those with repeating

-

O

O

O

N

NH2

N

NH2 P O

N

O

N O

O

-O

N N

P

H

NH2

O

N

O

O

N

O

O

Repulsion between repeating backbone negative charges on one strand of a DNA molecule and the repeating backbone negative charges on another strand of a DNA molecule causes the two strands to interact with each other as far from their backbones as possible.

119

charges. Further, a biopolymer carrying a repeating charge is easily extracted from solution. A surface having an absorbent with a regularly spaced positive charge will bind to RNA and DNA more tightly than competing single anions (like chloride). If the polyelectrolyte theory is correct, we need not wait around to see if an alien chemical system can evolve. All that we must do is see if it has a genetic structure that is capable of Darwinian evolution. Operational definitions Combining chemical theory with our definition-theory of life has done something that is worth comment. First, our definition-theory of life, by speaking of "capabilities" and being broadly cast, appeared to be less tangible that definition-theories of life that focused on metabolism and cell structure. Accordingly, members of the community raised what are called operational objections. They conceded that the definition-theory might be universal from a metaphysical perspective. But they contended that it was useless as a practical tool to search for life in the cosmos. Why? Because they held that it could be applied only by observing living systems and waiting for them to evolve in a Darwinian sense, something that rendered the definition-theory useless as an operational definition. An operational definition produces a series of operations that, recipe-like, are applied to an object. Like a key to assign a name to a species of organism, the operations generate observations that give simple and objective answers to questions: Is the object green? Does it have a hardness of 8 on the Moh's scale. Does it crystallize in a hexagonal space group? If so, the object is defined to be an emerald. Operational definitions are important in science, so much so that some scientists believe that they are the only definitions that are truly "scientific". Thus, "water" is defined as a substance that passes simple tests. If the substance freezes at 0 °C (32 °F, 273 Kelvin), boils at 100 °C (212 °F, 373 Kelvin), and so on then the observations based on the operations are considered to be good definitions of water. The quality of the operational definition, in this view, depends only on how easily the operations can be carried out. We have chosen a definition-theory of life that perhaps suggests an observational test. To determine whether a system is life, place it under stress and observe whether it generates replicates that have heritable changes in its molecular structure. This is, however, a "bad" operational definition because the test is not easily executed. Bad news. But a bit of theory from chemistry permits this to be turned on its head. As chemists, we need not resort to simple observational science to decide whether a glob might support Darwinian evolution. Rather, we can devise chemical laws that will say whether the chemical structures in those globs can support Darwinian evolution. As chemistry, those laws are (truly, we believe) universal. And so we can test them, Galileo-like. Not by rolling balls, but by synthesizing different forms of matter in the laboratory directed to the test of the law, not of the definition-theory which, after all, concerns a subject matter that we cannot (yet) observe. Might "weird life" still be discovered on Earth? Armed with this approach, we can search the cosmos for life. First, we go to a place where the criteria of life are met. According to the community, these would include a place where liquid water is possible, where organic materials are available, where energy is accessible, and so on. Then we must raise the funding to get there. But what about here on Earth? About a decade ago, with Christopher Switzer, now a professor at the University of California at Riverside, we asked: What is the chance that a new form of life might be here on Earth, under our feet, unrecognized? Before you dismiss the idea, let us make an analogy-by-history argument to persuade ourselves that there is no reason to be confident that the terran biosphere has exhausted its potential to yield discoveries about the fundamental nature of life. Just 50 years ago, having

120

full advantage of microscopes of many kinds (compound and electron), biologists thought that all life could be divided into two kinds, eukaryotes and prokaryotes. This distinction was based on Cell Theory. Eukaryote cells had a nucleus; prokaryote cells did not. We thought we were "done-done". That conclusion was so entrenched that in the early 1980s, biologists were dividing the living world based on cell structure and other physiological and metabolic characters. For example, in 1982, Lynn Margulis organized life into five kingdoms: animal, plant, fungi, single-celled organisms with nuclei, and single-celled organisms without nuclei [Five Kingdoms, 1982]. The Five Kingdoms book holds many lessons in the scientific method, including yet another example (if one were needed) of a new way of looking at the world disrupting "settled science". Woese’s experiments with the sequences of RNA molecules had established that "prokaryotes" contained taxa as different from each other as eukaryotes were from canonical prokaryotes. Yet only twice in Five Kingdoms did Margulis mention "some biologists" who considered that the taxonomy should be driven by chemical criteria. Margulis was, of course, referring to Carl Woese, whom we met in Chapter 4. The omission of Woese's classification scheme (published in the previous decade) from Margulis' 1982 book reflects a dispute in the community that was to continue for another decade. As late as 1998, Woese and the evolutionary biologist Ernst Mayr were arguing about whether chemical criteria should trump others in classification schemes for biology. The fact that just 10 years ago our fundamental view of life was still changing because of biodiscoveries on Earth cautions (if further caution is needed) anyone who thinks that science is “settled”, or that exploration on Earth can no longer yield discoveries relevant to the nature of life. Exploration of Earth continues to yield discoveries. For example, exploration of the floor of the Atlantic Ocean in 2000 near the Mid-Atlantic Ridge discovered spires of rocks rising from the ocean floor. The spires were so magnificent that the structure was called the "Lost City". On closer observation, it became clear that the microbes inhabiting the Lost City had a distinctive metabolism, supported by the erosion of rocks containing peridotite, the same mineral that generates the alkaline deposits that we found so useful in Chapter 5 to generate prebiotic car- For those who seek to look on bohydrates. Earth as a "low cost" approach The microbes inhabiting the Lost City turned out to be representatives to discover life that might jolt of the standard terran life that we already knew. But the discovery fur- our view of life as a universal, ther jolts complacency that we might have to dismiss out of hand the no place is better than the ocean floor. Here is an image of the question: Can further exploration on Earth uncover a form of life that "Lost City", discovered just off does not share common ancestry with the terran life that we already the mid-Atlantic Ridge, a know? Or, if ancestry is shared, is it possible that this life form would geological structure that harbors have diverged in history before the three kingdoms of life identified by distinctive forms of life. Woese diverged? Might RNA-world life still be living on Earth? A decade ago, NASA arranged a conference on this question. At that conference, I argued (there was dispute, so this argument does not reflect the community view) that what we knew about life almost force the conclusion that the possibility must exist. Here is how. The argument starts with the hypothesis, from Chapter 5, that an early episode of life on Earth used RNA as the only genetically encoded molecule (the “weak” version of the RNA-world hypothesis). In the RNA world, RNA served as both the genetic molecule and as the catalyst. Protein synthesis was, according to this view, later invented by this single biopolymer life form.

121

Descendents of this breakthrough innovator expanded the list of amino acids included in proteins to the 20 standard amino acids. Since these same amino acids are present in all of the terran life that we know, we conclude that all of the terran life that we know descended from organisms that had already learned how to make proteins using ribosomes. This argument prompts the question: What happened RNA LIEE STILL SURVIVING ON EARTH to the descendents of the RNA world organisms that did not learn how to synthesize encoded proteins? The RNA-world hypothesis is silent on this. When asked, many of its proponents (including me) started by saying: We don’t know, but perhaps they were all eaten by the organisms that could make proteins. The argument is: • • • •

Organisms that can make encoded proteins are better than organisms that cannot, which are inferior. Better organisms eat inferior organisms Over a billion years, better organisms eat all inferior organisms Therefore, as a billion years (at least) has passed since organisms that biosynthesize encoded proteins emerged, all of the organisms that cannot biosynthesize encoded proteins have been eaten.

Fair enough. But an inferior organism can be eaten by a superior organism only if the superior organism can find it. If the two live in different environments, perhaps the inferior organisms might have survived. This led us to the question: In what kinds of environments might a single biopolymer organism left over from the RNA world have survived? We offered three thoughts. First, the encoded biopolymer of RNA organisms does not require sulfur; proteins, one encoded biopolymer of modern organisms, does. RNA organisms may therefore survive in an environment on Earth that has very little sulfur. This is a place to search for RNA organisms left over from the RNA world. Second, some 70% of the volume of a typical three polymer microorganism is consumed by the translation machinery needed to make proteins. This machinery is not needed by an RNA organism. Therefore, RNA organisms can be much smaller than protein-based organisms. This suggests we might look for RNA organisms on Earth by looking for environments that are space-constrained. Many minerals have pores that are smaller than one micron across. These might hold smaller RNA organisms. RNA is also easier to denature and then refold than proteins. This might create a niche for an RNA organism in environments that cycle between very high and very low temperatures. Another place to look. Other conjectures are conceivable. High hydrostatic pressures may favor an RNA over proteins. Increasing pressure appears to favor hydrogen bonding and aromatic stacking, and disfavor more general hydrophobic effects. As proteins depend on the latter for their folds, while nucleic acids depend on the former, RNA life might be found more likely at higher pressures, deep in the ocean. Another place to look. Would we have missed this "weird" life? The search for weird life on Earth immediately runs again into the same problem: How do we look for forms of life whose structures are not known to us on Earth? Woese's "universal" tree of terran life was defined by RNA molecules that formed part of the ribosome, the machine that makes proteins. Indeed, when new microorganisms are sought, sequences found in ribosomes are used as probes to find them. This will not work to discover RNA-world life, which (by definition) does not have ribosomes to make proteins by translation Therefore, just as the Viking GC-MS could not have detected meteoritic organics

122

even if it was sitting on a pile of them, the tools commonly used to search for life in the terran biosphere would not detect any life that survived from the Earth's past, even if the sample was swimming with it. What else can we look for? RNA, of course, but any RNA that we find could originate in the three-biopolymer life that we already know about. Those who place weight on a cell definition-theory of life might look for small cells. Interestingly, these have been reported; they are called "nanobacteria". It will require a separate book to go through the good news-bad news story on these. Early evidence that very small bacteria were responsible for certain kinds of kidney stones was disputed as being better explainable as the consequence of mineral crystallization, and this is how the community appears to view this at present. Reminiscent of the Allan Hills dispute, these are questions for which the community lacks an accepted standard-of-proof. Indeed, one of the most unfortunate features of this literature is that papers are rejected by the community for illogical of reasons. For example, some referees have rejected papers based on the failure of their authors to report, for example, finding DNA in the nanostructures. Well duh? If these are RNA world organisms, they do not need DNA. Follow the water The difficulty that the community has had agreeing on life-detection tools and standards-of-proof for evaluating their output has caused some to go in a different direction. Instead of looking for life, the community has developed the concept of a habitable zone. This is a region in the solar system, or on a planet, where life might exist. Most habitable zone calculations are based on a definition-theory of life that emphasizes the importance of liquid water. Influenced by Chris McKay (no relation to David McKay) and others, the community has come to accept the idea that life on Earth is everywhere where liquid water is available. If liquid water is absent, then THE HAZARDS OF EXPLORATION life is absent, even in environments (such as the ultra-dry Atacama MEETS THE NEED FOR WATER. desert in Chile) that have ample opportunity to be infected from life floating in from wetter environments elsewhere on Earth. Defining a habitable zone in terms of the availability of liquid water has some advantages. Because of our understanding of the universal physical properties of water, we can decide before we fly a NASA mission where bodies in the solar system are likely to have environments that hold liquid water. For this reason, NASA adopted as its exploration motto: "Follow the water". Many locales in the solar system may well have liquid water. These include the sub-ice regions of Europa (one of Galileo's moons of Jupiter) and sub-surface Mars. Unfortunately, Europa is sufficiently far from Earth that it is not inexpensively accessible. Again, no bucks, no Buck Rogers. No NASA mission has visited Europa, although people talk from time to time about going there. Is there liquid water on the surface of Mars? But Mars is now back in the picture. In part, this is because liquid water is not necessarily excluded from the accessible surface of Mars. "But" I hear you say, "I thought we had that settled!" If there is one idea that I want you to take home from this book, it is: Nothing is ever settled in science. As discussed previously, a useful process to enforce intellectual discipline in science is to require ourselves, every now and then, to revisit "settled" science and ask: What must be wrong were we to reject what we believe is a well-validated fact? What primary data must be set aside?

123

And so we establish the dialectic. The community belief is that there is not liquid water on the Martian surface. The dialectical position is there is liquid water on the Martian surface. Without violating even universal law, how might we get water to the surface of Mars? One way to do so is to revisit atmospheric pressure as a function of elevation. Just as the boiling point of water decreases as we go up in elevation, from Miami to Denver to Mt. Everest, the boiling point of water increases if we go down in elevation. In Death Valley, for example, 282 feet (86 meters) below sea level, the boiling point of water is higher because the atmospheric pressure is higher. The same is true in the Dead Sea, at 1378 feet (420 meters) below sea level, the lowest land surface on Earth. So one way we might get liquid water on Mars is by going to its lowest point. Mars does indeed have deep canyons where the atmospheric pressure is higher. We know the depth of those canyons. Gas pressure, gravity, and the properties of water are (we believe) universals. And indeed, water has a greater chance of being liquid there. But it is close. The triple point of water is 611 pascals. Above this pressure, water can be a liquid. This is quite close to what Mariner estimated as the typical pressure on Mars (~ 600 pascals). On the surface, the Viking and Pathfinder missions never measured a pressure below 700 to 900 pascals at the sites where these missions landed. So liquid water has a narrow range of existence at lower elevations on Mars. But what about putting something in the water? Like salt? It is well known that adding some sodium chloride to water increases its boiling point. At sea level, a saturated solution of NaCl boils at 109 °C, up from 100 °C for pure water. At the top of Everest, where water boils at 69 °C (156 °F, 342 Kelvin), a saturated solution of NaCl boils at 74 °C. And so on down to Martian pressures. Salt also helps water stay liquid on the other end of the temperature range. Salt is an anti-freeze; salt in Martian water is expected to help it stay liquid at the cold temperatures on Mars. Prompted by these thoughts, a few years back, George Landis at NASA pointed out that the surface of Mars might have salty liquid water. On Earth, life was known to live in brine. Indeed, both archaebacteria and a few eukaryotes live on Earth in nearly saturated NaCl. Good news. The Opportunity and Spirit Rovers wandering around on Mars found more good news: evidence for salt. Indeed, magnesium sulfate (known in your bathroom as Epsom salt) appears to be an important component of the Martian soil where they looked. Additional evidence was found for the bromide ion as part of salt deposited by past liquid water on the surface of Mars. A saturated solution of magnesium sulfate (about 20% by weight) freezes at 263 K (-10 °C, 4 °F); the boiling point of water of that solution is also higher (383 K, 110 °C, 230 °F). Thus, a saturated aqueous solution of magnesium sulfate can be a liquid on today's surface of Mars. Magnesium bromide does better, with the most optimistic calculations having a range for liquid water to be between -30 to 20 °C (-22 to -4 °F, 242 to 252 Kelvin). Evidence for recent Good news, you say. Salt gets us liquid water. No, to some, it is bad news. flow of a fluid on Wired Magazine, for example, just published a report published in the prestigious the surface of Mars. journal Science stating that a Harvard research team led by Nicholas Tosca ad calculated that water on Mars was "probably too salty and too acidic to support the development of life." At the very least, life on Mars "would require biology that was completely different from any we know on Earth." The story went further: "Early in the 20th century, hopes were high that intelligent life existed on the planet, largely due to erroneous interpretation of geological features seen through telescopes. The Viking missions of the 1960s 1970s dashed hopes that Mars was home to pervasive life. But since then, scientific momentum has been building that life could have existed in Mars' distant past, when it is believed that liquid water

124

flowed on the planet. The new study's innovative investigation of Martian water's chemistry, however, could start the pendulum swinging back the other way. Tosca and his colleague were able to calculate a maximum water activity level, a common measure of salinity, of 0.78 to 0.86. By comparison, pure water has a water activity of 1.0, the world's oceans come in at 0.98, and the Dead Sea, the saltiest body of water on Earth has a water activity of 0.69." Back to despondency? As we learned in the case of the Martian GCMS, scientific conclusions have a way of evolving in the press to mean more than they were originally presented to mean. For this reason, a key part of the discipline of good science is to actually read the paper.

A separate book must be written about scientific communication in "fashion journals" like Science, and in the science press As noted in earlier chapters, language reflects culture, and both determine science. One cannot get a paper published in Science unless you argue for a dramatic, yet simple point. This accounts for the frequency of papers that "rule out" or "rule in" alien life in that journal. Most of the time, these can be safely ignored, as the truth is nearly always a more complex combination of facts and theories.

Here is what Tosca actually said: "Our calculations indicate that the salinity of well-documented surface waters often exceeded levels tolerated by known terrestrial organisms." Certainly no reason for despondency, you might say. First, they are "calculations". Next, these are couched in "weasel words"; the calculations only "indicate". Further, the salinity only "often" exceeds some level. And the salt is too high to be tolerated by "known terrestrial organisms". So, why should we worry about this paper at all? We are talking about life on Mars. No doubt it could have biochemistry different from biochemistry on Earth. Even on Earth, biochemistry works at high salt concentrations. Those who worry that life is impossible in fluids where the activity of water is below 0.7, be reassured: the activity of water inside of your living cells is approximately 0.6. After all, the proteins and nucleic acids inside of a cell are themselves polyelectrolytes (read "salts"). Evidence for water on Mars was found by the 2008 Phoenix Mars Exploration continues lander; the frozen water (white) Even as this chapter is being written, the Phoenix mission is on sublimes. One salt in the frozen the surface of Mars near the polar region. There, it found ice, salts, water is a stable oxidant and a high pH. Curiously enough, these are the very conditions (perchlorate), that may indicate that we might want to create borate-containing minerals that would the oxygen history of the Martian stabilize ribose formed from formaldehyde created in the Martian surface. atmosphere above the surface, which is rich in carbon dioxide. Unfortunately, the instruments that were landed on the Martian surface mission was not designed to search effectively for specific organic molecules. Nevertheless, this discovery represents the next in a long line (we hope) of continuing missions that might find evidence of an alien form of life that is unrelated to the life that we know on Earth. If found, such evidence will be the single most important thing to drive our understanding of what life is, and provide what is so difficult to generate in science without exploration: New ideas.

125

126

Chapter 7 Synthetic Biology: If We Make It, Then We Understand It So far, we have emphasized how different communities of scientists differ in how they practice their methods. Now, let us devote some pages to doing the opposite, discussing intellectual strategies that unify science. To the extent that different sciences must be drawn together to understand life, things that different scientific fields have in common will be important. For example, observation is practiced by essentially every scientist. Auto mechanics, symphony conductors, and others who do not call themselves scientists also observe. Indeed, it is difficult to conceive of a human activity that does not involve observation of some kind. Of course, different scientists observe in different ways. Moose in Montana are observed using binoculars. Moons around Jupiter are observed with telescopes. Molecules of methane in interstellar clouds are observed by microwave spectroscopy. However, observation has something in common in all disciplines: Observation does not alter the system. The moose, the moons, and the methane are all unaware of their being observed. They behave no differently because of it. Perturbation is another strategy that is used in various sciences, as well as by auto mechanics and symphony conductors. Here, the system is probed. The scientist, the mechanic, or the conductor then observes how the probe causes the system to behave differently. The conductor might poke the first violinist and see if he plays faster. The auto mechanic might change the brake pads and see if the squeaking stops. The scientist might drop hay near a moose and see whether he eats it. Of course, useful perturbations may also come naturally. For Scars were formed where pieces example, when fragments of the Shoemaker-Levy 9 comet hit Jupiter of Comet Shoemaker-Levy 9 in 1994, the Jovian atmosphere was perturbed. Even though planetary crashed into Jupiter. The scientists did not deliberately throw the comet at Jupiter as a probe, cometary probe revealed they certainly used observations of the planet after Jupiter was features of Jupiter's atmosphere behavior in ways that simple naturally perturbed to test their models for the Jovian atmosphere. observation could not.

Analysis generates lists of parts Another strategy that sciences have in common is analysis. Analysis does more than observe after probing. Analysis dissects a system to produce a list of parts. Such lists cannot be obtained by simple observation of the system, or even by observation following perturbation. The system must be dissected. In geology, dissection found that green emeralds, green peridots, and green rocks from Solomon's mine contain beryllium, magnesium, and copper (respectively). In chemistry, analysis showed that methane is built from four hydrogen atoms and one carbon atom (CH4), ammonia is built from three hydrogen atoms and one nitrogen atom (NH3), and water is built from two hydrogen atoms and one oxygen atom (OH2). In biology as classically done, analysis begins by killing the system. Then, the life-that-was is physically dissected, and the parts encountered are listed. Only technology limits what ends up in a parts list. When applied to living systems with a scalpel and the eye, analysis generated lists of organs and bones. When supported by a microscope, biological analysis generated lists of cell types, such as the neurons in the brain or types of cells in the blood. Once sup-

127

ported by electron microscopy, analysis generated lists of sub-cellular structures such as the nucleus, ribosome and mitochondrion. The value of analysis in biology is indisputable. Indeed, progress in biology over the last century has been measured in what analysis has produced. Christian de Duve, whom we met in Chapter 5 discussing the origin of life, won his Nobel Prize analyzing structures within cells. Peter Mitchell, whom we met in Chapter 3, won his Nobel Prize analyzing the mitochondrion. Biological analysis is now based on chemistry THE BEGINNINGS OF ANALYSIS IN BIOLOGY One feature of 20th century biology that set it apart from earlier biology was the listing of the chemical parts of living systems, including protein molecules and genetic molecules. Chemistry then determined the parts of these parts. These included the 20 amino acids that form protein molecules, the four nucleotides that form DNA molecules, and the four nucleotides that form RNA molecules. It is difficult to overestimate the impact that chemical analysis had on our view of life in general. Indeed, the definition-theory of life as a chemical system comes directly from the last century of analysis. Analysis of living systems at the molecular level is now remarkably advanced. For example, we today have a nearly complete list of all of the chemical parts of an E. coli cell, including its genes, its proteins, its RNA molecules, and its metabolites. We can even estimate to within a few percent how many water molecules are present in an individual E. coli cell. For humans, the list of parts is not as complete, but is coming along nicely. We have a nearly complete list of all DNA molecules in two human individuals (Craig Venter and James Watson). We can list a majority of the proteins encoded by that genome. The same is true to a lesser degree for many other forms of life on Earth. Indeed, in Chapter 4, we saw how this surfeit of molecular information from modern terran biology allows us to build models for the history of all known life on Earth. Once analysis has provided a list of parts within a living system, scientists may perturb the parts and observe a response. For example, after the mitochondrion was identified as a part of a cell by analysis, scientists perturbed it, observing how it responded. Eventually, Peter Mitchell's chemiosmotic hypothesis for forming ATP emerged. Likewise, after the ribosome was listed as a cell part, perturbation followed by observation identified it as the place where proteins were synthesized. Analysis of the parts of the ribosome showed that its RNA parts actually made the proteins. From here, it was simple enough to generate the RNA-world hypothesis, which (in turn) made a small but important statement about the potential scope of life universally. So analysis has been important in advancing the theme of this book. Sickle cell anemia illustrates how analysis has value in technology. Thus, analysis of whole blood identified red cells as one of its parts. Analysis of red blood cells identified hemoglobin as a part of those cells. Analysis of hemoglobin identified the 20 amino acids that are its parts, and their sequence in hemoglobin. From this came the hypothesis that sickle cell disease arises when one amino acid is different in the sequence of hemoglobin. As well as a more universal statement that mutations that alter the sequence of amino acids in proteins might be the sources of many diseases.

128

Analysis need not provide "understanding" But analysis has limits. We do not, for example, expect to understand human consciousness from a list of neurons found in the human brain. We do not expect such an understanding from a list of molecules inside those neurons. At the very least, the parts of a system must be related to each other, in time and space, to give a model that reflects how the parts interacted in the whole system, before we killed it . What language is most productively used to build such a model? This depends on the field of science. Chemists, for example, use the "language" of plastic. Chemists model molecular reality with plastic balls, white balls representing hydrogen atoms, black representing carbon atoms, red representing oxygen atoms, and blue representing nitrogen atoms. Such low-tech models are (perhaps surprisingly) valuable to chemists seeking to understand how molecules behave as wholes built from atomic parts. Stepping up in the biological hierarchy, we could build a plastic model of the human brain that represents collections of neurons. Plastic brains are found in the middle-school science classroom (no, we are not speaking of the students). There, they help students understand the workings of the brain, even though the models are constructed in the "plastic language". But plastic models do not take advantage of modern analytical technology. Indeed, a plastic model of the brain represents a level of analysis that is little better than the analysis available to the cave man dissecting a mastodon. We would like something more, expressed in a language that exploits all of the technology that was applied in the last century to biology.

Chemists use physical models made from plastic and metal to help them picture the whole molecule as a composite of its parts. Above are models of the cytidine, a part of RNA (top) and benzene, an aromatic molecule (bottom). Hydrogen, carbon, oxygen, and nitrogen atoms are white, black, red and blue, respectively.

Computers as tools to model the whole from the parts And so we turn to computers. We have read six chapters without a mention of computers as one of the transformative technologies of our time. It is time now to mention computers and their role in various scientific methods. Computer programs can describe interactions between parts of a system that have been identified by analysis. In cartoon form, computer modeling begins with a program that represents each of those parts, their amounts, and their interactions. An initial state is set that describes those parts and their location. Then, the system is set in motion, with numerical simulation used to model A plastic model of the how the parts move, interact, and react. This motion is simulated for just a human brain. The moment of time. After that moment, some of the parts have moved, interacted, model helps explain and perhaps changed in amount. This creates a new state, which becomes the how the brain is orgastarting point for a simulation modeling what happens in the next moment in nized, but does not time. In the next moment, the amounts and positions of parts change again to take advantage of current molecular techyield yet another state. This, in turn, is the starting point for the simulation nology in biology. covering the next interval. The process is repeated as long as one has the bucks to pay for time on the computer. If each interval lasts a microsecond, repeating the simulation steps one million times will show how the system will look one second (one million times one millionth of a second) after the moment when the simulation was set in motion. If the program correctly models its parts and their interactions, the model should predict the behavior of the system as a whole over time. We can even go to lunch as the computer works. Cool stuff.

129

Computational simulation also offers the potential for predicting the behavior of the system upon perturbation. If our first program accurately represents the reality of the initial system, we can model its perturbation by simply changing a few of parameters. Again, we may go to lunch. When we return, the computer should tell us how the system is predicted to behave upon perturbation. If we then perturb a real system and observe its changed behavior, we can observe whether the prediction was correct. Again, cool. Enabled by the dramatic growth in computer power, numerical simulations of this type are found everywhere. Simulations are used in planetary science to predict the path of hurricanes. Simulations are used in economics to model tomorrow's close for the stock market. Simulations are used to model the folding of proteins and their binding to small molecules as potential drugs. You yourself may have used your screen saver to do a simulation, perhaps in the hope of helping to design a new anti-AIDS drug. In such simulations, a protein from the human immunodeficiency virus (HIV, which causes AIDS) is modeled in the computer. Then, molecules that are candidates for drugs that might bind to the protein are modeled. Numbers are introduced that represent the attractions and repulsions between individual atoms in both molecules. The system is set running within your screen saver while you go to lunch. If the attraction between the potential drug and the protein is simulated to be large enough, the simulated molecule binds and a drug candidate is found. If the repulsive forces dominate, the simulated molecule does not bind. In that case, the program discards that molecule and sets your screen saver to start simulating the binding of the next drug candidate to the protein. Models for artificial life in silico Given the explosive growth of computer power, it is hardly surprising that simulations would be proposed as a tool to better understand life as a universal. From this has emerged a field called artificial life. This approach does not generate self-sustaining chemical systems capable of Darwinian evolution. It does, however, generate computer programs that represent how components might interact in a living system. The goal is to have the simulation produce as output something that looks like the output of an actual living system. For example, the computer might generate the patterns of colors on the skins of animals or the shapes of the leaves found on a Japanese maple tree. One goal of artificial life is to learn something about the number and nature of rules and parameters that are needed to generate order and complexity at various levels. For example, how many rules are required in a model that generates the patterns of colors on the skin of fish? How simple can the program be to generate the shapes of maple leaves? This approach has provided some interesting answers. In some cases, rather complex output (to the human eye) can be obtained from very few rules. Programs generating such output, some quite aesthetic, can even be used as screen savers. Some biologists dismiss artificial life research as "game playing". It is, for biologists. However, it does have value as an experimental way to examine patterning, order, and complexity. For example, it is interesting to know the smallest number of parameters required to generate the color patterns on a fish, even if this is not the number Some patterns created in silico (meaning in the silicon of a of parameters that natural fish actually evolved to computer chip) by artificial life programs that implement a simulation using only a few parameters. create those patterns.

130

We can try to put life back together by simulating real chemical systems Despite its value to explore the type and amount of information needed to create natural biological form, artificial life research need not help us understand how a chemical system might support Darwinian evolution, either in the particular or as a universal. In part, this is because in silico models of life may have parts that behave like no real chemical parts could. The behavior of the parts residing within an artificial life form "living" in computer need not be constrained by real chemical rules. We might, however, come to learn something about how real life works by building computer programs that numerically simulate real chemical parts listed from real biological systems. This thought has led to the new hyphenated sub-field of biology called "systems biology". Systems biologists accept some simple premises. First, they assume that the analysis is essentially complete for the living system that captured their interest; essentially all of the parts in that system must already be on a list. As noted above, this assumption may be reasonably accurate for bacteria such as E. coli. Systems biologists working with more complex systems (including humans) hope that the assumption is also actionably accurate for more complex systems as well. Next, systems biologists must have in hand technology that can measure the amounts of the parts inside of a living system. These amounts are put into the computer program as parameters that describe the system at the start of the numerical simulation of the bacterium. Also necessary are observations that quantitatively describe how the parts interact. Does a particular metabolite bind to a particular protein? If so, how fast and how tightly? What happens to the metabolite after it is bound? The program must contain parameters describing these interactions, as these determine how the system changes over time once the numerical simulation is started running. Might a computer simulation, built with all of those parameters, accurately simulate the behavior of the living system as a whole? If so, this might lead to predictions for the system and perhaps even an understanding of the system under a new scientific aphorism: "If I can simulate it, then I understand it". For example, might a simulation predict how frequently E. coli divides given a certain amount of food? Still better, might a more complex simulation for humans allow us predict how humans would behave when perturbed, say, by the administration of a drug? Systems biologists constructively believe that the answers to all of these questions are "yes". Accordingly, departments of systems biology have been established at major universities, including the University of California at Berkeley. Leroy Hood, an effective advocate for systems biology, set up his own Institute for Systems Biology in Seattle (www.systemsbiology.org), not unlike the institute that Peter Mitchell established to explore his chemiosmotic hypothesis or the institute that I set up in Florida to merge the physical sciences with natural history (www.ffame.org). But much more has happened in systems biology. Hoping that highly parameterized computational models of human Hypothetical metabolic network showing biology will help doctors do "predictive medicine", inves- the links between enzymes and metabolites tors have provided money. Companies have been founded that interact with the pathway that oxidizes acetate in thale cress, a plant. The enzyme from investor bucks. Journals have been initiated. Annual and metabolite parts are represented by red symposia meet. Today, systems biology is (to its advocates) dots. The interactions between the parts are an important new trend in biology. To its detractors, sys- represented by black lines. A total of 43 enzymes and 40 metabolites are shown. tems biology is biology's latest fad.

131

The challenge of developing and using highly parameterized models It is too soon to assess the value of systems research to our understanding of life, either in the particular or the general. One thing is clear, however: Those who call themselves systems biologists are generating huge amounts of data. Further, although some thoughtful commentaries have wondered whether systems biology is any different from classical anatomy and physiology, the fad-like atmosphere does not mean that systems biology will not be productive. Gold rushes do, on occasion, produce gold. In a decade, it will be possible to assess the value of the combination of analysis and numerical simulation offered by systems biologists as a strategy to understand life. However, it is not too soon to comment on the challenges of the computer modeling as a scientific method standing behind systems biology. As computer modeling becomes important in scientific methods everywhere, it is important for the lay public to understand its strengths, its weaknesses, and its potential impact, both for good and bad. Central to this discussion are difficulties of using highly parameterized computer simulations to model complex systems. The problem is simply stated for biology. Biological systems have many parts; let us call the number of parts p. The number of interactions between the p parts scales with the square of p (p2). Doing the math, this means that if E. coli has 100 different kinds of parts, the number of pairwise interactions is 10000 (=1002). If E. coli has ten thousand kinds of parts, the number of pairwise interactions is 100 million. One or two numbers can generally be used to describe the pairwise interaction between each kind of part. Thus, a computer model for E. coli will have on the order of millions of different parameters. We now take a scientific method from geography, which teaches us that: "Everything influences everything, with the extent of the influence diminishing with distance". The interaction between two parts is often influenced by a third. This means that an accurate computer simulation of a system must account for that dependency. If our model must consider many multi-way interactions, the number of parameters becomes astronomical. I count many systems biologists among my friends (or I did before this book). They are smart enough to know that if I am allowed to continue in this vein for much longer, the systems biology enterprise will be undermined. So let us have them join the discussion in a dialog, similar to dialogs presented by Galileo four centuries ago. Sysbio:

With the parts list complete for E. coli, we will do a computer simulation of an entire E. coli cell. Skeptico: Uh, really, Signor Sysbio? How many of the interaction parameters have you measured? Sysbio: In E. coli, maybe 1000 protein-substrate binding interactions and a similar number of proteinprotein binding interactions. Skeptico: Of the millions of interactions that are possible??? Sysbio: True, the number of parts in a living system is large, and the number of their potential interactions is still larger. But most parts inside of an E. coli cell do not interact with most other parts, meaning that most of the possible interaction parameters can be ignored (that is, set to zero). Skeptico: Let's hope so. Systems biologists must constructively believe that most of the possible interaction parameters are approximately zero, enough of the non-zero parameters can be measured, still more of those parameters can be constrained to a range imposed by observations, and the rest can be guessed. Highly parameterized model fitting is dangerous So what are the problems with using a highly parameterized computer program to simulate the whole from its parts. To illustrate, let me start with a chemical system much simpler than E. coli, remembering the time as a student where I first encountered the enormity of the problem. In May, 1978, I attended a Leermakers Symposium at Wesleyan University. The topic of that symposium was the use of computa-

132

en er gy d if fe ern ce

tion to model molecules. Many statesmen of scino experimental value known; published computed values scatter ence were present, including Henry Eyring, the 40 inventor of transition state theory and Frank incorrect published experimental computed value values Westheimer, the inventor of molecular dynamics, 30 retrodict the wrong a kind of numerical simulation. value 20 To students, then as now, nothing is more enpublished computed tertaining than seeing statesmen fight. At the 10 values again fall Leermakers Symposium, we were more than enin line first experimental published 0 tertained. The fight began when Westheimer precomputed value another values correct sented a plot concerning a computation for the suddenly experimental -10 fall in line value energy of a molecule called methylene. Never 1970 1980 1950 1960 mind the details (they are not important to our year of computation or experiment discussion of scientific method). All that you need to know is that methylene has just eleven Values computed for the energy difference between two forms of the molecule methylene (CH , represented parts, two hydrogen nuclei, one carbon nucleus, by blue points) tracked the experimental2 values (red and eight electrons. points) after experimental values became known, but The goal of the computation was to apply an not before. This suggests that scientists doing computer underlying theory (quantum mechanics, which is simulations selected data to fit the hypothesis that they undisputed within the community) to estimate a constructively believed, even if the hypothesis ended up being based on an incorrect experimental value. value for the difference in energy between two Data in this reconstructed plot are from Curt Wentrup's forms of methylene that differed only in the di- book Reactive Molecules. rection in which one of its eight electrons spins. Westheimer plotted that energy difference (blue points) on the y axis as a function of the year in which the computation was done. Westheimer then used the plot to make a point about scientific method. Prior to 1971, various groups published values that they had computed for the energy difference of the two states of methylene. Those values ranged widely, from + 40 to -15 kilocalories per mole. Never mind the units. They are not relevant to the discussion other than to say that the disagreement between various computations was rather large. Then, in 1971, an experiment was done that provided an experimental value for that energy difference. That number (about +8.5) is represented by the first red point on the plot. Soon thereafter, four groups published calculations based on parameterized programs that reproduced the experimental value (the following blue four points in the plot). No longer did the published computed values vary widely. They were now all essentially the same as the experimental value. Fair enough. As an anthropologist of science, you might hypothesize that just as the experiment was done, computational methods improved to allow computational chemists, for the first time, to calculate a correct value. The experiments are hard. The computations are hard. Both improved together. However, four years later, in 1976, yet another experimental value was published. Represented by the second red dot in Westheimer's plot, this new experimental value was higher, +20. Today, it is accepted that this second, higher experimental value is incorrect. The community came to believe that the experimental observations were not interpreted correctly. Nevertheless, immediately after the incorrect experimental value was published, a computed value was published that reproduced the higher incorrect value. Hmmmm, you might say as an anthropologist of science. The observations seem to be consistent with a less respectable hypothesis than the one just mentioned. It appears that computational chemists wrote many computer programs that generated a range of different values for this energy difference. Then, the chemists selected among those, publishing only those values that that come closest to the experimental value (once an experimental value is known).

133

Westheimer then noted that a few years later, a third experimental value was published. This value was back near the original (and correct) +8.5 experimental number. Immediately thereafter, three independent computations were published that reproduced that experimental value. Westheimer concluded that the correlation between the published computational values and the known experimental values meant that published computations were driven by knowledge of experimental values by the scientists. This was the only easy way to explain the systematic differences between the published output of computations done before any experimental numbers were known versus the computations that retrodicted the energy difference after they were known. Westheimer pointed this out before an audience of scientists, many of whom were constructively committed to believing that computational chemistry was, well, a science. More to the point, if Westheimer was making the point was that the computational emperor had no clothes, then the people sitting in his audience were naked. As you might imagine, the chair throwing started immediately. Eyring opened: Eyring: "Dammit Frank, if you are going to give a complete history of computation in chemistry you should have mentioned Polanyi's work, not to mention my work". Westheimer: I did not give a complete history of computation in chemistry. Eyring: That's for sure. At this point, David Loewus, my lab mate sitting beside me, whispered in my ear "Oooh. Black Frank". This was a shorthand way of saying: "Westheimer has scored." The fight went on for another 15 minutes, and was darn entertaining. But the attacks were ad hominem, about whom should be given credit for what, or whose model was better. None were directed to the point that Westheimer had made: The values computed appeared to have been selected from many computed to give the values the simulators wanted. To students of experimental science, as David and I both were, this was more than damning. If there is one thing that we were taught as students about method, it is that we are not allowed to select data from among many experiments to support the conclusion that we want to support. Why not? As noted in Chapter 1, experiments can give a range of outputs for reasons that the experimentalist need not understand. The instrument making the observation might not be working correctly on Monday. Reagents might be contaminated on Tuesday. The test tubes might be dirty on Wednesday. A thousand other variables that we do not know of may give different results on Thursday. If we are allowed to pick the data that we want, constructing an ad hoc explanation to disregard the data that we do not want, then we can effectively draw any conclusion that we want. Not that this does not happen in experimental science. Yet deep in the training of experimental scientists is the recognition that we must manage our personal propensity to believe what we want to believe. This requires first that experiments be repeated, regardless of the data that they produced. Next, if the results are different the second time we do an experiment, we may not simply pick the results that we want. We must do the experiment yet again. Indeed, we must repeat the experiment until the data are consistent, and we understand the reasons what they were not. If we abandon that discipline, we are no longer doing science. Westheimer's point was that (at least some) computational scientists seemed to not have in place such a management process, at least when it came to calculating the energy of methylene. Many scientists like me can recount in detail the time when they first learned to distrust parameterized computations. For example, Freeman Dyson, the physicist whom we met in Chapter 5 considering the origin of life, recounts his student days when he suggested a multi-parameter explanation for a physical phenomenon to his boss, Enrico Fermi. Fermi was a physicist famous for having developed a nuclear chain reaction beneath a stadium in Chicago. Fermi rejected Dyson's suggestion because it had too many parameters. Rejection by one's boss is, in science and elsewhere, a powerful teaching tool.

134

Publication mechanisms drive data selection So computational scientists selected data to fit their desires concerning the energies of various forms of methylene. Naughty naughty? Actually, not really, as it is hard to see how it could be otherwise in a highly parameterized computer computation. Here is why. First, it is practically impossible to measure all of the pairwise interactions between all of the parts in a system having any interesting complexity. Further, even if we could, those measurements would have errors, and the computed output may be sensitive to those errors. Further, the interaction between any two parts of a system generally depends on their distance and orientation (repeating the methodological aphorism from geography). This dependency can, in principle, be described by a mathematical function. But we generally cannot know this function in detail. Therefore, our computation will approximate that function. Thus, a computation starts with incomplete and approximate parameters. Recognizing all of this, what would you do as a student do if your computation did not retrodict a published experimental value for the energy difference in methylene? You know that the parameters in your simulation could not possibly be precise. So it is entirely justifiable for you to fiddle with the parameters until you get your computation to better approximate the "correct" experimentally observed value. After all, if your computation does not produce the "correct" observation, your parameters must be wrong (or worse), and you should adjust them. And suppose you did not, but rather took the failed retrodiction to your Ph.D. supervisor. What would he/she do? Odds are, your supervisor would have you return to the computer and fiddle with the parameters until the simulation retrodicted more closely the experimental value. After all, your supervisor also constructively believes that the experimentally observed value is the correct one. If the simulation does not reproduce the correct value, then the parameters must be wrong. One should change wrong parameters. Suppose you and your supervisor decided nevertheless to write up for publication a manuscript that described the simulation that did not retrodict the observed value. What would the editor for the journal that receives your manuscript do? Especially if the editor had a competing manuscript from a competing laboratory that did retrodict the right value (perhaps because a student in that laboratory had adjusted the parameters)? Of course, the editor would reject your manuscript and publish the competing manuscript. Entirely justifiably, since one should not publish a simulation that did not retrodict the "correct" value, as its parameters must be wrong. At least the simulation behind the competing manuscript has a chance of having the correct parameters. We now enter a new Darwinian realm involving survival in the world of academic science. The nearly universal academic Darwinian law is "publish or perish". Funds go to those who publish (and publish the right answer). No bucks, no Buck Rogers. Unless you and your supervisor fell into line with the standards of the community, you all would be one step closer to academic extinction. Attempting to reverse engineer the model But what about the reverse? Could we not use computers to surmise the correct parameters between parts of a complex system. Here is how we could do it, again in cartoon form. First, we would use numerical simulation to predict something observable about the system. Initially, the simulation would fail, as the parameters would be wrong. But no problem. We would just adjust the parameters until the output of the simulation matches the observations? Could we not get the correct parameters that way? In principle, yes, and this hope stands behind much of systems biology. But the pitfall of this approach is represented by a dictum has been attributed to various individuals in various forms. In the form attrib-

135

uted by Freeman Dyson to John von Neumann: "With four parameters, I can fit an elephant. With five, I can make him wiggle his trunk." Not as eloquent as Occam's razor, and not entirely accurate (see the box). But the dictum captures the difficulties encountered when we attempt to use simulation to infer how the parts interact from observations of the whole. Many different combinations of parameters get the same result. Take a piece of paper and a pencil. Scrawl across it something that you think looks like an elephant. Give your scrawl to a mathematician. He/she will then generate a mathematical equation that approximates your scrawled elephant. The first equation might be simply describe your drawing of the elephant as a circle. One parameter describes a circle in an xy plane How many parameters does it take to fit an ele(x2 + y2 = radius); it is the radius, if we do not care phant, defined in a "connect the dots" drawing with where in on that plane our model for the elephant 36 dots (top). Here are shown fits with 5, 10, 20, and 30 parameters (B, C, D, and E); a model with drawing exists. In fact, if we are not worried about 30 parameters (bottom right) might not satisfy a the size of our model for the elephant (something that middle school art teacher, but would carry most a scientist would call the "scale"), the only choice engineers into a preliminary design. But these are that our mathematician friend must make in modeling not the only set of 30 parameters that will get a our elephant at this level is to say: "I have chosen to "good" fit to your elephant, and it is subjective what a "good fit" is. approximate your elephant drawing as a circle". The next more advanced description might approximate the elephant as an ellipse. This requires the choice of an equation ("let us describe the elephant as an ellipse") and one parameter (how oval the ellipse is). The next more advanced description would have more parameters. These would give an ellipse with a wiggle, and so on. With each parameter added to a model, the fit to your elephant will become better. Eventually, the model will become indistinguishable from your drawing. At the same time, the number of parameters will become large. People who have studied the problem (yes, they exist) can get a fairly good representation of an elephant with about 30 parameters. So why will a similar process not allow us to model the whole system from its parts? Why not add parameters to a computer model of a living cell until it reproduces some experimentally measured property, such as rate of growth as a function of food added? If we reproduce the observed behavior perfectly, will we not have inferred the correct interaction parameters for the parts? Actually, no. It turns out that an output (like growth rate) can be generated by a very large number of different sets of parameters. Further, any observations made on a real system's behavior will include noise. By adding more parameters until the model fits perfectly the observations, we will have succeeded only in fitting the noise in the observations, not their information. The noise, of course, will not be the same the next time we observe, even if we observe the same system. This means that as a model is refined to contain more and more parameters so that it fits known observations better and better, the model comes to fit future observations less and less. Further, within the noisy observation are models that use a different set of parameters, and a different set of equations that use those parameters. In fact, an infinite number of parameterized mathematical models can define an arbitrarily precise observation.

136

One way out of this dilemma is to make the number of observed measurements and their precision outrun the number of parameters used in the model for the living system. For that to happen, the number of measurements must be large compared to the number of parameters. The challenge to systems biology arises from the need for millions of measurements to win the race with complexity. This is heavy stuff, for everyone, not just those who hope to learn about life by putting parts back into a whole using a computational model. It is not just that non-scientific considerations influence the published output of scientists. Nor is it simply that the value of a scientific statement is determined in part by the mental state of the individual who makes it. No one who attempts to formalize the practice of science likes that. Rather, the problem arises because things cannot be any other way. We have no philosophical reason not to ignore computational data that disagrees with observation, because we know when we begin the simulation that the parameters are wrong. Further, a computational experiment exactly reproduced always produces the exact same outcome. Therefore, a failed computation provides less of a constraint on the human propensity to self-deception than a failed experiment. No amount of admonition, education, or peer review can change this fact. This means that we should not set out to create a utopian scientist (computational or experimental) who is always objective, always observes the environment objectively, always constructs models objectively, and always tests them objectively. Rather, the goal is how to construct scientific methods that manage the natural non-objectivity of real human scientists for computational science as well. Managing propensities to believe computers To this end, we must first manage a feature of the human species: a propensity to believe computers. No matter how skeptical individual humans are, distrusting everyone and everything from used car salespeople to televised political advertisements, humans will believe essentially anything thing that a computer tells them. I have no idea how this propensity evolved. It is certainly not Darwinian. Anyone who lives in my part of the United States nervously watches the approach of hurricanes from time to time. We recognize that our ability to survive, get married, and have children can depend on our ability to get out of the way of a particularly bad hurricane.

Computer simulations predicting the path of the Paloma hurricane in November, 2008. Different lines reflect different models, which is where all of the models begin their predictions. (top) The prediction on Saturday starting from the then current position of the hurricane just southwest of Cuba. (bottom) The prediction on Sunday starting from the then current position of the hurricane, over Cuba. Where would you go to preserve your capability for Darwinian evolution? Predicting hurricane paths is hard.

137

Accordingly, we have more than enough motivation to distrust parameterized computational models. And more than enough experience with computer models that predict the track of a hurricane to know their unreliability. We have a joke in Florida: If the computer simulation tells you that the hurricane will come ashore in Tampa in three days, the only thing that inductive logic tells us is that the hurricane will not come ashore in Tampa in three days. Nevertheless, if I see on the web a prediction made by a computer for the future course of a hurricane, I have to bang myself on my head to keep myself from believing it. Here is how the hypothalamic-limbic section of my brain sees things. These are computers. They have billions and billions of circuits operating at the speed of light. They must be right. I cannot blame my instinct on a lack of education. Nor can I blame the education of the scientists who do numerical simulations that predict the path of hurricanes. Everyone understands that this is a difficult problem. As I prepared this book, I went to the library to review textbooks in statistics, modeling, and numerical simulation. They all discuss the challenges outlined above facing those who attempt to discover cause-and-effect laws using parameterized models that retrodict data through simulations. Some simulations in biological molecules + So maybe numerical simulation cannot today put back together the H O parts of an entire biological system like an E. coli cell to deliver an unN C derstanding of the whole. What about the parts of those systems, such N as a single protein molecule inside a cell? Can we simulate single moleO HR H cules from biological systems from their parts, hoping to capture rules + that define the essence of their biomolecular wholes? If so, we will amide leave for later the task of re-synthesizing the whole of the cell from the units other parts. Attempts to understand how proteins behave in terms of the interactions between their amino acid parts is where my professional interaction with parameterized computational models began two decades ago. First, some background. As described in Chapter 4, analyses of proteins determine the sequence of their constituent amino acids. This sequence is the starting point for any effort to understand the protein as a whole. However, the linear sequence is generally not the ending point. The biologically important form of a protein that contributes to our ability to survive, select a mate, and reproduce is generally the folded. In the fold, the linear chain of amino acids scrunches itself up into a ball. This means that if To illustrate how proteins with we are ever going to use computation to understand life as a universal, a repeating dipole fold, get at the very least we should be able to go (as the first step) from the eight magnets with north (red) amino acid sequence of a protein to its folded form. and south (blue) poles and tie Why do proteins fold? As shown in Chapter 6, each amino acid in a them on a string (left). Unless protein is joined to its neighboring amino acids by -CO-NH- units you are careful, they ill find each other and collapse to give ("amide" units). Amide units have an uneven distribution of electrons. a folded structure (right). The oxygen (O) atom of the amide has a partial negative charge, while the hydrogen (H) atom of the amide unit has a partial positive charge. Therefore, a chain of amide -CO-NH- units behaves like a string of magnets on a string, where the north and south poles of the magnets are analogous to the positively and negatively charged ends of the amide dipole. The string has a hard time staying extended because the north poles of some of the magnets attract

138

the south poles of others; the entire thing collapses. Likewise, a protein has a hard time staying extended because the positive charges of some of amide units attract the negative charges of others; the linear protein collapses to give a folded form. In Chapter 6, we noted that folding is not desired in a genetic biopolymer. A genetic biopolymer must be extended to permit templating as part of a process by which it is copied. According to the polyelectrolyte theory of the gene, this requires a repeating charge in the backbone, not a repeating dipole. With a repeating backbone charge, folding is obstructed. With a repeating backbone dipole, folding is facilitated. The level of models that allows the prediction of protein folds Can we predict how proteins fold? After all, this is the first step in trying to understand the whole protein after we have killed it by dissecting it into its parts. Let us reintroduce ourselves to Linus Pauling. This is the same Linus Pauling who helped found molecular evolution with Emile Zuckerkandl and the same Linus Pauling who discovered the genetic defect in hemoglobin that causes sickle cell anemia. This time with Robert Corey, Pauling asked how the repeating dipoles in the backbone amides of a protein chain might interact with each other to help the chain fold. Pauling and Corey used two simple theories: (i) opposite charges attract, and (ii) the atoms in the folded structure cannot sit on top of each other. With these, they set out to find ways to fold a An alpha helix. The white protein chain so that the positively charged ends of some amides got hydrogens form pink intraclose to the negatively charged ends of other amides without having at- chain hydrogen bonds to red oxygens. oms sit on top of each other. To do so, Pauling and Corey made sure that they first understood the parts, using every analytic technology of their time. They made careful observations (using X-rays) of the arrangement of atoms in amide linking units. They noticed that those atoms all lay in a plane. They carefully observed the distances between those atoms in the parts. Then, Pauling and Corey considered two ways that the partial positive and negative charges in the backbone amides might interact. In the first, amides in the protein chain interacted with amides nearby in the chain. Here, the need to avoid the superimposition of atoms was important. The amide joining amino acids 1 and 2 cannot interact with the amide joining amino acids 2 and 3 without having atoms overlap and bonds bend in ways impossible for real organic molecules. However, it is possible the oxygen of the amide unit joining amino acid 1 to amino acid 2 Linus Pauling (1922) as he to interact with the hydrogen atom of the amide joining amino acid 4 to graduated from what is now amino acid 5. Once this interaction takes place, then the oxygen of the Oregon State University. amide unit joining amino acid 2 to amino acid 3 to interact with the hydrogen atom of the amide joining amino acid 5 to amino acid 6. Continuing the pattern, the oxygen of the amide unit joining amino acid 3 to amino acid 4 interacts with the hydrogen atom of the amide joining amino acid 6 to amino acid 7. These interactions cause the protein chain to curl around to give a spiral structure. Pauling and Corey called this spiral structure an alpha helix. In an alpha helix, and the side chains of the amino acids all point outwards to the water surrounding the helix.

139

Next, Pauling and Corey considered the possibility that the amide dipoles from one part of a protein chain might interact with amide dipoles from distant parts of that chain. They considered two models. In the first, the two interacting protein chains run in the same directions. This they called a pair of parallel beta strands. In the second model, the two protein chains run in the opposite direction. Pauling and Corey called this a pair of antiparallel beta strands. In both cases, the amide N-H units from one strand, where the hydrogen atom is partially positively charged, interacts with the amide C=O units from the other strand, where the oxygen is partially negatively charged. Additional strands could be added to the first two to give a beta sheet composed of three or more strands. Pauling and Corey gave the interaction between partially positively charged hydrogen atoms and partially negatively charged oxygen atoms a special name. They called this interaction hydrogen bonding. A hydrogen bond can be formed by any hydrogen atom attached to a more electronegative atom (a nitrogen or an oxygen, sometimes called a heteroatom), and another heteroatom (a nitrogen or oxygen). We will see many hydrogen bonds in a few minutes when we discuss DNA.

1 2 3 + R - R + R + + 4 5 6 - R + R - R + + +   

+

-

H

O

N

C

C

HR

C

N

H C

H

O

hydrogen bond between these

-

+

RH

O

H

C O

+

N H

+

-

+ RH

C

C HR

N

C

HR

-

N

O C

C

C

C

H

N H

+

C O

-

+

RH

O

RH N

-

+

O

H

C

C

N

HR

amide unit

The hydrogen atoms in the N-H units have a partial positive charge, represented by +. These +'s repeat going down the protein chain, and stand ready to interact with the repeating C=O units of another chain, where each oxygen atom has a partial negative charge, represented by -. This holds together two strands of an antiparallel beta sheet. The natural ability of a polymer with a repeating dipole to fold, the hypothesis that folding is essential for catalysis, and the hypothesis that catalysis is essential to life suggests that polymers with repeating dipoles will be found in life universally.

The protein fold may be a universal in catalytic biomolecules Pauling and Corey actually predicted the folding of protein molecules. They did not first know that proteins form helices and strands, and then retrodicted these. Further and remarkably, their predictions were essentially correct. As observations were subsequently made of real proteins, protein chains were found to fold to give alpha helices and beta strands. These assemblages of amino acid parts came to be called secondary structural elements. And where were the computers? Nowhere. No computations stood behind the Pauling-Corey models. Instead, their models were built of metal and plastic, like the plastic brains in school science classrooms. This had advantages. Because the modeling was done "by hand" and involved a good bit of human expertise, the resulting models provided a human-scale "understanding" of what was going on. This is quite different from computer simulations, where "what is going on" is buried deep within thousands of lines of computer code that are generally inaccessible to the human, who therefore must accept the computer's output without understanding its "Why?" Last, it was impossible not to be impressed by simplicity of the theory that Pauling and Corey applied. The only theory was that opposite charges attract; atoms cannot bump into each other. The simplicity and elegance of the helices and strands formed by proteins made many statements that had rhetorical impact, then as now. First, in less than two centuries since Scheele isolated lactic acid from sour milk, and less than a century after the twenty amino acid constituents of proteins had been identified, the simplest theories in chemistry were sufficiently adequate to predict the folding of the catalytic and structural molecule of all terran life. By doing so, the exercise defined the level of theory that created understanding, defined here as being what confers predictive power.

140

Finally, any scientist looking at these secondary structural elements can easily accept the proposition that these structures must be close to universal. While the universal genetic molecule (which should not fold) should have a repeating charge in its backbone, as discussed in Chapter 6, the universal catalytic molecule (which should fold) should have a repeating dipole. Klingons, Vulcans, and Ferengi would have something like our proteins in this respect, even if their amino acids were not the same as the amino acids used on Earth. Faaaaantastic! Another bit of a possible answer to the theme of this book. Can we predict more details of protein folding from basic theory? It was not long before scientists took the next step. They knew that some segments of a protein chain fold to give helices. Others fold to give strands. The remainder did neither (we call these neither-nor structures coils). Suppose we wanted to know how a particular protein folded? After all, this is the next step in understanding the whole of this particular part after we have killed it by dissection. The Pauling-Corey models showed the power of theory in chemistry and the rapid progress it had made. By 1960, to get the whole protein fold, it seemed as if we needed to decide only which segments within the protein folded as helices and which folded as strands. Once this is done (and this would seem to be the hard part), we would simply need to assemble the 10-20 predicted helices and strands into an overall protein fold. This would be the fold that actually creates the function that allows the host to survive, select a mate, and reproduce! To implement this vision, a few ideas were needed beyond those proposed by Pauling and Corey. All of the amino acid backbone amides were essentially the same. So given this similarity, it was clear that something else must determine in a protein which segments folded as helices and which folded as strands? Fortunately, those who studied proteins were assembling data that might address such questions. Many proteins, it turns out, form crystals. Crystallization arranges many protein molecules in a uniform pattern in space. By diffracting x-rays through the crystal, crystallographers could determine the arrangements of atoms within those proteins. In those arrangements one saw helices, strands and coils. Since the sequence of the protein was also known, it was possible to say, one protein at a time, which amino acids lay in helices, which amino acids lay in strands, and which amino acids lay in coils. This suggested an opportunity for statisticians. Statistician, without needing to know much chemistry, might hope that some amino acids would prefer to form helices, while others would prefer to form strands. Still others would prefer to form coils. As a typical protein has 300 amino acids, each crystal structure would can provide about 300 data points to parameterize a tool to discern "rules" for folding proteins. To this end, statisticians began to generate parameterized computer tools to predict protein folds, associating each of the 20 amino acids with three parameters. For example, for the amino acid lysine, the first parameter would represent the probability that lysine would be found in a helix. The second parameter would represent the probability that lysine would be found in a strand. The third parameter would represent the probability that the lysine would be found in a coil. One amino acid, three parameters. Then, we would do the same thing for glutamate; three more parameters would indicate analogous probabilities for glutamate. Twenty amino acids times three kinds of secondary structure gives 60 parameters. Unfortunately, things were not so simple. It turned out that the propensities, as these were called, for individual amino acids to be present in the three different folds (helices, strands, and coils) were not terribly high. Thus, glycine and proline were found in coils with higher probability than lysine or glutamate, which were more likely to be found in helices. But the probabilities were not overwhelming. Therefore, predictions of secondary structures based on propensities were not very good.

141

But statisticians are drawn to parameters like moths are drawn to headlights. Certainly, it was reasoned, whether an amino acid at a position i wanted to form a helix or a strand would be influenced by the amino acid that was before it in the chain (say, at position i minus one), and by the amino acid that was after it in the chain (say, at position i plus one). This gave six more parameters, three for the amino acid that lay before the site of interest, and three for the amino acid that lay after the site of interest. Unfortunately, parameterized tools that included the interaction between the part before and the part after the part of interest also proved not to be predictive. To try to make predictions better, P. Y. Chou and Gerald Fasman at Brandeis added some real chemistry to the statistics. They noted that since it took four amino acids to form one turn of an alpha helix, perhaps one should seek four amino acids in a row that had high propensities to form helices before predicting a helix. Thus, they aggregated parameters for amino acids i + 2, i + 3, and i + 4 in the protein chain But even this still more parameterized tool did not work well. One of the most ambitious efforts to parameterize a computer tool to predict protein folds came from Jean Garnier, David Osguthorpe and Barry Robson. This came to be called the GOR prediction method (the acronym GOR combines the first letters in the last names of its three developers). These scientists argued that amino acids present in a protein sequence up to ten sites before and up to ten sites after might influence the fold at a site. They set out to develop an "information theoretic" model that extracted the dozens of parameters needed to describe those influences. They then wrote a computer program incorporating those parameters to predict where helices and strands lay in any protein sequence. The GOR method departed from the scientific methods of Pauling and Corey. Instead of using underlying chemistry, the GOR tool required that protein sequences as strings of letters speak to fold. But did the GOR tool work? Of course, one is not allowed to test a parameterized prediction tool by retrodicting the folds of exactly the same proteins that had been examined to generate the parameters. This reasoning would be cleanly circular, something recognized to be incorrect scientific method in any field. Instead, the GOR method was tested using a process known as "cross validation". This process used some of the known protein folds to generate parameters for the prediction program. Then, the remaining protein folds were retrodicted using the parameterized program. This process is repeated with different sets of the known folds used to modify the parameters, with the remainder used to test the program. When tested by retrodiction, the published evaluations of the GOR program showed it to be very successful. However, whenever the parameterized program was challenged by a new protein, it failed to predict secondary structure with the same success. One commentary wrote: "There is the sobering experience that the 60-70% successful predictive schemes of the literature, tested on known crystal structures, appear to operate in the unsuccessful percentage range whenever they are applied to a real case of interest." Why? Westheimer's Leermakers lecture provides a hint to the answer. Knowledge of the correct answer provides information even when the observations used to parameterize are kept separate from the observations used to test the parameterized program. At the very least, knowing the correct answer tells the computational biochemist when to stop parameterizing the secondary structure prediction program. Again, it could not be otherwise. If the program does not retrodict the know protein folds correctly, the student should continue to parameterize until it does. And if the student does not, his/her competitor in the academic ecosystem will, and will publish. The student, and his/her supervisor, will lose funding. A Buck Rogers with no bucks, academic extinction will soon follow. Today, we know that the chemical reality behind protein folding does not lend itself to such a simplistic analysis. Whether a lysine, for example, is found in a coil, a helix, or a strand depends on its interaction with many other amino acids in the protein sequence and its interaction with the solvent (water) that surrounds the protein. For example, an amino acid buried within a protein fold and away from the water that dissolves the protein interacts with as many as ten other amino acids. Many of the interacting parts do not

142

come from the immediately adjacent part of the protein chain, even if the preceding and trailing ten amino acids are considered. These interactions are not captured by any rule that assigns propensities to individual amino acids. But those interactions must be considered to predict the structure of a protein And so, following the glimpse into the universal provided by Pauling and Corey, momentum in the field of protein structure prediction vanished. Throughout the 1980's, the field dragged, with computer modelers generating parameterized prediction tools that were claimed to be successful using retrodiction, only to find that as new protein folds were observed, the tools failed. Reviewing the state of the field in 1992, Mary Purton and Timothy Hunt wrote: "The ability to predict folding patterns from amino acid sequences is still, we understand, more a matter for soothsayers than scientists, despite lavish support from optimistic protein and drug designers." Ouch. Unconventional approaches to the folding problem My group entered the protein folding problem from a different direction. As you might expect, our approach were grounded in evolutionary considerations rather than data mining or numerical simulation. It was based on one generalization and two ideas that had, by 1990, been supported by many observations: (a) As proteins within a family of proteins divergently evolve under functional constraints, their amino acid sequence diverges faster than their fold. In fact, homologous proteins that have different amino acids at as many as 70% of their sites still have folds that are indistinguishable to the human eye. (b) How fast a site suffers amino acid replacement depends on where it is in the fold. Changing an amino acid that lies inside of the folded structure is more likely to disrupt the fold (and the function) of the protein than changing an amino acids that lies on the surface. Therefore, amino acids inside the fold are more likely to be conserved than amino acids on the surface of the fold. (c) Given a set of proteins whose sequences have diverged under functional constraints, one can infer which amino acids lie (in this case) on the surface of the fold and which lie inside of the fold, simply by looking at the number of amino acid replacements per unit time. It actually turns out to be more complicated than this, but you get the idea. By observing the rates and patterns of amino acid replacement over the evolutionary history of a protein family, one can extract information about which amino acids are on the outside of the fold, and which are inside the fold. The tools discussed in Chapter 4 are ideal to learn about these rates and patterns. "Inside" and "surface" patterns predict protein secondary structure Once we know the orientation of the side chains of amino acids relative to the inside and outside of a protein in a segment of sequence, we can infer whether that segment folds as a helix, strand or coil. To illustrate, consider the secret code that you may have used in your youth. Wrap a strip of paper around a stick of a certain size. Write a secret message on the paper once it is wrapped. Then, throw away the stick and send the paper by carrier pigeon to your friend. The meaning in the message cannot be extracted by any extraterrestrial who might grab it in flight. But your friend, having a stick of the same size, will be able to read the message simply by wrapping the linear sequence around that stick. The same can be done to predict where surface helices lie in a protein. All one needs to do is wrap the sequence around in a spiral having the dimensions of an alpha helix, which completes a turn

A strip of paper holding apparently meaningless letters (above) reveals its secret message when wrapped around a stick having the correct dimensions (below)

143

every 3.6 amino acids. If the amino acids inferred to lie on the surface end up on one side of the putative helix and the amino acids inferred to lie inside the fold on the other side, one can predict a helix. For example, suppose we infer from the evolutionary history of a protein family that the amino acid side chains of the amino acids in a segment of a protein lie, with respect to the fold, in the pattern: "insidesurface-inside-inside-surface-surface-inside-inside-surface-inside". We predict that this segment of the protein chain forms a helix. In contrast, if the pattern is "inside- surface- inside-surface- inside", we infer that the segment forms a beta strand, because the side chains in a strand alternately lie above and below the strand.

1

5

1 2

1 0

3

6

144

7 1

15

14

8

Testing tools using evolutionary history to predict protein folds Like Pauling and Corey, who used very little theory to predict helices and strands, our approach for predicting protein secondary structure relied on little theory. We knew, of course, of the existence of helices, inside inside strands, and coils, as well as their dimensions. Our approach also drew surface heavily on evolutionary theory and the availability of families of protein inside sequences. But beyond that, we use neither millions of parameters nor 4 16 surface numerical simulations. 11 9 18 But how were we to test our approach? We certainly could not use 2 surface 13 7 our tool to retrodict helices and strands in proteins as a convincing test inside of our tool. This would certainly not persuade you, who have paid good surface money to buy this book, and suffered through six chapters to get to this inside inside point. You know better. If one looks down the long axis Accordingly, in the late 1980's, we joined many individuals who were of an alpha helix, one observes attempting to change the culture in the community of scientists working that the side chains in a strand with an assignment pattern to predict protein folds from protein sequences. We began by proposing, inside- surface- inside- insideand then insisting, that protein structure prediction programs be tested surface -surface- inside- insidenot by retrodicting the folds of known proteins, but rather by making surface-inside" sort themselves blind predictions of unknown folds. We predictors needed to stick our nicely into a surface of the helix necks out. We needed to make and publish a prediction before we knew that makes contact with the protein (left) and a surface that whether it was correct. If our prediction was wrong, that had to be contacts water (right). plainly obvious to everyone. Accordingly, we asked some friends at Sandoz, a pharmaceutical company in Switzerland, what they considered to be the most interesting protein, from a medical perspective, whose structure was not already known (but might soon become known). In an instant, an answer came back: protein kinase. Protein kinase is a protein that takes a phosphate group from ATP and transfers it to another protein. This transfer is key to regulating growth of healthy cells. Further, the process becomes uncontrolled in many kinds of cancer. Scientists at Sandoz were targeting the protein kinase for anti-cancer drugs; drugs such as Gleevec have subsequently emerged that work by binding to single protein kinases. The family of protein kinases already in 1990 had over 70 members with known amino acid sequences. These sequences aligned nicely. This meant that we could easily learn something about the historical patterns of replacement and conservation at different sites in the protein family. Applying our approach using evolutionary analyses, we predicted the strands and helices in protein kinase. We missed a long internal

286

313

274

301

307 7

active site 262 12

152 153

131 22

Peptide Binding Site 241

156

42

201

186

124

general base g-phosphate 46

182

237

ATP Binding Site

48

115

61

161

67

Peptide Binding Site

74

163 177

111

84

g M

+ +

helix in the second part of the fold. Nevertheless, we were able to predict that strands in the first part of the protein aligned in an antiparallel fashion to give a nice antiparallel beta sheet. I wrote up a manuscript describing the prediction in Madrid in the summer of 1990, with the window open to the sounds of a bullfight in the distance. Dietlind Gerloff, now on the faculty of the University of California at Santa Cruz (and the creator of one of the cartoons in this volume) finished it off. Together with a description of how the prediction was done, the overall report was some 200 manuscript pages.

212

166

How do you publish a prediction? We immediately encountered a sociological problem. A prediction is no good until it is published. We predictors need to run the risk of being embarrassed by failure. But how could we publish our manuscript predicting the structure of protein kinase? We approached several jour- Predicted fold of protein kinase published in 1990 in Advances nal editors, but got back a disappointing response: The editors would in Enzyme Regulation. publish our prediction, but only if it was known to be correct. Their logic was inarguable; certainly one should not publish incorrect predictions. However, if only correct predictions are published, another epistemological problem would emerge: The published literature would convey the impression that prediction methods are very good, no matter how bad they actually are. Even random predictions will, from time to time, be correct. If we publish just those, it will appear as if all predictions are correct. This offers another example of how scientific methods associated with publication distort the science. Fortunately, I had met a few years earlier George Weber, a distinguished cancer researcher at Indiana University. George was also interested in the evolution of disease, and we hit it off splendidly. George ran an annual conference in Indianapolis that generated a book in a series entitled Advances in Enzyme Regulation. Participants in the conference were expected to write a chapter for the book. That chapter had no page limits. The chapters were not refereed by our peers. This, Dietlind and I thought, was an opportunity to publish our protein kinase prediction before we could know whether or not it was correct. We therefore converted our 200 page manuscript into a paper for publication in Volume 31 of Advances in Enzyme Regulation. The paper laid out our tool at the same time as it applied the tool to a true prediction of a protein structure. To stick our necks out still farther, Dietlind and I sent a copy of the prediction to Susan Taylor, a biochemist at the University of California at San Diego who was collaborating in the team trying to solve the structure of protein kinase. Susan later told me that when she got our prediction, she put it in her drawer, pulling it out only after she and her colleagues had solved the crystal structure of protein kinase. When she had the experimentally observed structure in hand in 1991, she looked at the prediction that we had made a year earlier. Her reaction was reported in the paper that reported that experimental structure: 226

beta strand with bend indicated

alpha helix

103

93

coil, loop, or turn

"Although most of the predictions of secondary structure in the C subunit have been quite inaccurate and do not correlate well with the actual structure, the recent prediction by Benner and Gerloff is an exception. Their prediction of the secondary structure, based on chemical information and homologies within the protein kinase family, is remarkably accurate, particularly for the small lobe." Others concurred. Janet Thornton, at the University of London, commented in the journal Nature:

145

"Benner and Gerloff tackled secondary structure prediction; this was essentially a case study of the catalytic domain of protein kinases, the structure of which was then unknown. The cause for excitement is that the structure has since been solved by X-ray crystallography, and Benner and Gerloff's prediction of core secondary structures was much better than that achieved by standard methods." Arthur Lesk and Ross Boswell agreed. "Particularly noteworthy is the work of Benner and Gerloff who have achieved remarkable results in the a priori structure prediction of the catalytic domain of protein kinase C, by a method that involves careful analysis of the patterns of residue variation in a family of related sequences. This is a spectacular achievement and it is our feeling that it will come to be recognized as a major breakthrough, but so far it is an isolated success." The computational community strikes back. Living the Kuhnian vita loca A second theme of this book is that multiple fields of science must work together if we are to understand life as a universal. Here, as practicing scientists, we had a ring-side view of what happens when multiple fields interacted. Here, the fields were evolutionary biology, protein chemistry, and computational science. We had some guidance from none other than Thomas Kuhn. When he analyzed the structure of scientific revolutions, Kuhn reached some startling conclusions. A new idea is not so much rejected as it is misunderstood by the community of scientists being "invaded" by the idea. Those who view the world in the light of the new idea often speak the same language as those trained in the predecessor idea. However, the words mean different things to those from the new and predecessor paradigms. The discussion becomes disengaged; people talk past each other. We have seen this in Chapter 1 (organismic versus evolutionary biologists at Harvard), Chapter 4, and Chapter 5. As it turned out, the 1990's took us on a fantastic ride through the world of Kuhn as we attempted to get evolutionary-based fold prediction into the mainstream of the protein fold prediction community. First, those who made a living building highly parameterized prediction tools via information theoretic processes were not enthused that someone who did neither statistics nor numerical simulations, or anything else that looked like what a prediction tool should look like, but was still evidently able to predict protein folds. In particular, after we published our second prediction (for a protein called the SH3 domain), Robson and Garnier went on the hunt for our hides. We awoke one morning to read their commentary in Nature: "Benner et al. have fallen afoul of the mistake of forgetting that one swallow does not make a summer. GOR methods have survived for more than a decade because of their formal correctness, ease of reproduction, and methods of objective testing. By seeking to incorporate intuition, insight and expertise, Benner et al. do not satisfy these criteria. Reproducibility is the cornerstone of science, and although that may appear hampering to the creative, it does have very considerable benefits." Ouch. And curious in its content. We were more than prepared to agree that one swallow does not make a summer. After all, the SH3 domain prediction was our second prediction. But we found it interesting to see a nearby scientific culture, one that certainly needed to be included in any effort to understand life, arguing intuition, insight, and expertise are bad. In our culture of chemistry, intuition, insight, and expertise are considered to be good. So we found ourselves in a polydisciplinary purgatory. In our field, Robson and Garnier were paying us the highest complement, using the very words that in their field were evidently the nastiest of insults. Remarkable. But that was not all. As mentioned above, our first prediction (for protein kinase) had been fairly good (other than missing an internal helix). Our second prediction, for a cancer protein called the Src homology domain 3, was about as good. But success did not seem to impress the GOR culture. Instead, it seemed

146

that it valued a tool that had "formal correctness", "ease of reproduction", and easy "testing" but failed to predict protein folds correctly over a tool that succeeded, but did not meet these formal criteria. Again remarkable. The field was driven to change As it turned out, the GOR criticism helped to drive the field to shift its standard-of-proof. This shift was further driven by the fact that several groups had begun to predict protein folds using evolutionary analysis without highly parameterized computer programs. For example, Geoff Barton applied such an analysis to predict the fold of the SH2 domain, without foreknowledge of the fold, again successfully. Fernando Bazan applied an analogous approach to interleukin 2; his predicted fold was so successful that it corrected an error in the experimental structure. This is remarkably different from the results reported in the Leermakers symposium, where, modelers reproduced experimental error. The jolt of these predictions convinced many in the community that its members needed to join us in doing bona fide predictions if they were to remain credible to scientists. This conviction was implemented in 1995, when a project was organized under the title "Critical Assessment of Structure Prediction" (CASP). Here, crystallographers were asked to present proteins whose folds had not yet been determined, but soon would be. The prediction community was then challenged to predict their folds and submit them for judging. The community then met in December to hear the judges' reports. Again, what happened is instructive as we seek to understand scientific method as it is actually practiced. Two proteins were submitted as de novo fold prediction targets in the CASP project: phospho-beta-galactosidase and synaptotagmin. Those using our evolutionary approach or one of its variants successfully identified the folds of both of these targets, either directly (for phospho-beta-galactosidase) or as one of a small number of alternative folds (for of synaptotagmin). This was the case in both my laboratory and in the laboratory of Chris Sander and Burkhard Rost, who had developed a neural network that also used evolutionary information to predict secondary structure. The GOR method did less well. CASP also had contestants who tried out highly parameterized numerical simulation tools to predict protein folds. They failed badly. The distinction between the success of predictions made by evolutionary analyses and the lack of success of predictions made by numerical simulation was noted by Thomas DeFay and Fred Cohen, who served as judges for the contest. Did the community then move accordingly? In part, yes. For the large part, however, the response was Kuhnian. Those from the community Predictions made for phosphowho did numerical simulation could not understand how it was possible beta-galacosidase in the 1995 for tools that did not involve numerical simulations to generate a correct CASP project from the Benner prediction. Accordingly, many in that community concluded that evolu- laboratory (top) and using the tion-based tools had not provided a correct prediction, even though they GOR tool (bottom). Correct helices are in blue. Correct had. John Moult, one of the organizers of CASP, insisted that no suc- strands are in green. Incorrect cessful predictions had been made in CASP. From the middle school predictions are in red. The perspective that holds that scientific theories are confirmed by making Benner-Gerloff prediction alsuccessful predictions, method had failed. As often in science as it is lowed the overall fold to be correctly predicted.

147

actually practiced, success of an extra-community view is simply denied if it contradicts a community view. The 1995 CASP project was followed two years later with CASP2. Here, heat shock protein 90 (Hsp90) was offered as a prediction target. Biologists did not know a function for Hsp90; it was recognized only as a protein that was made by cells after they had been shocked by heating (hence its name). Two groups (mine and the group of Fernando Bazan) independently predicted a fold for HSP90. Both laboratories used an evolutionary analysis; both produced results that were not obtained by numerical simulations. Both concluded based that Hsp90 was a distant homolog of a protein called gyrase. Several features of the Hsp90 prediction were interesting. First, two different laboratories predicted the same fold using tools that relied heavily on human intuition, insight, and expertise, those bugaboos of Robson and Garnier. This lay to rest the question of “reproducibility” of prediction methods that exploited evolutionary analyses. Even though the predictions were not fully computational and exploited human expertise, the methods were transferable. In this respect, the prediction methods were similar to most methods used in chemistry. Further, the prediction about function contradicted experimental observations of Hsp90 that had driven the conclusion that Hsp90 could not be a gyrase. Once the CASP2 predictions emerged, experimentalists re-examined their data. As with the methylene discussed at the Leermakers symposium by Frank Westheimer, it was possible for experimental data to be wrong. But this time, theory did not follow the experiments. Rather, theory corrected experiments. The crystallographers who had solved the structure of Hsp90 were also appreciative. These scientists wrote: "The tertiary fold of Hsp90 N-domain has a remarkable and totally unexpected similarity to the Nterminal ATP-binding fragment of DNA gyrase B protein. This similarity was not initially recognized by the authors of either the human or yeast structures but was determined within the CASP2 structure prediction competition." How experiments end By the end of the 1990’s, evolution-based tools to predicting protein fold had been applied in laboratories around the world to predict the folds of several dozen proteins and to apply those predictions to solve real biological problems. Foremost among those was the use of predicted folds to confirm or deny suspicions of distant homology, and to support or deny efforts to transfer functional annotation from family to family. In these regards, both the protein kinase and Hsp90 predictions were examples. These tools relied on the growing availability of proteins as families in the biosphere. These predictions have been used to analyze the function of proteins as well as their evolutionary history. While the predictions are far from perfect, as one might expect from an understanding of how they are generated, they have been useful to confirm distant homology and impute new function (as in the case of Hsp90 and gyrase) as well as to deny distant homology (which was the case with the very first of these predictions, for protein kinase). But the Kuhnian model for conceptual change described well the interaction between evolution-based predictors, information theoretic predictors, and numerical simulators. Among the first, a new community emerged that accepted the fact that useful predictions of protein folds could be generated by evolutionary analyses of families of proteins. This community developed those tools, used them, and came to an understanding of their limitations. Today, luminaries such as David Baker in Seattle build on this ground almost without acknowledging it. That process was understood by experimentalists, including those who determined protein folds using X-ray crystallography.

148

In contrast, many members of the numerical simulations community simply could not believe that useful predictions could come from any "formally incorrect" tool. Therefore, they denied the observation that such tools provided useful predictions. What was to be done next? For us, the story became complicated. My group moved from Switzerland back to the United States by the end of 1996. Upon our return, we were funded by the National Institutes of Health to carry this work forward. When it came time to renew the grant, we were funded again. However, after the money was promised, I got a telephone call from the National Institutes of Health saying that instead of giving us bucks to carry this work forward in an academic environment (where it would be available to the community as a whole), the NIH would instead give the bucks to a company that I had founded to commercialize an evolutionarily organized database (called the MasterCatalog) that would incorporate structure predictions into its family structure. The idea was not fundamentally bad. A commercial software product can also be available to the community as a whole. Initially, things went smoothly. The MasterCatalog product was developed, used to organize and distribute a large collection of data from Genome Therapeutics, and generated some $3.4 million in sales. But bringing technology into private hands exposes it to the vicissitudes of start-up company economics. The management at the company charged with developing the product changed; the new management made strategic errors that caused the product to lose much of its commercial value. Today, we are working to extract the MasterCatalog from corporate limbo. As you might imagine, this is an entirely different culture, and will require an entirely separate book to describe. When can we say that we understand? What have we learned so far about how science is done as we attempt to reassemble the whole from the parts? A lot. We have seen how the processes of publication and funding influence the outcome of science. Anecdotes show the difficulty of managing the natural human propensity for self-deception when doing highly parameterized computer modeling. Accordingly, many theorists suggest that the only theory worth doing is theory that directly leads to experiments, or directly interacts with experiments; interaction with experiments is a way to manage that propensity. Further, anecdotes show how paradigm changes are problematic. Fields like exobiology, where multidisciplinarity is required, must manage all of this. But it remains clear that we will not understand any whole, biological or otherwise, until we somehow assemble the whole from its parts. We have failed to identify a solution to get the whole from the parts. And except for a small advance here and there, we are having a difficult time getting insight into life as a universal. Perhaps we can make some progress by once again becoming epistemological. Let us as a simple question: What do we mean by "understanding"? One way to approach this deeply difficult problem is to develop an operational definition (of a sort) for "understanding". Here, "understanding is what understanding does". Specifically, if we truly possess understanding, we should be empowered. We should be able to do things, manipulate things, and build things that someone who lacks understanding cannot do, manipulate, or build. In this respect, this definition-theory is similar to a pragmatic definition-theory of knowledge. Knowledge is what confers power to manipulate, predict, or design. With this in hand, let us ask: Do we understand proteins? The ability to predict the fold of a protein can be interpreted as evidence that we do. To the extent that Pauling and Corey could predict the existence of alpha helices and beta strands, or that Barton, Bazan, Benner, and Gerloff can actually predict the fold of heat shock protein 90, the SH2 domain, the SH3 domain, phospho-beta-galactosidase, and protein kinase, it can be argued that something must be understood by these individuals that is not understood by (for example) the computer programs that attempted the same thing.

149

This view of understanding has some practical value. First, it automatically places boundaries on the theory that is needed to empower, and therefore needed to understand. Thus, to predict the existence of helices, strands, and sheets, Pauling and Corey needed only very precise descriptions of the parts of a polypeptide chain, and two theories represented by the aphorisms "unlike charges attract" and "two atoms cannot be at the same place at the same time." This means that these two aphorisms constitute the full extent of the theory that is needed to empower at this level. This, in turn, means that these two aphorisms constitute the full extent of the theory that is needed to understand at this level. Further, the language of understanding is defined by the language of the theories needed to empower. Likewise, to predict the fold of heat shock protein 90, the SH2 domain, the SH3 domain, phospho-betagalactosidase, and protein kinase, Barton, Bazan, Benner, and Gerloff needed only the tools to build precise multiple sequence alignments and theory relating amino acid change to neutral and adaptive theories of evolution. This defines the theory and its associated language needed for this level of empowerment. It also defines the theory needed to understand proteins at this level. Synthesis as way of demonstrating understanding We need not stop here, however. It is conceivable that a higher level of understanding can be demonstrated or, if it is lacking, be developed, by attempting to meet further challenges associated with the assembly of parts. For example, a level of understanding beyond that demonstrated by the prediction community would be demonstrated by a community that can design proteins from their parts. That demonstration would come by an experiment where the designed proteins are then synthesized in the laboratory In The Hitchhiker's Guide to the Galaxy, Slartibartfast explains to Arthur Dent how and observed to show that they fold as designed. A still planets are synthesized on Magrathea, higher level of understanding would come if the designed with some representative models in the proteins also functioned in some way, say as catalysts for a display case of planets they have made. chemical transformation. Again, the language of the design Planets synthesized according to a plan would also be the language of understanding, as this would be could test hypotheses that relate planetary behavior to planetary structure. Unfortuthe language that enables empowerment. nately, the synthetic technology boasted There is no reason to limit this process to proteins. Synthe- by Magratheans is today only fictional, sis as a strategy can be applied more broadly. For example, if meaning that synthesis is unavailable to life is nothing more than a chemical system capable of Dar- planetary scientists as a research strategy. winian evolution, and if chemists can go into a laboratory and whip up a new chemical system that does Darwinian evolution, that artificial chemical system should (if our definition-theory were correct) display all of the properties that we associate with "life". If we do and it does, we have demonstrated our understanding of life. If we Construct from scratch do and it does not, then something must be missing in Life a Darwinian our understanding of life. chemical o

o

H HN

O

R

NH

NH H H HN

R

O

R

Synthesis as a strategy to test hypotheses in chemistry Synthesis is the fourth wedge in our four part approach to understand life, shown in Figure 3.1. It is a strategy, like observation, probing, analysis, and simulation. Further, synthesis has attributes that are not of-

150

system

N

N R

N

H H N

O

N

H

N

O

NR

N

H

N

R

O

H

NH

O

O

NR

N

N

NH H

R

N

HN

NH H

NR

N

N

o o

N

O

NR

N

N H

N

N

R

Synthetic biology

H

H H fered by observation, analysis, or simulation. First, it provides a powerful way to test hypotheses. Second, it offers a mechanism to manage C C H H the propensity of humans to self-deception. Further, it offers a recipe C C for discovery that does not require travel. Last, following the FeynC C man dictum that "I understand what I can make", synthesis defines H H C C understanding. Matching these advantages is one disadvantage: Synthesis, like H H analysis, depends on technology. To apply synthesis as a strategy to a Cyclooctatetraene arranges 8 scientific problem, including the one that is the focus of this book, carbon atoms and 8 hydrogen atoms in a ring with alternating one needs to have the technology to synthesize things. Therefore, the history of the application of synthesis in science is single and double bonds. Willstätter synthesized this molethe history of the development of synthetic technology. Conversely, cule and found that it was not synthesis is not universal as a strategy for all sciences, simply because especially stable, using synthesis to uncover a counterexample some sciences lack synthetic technology. Historically, synthesis has been best developed as a research para- of the rule that organic moledigm in chemistry, simply because chemistry has had the technology cules that place alternating single and double bonds in a ring to do synthesis for 150 years. As we start our discussion of scientific are especially stable. methods that support synthesis as a scientific strategy, it is therefore useful to start with chemistry. Let us return to benzene. By 1860, analysis had shown that benzene is built from six carbon and hydrogen atoms each. By 1860, analysis of many other molecules suggested that carbon atoms make four bonds and hydrogen atoms make one bond. Accordingly, in 1860, Kekulé proposed a model for the molecular structure of benzene consistent with these rules. That model placed six carbon atoms in a ring, held together by alternating a single and double bonds. These used up three of the four valences of carbon. The fourth valence of each carbon was then consumed by a bond to one hydrogen atom; the six hydrogen atoms therefore lay around the outside of the benzene ring. Benzene was recognized as being an unusually stable molecule, espeH cially when compared to other molecules containing carbon atoms douH C H bly bonded to other carbon atoms. In Chapter 5, we discussed this in the C C language of modern theory. There, we noted that benzene has six electrons circling in a ring, and used the term aromatic to describe moleC C H C H cules having such features. Aromaticity was the concept used to anticipate that the nucleobases of RNA might be intrinsically stable products H August Kekulé claimed that as in the pre-biotic processing of organic species. This is, however, the modern way of talking about benzene. For he was dozing before the fireplace, he dreamt of a snake Kekulé and others in the 19th century, what was distinctive about ben- biting its tail inspiring him to zene's structure was the alternation in the double and single bonds in the imagine a cyclic structure for benzene ring. Thus, it was natural to hypothesize that all carbon-hydro- benzene (below) where 6 gen compounds that have alternating single and double bonds in a ring carbon atoms and 6 hydrogen would have exceptional stability. One such molecule has eight carbon atoms are in a ring having alternating single and double atoms in a ring, and the formula C8H8. This molecule was unknown to bonds. The second bond of each Kekulé, but was given the name cyclooctatetraene. double bond holds 2 electrons. How was such a hypothesis to be tested? One could return to the These 6 electrons in a ring were analysis of coal tar to try to observe other cyclic compounds such as suggested to be a particularly cyclooctatetraene that also had alternating single and double bonds. stable arrangement.

151

Many chemists did. But a logical problem immediately presents itself. The compounds that are not stable might not be found in coal tar, just as compounds that are not stable do not survive in the prebiotic soup. Failure to observe other compounds having alternative bonds, such as cyclooctatetraene, would at best be negative evidence against the hypothesis that organic species with alternating single and double bonds are distinctively stable. Many who think about scientific methods hold that negative evidence is not particularly persuasive. After all, failure to observe cyclooctatetraene in coal tar may mean nothing more than that we did not look hard enough. Enter synthesis as a research strategy. By 1905, chemists had developed the technology needed to synthesize organic molecules like cyclooctatetraene. Accordingly, the chemist Richard Willstätter synthesized a sample of the compound and made a discovery. Cyclooctatetraene was not especially stable, like benzene; it was actually rather reactive. Thus, synthesis of a new form of matter allowed chemists to deny a hypothesis that related molecular structure to molecular behavior. Organic molecules with alternating single and double bonds in a ring were not necessarily stable. The discovery made by observing synthesized material led to new hypotheses to relate the stability of organic molecules to their structure more generally. One of these was that molecules having an odd number of alternating double bonds in a ring are particularly stable, while rings with an even number of alternating double bonds are not. This rule came to be understood as the consequence of quantum mechanics, a theory from physics applied to chemistry most broadly by (perhaps you see a pattern here) Linus Pauling. Subsequently, addition molecules were synthesized to study how quantum mechanics influences molecular behavior. Efforts to test such hypotheses drove the synthesis of many new compounds. Further, as we saw in Chapter 5, these hypotheses and the associated concept of "aromaticity" allowed us to say that adenine and other nucleobases are likely to be stable enough to survive tar-formation following their prebiotic synthesis. The 20th century was to produce many, many examples where chemists developed theory using synthesis. This is a history for another book. For now, it is sufficient to note that the construction of new forms of matter supported the development of chemical theory in a way that neither observation, probing, simulation nor analysis, alone or in combination, could have. Synthesis of proteins that fold and catalyze reactions Given my own training as a chemist set in this tradition, it was natural for my laboratory to use synthesis to take the next step to demonstrate our understanding of proteins or, if the synthesis failed, to demonstrate a lack of this understanding. To this end, we set out upon arriving in Switzerland to assemble by design a protein of our own, from its amino acid parts. We used no numerical simulations as we designed this protein. Rather, we added to the Pauling-Corey model one theoretical concept that was clearly followed by natural proteins. This concept, developed by Thomas Kaiser first at the University of Chicago and then at the Rockefeller University, noted that protein chains would fold to give helices if they placed hydrophobic amino acids on one side of the helix, and hydrophilic amino acid side chains on the other side. Again, from the Greek, where "hydro" means "water", "philic" means "brotherly love" (as in Philadelphia), and "phobic" means "fear" (as in "phobia"), such helices were called amphiphilic ("amphi" means "both sides"). This is, of course, the physics approach to the inside-outside dichotomy used when we predicted protein folds. Hydrophobic molecules "fear water", which means that they do not dissolve in water. They are made of bonds between atoms having similar electronegativities, like carbon-carbon and carbon-hydrogen bonds. Returning to your kitchen, pull a specimen of olive oil out of the pantry (it is next to the chowder). Olive oil has molecules that contain chains of -CH2-CH2-CH2- units. Try to get the oil to mix with water. You

152

will demonstrate for yourself the adage "oil and water do not mix." In the language of chemists, this means that the molecules in oil are hydrophobic. Amphiphilic helices made from amino acids search for oil to bury their hydrophobic side. Kaiser, for example, showed that amphiphilic helices liked to stick to cell membranes, where they could get their hydrophobic side out of water by burying in the oils in the membrane. In the absence of a membrane, amphiphilic helices tend to stick to each other. By doing so, they bury their hydrophobic sides in a helix-helix contact with the hydrophobic side of another helix. With this low level theory, it is simple to design an amphiphilic helix. One needs only to place Four amphiphilic helices are expected to form a amino acids having hydrophobic side chains (with - tetramer, so that their hydrophobic sides are buried away from the water in the helix-helix contact sites. CH2- units, such as leucine or alanine) and hydro- This drawing shows the four helix bundle that we philic amino acids (such as lysine, which has a designed in 1988 projected down the helix axes. positively charged NH3+ unit; remember, charged species like to dissolve in water) into the inside-inside-surface-inside-inside-surface-inside-inside-insidesurface-inside-inside-surface-surface pattern. Doing this substitution, we designed the amphiphilic helix in the short protein which has the sequence "leucine- alanine- lysine- leucine- leucine- lysine- alanineleucine- alanine- lysine- leucine- leucine- lysine- lysine". In 1988, we wanted to do more than design a protein that folds into a cluster of amphiphilic helices through the contact between their hydrophobic sides. Kaiser has effectively done that, as had an M.D.Ph.D. student working in the laboratory of David Eisenberg at UCLA in collaboration with William DeGrado, then at DuPont Central Research. We wanted a protein that did something that natural proteins do, such as catalyze a chemical transformation. For this purpose, we turned to oxaloacetic acid, the same compound that we met in Chapter 5 that Robert Hazen and Harold Morowitz sought as a product of their prebiotic reaction of pyruvic acid and carbon dioxide under high pressure. The reverse reaction, involving the conversion of oxaloacetate to pyruvate and carbon dioxide, had been introduced into an industrial process for the manufacture of phenylalanine. Like any catalyst, our designed protein would work in both directions, decarboxylating oxaloacetic acid at low CO2 pressures, and fixing CO2 to make oxaloacetic acid at high CO2 pressures. So why the lysines? The side chain of lysine carries an -NH2 amino group which, as we learned in Chapter 5, is a nucleophilic center. It carries an unshared pair of electrons. At neutral pH, enough protons (H+) are around as electrophiles that these groups are protonated -NH3+ units. Therefore, they are unreactive as nucleophiles. To design a protein that could get these protonated -NH3+ units to become reactive -NH2 groups at neutral pH, we turned to work from Frank Westheimer (the same Westheimer who spoke at the Leermakers symposium). With James Kirkwood, Westheimer had developed a theory that described what happens when several protonated -NH3+ units are brought together in space. Because like charges repel, Kirkwood-Westheimer theory predicts that one of the -NH3+ units will lose a proton to become a reactive nucleophilic -NH2 unit, even at neutral pH, when the lysines are all brought together on one side of an alpha helix. Peter Schultz pointed out to us that there was actually one more -NH3+ unit, the -NH3+ unit at the end of the helix, the first amino acid in the chain. In other words, since like charges repel, with six surface 10

surface

surface

surface

3

14

6

13

inside

7

inside 2

inside 9

inside

11

5

4

7

surface

11 inside

4

inside

inside

12 inside inside

14

surface

inside

1 inside

10

6

5 inside

inside

inside

8

2

inside

1

9

inside

2

13

surface

6

12

inside

13

inside

10 surface

inside 1

12 inside

surface

9

inside 5

8 inside

8

3

surface

1 inside

inside

inside

inside

12

surface

14

7

4

inside

5

inside

3

8

inside

11 inside

surface

inside

4

inside 11

9

inside

2 inside

inside 7

surface

13

3 surface

surface

6

14

10 surface

surface

153

positive charges together, one of -NH3+ units is expected to lose a proton and become a reactive nucleophilic -NH2 unit, even at standard pH. Hold that thought for a moment, as we return to oxaloacetic acid. This molecule, as the name implies, is an acid. Actually, it has two acidic COOH units, one on each end. At neutral pH, therefore, oxaloacetic acid loses an H+ from each end. Once it does so, it has two negative charges. How will oxaloacetic acid, with its two negative charges, interact with our designed protein, which has six or (after losing one proton) five positive charges. Again, the theory is easy: Opposite charges attract. The designed protein, with five positively charged protonated NH3+ units, is expected to bind oxaloacetate, with two negatively charged -COO- units. If theory at this level is adequate to empower. Now, back to the thought. In cartoon form, let us have two of the positive charges on the designed protein bind to the two negative charges on oxaloacetate. This places the middle nucleophilic -NH2 unit right next to the electrophilic C=O unit of oxaloacetate (remembering from Chapter 5 that C=O carbons are electrophilic). The nucleophilic NH2 unit is expected to react with the C=O unit to form a new bond (lysine-N=C) that holds oxaloacetate to the protein. This is called an imine (not to be confused with an amine, which is -NH2) Imines have special properties. First, they can also be protonated. In the oxaloacetate-protein complex, which now has two negative charges from oxaloacetate balancing the positive charges on the lysines, the imine can be protonated. Then, the protonated imine can pull away the electron pair that holds the carbon dioxide unit to the rest of the oxaloacetate. This releases carbon dioxide, the desired product. Imines were known to do this. Indeed, Frank Westheimer (again) had shown that imines were intermediates in natural proteins that catalyzed reactions analogous to the decarboxylation of oxaloacetate. Let us keep track of the theories that we have accumulated so far in the hope of empowering our design. We started with the alpha helices of Pauling and Corey ("opposite charges attract" and "atoms cannot sit on top of each other"). We added to it the "oil and water do not mix" theory, the Kirkwood-Westheimer theory ("like charges repel" theory), and the "positively charged proteins bind negatively charged substrates" theory (a version of the "opposite charges attract" theory). For catalysis, we exploited the "imines catalyze decarboxyation" theory. In the design, we did not use any higher level theories, neither quantum mechanics nor numerical simulation nor highly parameterized computer models. Accordingly, the question is: How much catalytic power can be gotten out of a protein designed to decarboxylate oxaloacetate using theory at this level, cast in such language? Remember, the extent to which theory at this level empowers is the extent to which it can be said to confer "understanding".

154

H3N

N

H O C

O

C

NH3

H H2 C

O

O

C O

H

H3N

:N O

O

C

C

H NH3 H2 C

O

O

N C

O

C

NH3 H H2 O C C

O

H3N O

O

N C O

O

H

H

H3N

C

H

NH3

CO2

C CH2

In our designed oxaloacetate decarboxylase, the protein folds to bring together many positively charged -NH3+ units from lysine side chains. The repulsion between the positive charges causes one to lose H+ and become an- NH2 unit. That nucleophilic unit reacts with the C=O of oxaloacetate, held to the protein by the attraction between its two negative charges the remaining-NH3+ units. The product of that reaction is a protonated imine, which pulls away the electrons that hold CO2 to oxaloacetate away, breaking the bond and releasing CO2.

To answer this question, Rudolf Allemann (now a professor at the University of Cardiff) and Kai Johnsson (now a professor at the University of Lausanne), both graduate students in my group at the time, synthesized the designed peptide. They immediately found that it did indeed catalyze the decarboxylation of oxaloacetate. But how much? And why? Oxaloacetate might be decarboxylated in many different ways. For a synthesis by design to demonstrate understanding, the activity of the designed protein must occur for the reasons intended. Without belaboring the details (which were published in 1993), Rudolf and Kai were able to show that the designed protein did indeed form an alpha helix, where four of these helices came together to bury their hydrophobic sides. Score one point for the "oil and water do not mix" theory. Further they showed that a reactive amine on the protein lost H+, the expected -NH3+ unit in one version of the protein, an unexpected -NH3+ unit in another version. Another point for Kirkwood-Westheimer theory. Further, they showed that an imine was formed between the protein and oxaloacetate, and catalyzed a decarboxylation faster than a simple amine by a factor of about ten thousand. Score one for this level of theory. Then, by comparing catalysis by the designed protein with analogous catalysts that lacked a binding site for oxaloacetate or an NH3+ unit that was not as easily deprotonated, they estimated that about 60% of the catalytic power came from binding (the "opposite charges attract" theory), and about 40% of the catalytic power came by the ease of deprotonation (the Kirkwood-Westheimer theory). Can numerical simulation help design a better catalyst? Do we understand protein folding as it relates to catalysis? The answer is "yes", to a factor of 10,000, and as it relates to this level of theory, if we accept our operational definition-theory of "understanding". We know the theory that confers understanding, because it is the theory that provided empowerment. Now for some numbers. Typical proteins that come from terran life via the process of Darwinian evolution are better as catalysts. While different natural proteins increase the rates of chemical reactions over a wide range, the rate enhancement is generally greater than a factor of ten thousand. Typical rate enhancements are factors of ten million or ten billion. In a few cases, the rate enhancement can be as high as a factor of 1,000,000,000,000,000 (one quintillion, for those who are interested). So while synthesis-by-design has demonstrated that we understand a part of protein catalysis, we clearly need more to understand the rest of it. More synthesis is called for, with more advanced design based on improved theory, you say. Fair enough. But since our work, an intriguing pattern has emerged. Factors of 10000 have appeared throughout many attempts to design proteins that fold and catalyze reactions. For example, Carlos Barbas III at The Scripps Research Institute developed an antibody that catalyzed a similar reaction, also via an imine. The rate was faster by a factor of 10000. Can numerical simulation help? Just recently, David Baker, Ken Houk and a team of collaborators from the United States and Europe attempted to improve the rate enhancement of a protein that catalyzed a reaction identical to that catalyzed by the Barbas' protein by adding numerical simulation to the theories used to design proteins. They first attempted to numerically simulate contacts between the protein and the substrate directly. All proteins that they synthesized using those designs had no catalytic activity at all. If, however, the numerical simulation separated their designed protein from their substrate by about the size of a water molecule, they got catalysts that operated via an imine mechanism, analogous to the mechanism described above. Remarkably, these also produced a factor of 10000 in rate enhancement. So numerical simulations are no better here. Over the past half century, technology developed to apply synthesis to biology There is little doubt that the deliberate synthesis of new forms of matter could speed the development of any science. Imagine, for example, how quickly scientists interested in stellar physics could test hypothe-

155

ses that relate the behavior of a star to the composition of the star if they could only make a few stars having various compositions and observing how they behaved. Or, as in The Hitchhiker's Guide to the Galaxy, what better way to test a hypothesis relating planetary behavior to planetary structure than to have the custom planet builders of Magrathea whip up a few planets with structures as specified, just to observe how they behaved? Unfortunately, we do not have the technology today to synthesize new designed stars or new designed planets. So the power of synthesis as a scientific research strategy will need to wait. Throughout the sciences, the history of the application of synthesis as a strategy has been the history of the development of synthetic technology. This is certainly true in biology. Classically, deliberate synthesis was not possible in biology. New forms of life were created every time a life form reproduced, of course. So, like Shoemaker-Levy falling into Jupiter, opportunities existed to observe perturbations that arose naturally. Humankind has long speculated on the deeper significance of the two headed pig or other defects that Nature occasionally produces by accident. But such speculations have been largely fruitless. In the late 18th century, synthesis of new forms of life came to be guided by increasingly sophisticated husbandry. Here, farmers deliberately selected animals that had desired traits, and allowed only those animals to reproduce. This might be argued to have provided the spark that first suggested the concept of natural evolution, and then the concept of undirected natural selection. But still, animal husbandry or plant horticulture technology could not arrange the deliberate test of specific hypotheses. Just as the astrophysicist cannot today synthesize stars, or geologists planets, with deliberately altered structures, biologists could not synthesize life with deliberately altered structures. That technology might have been useful to our ancestors. For example, if the caveman had a hypothesis that a particular nerve in a mastodon gave the animal the ability to gore his fellow hunters, he could have learned if he was right by synthesizing a mastodon lacking that nerve. Once the mastodon lacking the nerve is in hand, he would see if the mastodon behaves as predicted by the hypothesis. If it did, our mastodon hunting ancestors would have known better where to direct their spears. We still cannot synthesize mastodons. But starting in the late 1960's, synthetic technology began to emerge in biology that allowed biologists to deliberately change the genetic structure of microorganisms. That technology was initially called "recombinant DNA technology". Recombinant DNA technologists found on tools that allowed the scientist to cut pieces of DNA from existing organisms and place them back into bacterial cells in new "recombined" arrangements. Since, DNA controls the synthesis of RNA, RNA controls the synthesis of proteins, and proteins control structure and metabolism in those cells, recombinant DNA technology made possible the test of specific hypotheses that related genetic structure to observed bacterial behavior. Nobel prizes abound in this field. For example, a Nobel went to Hamilton Smith who, in 1970, reported the isolation of an enzyme (called a restriction enzyme) that could cut DNA at specific sequences. For example, a restriction enzyme from E. coli, the bacterium common in the human gut, cuts any DNA double helix that has the sequence G-AA-T-T-C. This sequence of six nucleotides occurs only once every 4000 nucleotides in a DNA molecule. Therefore, treating a chromosome-sized piece

156

of DNA (which may be millions of nucleotides in length) with this restriction enzyme creates manageable pieces of DNA that can be isolated. Over the following decade, over 100 other restriction enzymes were isolated that cut at different sites. Many of these enzymes can be purchased. Extraction technology also emerged in to support synthesis in biology. As we learned in Chapter 6, DNA is a polymer with a repeating backbone charge. This means that DNA moves in an electric field, towards the positive charge. This means that DNA molecules can be separated by electrophoresis. The smaller the DNA molecule is, the faster it can move through a viscous medium placed in an electric field. This allowed individual fragments of DNA cut from natural DNA to be purified. Next, one needed to know the sequence of the DNA that one had extracted. This technology was developed by Walter Gilbert and Frederick Sanger. Back to Sweden for more Nobel prizes. Once one has pieces of DNA, a technology was needed to paste the DNA fragments back together in new combinations. Enzymes called ligases were found to do this (this discovery was not recognized by a Nobel prize). Once recombined in a new form, tools were developed that placed these back into a cell, in a form where the DNA could direct the synthesis of new forms of RNA and proteins. Later, through the work of Marvin Caruthers, Robert Letsinger, and Robert Merrifield (only the last won a Nobel prize), technology was developed to deliberately synthesize new DNA, one nucleotide at a time. Today, for just a few pennies a nucleotide, you can order a DNA molecule 75 nucleotides in length having any sequence you want. Very few who use the services of a DNA synthesizing company appreciate the remarkable nature of the product provided. Some 750 specific chemical transformations are required to make that DNA molecule happen. Each transformation must go to over 99.99% completion. The chemistry that makes this happen is remarkable. The field of synthetic biology (in its classical version) And so dawned the era of "classical synthetic biology". By 1974, it was recognized that recombinant DNA technology had delivered to biology at least a bit of the same synthetic power that chemists had enjoyed for over a century. In that year, Waclaw Szybalski suggested that recombinant DNA technology was creating a new hyphenated biology, which he called "synthetic biology". According to Szybalski's definition, synthetic biology was the field that used recombinant DNA technology to rearrange natural genes and proteins in new contexts, and therefore creates a new form life. Having kids also rearranges natural genes and proteins into new contexts. Szybalski's synthetic biology was different because it was deliberate, conscious, and planned. As with Willstätter's synthesis of cyclooctatetraene, the process of synthesis in biology was first used to test very specific hypotheses in biology, and first in bacteria. For example, in 1973, Stanley Cohen, Annie Chang, Herbert Boyer and Robert Helling in California reported the synthesis of a DNA molecule that recombined two DNA fragments, each separately encoding resistance to one antibiotic. When the recombinant DNA molecule was placed into an E. coli, the synthetic bacterium was resistant to both antibiotics. Nevertheless, even one with modest vision could see that the power of chemical synthesis applied with restriction enzymes, ligases, and the like might eventually be applied to yeast, plants, and animals. Thus, Paul Berg developed the synthetic technology to support the transfer of DNA fragments into mammalian cells. More Nobel prizes. Classical synthetic biology has had enormous impact in medicine. Like our (fictitious) ancestor synthesizing a mastodon lacking a specific nerve to learn whether spears should be directed at that nerve, molecular biologists have removed genes from infectious agents to see if they are rendered non-infectious. If they are, the removed gene is a target for the modern spear, an antibiotic. It was not long before people realized that they could create a new metabolism in an organism by collecting genes that encoded enzymes that catalyzed steps in that metabolism from any organism that might

157

provide them. Today, this process continues. For example, Jay Kiesling and his colleagues in the Department of Synthetic Biology at the University of California at Berkeley are picking pieces of DNA from various sources encoding various enzymes that catalyze chemical transformations that they wish to have in a specific order. The apotheosis of classical synthetic biology is now being pursued by Craig Venter and Hamilton Smith (already mentioned). They are seeking to create a living cell whose entire DNA complement comes from somewhere else in biology. Synthesis as a recipe for discovery and to manage self-deception While we can marvel at this ultimate exercise in classical synthetic biology, it clearly will not provide much information about life as a universal. After all, classical synthetic biology simply re-arranges the pieces of life that have been delivered to us by four billion years of biological evolution on Earth. Baggage and all. Nevertheless, the pursuit of classical synthetic biology to create a new cells illustrates an additional value of synthesis, here as a recipe to make discovery. Synthesis can do more than test hypotheses. When applied to meet a "grand challenge", scientists employing the synthetic strategy are dragged across uncharted territory where they are forced to encounter and solve unscripted problems. Should their theories and/or hypotheses be inadequate, they will fail, perhaps in a public in an embarrassing way (much as an incorrect prediction of protein kinase would have been a public embarrassment). Therefore, the synthetic goal does not allow scientists to respond as they usually do when facts contradict their theories (ignore the facts rather than reject their theory). Because of this, synthesis drives innovation and paradigm change in ways that analysis cannot. The synthesis of an entire cell where all of its components, although natural, come from somewhere else using recombinant DNA technology, is such a grand challenge. It is beyond, but perhaps not far beyond, current synthetic capabilities. Having just seen Hamilton Smith give a talk, it is clear that the struggle to meet this goal is forcing his team to create new synthetic technology. If understanding is defined by how we are empowered, this activity is creating new understanding as well. Modern synthetic biology. Building life from the atom up in the search for interchangeable parts It is more than that. Recombinant DNA technology as it is classically applied in synthetic biology, makes assumptions about the interchangeable parts in a biomolecular system. Specifically, the assumption throughout modern implementations of classical synthetic biology is that the gene, and its encoded enzymes, are the interchangeable parts. Still others, including Drew Endy, Tom Knight, and other engineers doing synthetic biology, argue that these interchangeable parts can be independently validated, permitting them to be recombined just as transistors and integrated circuits can be. There is no reason to expect this to be true. Indeed, work in Wendell Lim's laboratory shows that considerable adjusting is required to rearrange proteins in a biological regulatory circuit before the designed new regulatory output is achieved. But no matter. The challenge drives research. This, in turn creates discoveries, new technology, new empowerment, and therefore new understanding. We have already discussed the type of synthetic biology that is likely to let know more about life as a universal. This type of synthetic biology attempts to build biological systems starting with the atom, which is believed to be universal. Again, if we could only develop the synthetic technology to build a

158

self-assembling chemical system capable of Darwinian evolution, we would have the opportunity to test our definition-theory of life. As well as learning what we can learn as we cross uncharted territory to address one of the grandest challenges of all: Create synthetic life. And to meet that grand challenge, we must return to the molecule that (we think) is central to all of life: DNA. We already noted in Chapter 6 some features of the structure of DNA that are likely to be universal in genetic molecules. For example, the repeating charge of DNA allows the nucleotides to behave as interchangeable parts. How these came to be recognizes relates to another story that is related to Linus Pauling. Nowhere was the influence of Pauling and Corey greater than with two young scientists named James Watson and Francis Crick. Watson began as an amateur ornithologist, a bird guy like Darwin. Crick, at the time he met Watson, was a 35 year old who had not yet finished his Ph.D. Watson was supported by a Merck Fellowship to do research in Denmark when he saw a talk from Maurice Wilkins, who was working in England. Watson asked to have his fellowship extended for a year and transferred to England to learn how to use X-rays to determine biomolecular structures. Watson's mother, on hearing of this, reportedly telephoned the chairman of the Merck Fellowship Board imploring him to decline Watson's request because "that boy needs for once to learn a lesson". Talk about a tough critic.1 Watson nevertheless got permission to move to England, and set out with Crick to apply the Pauling-Corey method to DNA. DNA had been identified to be a polymer built from four nucleoside parts by chemists such as Lord Todd (another Nobel laureate). These parts were known to be joined by phosphate linkers, which provide the repeating backbone charge that discussed in Chapter 6 (the polyelectrolyte theory of the gene). The nucleosides themselves have two parts, a sugar, and a nucleobase. As discussed in Chapter 6, the nucleobases are stable, flat molecular units having many N, H, and O atoms that might interact with each other, just as the N, H, and O atoms of the flat amides in proteins interact with each other. Watson and Crick understood all of this. Further, they had some pirated data from Rosalind Franklin that suggested that DNA had some sort of helical structure, perhaps analogous to the helical structures in proteins. Crick had solved the equations that would connect helical A model of DNA assembled from the pieces used by Watson structures to observations made by bouncing X-rays off of DNA. For a time, however, their modeling went nowhere. This had a simple and Crick. The nucleobase pairs are the silver flat pieces; the explanation: Watson and Crick did not know the correct arrangement of uprights are metal frameworks. atoms in the nucleobases. In other words, they had not done the analysis correctly. This was because they had lifted the structures of the nucleobases from a review by James Davidson, who also did not know the correct structures of the nucleobases. Fortunately, Jerry Donohue was spending a few months at Cambridge on a Guggenheim Fellowship. Donohue was a chemist trained in the laboratory of (guess who?) Linus Pauling, Donahue therefore knew the correct structures of the nucleobases, based in part on his knowledge of calculations using quantum mechanics (the same as discussed for the structure of methylene). As the story goes, Donohue one day looked over Watson's shoulder, saw that Watson was attempting to fit together the wrong nucleobases, and told Watson the correct arrangement of atoms in the nucleobases. Once he did so, Watson fiddled for 1

I promised no footnotes, but some things in the history of science absolutely require a reference. This comes from Robert Olby's book, The Path to the Double Helix, Dover, 1994, p. 309),

159

a bit, and then went to play tennis. When he came back, he solved the structure. Donohue was not a coauthor of the Watson-Crick paper, but was acknowledged. Watson and Crick both won Nobel Prizes, but not Donohue. Application of the Pauling and Corey method again Again, Watson and Crick did not use a computer program to model the structure of the DNA double helix. The level of theory was little more than "opposite charges attract" and "two atoms may not occupy the same space." But the Pauling-Corey approach was to create another "swallow". Once the analysis was done correctly, with a bit of help from Franklin's X-ray data, Watson and Crick were able to predict (not retrodict) the structure of DNA. In doing so, they constructed one of the most elegant structural hypotheses ever constructed in biochemistry. Not only did the model account for the X-ray data; it also accounted for genetics. In doing so, it accounted in the language of chemistry for much of biology. The Watson-Crick "first generation" model for duplex DNA molecule begins with an analogy to a ladder. The uprights of the ladder are the backbone sugars and phosphates of two linear polymers of DNA, the strands. These two strands run in opposite directions (they are antiparallel). Further, the strands are twisted around each other to form a "double helix". The key to the structure, however, was the rungs of the ladder. These are formed by two nucleobases, one from each strand. Here, Watson exploited the same kind of hydrogen bonding between N-H and O units as was exploited by Pauling and Corey when they got amides to come together. Again, hydrogen is more electronegative than nitrogen. Therefore, any hydrogen attached to a nitrogen will have a partial Small Big positive charge, and the nitrogen will have a partial negative H charge. Conversely, both oxygen and nitrogen are more O N H Donor Acceptor N electronegative than carbon. Therefore, oxygen or nitrogen CH C C HC C atoms attached to carbon atoms have a partial negative Acceptor HC H N Donor C N N R C N N C charge. With an amide, the H atom of the N-H unit of the Acceptor H N Donor R O amide forms a hydrogen bond with the O atom of the C=O H guanine cytosine unit of the amide. H H N O In the first generation model, base pairs fit following two Acceptor H3C Donor N CH C C C C simple rules of complementarity. The first, size compleDonor HC Acceptor N C N N H R mentarity, pairs small pyrimidines with large purines. The C N N C Acceptor H missing R O second, charge complementarity, pairs hydrogen bond dothymine adenine nors from one nucleobase with hydrogen bond acceptors from the other. From this come the simple rules that govern rungs of the DNA ladder genetics: C pairs with G and T pairs with A. You can do this yourself, and have a Watson-Crick disWatson and Crick fit together hydrogen covery moment. Cut out models of the structures of the nu- bond donors (red H atoms) with hydrogen cleobases. Then fit them together. It helps if you play a bond acceptors (blue N: and O: atoms, game of tennis first. Either way, however, you will notice where the two dots are electron pairs) for that small cytosine (the C in DNA) presents, from top to small cytosine and thymine nucleobases and large guanine and adenine nucleobases. bottom, a hydrogen bond donor unit (the -NH2), then a hyThe remarkable outcome is that the donordrogen bond acceptor unit (the ring N with its unshared pair acceptor-acceptor pattern of cytosine of electrons), and then another hydrogen bond acceptor unit matches the acceptor-donor-donor pattern (the C=O unit, with its unshared pair of electrons). That is of guanine, while the acceptor-donorthe small piece. Then, its partner, the large guanine (the G acceptor pattern of thymine matches the donor-acceptor pattern of adenine (adenine in DNA), presents (again from top to bottom) a hydrogen misses the bottom hydrogen bonding unit). bond acceptor unit (its C=O unit), a hydrogen bond donor R is where the sugar is attached.

160

unit (the ring N-H), and then another hydrogen bond donor unit (the -NH2). Thus, the donor-acceptor-acceptor pattern on the small cytosine is complementary to the acceptor-donor-donor pattern on the large guanine. The cytosine-guanine nucleobase pairs is joined by three hydrogen bonds. Fit the C and G cutouts together. Look at how the donor and acceptor units fit. The same is true for thymine and adenine, the T and A of DNA. The small thymine presents, from top to bottom, a hydrogen bond acceptor unit (the C=O unit) followed by a hydrogen bond donor unit (the ring N-H) followed by another hydrogen bond acceptor unit (another C=O unit). The large adenine has only two hydrogen bonding groups, from the top a hydrogen bond donor (the -NH2), then a hydrogen bond acceptor (the ring N with its unshared pair of electrons), The thymine-adenine nucleobase pair is joined by two hydrogen bonds. Fit the T and A cut-outs together. See how donor and acceptor units fit. Watson reports that when he saw this structure, he immediately knew that it was correct. In part, this was because the model explained so easily how DNA might replicate. All that was necessary was to pull the two strands apart, and then synthesize a new strand to complement each of the old strands. This synthesis would put A in the new strands opposite T in the old strands, T in the new strands opposite A in the old strands, G in the new strands opposite C in the old strands, C in the new strands opposite G in the old strands. Magic. Two identical DNA duplexes now exist where previously only one existed. The model could even account for mutation, the ingredient in addition to replication that is essential for Darwinian evolution. Mutation was explained by Watson and Crick as the consequence of error in the replication process. Such error would arise whenever the wrong nucleobase was incorporated into a new strand opposite a nucleobase in the old strand. It took some time before the details of the Watson-Crick model were confirmed by experiment. But they were. And so the thought: Something this elegant simply must be universal to life. Could it be so simple? Synthesis provides an answer Could the chemistry behind genetics, evolution, and life possibly be so simple? Could just two theories used by Watson and Crick (opposite charges attract, and atoms do not occupy the same space) together with possibly a third (aromatic nucleobases stack) be all there is to it? If we had another example of life by independent genesis (Chapter 6), we could observe whether a life form unrelated to ours arrived independently at the same molecular solution to the general problem of life defined theoretically as a selfsustaining chemical system capable of Darwinian evolution. As noted in Chapter 4, no amount of analysis H N CH H HC CH of known life on Earth provides N R N R N C Donor N N Donor C H C C H C such a perspective, as all of this life Acceptor O: O: Acceptor Donor N Acceptor N N :N H C C C C H R R Donor Acceptor N: C is related by common ancestry. N Acceptor C Acceptor :O puDAA :O puDDA Donor C HC C H HC H Donor N N C N Synthesis provides a way out of pyADD pyAAD R H R H HC CH HC CH this conundrum. If nucleobase N R N N N R :O :O Acceptor Acceptor C C C H H C H H Donor N pairing were indeed so simple, it N Donor N Donor Acceptor N :N N H C C H C C R Donor Acceptor N C N: puAAD Donor N should be possible to create a new Acceptor N :O Acceptor H H C C HC HC H puADA Donor C O: C N genetic system by moving atoms pyDDA pyDAD R R H around (on paper) within the nuIf all that a genetic system needs by way of nucleobases is a flat arocleobases to have different ar- matic large system and a flat aromatic small system, with red hydrogen rangements of partially positive bond donors having a partial positive charge pairing with blue hydroand partially negative hydrogen gen bond accepts, then the four unnatural nucleobase pairs drawn bonding units. Here, the hydrogen above should support artificial genetics. To design these, we moved bonding groups would be the inter- hydrogen bond donors (H) and acceptors (N: and O:) as interchangeable parts to create new nucleobases that give 4 new pairs changeable parts. We would, of with Watson-Crick geometry (small pairs with large) joined by course, make the nucleobases aro- different patterns of hydrogen donors and acceptors.

161

matic. Then, we should use synthetic technology to synthesize unnatural nucleotides that implement these different arrangements. If such a simple set of theories provides all of the understanding that we need for replication and mutation in a genetic molecule (while retaining the repeating charge in the backbone, which we have argued is universal), it should empower us accordingly. If the chemistry behind genetics were so simple, our synthetic nucleotides would still pair following rules of size and hydrogen bonding complementarity, but differently from the natural nucleobases. We would have a synthetic genetic system. Applying synthetic technology In my laboratory in Zurich, Lawrence MacPherson, Joseph Piccirilli, Tilman Krauch, Ulrike von Krosigk, Chris Switzer, Simon Moroney, Jennifer Horlacher, Johannes Vögel and others took up the grand challenge of making a synthetic genetic system using this set of theories. By shuffling the hydrogen bond donor and acceptor groups, we came up with eight new nucleobases that fit together to give four new nucleobases pairs (cut them out and try for yourself). As with the four standard nucleobases examined by Watson and Crick, our synthetic nucleobases were predicted to pair with size complementarity (large pairs with small) and hydrogen bonding complementarity (hydrogen bond donor pairs with hydrogen bond acceptor). Next we needed to use the synthetic technology to create these new forms of matter, put them into DNA molecules, and see whether they paired effectively. We will not leave you in suspense. We made them all, and put them into synthetic DNA (and, for good measure RNA). The synthetic genetic system worked. We could bind DNA containing the eight new synthetic nucleotides according to rules that formed four additional base pairs. So empowered, we are able to say several things. First, nucleobase pairing is as simple as the 1953 "first generation" model proposed by Watson and Crick. Three simple theories (charge, atoms in space, and aromaticity) are enough create that empowerment, and therefore (according to our definition-theory of understanding) provide the language to support understanding. Synthesis generated a genetic alphabet with up to 12 independently replicatable nucleobase pairs supported by an extended set of Watson-Crick rules. Synthetic genetics supports human health care Perhaps the best demonstration of empowerment is the use of our synthetic genetic system in the clinic to support the care of human patients. The synthetic genetic system allows diagnostics tools to move DNA around without interference from DNA containing A, T, G, and C. Accordingly, our synthetic genetic system is now used in tools to manage the care of patients infected by HIV; these tools developed at Chiron and now marketed by Siemens, allow the physician to follow the level of virus in the patient's blood. This, in turn allows physicians to personalize the care of their patients, allowing them to change the drugs administered just as the virus becomes resistant to the drugs first applied. Our synthetic genetic systems also are applied to the management of care of patients infected with the hepatitis B and C viruses. Still other applications of synthetic biology come in the analysis of cystic fibrosis, respiratory infections, and influenza. Today, synthetic genetic systems help manage the care of approximately 400,000 patients infected with HIV and hepatitis viruses each year. With other synthetic genetic systems, we are developing tools that will allow the genomes of patients to be sequenced more rapidly and less expensively. This is being done with the support of the National Human Genome Research Institute. When applied in human medicine, such tools will ultimately allow your physician to determine the genetic component of whatever malady that might afflict you.

162

The next grand challenge. Can artificial synthetic genetic systems support Darwinian evolution? So what about the grand challenge? The next step in testing our empowerment required us to learn if a synthetic genetic system can support Darwinian evolution. For this to happen, we needed the technology to copy DNA built from the synthetic nucleotides. Of course, copying alone would not be sufficient. The copies must, from time to time, be imperfect, and the imperfections must themselves be copyable. To copy a synthetic genetic system, we turned to enzymes called DNA polymerases. These enzymes copy standard DNA by synthesizing new DNA strands by pairing A with T, T with A, G with C, and C with G. With natural DNA, polymerases are now routinely used to then copy the copies, and then copy the copies of the copies. If done enough times, this is called the polymerase chain reaction, or PCR. PCR was developed by Kary Mullis, who was awarded a Nobel prize. Natural DNA polymerases have evolved for billions of years to accept standard nucleotides, of course. As we tried to use natural DNA polymerases to copy our synthetic genetic system, we encountered problems as a consequence. Our synthetic genetic system differed from the natural genetic system in enough ways that natural polymerases regarded ours as "foreign". Fortunately, synthetic methods were available from classical synthetic biology to allow us to synthesize new DNA polymerases that would accept the new genetic system. Michael Sismour and Zunyi Yang, working in my group, examined a number of these. Without going into details, combinations of polymerases and synthetic genetic alphabets were found that worked together. As Figure 7.1 shows, a six letter genetic system can support Darwinian evolution. Figure 7.1. Synthetic biologists label DNA with a radioactive isotope of phosphorus, allowing it to be detected. Then, the DNA molecules are separated by placing them in an electric field. The repeating negative charge causes DNA to move towards the positive end of the field. When the movement is through a gel, small DNA molecules move faster than large DNA molecules, allowing their separation. The image was obtained by Zunyi Yang from one of these "gel electrophoresis" experiments. Here, a DNA primer, 17 nucleotides long, is built from six genetic letters (like ET, Steven Spielberg's extraterrestrial) and labeled with radioactive phosphorus; its presence is indicated by the black smudges at the bottom of the lanes in the gel. Then, DNA polymerases are challenged to generate its children, grandchildren and great-great grandchildren by repeated copying of the synthetic genetic system in a polymerase chain reaction. The appearance of descendents is demonstrated by the smudge at the top of the gel, indicating the formation of descendents, DNA molecules 81 nucleotides long. The gel shows, following good scientific method, both a positive control, where standard templates are copied with standard primers (these generate standard children), and negative controls, where standard templates are copied with synthetic primers and synthetic templates are copied with standard primers (these should give no children at all). The demonstration that this synthetic genetic system can support Darwinian evolution is in the right four lanes of the gel, where increasing amounts of synthetic children are detectable with increasing numbers of cycles of copying.

163

Today, DNA containing components of our synthetic genetic system can be copied, the resulting copies can be copied, and those copies can be copied. Sometimes, the copies have mistakes; these mutations are then copied, permitting the system to evolve. These are the elements of a chemical system capable of Darwinian evolution. Is this synthetic life? Our theory-definition holds that life is a self-sustaining chemical system capable of Darwinian evolution. The artificial genetic system that we have developed is certainly a chemical system capable of Darwinian evolution. It is not, however, self-sustaining. For each round of evolution, a graduate student must add something, or something must be added automatically by a machine that the graduate student runs. Nor, at the moment, does our synthetic genetic system have a feature that can be directly subjected to Darwinian selection pressures, other than replication itself. This means that our synthetic genetic system does not yet have many attributes that would make some of its sequences "fitter" than others. Therefore, while the synthetic genetic system demonstrates that simple theory can empower/understand the molecular basis of genetics, we are not yet at the point where we can use our synthetic genetic as a "second example" of life, and to see whether it can generate traits that we recognize from natural biology. For this, we return to the need for "bucks". Not surprisingly (and not inappropriately), funding is easier to find to research tools that help manage the medical care of patients infected with HIV and other viruses. To further our effort to obtain a self-sustaining synthetic chemical system capable of Darwinian evolution, my group at the Foundation for Applied Molecular Evolution, the Szostak group at the Massachusetts General Hospital, the Joyce group at The Scripps Research Institute in La Jolla, the Unrau group at Simon Fraser University, and others, supported by the Canadian government, the John Templeton Foundation, and the National Science Foundation, are attempting to meet the "man on the moon" goal. For the next step, it may not be necessary to make an artificial genetic system self-sustaining. It is sufficient to ask: Will this Darwinian chemical system (if sustained by a helpful researcher) be able to generate by way of the properties that we expect from life? Is life that simple? If our system does not, we shall surely mitigate any failure in our search to put this "man" on the "moon" by examining Quine's secondary assumptions. But ultimately, failure will indicate a weakness in our theory-definition of life. On the contrary, success will be a major step forward in our effort to understand life universal. Does synthetic biology carry hazards? A provocative title like "Synthetic Biology" suggests a potential for hazard. Accordingly, the past year has seen a call for a second "Asilomar conference for synthetic biology". This call makes reference to a conference held in Monterrey in 1975 that considered the public hazards of the recombinant DNA technology, the synthetic biology that fit the definition of the term that Waclaw Szybalski suggested in 1974. Much of what is called "Synthetic Biology" today is congruent with the recombinant DNA technology discussed in Asilomar thirty years ago. This includes bacteria constructed to express heterologous genes, proteins having amino acids replacements, and cells with altered regulatory pathways. Placing a new name on an old technology does not create a new hazard, and much of the hazards of modern efforts of this type reflect simply their greater chance of success through improved technology. Those seeking to create artificial chemical systems to support Darwinian processes are, however, creating something new. We must consider the possibility that these artificial systems might create new hazards if (for example) they escaped from the laboratory. Some general principles are relevant to assessing the potential for such hazards. For example, the more different an artificial living system deviates (at a chemical level) from natural biological system, the less likely it is to survive outside of the laboratory. A living organism survives when it has access to the

164

resources that it needs, and is more fit than competing organisms in recovering these resources. Thus, a completely synthetic life form having synthetic nucleotides in its DNA would have difficulty surviving if it were to escape from the laboratory. What would it eat? Where would it get its synthetic nucleosides? This applies to less exotic examples of engineered life. Thirty years of experience with genetically altered organisms since Asilomar have shown that engineered organisms are less fit than their natural counterparts. If they survive at all in the environment, they do so either under the nurturing of an attentive human, or by ejecting their engineered features. Thus, the most hazardous type of bioengineering is the type that is not engineering at all, but instead reproduces a known virulent agent in its exact form. The recent synthesis of smallpox virus is perhaps the riskiest example of synthetic biology. Indeed, suppose one actually wanted to do damage? Would one generate a genetically engineered E. coli? Or place fuel and fertilizer in a rented truck and detonate it outside of the Federal Building in Oklahoma City? We know the answer to this question for one individual. We do not know it for all individuals. Thus, it is difficult to conceive of a state of knowledge where it will not be possible to do harm in non-biotechnological ways easier than engineering biohazards. Any hazard must be juxtaposed against the potential benefits that come from the understanding developed by synthetic biology. History provides a partial guide. In 1975, the City of Cambridge banned the classical form of synthetic biology, recombinant DNA technology, within its six square miles to manage what was perceived as a danger. In retrospect, it is clear that had the ban been worldwide, the result would have been more harmful. In the same decade that Cambridge banned recombinant DNA research, an ill-defined syndrome noted in patients having "acquired immune deficiency" was emerging around the planet as a major health crisis. Without the technology that the City of Cambridge banned, we would have been hard pressed to learn what the human immunodeficiency virus was, let alone have compounds in hand today that manage the infection. Today, classical synthetic biology and recombinant DNA technology allows us to manage new threats as they emerge, including SARS, bird influenza, and other infectious diseases. Indeed, it is these technologies that distinguish our ability to manage such threats today from how we would have managed them a century ago. With these thoughts in mind, a Venn not self sustaining (needs to be fed) diagram can be proposed to assess risk in synthetic biological endeavors. Activities most engineering within the red circle use standard terran synthetic biol biochemistry, more or less how Nature has capable of standard terran evolving biochemistry developed on Earth over the past four Benner-Sismour (parasitism) synthetic biology billion years. Activities outside that circle RISK concern activities with different bioVenter-Smith chemistry; the farther outside the circle, the artificial cell more different the biochemistry is. not standard not capable The green circle contains systems that are terran biochemistry of evolving self sustaining capable of evolving. Those outside the circle cannot, and present no more hazard than a toxic chemical; regardless of its hazard, it is what it is, and cannot get any A representation of the potential hazards of constructing artificial chemical systems, evaluated based on whether they use worse. The blue circle contains systems standard terran biochemistry (inside the red circle) or not that are self-sustaining. They "live" without (outside the red circle, whether they can evolve (inside the green continuing human intervention. Those circle) or not (outside), and whether they are self-sustaining outside the blue circle require continuous (inside the blue circle) or not (outside). The risk is greatest when feeding. Thus, these represent no more of a all three criteria are met (the intersection of the three circles).

165

hazard than a pathogen that will die once released from the laboratory. The greatest chance for hazard, comes from a system that is selfsustaining, uses standard biochemistry, and is capable of evolving. This is, of course, the goal of the Venter-Smith artificial cell, which presents the same hazards as presented by natural non-pathogenic organisms, that it might evolve into an organism that feeds on us. Those hazards, although not absent, are not large compared to those presented by the many natural non-pathogens that co-inhabit Earth with us. What have we learned from synthetic biology? Just a bit about life as a universal. We have shown that alien life with six letters in its genetic alphabet is possible (see Chapter 8). We have developed chemical descriptions of the molecular features that we believe will be universal in genetic and catalytic biomolecules. These are significant (albeit particular) steps as we improve our view of life. I encountered the discussion of But even with research backwards and forwards in time, only pieces of the potential hazards of modern synthetic biology, which fothe puzzle have been put together. cuses on building artificial We have learned much about method in science, and the power of Darwinian systems from the synthesis as a method. By setting grand challenges that force scientists atom up, first in 1988, when I across uncharted territory where we must encounter and solve un- proposed the title Redesigning scripted problems, synthesis drives discovery and paradigm change in Life for a book that I edited for a conference on synthetic biolways that analysis cannot. ogy that year. Several of my The synthesis of an artificial genetic system capable of Darwinian colleagues in Swiss science felt evolution has made the next grand challenge still more ambitious. We that this title would be too proneed to reduce the reliance of the current synthetic Darwinian systems vocative, and asked me to on natural biology, including the graduate students who must feed our change the title to the one that emphasizes that the molecules artificial Darwinian chemical system to keep it alive. We need to make are being redesigned. it self-sustaining. This is not simple rearrangement of existing genetic material to get a "minimal cell". Instead, it is a direct challenge to our definition-theory of life as a self sustaining chemical system capable of Darwinian evolution with chemistry different from what has been delivered to us by natural processes. If life is as simple as our definition-theory suggests, and if this kind of life emerged spontaneously from minerals and simple organic molecules in a prebiotic soup on Earth four billion years ago, surely we should be able to get something analogous in a laboratory. Especially given the advantages of the controlled environment in a modern laboratory over what we presume to have been present on early Earth This "work in progress" has already encountered some curious results. We have had unexpected problems as we attempt to free our own synthetic genetic systems from reliance on proteins from natural terran biology. Some in the community are confident that with a few more bucks, we can surmount these problems, achieving something that (absent the discovery of a natural alien life form on another world) would expand our knowledge of life as a universal more than anything else. If we had a simple form of designed life in our hands, we could ask key questions. How does it evolve? How does it create complexity? How does it manage the paradoxes associated with the limitations of organic molecules related to Darwinian processes? We have more than enough reason to expect that producing in the laboratory a simple, self-sustaining chemical system will tell us more about the essence of life than any other event short of encountering an alien life form directly.

166

Chapter8 Weird Life. Life as We Do Not Know It Congramlations. \Y/C'. have survived seven chap ers of heavy lifting rogeth all live on rhc energy arisin rom ch acomic nucl i thac are the n:rnnanrs of supern ovas, :ill dcliv r co fhe :,urh in it format ion. f co, r , ch mulci cep proc chat onverrs rh c ma.. of radioa riv :icom. mt h ar under th eq ualio11 e=mr, from rherc in ro ch mi. rry, and from chere 1m life, i inefficient. A better way fo r Ii ro l1 ch n gy obtained b rna ' on r ion would be for ir co volv cacaly c char challg ma.s rlirectfr i co en-

n,,,

\\'t-111I

!.¥. L[/i- As\~ Do Nor Al/Ult' ft

e g)'. thnt kind ofl ifc i nor kn wn rnd,ty. lovi opyrightec.l and avaibh l f r ale.\; le c;in u

,-.

nott'; chi . id .1 i [ fu11 coul carr} lif urviving on the .flll't' /iqrwl ,u l'ita11 for. Afier trot. ing 1h111 li11e to rb itfH. t/,e li1uul and g,1, pht1w If/ w, zone on each of the gas giants us ing vo longer differr-nt co_, JS ii Slffi'l'rTiti ,,I fluid. radi.i. The fir. r is rh radiu mea Lll' d

-----. I I

\~ ird Lt.ft'. ife

tr. D() Not KMu 1 f1

277

rom rh cent r of rht'. plan r to th alrirnde wh r diflydro"en eas to 6 upcr ri tical. Tb second radiu is mea ur d from che c ncer co tht'. alcitude , her the r mperature drops t a poinr wh re or0 anlc mole ul are srnble (I r u ay 00 K, 227 ° , or 440 °F, :i high r mpcnirnre in yo ur kitch n oven). If the .se ond radius i. mall r chan the fir {, then there is a "habicabk -ione" on rhc planet wh tc: lifi • 1 urvi c living in sup rcritical di hydro 0 en s a solv nt. lf rh cond radiu i S11pmritia1/ l~vdroge11 }l11id i, found in larger rhan th fi t ho\ •ever, rhen rhe pl:m t ,1 /,, r ii[ Samru if 011~ gol's too di '/1· lrowet'