Introduction To Molecular Genetics And Geonomics


301 100 3MB

English Pages 132

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
1 - Intro to Molecular Genetics and Genomics......Page 1
2 - DNA Structure and DNA Manipulation......Page 37
3 - Transmission Genetics: The Principle of Segregation......Page 87
Recommend Papers

Introduction To Molecular Genetics And Geonomics

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

C H A P T E R

1

Introduction to Molecular Genetics and Genomics

CHAPTER OUTLINE 1.1

DNA: The Genetic Material Experimental Proof of the Genetic Function of DNA Genetic Role of DNA in Bacteriophage

1.2

DNA Structure: The Double Helix

1.3

An Overview of DNA Replication

1.4

Genes and Proteins Inborn Errors of Metabolism as a Cause of Hereditary Disease Mutant Genes and Defective Proteins

1.5

Gene Expression: The Central Dogma Transcription Translation The Genetic Code

1.6

Mutation Protein Folding and Stability

1.7

Genes and Environment

1.8

Evolution: From Genes to Genomes, from Proteins to Proteomes The Molecular Unity of Life Natural Selection and Diversity

PRINCIPLES • Inherited traits are affected by genes. • Genes are composed of the chemical deoxyribonucleic acid (DNA). • DNA replicates to form copies of itself that are identical (except for rare mutations). • DNA contains a genetic code specifying what types of enzymes and other proteins are made in cells. • DNA occasionally mutates, and the mutant forms specify altered proteins that have reduced activity or stability. • A mutant enzyme is an “inborn error of metabolism” that blocks one step in a biochemical pathway for the metabolism of small molecules. • Traits are affected by environment as well as by genes. • Organisms change genetically through generations in the process of biological evolution.

CONNECTIONS Shear Madness Alfred D. Hershey and Martha Chase 1952 Independent Functions of Viral Protein and Nucleic Acid in Growth of Bacteriophage

The Black Urine Disease Archibald E. Garrod 1908 Inborn Errors of Metabolism

1

E

ach species of living organism has a

unique set of inherited characteristics that makes it different from other species. Each species has its own developmental plan—often described as a sort of “blueprint” for building the organism— which is encoded in the DNA molecules present in its cells. This developmental plan determines the characteristics that are inherited. Because organisms in the same species share the same developmental plan, organisms that are members of the same species usually resemble one another, although some notable exceptions usually are differences between males and females. For example, it is easy to distinguish a human being from a chimpanzee or a gorilla. A human being habitually stands upright and has long legs, relatively little body hair, a large brain, and a flat face with a prominent nose, jutting chin, distinct lips, and small teeth. All of these traits are inherited—part of our developmental plan—and help set us apart as Homo sapiens. But human beings are by no means identical. Many traits, or observable characteristics, differ from one person to another. There is a great deal of variation in hair color, eye color, skin color, height, weight, personality traits, and other characteristics. Some human traits are transmitted biologically, others culturally. The color of our eyes results from biological inheritance, but the native language we learned as a child results from cultural inheritance. Many traits are influenced jointly by biological inheritance and environmental factors. For example, weight is determined in part by inheritance but also in part by environment: how much food we eat, its nutritional content, our exercise regimen, and so forth. Genetics is the study of biologically inherited traits, including traits that are influenced in part by the environment. The fundamental concept of genetics is that: Inherited traits are determined by the elements of heredity that are transmitted from parents to offspring in reproduction; these elements of heredity are called genes.

The existence of genes and the rules governing their transmission from generation to generation were first articulated by Gregor Mendel in 1866 (Chapter 3). Mendel’s formulation of inheritance was in 2

terms of the abstract rules by which hereditary elements (he called them “factors”) are transmitted from parents to offspring. His objects of study were garden peas, with variable traits like pea color and plant height. At one time genetics could be studied only through the progeny produced from matings. Genetic differences between species were impossible to define, because organisms of different species usually do not mate, or they produce hybrid progeny that die or are sterile. This approach to the study of genetics is often referred to as classical genetics, or organismic or morphological genetics. Given the advances of molecular, or modern, genetics, it is possible to study differences between species through the comparison and analysis of the DNA itself. There is no fundamental distinction between classical and molecular genetics. They are different and complementary ways of studying the same thing: the function of the genetic material. In this book we include many examples showing how molecular and classical genetics can be used in combination to enhance the power of genetic analysis. The foundation of genetics as a molecular science dates back to 1869, just three years after Mendel reported his experiments. It was in 1869 that Friedrich Miescher discovered a new type of weak acid, abundant in the nuclei of white blood cells. Miescher’s weak acid turned out to be the chemical substance we now call DNA (deoxyribonucleic acid). For many years the biological function of DNA was unknown, and no role in heredity was ascribed to it. This first section shows how DNA was eventually isolated and identified as the material that genes are made of.

1.1

DNA: The Genetic Material

That the cell nucleus plays a key role in inheritance was recognized in the 1870s by the observation that the nuclei of male and female reproductive cells undergo fusion in the process of fertilization. Soon thereafter, chromosomes were first observed inside the nucleus as thread-like objects that become visible in the light microscope when the cell is stained with certain dyes. Chromosomes were found to exhibit a characteristic “splitting” behavior in which each daughter cell formed by cell division

Chapter 1 Introduction to Molecular Genetics and Genomics

receives an identical complement of chromosomes (Chapter 4). Further evidence for the importance of chromosomes was provided by the observation that, whereas the number of chromosomes in each cell may differ among biological species, the number of chromosomes is nearly always constant within the cells of any particular species. These features of chromosomes were well understood by about 1900, and they made it seem likely that chromosomes were the carriers of the genes. By the 1920s, several lines of indirect evidence began to suggest a close relationship between chromosomes and DNA. Microscopic studies with special stains showed that DNA is present in chromosomes. Chromosomes also contain various types of proteins, but the amount and kinds of chromosomal proteins differ greatly from one cell type to another, whereas the amount of DNA per cell is constant. Furthermore, nearly all of the DNA present in cells of higher organisms is present in the chromosomes. These arguments for DNA as the genetic material were unconvincing, however, because crude chemical analyses had suggested (erroneously, as it turned out) that DNA lacks the chemical diversity needed in a genetic substance. The favored candidate for the genetic material was protein, because proteins were known to be an exceedingly diverse collection of molecules. Proteins therefore became widely accepted

as the genetic material, and DNA was assumed to function merely as the structural framework of the chromosomes. The experiments described below finally demonstrated that DNA is the genetic material.

Experimental Proof of the Genetic Function of DNA An important first step was taken by Frederick Griffith in 1928 when he demonstrated that a physical trait can be passed from one cell to another. He was working with two strains of the bacterium Streptococcus pneumoniae identified as S and R. When a bacterial cell is grown on solid medium, it undergoes repeated cell divisions to form a visible clump of cells called a colony. The S type of S. pneumoniae synthesizes a gelatinous capsule composed of complex carbohydrate (polysaccharide). The enveloping capsule makes each colony large and gives it a glistening or smooth (S) appearance. This capsule also enables the bacterium to cause pneumonia by protecting it from the defense mechanisms of an infected animal. The R strains of S. pneumoniae are unable to synthesize the capsular polysaccharide; they form small colonies that have a rough (R) surface (Figure 1.1). This strain of the bacterium does not cause pneumonia, because without the capsule the bacteria are inactivated by the immune system of the host. Both types of bacteria

FPO

R strain

S strain

Figure 1.1 Colonies of rough (R, the small colonies) and smooth (S, the large colonies) strains of Streptococcus pneumoniae. The S colonies are larger because of the gelatinous capsule on the S cells. [Photograph from O. T. Avery, C. M. MacLeod, and M. McCarty. Reproduced from the Journal of Experimental Medicine, 1944, vol. 79, p. 137 by copyright permission of The Rockefeller University Press.] 1.1 DNA: The Genetic Material

3

Living S cells

Living R cells

Mouse contracts pneumonia

Mouse remains healthy

S colonies isolated from tissue of dead mouse

R colonies isolated from tissue

Heat-killed S cells

Living R cells plus heat-killed S cells

Mouse remains healthy

Mouse contracts pneumonia

No colonies isolated from tissue

R and S colonies isolated from tissue of dead mouse

Figure 1.2 The Griffith's experiment demonstrating bacterial transformation. A mouse remains healthy if injected with either the nonvirulent R strain of S. pneumoniae or heat-killed cell fragments of the usually virulent S strain. R cells in the presence of heat-killed S cells are transformed into the virulent S strain, causing pneumonia in the mouse.

“breed true” in the sense that the progeny formed by cell division have the capsular type of the parent, either S or R. Mice injected with living S cells get pneumonia. Mice injected either with living R cells or with heat-killed S cells remain healthy. Here is Griffith’s critical finding: mice injected with a mixture of living R cells and heat-killed S cells contract the disease— they often die of pneumonia (Figure 1.2). Bacteria isolated from blood samples of these dead mice produce S cultures with a capsule typical of the injected S cells, even though the injected S cells had been killed by heat. Evidently, the injected material from the dead S cells includes a substance that can be transferred to living R cells and confer the ability to resist the immunological system of the mouse and cause pneumonia. In other words, the R bacteria can be changed—or undergo transformation— into S bacteria. Furthermore, the new characteristics are inherited by descendants of the transformed bacteria. Transformation in Streptococcus was originally discovered in 1928, but it was not

4

until 1944 that the chemical substance responsible for changing the R cells into S cells was identified. In a milestone experiment, Oswald Avery, Colin MacLeod, and Maclyn McCarty showed that the substance causing the transformation of R cells into S cells was DNA. In doing these experiments, they first had to develop chemical procedures for isolating almost pure DNA from cells, which had never been done before. When they added DNA isolated from S cells to growing cultures of R cells, they observed transformation: A few cells of type S cells were produced. Although the DNA preparations contained traces of protein and RNA (ribonucleic acid, an abundant cellular macromolecule chemically related to DNA), the transforming activity was not altered by treatments that destroyed either protein or RNA. However, treatments that destroyed DNA eliminated the transforming activity (Figure 1.3). These experiments implied that the substance responsible for genetic transformation was the DNA of the cell—hence that DNA is the genetic material.

Chapter 1 Introduction to Molecular Genetics and Genomics

(A) The transforming activity in S cells is not destroyed by heat.

Cells killed by heat

Plate on agar medium

S cell extract (contains mostly DNA with a little protein and RNA)

Culture of S cells

Culture of R cells

R colonies and a few S colonies

(B) The transforming activity is not destroyed by either protease or RNase.

Protease or RNase Plate on agar medium

Conclusion: Transforming activity not protein or RNA S cell extract Culture of R cells

R colonies and a few S colonies

(C) The transforming activity is destroyed by DNase.

DNase

Plate on agar medium

Conclusion: Transforming activity most likely DNA S cell extract Culture of R cells

R colonies only

Figure 1.3 A diagram of the Avery–MacLeod–McCarty experiment that demonstrated that DNA is the active material in bacterial transformation. (A) Purified DNA extracted from heat-killed S cells can convert some living R cells into S cells, but the material may still contain undetectable traces of protein and/or RNA. (B) The transforming activity is not destroyed by either protease or RNase. (C) The transforming activity is destroyed by DNase and so probably consists of DNA.

1.1 DNA: The Genetic Material

5

Protein DNA

Head (protein and DNA)

Tail (protein only)

(A)

(B)

Figure 1.4 (A) Drawing of E. coli phage T2, showing various components. The DNA is confined to the interior of the head. (B) An electron micrograph of phage T4, a closely related phage. [Electron micrograph courtesy of Robley Williams.]

Genetic Role of DNA in Bacteriophage A second pivotal finding was reported by Alfred Hershey and Martha Chase in 1952. They studied cells of the intestinal bacterium Escherichia coli after infection by the virus T2. A virus that attacks bacterial cells is called a bacteriophage, a term often shortened to phage. Bacteriophage means “bacteria-eater.” The structure of a bacteriophage T2 particle is illustrated in Figure 1.4. It is exceedingly small, yet it has a complex structure composed of head (which contains the phage DNA), collar, tail, and tail fibers. (The head of a human sperm is about 30–50 times larger in both length and width than the head of T2.) Hershey and Chase were already aware that T2 infection proceeds via the attachment of a phage particle by the tip of its tail to the bacterial cell wall, entry of phage material into the cell, multiplication of this material to form a hundred or more progeny phage, and release of the progeny phage by bursting (lysis) of the bacterial host cell. They also knew that T2 particles were composed of DNA and protein in approximately equal amounts. Because DNA contains phosphorus but no sulfur, whereas most proteins contain sulfur but no phosphorus, it is possible to label DNA and proteins differentially by using 6

radioactive isotopes of the two elements. Hershey and Chase produced particles containing radioactive DNA by infecting E. coli cells that had been grown for several generations in a medium that included 32P (a radioactive isotope of phosphorus) and then collecting the phage progeny. Other particles containing labeled proteins were obtained in the same way, by using medium that included 35S (a radioactive isotope of sulfur). In the experiments summarized in Figure 1.5, nonradioactive E. coli cells were infected with phage labeled with either 32P (part A) or 35S (part B) in order to follow the DNA and proteins individually. Infected cells were separated from unattached phage particles by centrifugation, resuspended in fresh medium, and then swirled violently in a kitchen blender to shear attached phage material from the cell surfaces. This treatment was found to have no effect on the subsequent course of the infection, which implies that the phage genetic material must enter the infected cells very soon after phage attachment. The kitchen blender turned out to be the critical piece of equipment. Other methods had been tried to tear the phage heads from the bacterial cell surface, but nothing had worked reliably. Hershey later explained, “We tried various grinding arrangements, with results that weren’t very encouraging. When Margaret McDonald loaned us her kitchen blender, the experiment promptly succeeded.” After the phage heads were removed by the blender treatment, the infected bacteria were examined. Most of the radioactivity from 32P-labeled phage was found to be associated with the bacteria, whereas only a small fraction of the 35S radioactivity was present in the infected cells. The retention of most of the labeled DNA, contrasted with the loss of most of the labeled protein, implied that a T2 phage transfers most of its DNA, but very little of its protein, to the cell it infects. The critical finding (Figure 1.5)

Figure 1.5 (on facing page) The Hershey–Chase (“blender”) experiment demonstrating that DNA, not protein, is responsible for directing the reproduction of phage T2 in infected E. coli cells. (A) Radioactive DNA is transmitted to progeny phage in substantial amounts. (B) Radioactive protein is transmitted to progeny phage in negligible amounts.

Chapter 1 Introduction to Molecular Genetics and Genomics

(A)

(B) Infection with nonradioactive T2 phage

Infection with nonradioactive T2 phage

E. coli cells grown in 32P-containing medium (labels DNA)

E. coli cells grown

Phage reproduction; cell lysis releases DNA-labeled progeny phage

Phage reproduction; cell lysis releases protein-labeled progeny phage

DNA-labeled phage used to infect nonradioactive cells

Protein-labeled phage used to infect nonradioactive cells

in 35S-containing medium (labels protein)

After infection, part of phage remaining attached to cells is removed by violent agitation in a kitchen blender

Infecting labeled DNA

Infected cell

Phage reproduction; cell lysis releases progeny phage that contain some 32P-labeled DNA from the parental phage DNA

After infection, part of phage remaining attached to cells is removed by violent agitation in a kitchen blender

Infecting nonlabeled DNA

Infected cell

Phage reproduction; cell lysis releases progeny phage that contain almost no 35S-labeled protein

Conclusion: DNA from an infecting parental phage is inherited in the progeny phage

1.1 DNA: The Genetic Material

7

Shear Madness Alfred D. Hershey and Martha Chase 1952 Cold Spring Harbor Laboratories, Cold Spring Harbor, New York Independent Functions of Viral Protein and Nucleic Acid in Growth of Bacteriophage Published a full eight years after the paper of Avery, MacLeod, and McCarty, the experiments of Hershey and Chase get equal billing. Why? Some historians of science suggest that the Avery et al. experiments were “ahead of their time.” Others suggest that Hershey had special standing because he was a member of the “in group” of phage molecular geneticists. Max Delbrück was the acknowledged leader of this group, with Salvador Luria close behind. (Delbrück, Luria, and Hershey shared a 1969 Nobel Prize.) Another possible reason is that whereas the experiments of Avery et al. were feats of strength in biochemistry, those of Hershey and Chase were quintessentially genetic. Which macromolecule gets into the hereditary action, and which does not? Buried in the middle of this paper, and retained in the excerpt, is a sentence admitting that an earlier publication by the researchers was a misinterpretation of their preliminary results. This shows that even first-rate scientists, then and now, are sometimes misled by their preliminary data. Hershey later explained, “We tried various grinding arrangements, with results that weren´t very encouraging. When

Margaret McDonald loaned us her kitchen blender the experiment promptly succeeded.”

ferred from parental to progeny phage at yields of about 30 phage per infected bacterium. . . . [Incomplete separation of phage heads] explains a mistaken The work [of others] has shown that preliminary report of the transfer of 35S 3 bacteriophages T2, T , from parental to progeny and T4 multiply in the Our experiments phage. . . . The following bacterial cell in a non-inquestions remain unanshow clearly that fective [immature] form. swered. (1) Does any sula physical Little else is known about fur-free phage material separation of the the vegetative [growth] other than DNA enter the phase of these viruses. phage T2 into cell? (2) If so, is it transThe experiments reported ferred to the phage proggenetic and in this paper show that eny? (3) Is the transfer of nongenetic parts one of the first steps in the phosphorus to progeny is possible. growth of T2 is the release direct or indirect? . . . Our from its protein coat of experiments show clearly the nucleic acid of the virus particle, that a physical separation of the phage after which the bulk of the sulfur-conT2 into genetic and nongenetic parts is taining protein has no further funcpossible. The chemical identification of tion. . . . Anderson has obtained the genetic part must wait until some of electron micrographs indicating that the questions above have been anphage T2 attaches to bacteria by its swered. . . . The sulfur-containing tail. . . . It ought to be a simple matter to protein of resting phage particles is conbreak the empty phage coats off the infined to a protective coat that is responfected bacteria, leaving the phage DNA sible for the adsorption to bacteria, and inside the cells. . . . When a suspension functions as an instrument for the injecof cells with 35S- or 32P-labeled phage tion of the phage DNA into the cell. This was spun in a blender at 10,000 revoluprotein probably has no function in the tions per minute, . . . 75 to 80 percent of growth of the intracellular phage. The the phage sulfur can be stripped from DNA has some function. Further chemithe infected cells. . . . These facts show cal inferences should not be drawn from that the bulk of the phage sulfur rethe experiments presented. mains at the cell surface during infection. . . . Little or no 35S is contained in Source: Journal of General Physiology 36: 39–56 the mature phage progeny. . . . Identical experiments starting with phage labeled with 32P show that phosphorus is trans-

was that about 50 percent of the transferred 32P-labeled DNA, but less than 1 percent of the transferred 35S-labeled protein, was inherited by the progeny phage particles. Hershey and Chase interpreted this result to mean that the genetic material in T2 phage is DNA. The experiments of Avery, MacLeod, and McCarty and those of Hershey and 8

Chase are regarded as classics in the demonstration that genes consist of DNA. At the present time, the equivalent of the transformation experiment is carried out daily in many research laboratories throughout the world, usually with bacteria, yeast, or animal or plant cells grown in culture. These experiments indicate that DNA is the genetic material in these organisms as well as

Chapter 1 Introduction to Molecular Genetics and Genomics

in phage T2. Although there are no known exceptions to the generalization that DNA is the genetic material in all cellular organisms and many viruses, in a few types of viruses the genetic material consists of RNA.

1.2 DNA Structure:

The Double Helix The inference that DNA is the genetic material still left many questions unanswered. How is the DNA in a gene duplicated when a cell divides? How does the DNA in a gene control a hereditary trait? What happens to the DNA when a mutation (a change in the DNA) takes place in a gene? In the early 1950s, a number of researchers began to try to understand the detailed molecular structure of DNA in hopes that the structure alone would suggest answers to these questions. In 1953 James Watson and Francis Crick at Cambridge University proposed the first essentially correct three-dimensional structure of the DNA molecule. The structure was dazzling in its elegance and revolutionary in suggesting how DNA duplicates itself, controls hereditary traits, and undergoes mutation. Even while their tin-andwire model of the DNA molecule was still incomplete, Crick would visit his favorite pub and exclaim “we have discovered the secret of life.” In the Watson–Crick structure, DNA consists of two long chains of subunits, each twisted around the other to form a doublestranded helix. The double helix is righthanded, which means that as one looks along the barrel, each chain follows a clockwise path as it progresses. You can visualize the right-handed coiling in part A of Figure 1.6 if you imagine yourself looking up into the structure from the bottom. The dark spheres outline the “backbone” of each individual strand, and they coil in a clockwise direction. The subunits of each strand are nucleotides, each of which contains any one of four chemical constituents called bases attached to a phosphorylated molecule of the 5-carbon sugar deoxyribose. The four bases in DNA are • Adenine (A) • Thymine (T)

• Guanine (G) • Cytosine (C)

The chemical structures of the nucleotides and bases need not concern us at this time.

They are examined in Chapter 2. A key point for our present purposes is that the bases in the double helix are paired as shown in Figure 1.6B. That is: At any position on the paired strands of a DNA molecule, if one strand has an A, then the partner strand has a T; and if one strand has a G, then the partner strand has a C.

The pairing between A and T and between G and C is said to be complementary; the complement of A is T, and the complement of G is C. The complementary pairing means that each base along one strand of the DNA is matched with a base in the opposite position on the other strand. Furthermore: Nothing restricts the sequence of bases in a single strand, so any sequence could be present along one strand.

This principle explains how only four bases in DNA can code for the huge amount of information needed to make an organism. It

(A)

(B)

5’

3’ T A GC A T CG CG GC T A

Paired nucleotides GC T A G C T A T A A T G C T A CG T A G C 5’

3’

Figure 1.6 Molecular structure of the DNA double helix in the standard “B form.” (A) A space-filling model, in which each atom is depicted as a sphere. (B) A diagram highlighting the helical strands around the outside of the molecule and the AҀT and GҀC base pairs inside. 1.2 DNA Structure: The Double Helix

9

is the sequence of bases along the DNA that encodes the genetic information, and the sequence is completely unrestricted. The complementary pairing is also called Watson–Crick pairing. In the threedimensional structure in Figure 1.6A, the base pairs are represented by the lighter spheres filling the interior of the double helix. The base pairs lie almost flat, stacked on top of one another perpendicular to the long axis of the double helix, like pennies in a roll. When discussing a DNA molecule, biologists frequently refer to the individual strands as single-stranded DNA and to the double helix as double-stranded DNA or duplex DNA. Each DNA strand has a polarity, or directionality, like a chain of circus elephants linked trunk to tail. In this analogy, each elephant corresponds to one nucleotide along the DNA strand. The polarity is determined by the direction in which the nucleotides are pointing. The “trunk” end of the strand is called the 3' end of the strand, and the “tail” end is called the 5' end. In double-stranded DNA, the paired strands are oriented in opposite directions, the 5' end of one strand aligned with the 3' end of the other. The molecular basis of the polarity, and the reason for the opposite orientation of the strands in duplex DNA, are explained in Chapter 2. In illustrating DNA molecules in this book, we use an arrowlike ribbon to represent the backbone, and we use tabs jutting off the ribbon to represent the nucleotides. The polarity of a DNA strand is indicated by the direction of the arrow-like ribbon. The tail of the arrow represents the 5' end of the DNA strand, the head the 3' end. Beyond the most optimistic hopes, knowledge of the structure of DNA immediately gave clues to its function: 1. The sequence of bases in DNA could be copied by using each of the separate “partner” strands as a pattern for the creation of a new partner strand with a complementary sequence of bases. 2. The DNA could contain genetic information in coded form in the sequence of bases, analogous to letters printed on a strip of paper. 3. Changes in genetic information (mutations) could result from errors in copying in which the base sequence of the DNA became altered. 10

In the remainder of this chapter, we discuss some of the implications of these clues.

1.3 An Overview of DNA

Replication Watson and Crick noted that the structure of DNA itself suggested a mechanism for its replication. “It has not escaped our notice,” they wrote, “that the specific base pairing we have postulated immediately suggests a copying mechanism.” The copying process in which a single DNA molecule becomes two identical molecules is called replication. The replication mechanism that Watson and Crick had in mind is illustrated in Figure 1.7. As shown in part A of Figure 1.7, the strands of the original (parent) duplex separate, and each individual strand serves as a pattern, or template, for the synthesis of a new strand (replica). The replica strands are synthesized by the addition of successive nucleotides in such a way that each base in the replica is complementary (in the Watson–Crick pairing sense) to the base across the way in the template strand (Figure 1.7B). Although the mechanism in Figure 1.7 is simple in principle, it is a complex process that is fraught with geometrical problems and requires a variety of enzymes and other proteins. The details are examined in Chapter 6. The end result of replication is that a single double-stranded molecule becomes replicated into two copies with identical sequences: 5'-ACGCTTGC-3' 3'-TGCGAACG-5' 5'-ACGCTTGC-3' 3'-TGCGAACG-5'

5'-ACGCTTGC-3' 3'-TGCGAACG-5'

Here the bases in the newly synthesized strands are shown in red. In the duplex on the left, the top strand is the template from the parental molecule and the bottom strand is newly synthesized; in the duplex on the right, the bottom strand is the template from the parental molecule and the top strand is newly synthesized. Note in Figure 1.7B that in the synthesis of each new strand, new nucleotides are added only to the 3' end of the growing chain: The obligatory elongation of a DNA strand only at the 3' end is an essential feature of DNA replication.

Chapter 1 Introduction to Molecular Genetics and Genomics

(A)

(B)

Parent molecule of DNA 5’

T A

3’ ACGCT TGC TGCGAACG

CG 3’

A T

5’

CG CG

Template strand

Parent duplex 5’

GC

Complement of A “T” adds “A”

3’ ACGCT TGC

T A

Complement of “C” adds “G”

CG

TGCGAACG 3’

G

5’ Template strand

T A C G T A T A

A

5’

T

A

T G

C

CG

CG

3’

ACGCT TGC G Complement of “G” adds “C” C

T A CG

CG

5’

T A

T A

3’

3’

5’

T A

TA

Template strands

5’

ACGCT TGC ACG

CG

CG

A T

5’

And so forth

T A

T A

A TGCGAACG

GC

GC

Daughter duplex

5’

C Complement of “G” adds “C”

ACG TGCGAACG 5’

3’

3’

5’

5’

TA A T

5’

Replica strands

3’

3’ ACGCT TGC TGCGAACG

ACGCT TGC TGCGAACG 5’

3’

5’

Daughter molecules of DNA

Figure 1.7 Replication of DNA. (A) Replication of a DNA duplex as originally envisioned by Watson and Crick. As the parental strands separate, each parental strand serves as a template for the formation of a new daughter strand by means of AҀT and GҀC base pairing. (B) Greater detail showing how each of the parental strands serves as a template for the production of a complementary daughter strand, which grows in length by the successive addition of single nucleotides to the 3' end.

1.4 Genes and Proteins Now that we have some basic understanding of the structural makeup of the genetic blueprint, how does this developmental plan become a complex living organism? If the code is thought of as a string of letters on a sheet of paper, then the genes are made up of distinct words that form sentences and paragraphs that give meaning to the pattern of letters. What is created from

the complex and diverse DNA codes is protein, a class of macromolecules that carries out most of the activities in the cell. Cells are largely made up of proteins: structural proteins that give the cell rigidity and mobility, proteins that form pores in the cell membrane to control the traffic of small molecules into and out of the cell, and receptor proteins that regulate cellular activities in response to molecular signals from the growth medium or from other cells. 1.4 Genes and Proteins

11

Proteins are also responsible for most of the metabolic activities of cells. They are essential for the synthesis and breakdown of organic molecules and for generating the chemical energy needed for cellular activities. In 1878 the term enzyme was introduced to refer to the biological catalysts that accelerate biochemical reactions in cells. By 1900, thanks largely to the work of the German biochemist Emil Fischer, enzymes were shown to be proteins. As often happens in science, nature’s “mistakes” provide clues as to how things work. Such was the case in establishing a relationship between genes and disease, because a “mistake” in a gene (a mutation) can result in a “mistake” (lack of function) in the corresponding protein. This provided a fruitful avenue of research for the study of genetics.

Inborn Errors of Metabolism as a Cause of Hereditary Disease It was at the turn of the twentieth century that the British physician Archibald Garrod realized that certain heritable diseases followed the rules of transmission that Mendel had described for his garden peas. In 1908 Garrod gave a series of lectures in which he proposed a fundamental hypothesis about the relationship between heredity, enzymes, and disease: Any hereditary disease in which cellular metabolism is abnormal results from an inherited defect in an enzyme.

Such diseases became known as inborn errors of metabolism, a term still in use today. Garrod studied a number of inborn errors of metabolism in which the patients excreted abnormal substances in the urine. One of these was alkaptonuria. In this case, the abnormal substance excreted is homogentisic acid: OH

O

CH2 C HO

CH

An early name for homogentisic acid was alkapton, hence the name alkaptonuria for the disease. Even though alkaptonuria is

12

Figure 1.8 Urine from a person with alkaptonuria turns black because of the oxidation of the homogentisic acid that it contains. [Courtesy of Daniel De Aguiar.]

rare, with an incidence of about one in 200,000 people, it was well known even before Garrod studied it. The disease itself is relatively mild, but it has one striking symptom: The urine of the patient turns black because of the oxidation of homogentisic acid (Figure 1.8). This is why alkaptonuria is also called black urine disease. An early case was described in the year 1649: The patient was a boy who passed black urine and who, at the age of fourteen years, was submitted to a drastic course of treatment that had for its aim the subduing of the fiery heat of his viscera, which was supposed to bring about the condition in question by charring and blackening his bile. Among the measures prescribed were bleedings, purgation, baths, a cold and watery diet, and drugs galore. None of these had any obvious effect, and eventually the patient, who tired of the futile and superfluous therapy, resolved to let things take their natural course. None of the predicted evils ensued. He married, begat a large family, and lived a long and healthy life, always passing urine black as ink. (Recounted by Garrod, 1908.)

Garrod was primarily interested in the biochemistry of alkaptonuria, but he took note of family studies that indicated that the disease was inherited as though it were due to a defect in a single gene. As to the biochemistry, he deduced that the problem in alkaptonuria was the patients’ inability to break down the phenyl ring of six carbons that is present in homogentisic acid. Where does this ring come from? Most animals

Chapter 1 Introduction to Molecular Genetics and Genomics

obtain it from foods in their diet. Garrod proposed that homogentisic acid originates as a breakdown product of two amino acids, phenylalanine and tyrosine, which also contain a phenyl ring. An amino acid is one of the “building blocks” from which proteins are made. Phenylalanine and tyrosine are constituents of normal proteins. The scheme that illustrates the relationship between the molecules is shown in Figure 1.9. Any such sequence of biochemical reactions is called a biochemical pathway or a metabolic pathway. Each arrow in the pathway represents a single step depicting the transition from the “input” or substrate molecule, shown at the head of the arrow, to the “output” or product molecule, shown at the tip. Biochemical pathways are usually oriented either vertically with the arrows pointing down, as in Figure 1.9, or horizontally, with the arrows pointing from left to right. Garrod did not know all of the details of the pathway in Figure 1.9, but he did understand that the key step in the breakdown of homogentisic acid is the breaking open of the phenyl ring and that the phenyl ring in homogentisic acid comes from dietary phenylalanine and tyrosine. What allows each step in a biochemical pathway to occur? Garrod correctly surmised that each step requires a specific enzyme to catalyze the reaction for the chemical transformation. Persons with an inborn error of metabolism, such as alkaptonuria, have a defect in a single step of a metabolic pathway because they lack a functional enzyme for that step. When an enzyme in a pathway is defective, the pathway is said to have a block at that step. One frequent result of a blocked pathway is that the substrate of the defective enzyme accumulates. Observing the accumulation of homogentisic acid in patients with alkaptonuria, Garrod proposed that there must be an enzyme whose function is to open the phenyl ring of homogentisic acid and that this enzyme is missing in these patients. Isolation of the enzyme that opens the phenyl ring of homogentisic acid was not actually achieved until 50 years after Garrod’s lectures. In normal people it is found in cells of the liver, and just as Garrod had predicted, the enzyme is defective in patients with alkaptonuria.

Benzene ring C

C

C

C

C

NH2 C

C

C

CH2

O

OH H Phenylalanine (a normal amino acid)

Each arrow represents one step in the biochemical pathway.

1

HO

C

C

C

C

C

NH2 O CH2

C

C

C

OH H Tyrosine (a normal amino acid) 2

C HO

O

C

C

C

CH2

O C

C

C

C OH H 4-Hydroxyphenylpyruvic acid 3

In the next step the benzene ring is opened at this position.

OH C

C

C

C

C

O C

CH2

C OH

OH Homogentisic acid (formerly known as alkapton)

This is the step that is blocked in alkaptonuria; homogentisic acid accumulates.

X 4

O

O C HO

CH

CH

C

O CH2

C O

CH2

C OH

4-Maleylacetoacetic acid

Further breakdown

Figure 1.9 Metabolic pathway for the breakdown of phenylalanine and tyrosine. Each step in the pathway, represented by an arrow, requires a specific enzyme to catalyze the reaction. The key step in the breakdown of homogentisic acid is the breaking open of the phenyl ring.

1.4 Genes and Proteins

13

The Black Urine Disease Archibald E. Garrod 1908 St. Bartholomew’s Hospital, London, England Inborn Errors of Metabolism Although he was a distinguished physician, Garrod’s lectures on the relationship between heredity and congenital defects in metabolism had no impact when they were delivered. The important concept that one gene corresponds to one enzyme (the “one gene–one enzyme hypothesis”) was developed independently in the 1940s by George W. Beadle and Edward L. Tatum, who used the bread mold Neurospora crassa as their experimental organism. When Beadle finally became aware of Inborn Errors of Metabolism, he was generous in praising it. This excerpt shows Garrod at his best, interweaving history, clinical medicine, heredity, and biochemistry in his account of alkaptonuria. The excerpt also illustrates how the severity of a genetic disease depends on its social context. Garrod writes as though alkaptonuria were a harmless curiosity. This is indeed largely true when the life expectancy is short. With today’s longer life span, alkaptonuria patients accumulate the dark pigment in their cartilage and joints and eventually develop severe arthritis.

To students of heredity the inborn errors in the urine originally called alkapton] is of metabolism offer a promising field of homogentisic acid, the excretion of investigation. . . . It was pointed out [by which is the essential feature of the others] that the mode of incidence of alkaptonuric. . . . Homogentisic acid is a alkaptonuria finds a ready explanation if product of normal metabolism. . . . The the anomaly be regarded as a most likely sources of the rare recessive character in We may further benzene ring in homothe Mendelian sense. . . . Of gentisic acid are phenylconceive that the cases of alkaptonuria a alanine and tyrosine, the splitting of very large proportion have [because when these been in the children of first the benzene ring in amino acids are adminiscousin marriages. . . . It is normal tered to an alkaptonuric] also noteworthy that, if one metabolism is the they cause a very contakes families with five or spicuous increase in the work of a special more children [with both paroutput of homogentisic ents normal and at least one enzyme and that in acid. . . . Where the alchild affected with alkapcongenital kaptonuric differs from tonuria], the totals work out the normal individual is alkaptonuria this in strict conformity to in having no power of enzyme is wanting. Mendel’s law, i.e. 57 [normal destroying homogentisic children] : 19 [affected chilacid when formed—in dren] in the proportions 3 : 1. . . . Of inother words of breaking up the benzene born errors of metabolism, alkaptonuria ring of that compound. . . . We may furis that of which we know most. In itself it ther conceive that the splitting of the is a trifling matter, inconvenient rather benzene ring in normal metabolism is than harmful. . . . Indications of the the work of a special enzyme and that in anomaly may be detected in early medcongenital alkaptonuria this enzyme is ical writings, such as that in 1584 of a wanting. schoolboy who, although he enjoyed good health, continuously excreted black urine; and that in 1609 of a monk Source: Originally published in London, who exhibited a similar peculiarity and England, by the Oxford University Press. Excerpts from the reprinted edition in Harry stated that he had done so all his life. . . . Harris. 1963. Garrod’s Inborn Errors of There are no sufficient grounds [for Metabolism. London, England: Oxford doubting that the blackening substance University Press.

The pathway for the breakdown of phenylalanine and tyrosine, as it is understood today, is shown in Figure 1.10. In this figure the emphasis is on the enzymes rather than on the structures of the metabolites, or small molecules, on which the enzymes act. Each step in the pathway requires the presence of a particular enzyme that catalyzes that step. Although Garrod knew only about alkaptonuria, in which the defective enzyme is homogentisic acid 1,2 dioxygenase, we now know the 14

clinical consequences of defects in the other enzymes. Unlike alkaptonuria, which is a relatively benign inherited disease, the others are very serious. The condition known as phenylketonuria (PKU) results from the absence of (or a defect in) the enzyme phenylalanine hydroxylase (PAH). When this step in the pathway is blocked, phenylalanine accumulates. The excess phenylalanine is broken down into harmful metabolites that cause defects in myelin formation that damage a child’s developing

Chapter 1 Introduction to Molecular Genetics and Genomics

nervous system and lead to severe mental retardation. However, if PKU is diagnosed in children soon enough after birth, they can be placed on a specially formulated diet low in phenylalanine. The child is allowed only as much phenylalanine as can be used in the synthesis of proteins, so excess phenylalanine does not accumulate. The special diet is very strict. It excludes meat, poultry, fish, eggs, milk and milk products, legumes, nuts, and bakery goods manufactured with regular flour. These foods are replaced by an expensive synthetic formula. With the special diet, however, the detrimental effects of excess phenylalanine on mental development can largely be avoided, although in adult women with PKU who are pregnant, the fetus is at risk. In many countries, including the United States, all newborn babies have their blood tested for chemical signs of PKU. Routine screening is cost-effective because PKU is relatively common. In the United States, the incidence is about 1 in 8000 among Caucasian births. The disease is less common in other ethnic groups. In the metabolic pathway in Figure 1.10, defects in the breakdown of tyrosine or of 4-hydroxyphenylpyruvic acid lead to types of tyrosinemia. These are also severe diseases. Type II is associated with skin lesions and mental retardation, Type III with severe liver dysfunction.

Mutant Genes and Defective Proteins It follows from Garrod’s work that a defective enzyme results from a mutant gene, but how? Garrod did not speculate. For all he knew, genes were enzymes. This would have been a logical hypothesis at the time. We now know that the relationship between genes and enzymes is somewhat indirect. With a few exceptions, each enzyme is encoded in a particular sequence of nucleotides present in a region of DNA. The DNA region that codes for the enzyme, as well as adjacent regions that regulate when and in which cells the enzyme is produced, make up the “gene” that encodes the enzyme. The genes for the enzymes in the biochemical pathway in Figure 1.10 have all been identified and the nucleotide sequence of the DNA determined. In the following list, and throughout this book, we

Phenylalanine

Each step in a metabolic pathway requires a different enzyme.

1

A defect in this enzyme leads to accumulation of phenylalanine and to phenylketonuria.

Phenylalanine hydroxylase

Tyrosine 2

A defect in this enzyme leads to accumulation of tyrosine and to tyrosinemia type II.

Tyrosine aminotransferase

4-Hydroxyphenylpyruvic acid

Each enzyme is encoded in a different gene.

3

4-Hydroxyphenylpyruvic acid dioxygenase

A defect in this enzyme leads to accumulation of 4-hydroxyphenylpyruvic acid and to tyrosinemia type III.

Homogentisic acid 4

Homogentisic acid 1,2-dioxygenase

A defect in this enzyme leads to accumulation of homogentisic acid and to alkaptonuria.

4-Maleylacetoacetic acid

Further breakdown

Figure 1.10 Inborn errors of metabolism that affect the breakdown of phenylalanine and tyrosine. An inherited disease results when any of the enzymes is missing or defective. Alkaptonuria results from a mutant homogentisic acid 1,2 dioxygenase phenylketonuria results from a mutant phenylalanine hydroxylase.

use the standard typographical convention that genes are written in italic type, whereas gene products are not printed in italics. This convention is convenient, because it means that the protein product of a gene can be represented with the same symbol as the gene itself, but whereas the gene symbol is in italics, the protein symbol is not. • The gene PAH on the long arm of chromosome 12 encodes phenylalanine hydroxylase (PAH). • The gene TAT on the long arm of chromosome 16 encodes tyrosine aminotransferase (TAT). • The gene HPD on the long arm of chromosome 12 encodes 4-hydroxyphenylpyruvic acid dioxygenase (HPD). 1.4 Genes and Proteins

15

• The gene HGD on the long arm of chromosome 3 encodes homogentisic acid 1,2 dioxygenase (HGD).

Next we turn to the issue of how genes code for enzymes and other proteins.

1.5 Gene Expression:

The Central Dogma Watson and Crick were correct in proposing that the genetic information in DNA is contained in the sequence of bases in a manner analogous to letters printed on a strip of paper. In a region of DNA that directs the synthesis of a protein, the genetic code for the protein is contained in only one strand, and it is decoded in a linear order. A typical protein is made up of one or more polypeptide

Nucleotide sequence in DNA molecule

ATGTCCACTGCGGTCCTGGAA TACAGGTGACGCCAGGACCT T

TRANSCRIPTION

Two-step decoding process synthesizes a polypeptide.

An RNA intermediate plays the role of ”messenger“

TRANSLATION

Amino acid sequence in polypeptide chain

Met Ser Thr Ala Val Leu Glu ATGTCCACTGCGGTCCTGGAA

chains; each polypeptide chain consists of a linear sequence of amino acids connected end to end. For example, the enzyme PAH consists of four identical polypeptide chains, each 452 amino acids in length. In the decoding of DNA, each successive “code word” in the DNA specifies the next amino acid to be added to the polypeptide chain as it is being made. The amount of DNA required to code for the polypeptide chain of PAH is therefore 452 ⫻ 3 ⫽ 1356 nucleotide pairs. The entire gene is very much longer—about 90,000 nucleotide pairs. Only 1.5 percent of the gene is devoted to coding for the amino acids. The noncoding part includes some sequences that control the activity of the gene, but it is not known how much of the gene is involved in regulation. There are 20 different amino acids. Only four bases code for these 20 amino acids, with each “word” in the genetic code consisting of three adjacent bases. For example, the base sequence ATG specifies the amino acid methionine (Met), TCC specifies serine (Ser), ACT specifies threonine (Thr), and GCG specifies alanine (Ala). There are 64 possible three-base combinations but only 20 amino acids because some combinations code for the same amino acid. For example, TCT, TCC, TCA, TCG, AGT, and AGC all code for serine (Ser), and CTT, CTC, CTA, CTG, TTA, and TTG all code for leucine (Leu). An example of the relationship between the base sequence in a DNA duplex and the amino acid sequence of the corresponding protein is shown in Figure 1.11. This particular DNA duplex is the human sequence that codes for the first seven amino acids in the polypeptide chain of PAH. The scheme outlined in Figure 1.11 indicates that DNA codes for protein not directly but indirectly through the processes of transcription and translation. The indirect route of information transfer,

DNA triplets encoding each amino acid

Figure 1.11 DNA sequence coding for the first seven amino acids in a polypeptide chain. The DNA sequence specifies the amino acid sequence through a molecule of RNA that serves as an intermediary “messenger.” Although the decoding process is indirect, the net result is that each amino acid in the polypeptide chain is specified by a group of three adjacent bases in the DNA. In this example, the polypeptide chain is that of phenylalanine hydroxylase (PAH).

16

DNA 씮 RNA 씮 Protein is known as the central dogma of molecular genetics. The term dogma means “set of beliefs”; it dates from the time the idea was put forward first as a theory. Since then the “dogma” has been confirmed experimentally, but the term persists. The central dogma is shown in Figure 1.12. The main concept in the central dogma is that DNA

Chapter 1 Introduction to Molecular Genetics and Genomics

does not code for protein directly but rather acts through an intermediary molecule of ribonucleic acid (RNA). The structure of RNA is similar to, but not identical with, that of DNA. There is a difference in the sugar (RNA contains the sugar ribose instead of deoxyribose), RNA is usually single-stranded (not a duplex), and RNA contains the base uracil (U) instead of thymine (T), which is present in DNA. Actually, three types of RNA take part in the synthesis of proteins: • A molecule of messenger RNA (mRNA), which carries the genetic information from DNA and is used as a template for polypeptide synthesis. In most mRNA molecules, there is a high proportion of nucleotides that actually code for amino acids. For example, the mRNA for PAH is 2400 nucleotides in length and codes for a polypeptide of 452 amino acids; in this case, more than 50 percent of the length of the mRNA codes for amino acids. • Several types of ribosomal RNA (rRNA), which are major constituents of the cellular particles called ribosomes on which polypeptide synthesis takes place. • A set of transfer RNA (tRNA) molecules, each of which carries a particular amino acid as well as a three-base recognition region that base-pairs with a group of three adjacent bases in the mRNA. As each tRNA participates in translation, its amino acid becomes the terminal subunit added to the length of the growing polypeptide chain. The tRNA that carries methionine is denoted tRNAMet, that which carries serine is denoted tRNASer, and so forth.

The central dogma is the fundamental principle of molecular genetics because it summarizes how the genetic information in DNA becomes expressed in the amino acid sequence in a polypeptide chain: The sequence of nucleotides in a gene specifies the sequence of nucleotides in a molecule of messenger RNA; in turn, the sequence of nucleotides in the messenger RNA specifies the sequence of amino acids in the polypeptide chain.

Given a process as conceptually simple as DNA coding for protein, what might account for the additional complexity of RNA intermediaries? One possible reason is that an RNA intermediate gives another level for control, for example, by degrading the

DNA

TRANSCRIPTION

rRNA (ribosomal)

mRNA (messenger)

tRNA (transfer)

Ribosome

TRANSLATION

Protein

Figure 1.12 The “central dogma” of molecular genetics: DNA codes for RNA, and RNA codes for protein. The DNA 씮 RNA step is transcription, and the RNA 씮 protein step is translation.

mRNA for an unneeded protein. Another possible reason may be historical. RNA structure is unique in having both an informational content present in its sequence of bases and a complex, folded three-dimensional structure that endows some RNA molecules with catalytic activities. Many scientists believe that in the earliest forms of life, RNA served both for genetic information and catalysis. As evolution proceeded, the informational role was transferred to DNA and the catalytic role to protein. However, RNA became locked into its central location as a go-between in the processes of information transfer and protein synthesis. This hypothesis implies that the participation of RNA in protein synthesis is a relic of the earliest stages of evolution—a “molecular fossil.” The hypothesis is supported by a variety of observations. For example, (1) DNA replication requires an RNA molecule in order to get started (Chapter 6), (2) an RNA molecule is essential in the synthesis of the tips of the chromosomes (Chapter 8), and (3) some RNA molecules act to catalyze key reactions in protein synthesis (Chapter 11).

1.5 Gene Expression: The Central Dogma

17

Transcription The manner in which genetic information is transferred from DNA to RNA is shown in Figure 1.13. The DNA opens up, and one of the strands is used as a template for the synthesis of a complementary strand of RNA. (How the template strand is chosen is discussed in Chapter 11.) The process of making an RNA strand from a DNA template is transcription, and the RNA molecule that is made is the transcript. The base sequence in the RNA is complementary (in

5’

3’ T A CG A T CG CG GC

T A CG T A C G T A Direction of growth of RNA strand

T A

DNA strand being transcribed

A A

C

T T G

3’

G

CG

C

GC

A

T A

Translation

C

RNA transcript

*

*UA

CG

3’

T

UA

C

CG

T G

TA A U

A T

*

5’

5’

U in RNA pairs with A in DNA Figure 1.13 Transcription is the production of an RNA strand that is complementary in base sequence to a DNA strand. In this example, the DNA strand at the bottom is being transcribed into a strand of RNA. Note that in an RNA molecule, the base U (uracil) plays the role of T (thymine) in that it pairs with A (adenine). Each A앥U pair is marked. 18

the Watson–Crick pairing sense) to that in the DNA template, except that U (which pairs with A) is present in the RNA in place of T. The rules of base pairing between DNA and RNA are summarized in Figure 1.14. Each RNA strand has a polarity—a 5' end and a 3' end—and, as in the synthesis of DNA, nucleotides are added only to the 3' end of a growing RNA strand. Hence the 5' end of the RNA transcript is synthesized first, and transcription proceeds along the template DNA strand in the 3'-to-5' direction. Each gene includes nucleotide sequences that initiate and terminate transcription. The RNA transcript made from any gene begins at the initiation site in the template strand, which is located “upstream” from the amino acid–coding region, and ends at the termination site, which is located “downstream” from the amino acid–coding region. For any gene, the length of the RNA transcript is very much smaller than the length of the DNA in the chromosome. For example, the transcript of the PAH gene for phenylalanine hydroxylase is about 90,000 nucleotides in length, but the DNA in chromosome 12 is about 130,000,000 nucleotide pairs. In this case, the length of the PAH transcript is less than 0.1 percent of the length of the DNA in the chromosome. A different gene in chromosome 12 would be transcribed from a different region of the DNA molecule in chromosome 12, and perhaps from the opposite strand, but the transcribed region would again be small in comparison with the total length of the DNA in the chromosome.

The synthesis of a polypeptide under the direction of an mRNA molecule is known as translation. Although the sequence of bases in the mRNA codes for the sequence of amino acids in a polypeptide, the molecules that actually do the “translating” are the tRNA molecules. The mRNA molecule is translated in nonoverlapping groups of three bases called codons. For each codon in the mRNA that specifies an amino acid, there is one tRNA molecule containing a complementary group of three adjacent bases that can pair with those in that codon. The correct amino acid is attached to the other end of the tRNA, and when the tRNA comes into line, the amino acid to which it

Chapter 1 Introduction to Molecular Genetics and Genomics

Base in DNA template Adenine

Thymine

Guanine

Cytosine

A U

T A

G C

C G

Uracil

Adenine

Cytosine

Guanine

Base in RNA transcript

Figure 1.14 Pairing between bases in DNA and in RNA. The DNA bases A, T, G, and C pair with the RNA bases U, A, C, and G, respectively.

is attached becomes the most recent addition to the growing end of the polypeptide chain. The role of tRNA in translation is illustrated in Figure 1.15 and can be described as follows: The mRNA is read codon by codon. Each codon that specifies an amino acid matches with a complementary group of three adjacent bases in a single tRNA molecule. One end of the tRNA is attached to the correct amino acid, so the correct amino acid is brought into line.

The tRNA molecules used in translation do not line up along the mRNA simultaneously as shown in Figure 1.15. The process of translation takes place on a ribosome,

which combines with a single mRNA and moves along it from one end to the other in steps, three nucleotides at a time (codon by codon). As each new codon comes into place, the next tRNA binds with the ribosome. Then the growing end of the polypeptide chain becomes attached to the amino acid on the tRNA. In this way, each tRNA in turn serves temporarily to hold the polypeptide chain as it is being synthesized. As the polypeptide chain is transferred from each tRNA to the next in line, the tRNA that previously held the polypeptide is released from the ribosome. The polypeptide chain elongates one amino acid at each step until any one of three particular codons specifying “stop” is encountered. At this point, synthesis of the chain of amino acids is finished, and the polypeptide chain is released from the ribosome. (This brief description of translation glosses over many of the details that are presented in Chapter 11.)

The Genetic Code Figure 1.15 indicates that the mRNA codon AUG specifies methionine (Met) in the polypeptide chain, UCC specifies Ser (serine), ACU specifies Thr (threonine), and so on. The complete decoding table is

The coding sequence of bases in mRNA specifies the amino acid sequence of a polypeptide chain.

Messenger RNA (mRNA)

Bases in the mRNA

[

AUGUCCACUGCGGUCCUGGAA UAC AGG UGA CGC CAG GAC Transfer CUU RNA Met (tRNA) Ser Thr Ala Val Leu Glu

Each group of three adjacent bases is a codon. The mRNA is translated codon by codon by means of tRNA molecules. Each tRNA has a different base sequence but about the same overall shape. Each tRNA carries an amino acid to be added to the polypeptide chain.

Figure 1.15 The role of messenger RNA in translation is to carry the information contained in a sequence of DNA bases to a ribosome, where it is translated into a polypeptide chain. Translation is mediated by transfer RNA (tRNA) molecules, each of which can base-pair with a group of three adjacent bases in the mRNA. Each tRNA also carries an amino acid. As each tRNA, in turn, is brought to the ribosome, the growing polypeptide chain is elongated. 1.5 Gene Expression: The Central Dogma

19

Table 1.1 The standard genetic code Second nucleotide in codon U

C

A

G

A

UUU

Phe

F

Phenylalanine UCU

Ser S

Serine

UAU

Tyr

Y

Tyrosine

UGU

Cys

C

Cysteine

U

UUC

Phe

F

Phenylalanine UCC

Ser S

Serine

UAC

Tyr

Y

Tyrosine

UGC

Cys

C

Cysteine

C

UUA

Leu

L

Leucine

UCA

Ser S

Serine

UAA

Termination

UGA

UUG

Leu

L

Leucine

UCG

Ser S

Serine

UAG

Termination

UGG

CUU

Leu

L

Leucine

CCU

Pro P

Proline

CAU

His H Histidine

CUC

Leu

L

Leucine

CCC

Pro P

Proline

CAC

CUA

Leu

L

Leucine

CCA

Pro P

Proline

CAA

CUG

Leu

L

Leucine

CCG

Pro P

Proline

AUU

Ile

I

Isoleucine

ACU

Thr T

Termination

A

Trp

W

Tryptophan

G

CGU

Arg

R

Arginine

U

His H Histidine

CGC

Arg

R

Arginine

C

Gln Q Glutamine

CGA

Arg

R

Arginine

A

CAG

Gln Q Glutamine

CGG

Arg

R

Arginine

G

Threonine

AAU

Asn N Asparagine

AGU

Ser

S

Serine

U

AUC

Ile

I

Isoleucine

ACC

Thr T

Threonine

AAC

Asn N Asparagine

AGC

Ser

S

Serine

C

AUA

Ile

I

Isoleucine

ACA

Thr T

Threonine

AAA

Lys K

Lysine

AGA

Arg

R

Arginine

A

AUG

Met

M Methionine

ACG

Thr T

Threonine

AAG

Lys K

Lysine

AGG

Arg

R

Arginine

G

GUU

Val

V

Valine

GCU

Ala

A

Alanine

GAU

Asp D

Aspartic acid

GGU

Gly

G

Glycine

U

GUC

Val

V

Valine

GCC

Ala

A

Alanine

GAC

Asp D

Aspartic acid

GGC

Gly

G

Glycine

C

GUA

Val

V

Valine

GCA

Ala

A

Alanine

GAA

Glu E

Glutamic acid

GGA

Gly

G

Glycine

A

GUG

Val

V

Valine

GCG

Ala

A

Alanine

GAG

Glu E

Glutamic acid

GGG

Gly

G

Glycine

G

Codon

Three-letter and single-letter abbreviations

called the genetic code, and it is shown in Table 1.1. For any codon, the column on the left corresponds to the first nucleotide in the codon (reading from the 5' end), the row across the top corresponds to the second nucleotide, and the column on the right corresponds to the third nucleotide. The complete codon is given in the body of the table, along with the amino acid (or translational “stop”) that the codon specifies. Each amino acid is designated by its full name and by a three-letter abbreviation as well as a single-letter abbreviation. Both types of abbreviations are used in molecular genetics. The code in Table 1.1 is the “standard” genetic code used in translation in the cells of nearly all organisms. In Chapter 11 we examine general features of the standard genetic code and the minor differences found in the genetic codes of certain organisms and cellular organelles. At this point, we are interested mainly in understanding how the genetic code is used to translate the codons in mRNA into the amino acids in a polypeptide chain.

20

G

Third nucleotide in codon (3’ end)

First nucleotide in codon (5’ end)

U

C

In addition to the 61 codons that code only for amino acids, there are four codons that have specialized functions: • The codon AUG, which specifies Met (methionine), is also the “start” codon for polypeptide synthesis. The positioning of a tRNAMet bound to AUG is one of the first steps in the initiation of polypeptide synthesis, so all polypeptide chains begin with Met. (Many polypeptides have the initial Met cleaved off after translation is complete.) In most organisms, the tRNAMet used for initiation of translation is the same tRNAMet used to specify methionine at internal positions in a polypeptide chain. • The codons UAA, UAG, and UGA are each a “stop” that specifies the termination of translation and results in release of the completed polypeptide chain from the ribosome. These codons do not have tRNA molecules that recognize them but are instead recognized by protein factors that terminate translation.

How the genetic code table is used to infer the amino acid sequence of a polypep-

Chapter 1 Introduction to Molecular Genetics and Genomics

tide chain can be illustrated by using PAH again, in particular the DNA sequence coding for amino acids 1 through 7. The DNA sequence is

Codon number in PAH gene 1

5'-ATGTCCACTGCGGTCCTGGAA-3' 3'-TACAGGTGACGCCAGGACCTT-5' This region is transcribed into RNA in a leftto-right direction, and because RNA grows by the addition of successive nucleotides to the 3' end (Figure 1.13), it is the bottom strand that is transcribed. The nucleotide sequence of the RNA is that of the top strand of the DNA, except that U replaces T, so the mRNA for amino acids 1 through 7 is 5'-AUGUCCACUGCGGUCCUGGAA-3' The codons are read from left to right according to the genetic code shown in Table 1.1. Codon AUG codes for Met (methionine), UCC codes for Ser (serine), and so on. Altogether, the amino acid sequence of this region of the polypeptide is 5'-AUG UCC ACU GCG GUC CUG GAA-3' Met Ser Thr Ala Val Leu Glu or, in terms of the single-letter abbreviations, 5'-AUG UCC ACU GCG GUC CUG GAA-3' M S T A V L E The full decoding operation for this region of the PH gene is shown in Figure 1.16. In this figure, the initiation codon AUG is highlighted because some patients with PKU have a mutation in this particular codon. As might be expected from the fact that methionine is the initiation codon for polypeptide synthesis, cells in patients with this particular mutation fail to produce any of the PAH polypeptide. Mutation and its consequences are considered next.

1.6 Mutation The term mutation refers to any heritable change in a gene (or, more generally, in the genetic material) or to the process by which such a change takes place. One type of mutation results in a change in the sequence of bases in DNA. The change may be simple, such as the substitution of one pair of bases in a duplex molecule for a different pair of bases. For example, a C앥G pair in a

2

3

4

5

6

7

ATGT C C A C T G C G G T C C T G G A A DNA T A C A G G T G A C G C C A G G A C C T T TRANSCRIPTION mRNA A U G U C C A C U G C G G U C C U G G A A UA C AGG UGA CGC CAG TRANSLATION GAC CUU Met Ser Thr Polypeptide Ala Val Leu Glu Amino acid number 1 2 3 4 5 6 7 in PAH polypeptide

Figure 1.16 The central dogma in action. The DNA that encodes PAH serves as a template for the production of a messenger RNA, and the mRNA serves to specify the sequence of amino acids in the PAH polypeptide chain through interactions with the ribosome and tRNA molecules.

duplex molecule may mutate to T앥A, A앥T, or G앥C. The change in base sequence may also be more complex, such as the deletion or addition of base pairs. These and other types of mutations are considered in Chapter 7. Geneticists also use the term mutant, which refers to the result of a mutation. A mutation yields a mutant gene, which in turn produces a mutant mRNA, a mutant protein, and finally a mutant organism that exhibits the effects of the mutation—for example, an inborn error of metabolism. DNA from patients from all over the world who have phenylketonuria has been studied to determine what types of mutations are responsible for the inborn error. There are a large variety of mutant types. More than 400 different mutations have been described in the gene for PAH. In some cases part of the gene is missing, so the genetic information to make a complete PAH enzyme is absent. In other cases the genetic defect is more subtle, but the result is still either the failure to produce a PAH

1.6 Mutation

21

Mutation of A T

G C

Codon 1 in PAH gene 1

2

3

4

5

6

7

GTGT C C A C T G C G G T C C T G G A A DNA C A C A G G T G A C G C C A G G A C C T T

TRANSCRIPTION

Mutant initiation codon in PAH mRNA

mRNA G U G U C C A C U G C G G U C C U G G A A CAC

TRANSLATION

X

tRNAVal

No PAH polypeptide is produced because tRNAVal cannot be used to initiate polypeptide synthesis.

Val

Figure 1.17 The M1V mutant in the PAH gene. The methionine codon needed for initiation mutates into a codon for valine. Translation cannot be initiated, and no PAH polypeptide is produced.

protein or the production of a PAH protein that is inactive. In the mutation shown in Figure 1.17, substitution of a G앥C base pair for the normal A앥T base pair at the very first position in the coding sequence

changes the normal codon AUG (Met) used for the initiation of translation into the codon GUG, which normally specifies valine (Val) and cannot be used as a “start” codon. The result is that translation of the PAH mRNA cannot occur, and so no PAH polypeptide is made. This mutant is designated M1V because the codon for M (methionine) at amino acid position 1 in the PAH polypeptide has been changed to a codon for V (valine). Although the M1V mutant is quite rare worldwide, it is common in some localities, such as Québec Province in Canada. One PAH mutant that is quite common is designated R408W, which means that codon 408 in the PAH polypeptide chain has been changed from one coding for arginine (R) to one coding for tryptophan (W). This mutation is one of the four most common among European Caucasians with PKU. The molecular basis of the mutant is shown in Figure 1.18. In this case, the first base pair in codon 408 is changed from a C앥G base pair into a T앥A base pair. The result is that the PAH mRNA has a mutant codon at position 408; specifically, it has UGG instead of CGG. Translation does occur in this mutant because everything else about the mRNA is normal, but the result is that the mutant PAH carries a tryptophan (Trp) instead of an arginine (Arg) at posi-

The women in the wedding photograph are sisters. Both are homozygous for the same mutant PAH gene. The bride is the younger of the two. She was diagnosed just three days after birth and put on the PKU diet soon after. Her older sister, the maid of honor, was diagnosed too late to begin the diet and is mentally retarded. The two-year old pictured in the photo at the right is the daughter of the married couple. They planned the pregnancy: dietary control was strict from conception to delivery to avoid the hazards of excess phenylalanine harming the fetus. Their daughter has passed all developmental milestones with distinction. [Courtesy of Charles R. Scriver.] 22

Chapter 1 Introduction to Molecular Genetics and Genomics

Mutation of C G

T A

Codon 408 in PAH gene 408

G C C A C A A T A C C T T GG C C C T T C T C A G T T C G C DNA C G G T G T T A T G G A A C C G G G A A G A G T C A A G C G

Figure 1.18 The R408W mutant in the PAH gene. Codon 408 for arginine (R) is mutated into a codon for tryptophan (W). The result is that position 408 in the mutant PAH polypeptide is occupied by tryptophan rather than by arginine. The mutant protein has no PAH enzyme activity.

TRANSCRIPTION mRNA G C C A C A A U A C C U U G G C C C U U C U C A G U U C G C CGG UGU Mutant codon UAU GGA in PAH mRNA ACC TRANSLATION GGG AAG AGU Ala CAA Thr Ile GCG Polypeptide Pro Trp Pro Phe Mutant amino acid Ser in PAH polypeptide Val Arg

tion 408 in the polypeptide chain. The consequence of the seemingly minor change of one amino acid is very drastic. Although the R408W polypeptide is complete, the enzyme has less than 3 percent of the activity of the normal enzyme.

Protein Folding and Stability More than 400 different mutations in the PAH gene have been identified in patients with PKU throughout the world. Many of the mutations affect the level of expression of the gene or processing of the RNA transcript, and some mutations are deletions in which part of the gene is missing. But more than 240 of the mutations are simple amino acid replacements resulting from single nucleotide substitutions in the DNA. Surprisingly, only a minority of amino acid replacements result in a normal amount of PAH protein with a reduced enzyme activity. As a result of most mutations the amount of PAH is reduced, sometimes drastically, and in some other mutations the enzyme activity of the PAH protein that remains is virtually normal. Yet in all these cases the level of expression of the gene, and the amount of mRNA, are within the normal range. The reason why so many amino acid replacements reduce the amount of protein is that they cause problems in protein fold-

ing, or in the coming together of the protein subunits, or in the stability of the folded protein. Protein folding is the complex process by which polypeptide chains attain a stable three-dimensional structure through short-range chemical interactions between nearby amino acids and longrange interactions between amino acids in different parts of the molecule. Folding normally occurs as the polypeptide is being synthesized on the ribosome, and the process is facilitated by a class of proteins known as chaperones. During the folding process, the polypeptide chain twists and bends until it achieves a minimum energy state that maximizes the stability of the resulting structure, which is referred to as the native conformation. For example, one aspect of protein folding is that hydrophobic amino acids, which have low affinity for water molecules, tend to move toward each other and to form a relatively hydrophobic center, or core, in the native conformation. For a polypeptide of realistic length, there are so many short-range and long-range interactions, and so many possible folded conformations, that even the fastest computers cannot calculate and compare all their energy levels. Computer simulation of protein folding has yielded some insights, but the reliable prediction of protein folding is still a major challenge. 1.6 Mutation

23

Figure 1.19 Some amino acid replacements perturb the ability of a protein to fold properly. (A) Normal folding in phenylalanine hydroxylase forms the active tetramer. (B) Abnormal folding of a mutant polypeptide chain results in the formation of polypeptide aggregates, which are progressively cleaved into the constituent amino acids through a ubiquitin-dependent proteosomal degradation pathway.

(A)–Normal folding pathway

(B)–Aberrant folding pathway (subunits prone to aggregation)

Unfolded PAH polypeptide

Folding intermediate

Folding intermediate

Oligomerization domains through which monomers interact

Abnormal intermediate

Aggregate

Active site for substrate cleavage Folded monomer Aggregate

Proteasome

Binding of oligomerization domains

Degradation via ubiquitin-dependent proteasomal pathway and other routes

Tetramer (active form in cells)

Hypothetical pathways of protein folding in the case of PAH are illustrated in Figure 1.19. The normal pathway is shown in part A, including some of the (typically short-lived) intermediates in the folding process. The native conformation of a single PAH polypeptide constitutes the PAH monomer. Like many other polypeptides, the PAH monomer contains short regions, called oligomerization domains (oligo means “a small number”), through which PAH polypeptides undergo stable binding to one another. In the case of PAH, the active form of the PAH enzyme is a tetramer, consisting of four identical polypeptide chains held together by interactions between the tetramerization domains. Note that the folding and tetramerization proc24

esses are reversible, so any amino acid replacement that decreases the stability of the tetramer or any of the intermediates will cause more of the PAH polypeptides to fold according to pathway B. Pathway B in the figure is a misfolding pathway in which the folded monomers are prone to undergo irreversible aggregation with each other. These aggregates are targeted for enzymatic breakdown into their constituent amino acids, first by becoming covalently bound with a 76–amino acid polypeptide called ubiquitin, which is attached through the activity of several proteins, including a ubiquitin-conjugating enzyme. The tagged protein is then degraded by the proteasome, which is a large multiprotein complex containing proteins with

Chapter 1 Introduction to Molecular Genetics and Genomics

Au: Please confirm that caption is OK as set.

Three-dimensional structure of two of the four subunits in the active tetramer of phenylalanine hydroxylase. The chains of amino acids are represented as a sequence of curls, loops, and flat arrows, which represent different types of local structure described in Chapter 11. The oligomerization (in this case, tetramerization) domain is shown in green, and the catalytic domain containing the active site is shown in gold. Compare this with the simplified diagram in Figure 1.19. [Courtesy of R. C. Stevens, T. Flatmark, and H. Erlandsen. From H. Erlandsen, F. Fusetti, A Martinez, E. Hough, T. Flatmark, and R. C. Stevens. 1997. Nature Structural Biology 4: 995.]

ubiquitin-binding, protease, and other activities. The ubiquitin/proteasome pathway is required for the degradation of specific proteins during the cell cycle and development, and it is also used to degrade certain proteins that are intrinsically unstable or that become unfolded in response to stress. In the case of PAH, many amino acid replacements decrease the stability of the protein to such an extent that the protein is routed through the misfolding pathway in Figure 1.19B and targeted for degradation. A destabilizing amino acid replacement can be located at virtually any position along the polypeptide chain.

1.7 Genes and Environment Inborn errors of metabolism illustrate the general principle that genes code for proteins and that mutant genes code for mutant proteins. In cases such as PKU, mutant proteins cause such a drastic change in metabolism that a severe genetic defect results. But biology is not necessarily destiny. Organisms are also affected by the environment. PKU serves as an example of this principle, because patients who adhere to a diet restricted in the amount of phenylalanine develop mental capacities within the normal range. What is true in this ex-

ample is true in general. Most traits are determined by the interaction of genes and environment. It is also true that most traits are affected by multiple genes. No one knows how many genes are involved in the development and maturation of the brain and nervous system, but the number must be in the thousands. This number is in addition to the genes that are required in all cells to carry out metabolism and other basic life functions. It is easy to lose sight of the multiplicity of genes when considering extreme examples, such as PKU, in which a single mutation can have such a drastic effect on mental development. The situation is the same as that with any complex machine. An airplane can function if thousands of parts are working together in harmony, but only one defective part, if that part affects a vital system, can bring it down. Likewise, the development and functioning of every trait require a large number of genes working in harmony, but in some cases a single mutant gene can have catastrophic consequences. In other words, the relationship between a gene and a trait is not necessarily a simple one. The biochemistry of organisms is a complex branching network in which different enzymes may share substrates, yield the same products, or be responsive to the same regulatory elements. The result is 1.7 Genes and Environment

25

that most visible traits of organisms are the net result of many genes acting together and in combination with environmental factors. PKU affords examples of each of three principles governing these interactions: 1. One gene can affect more than one trait. Children with extreme forms of PKU often have blond hair and reduced body pigment. This is because the absence of PAH is a metabolic block that prevents conversion of phenylalanine into tyrosine, which is the precursor of the pigment melanin. The relationship between severe mental retardation and decreased pigmentation in PKU makes sense only if one knows the metabolic connections among phenylalanine, tyrosine, and melanin. If these connections were not known, the traits would seem completely unrelated. PKU is not unusual in this regard. Many mutant genes affect multiple traits through their secondary or indirect effects. The various, sometimes seemingly unrelated, effects of a mutant gene are called pleiotropic effects, and the phenomenon itself is known as pleiotropy. Figure 1.20 shows a cat with white fur and blue eyes, a pattern of pigmentation that is often (about 40 percent of the time) associated with deafness. Hence deafness can be regarded as a pleiotropic effect of white coat and blue eye color. The developmental basis of this pleiotropy is unknown.

Figure 1.20 Among cats with white fur and blue eyes, about 40 percent are born deaf. We do not know why there is defective hearing nor why it is so often associated with coat and eye color. This form of deafness can be regarded as a pleiotropic effect of white fur and blue eyes. 26

2. Any trait can be affected by more than one gene. We discussed this principle earlier in connection with the large number of genes that are required for the normal development and functioning of the brain and nervous system. Among these are genes that affect the function of the blood–brain barrier, which consists of specialized glial cells wrapped around tight capillary walls in the brain, forming an impediment to the passage of most water-soluble molecules from the blood to the brain. The blood–brain barrier therefore affects the extent to which excess free phenylalanine in the blood can enter the brain itself. Because the effectiveness of the blood–brain barrier differs among individuals, PKU patients with very similar levels of blood phenylalanine can have dramatically different levels of cognitive development. This also explains in part why adherence to a controlled-phenylalanine diet is critically important in children but less so in adults; the blood–brain barrier is less well developed in children and is therefore less effective in blocking the excess phenylalanine. Multiple genes affect even simpler metabolic traits. Phenylalanine breakdown and excretion serve as a convenient example. The metabolic pathway is illustrated in Figure 1.10. Four enzymes in the pathway are indicated, but even more enzymes are involved at the stage labeled “further breakdown.” Because differences in the activity of any of these enzymes can affect the rate at which phenylalanine can be broken down and excreted, all of the enzymes in the pathway are important in determining the amount of excess phenylalanine in the blood of patients with PKU. 3. Most traits are affected by environmental factors as well as by genes. Here we come back to the low-phenylalanine diet. Children with PKU are not doomed to severe mental deficiency. Their capabilities can be brought into the normal range by dietary treatment. PKU serves as an example of what motivates geneticists to try to discover the molecular basis of inherited disease. The hope is that knowing the metabolic basis of the disease will eventually make it possible to develop methods for clinical intervention through diet, medication, or other treatments that will ameliorate the severity of the disease.

Chapter 1 Introduction to Molecular Genetics and Genomics

1.8 Evolution: From Genes

to Genomes, from Proteins to Proteomes The pathway for the breakdown and excretion of phenylalanine is by no means unique to human beings. One of the remarkable generalizations to have emerged from molecular genetics is that organisms that are very distinct—for example, plants and animals—share many features in their genetics and biochemistry. These similarities indicate a fundamental “unity of life”: All creatures on Earth share many features of the genetic apparatus, including genetic information encoded in the sequence of bases in DNA, transcription into RNA, and translation into protein on ribosomes with the use of transfer RNAs. All creatures also share certain characteristics in their biochemistry, including many enzymes and other proteins that are similar in amino acid sequence, threedimensional structure, and function.

The totality of DNA in a single cell is called the genome of the organism. In sexual organisms, the genome is usually regarded as the DNA present in a reproductive cell. The human genome, which is contained in the chromosomes of a sperm or egg, includes approximately 3 billion nucleotide pairs of DNA. The complete set of proteins encoded in the genome is known as the proteome. The study of genomes constitutes genomics; the study of proteomes constitutes proteomics. The fundamental unity of life can be seen in the similarity of proteins in the proteomes of diverse types of organisms. For example, the proteome of the fruit fly Drosophila melanogaster includes 13,601 proteins; these can be grouped into 8065 different families of proteins that are similar in amino acid sequence. For comparison, the proteome of the nematode worm Caenorhabditis elegans contains 18,424 proteins that can be grouped into 9453 families. These two proteomes share about 5000 proteins that are sufficiently similar to be regarded as having a common function. Among the protein families that are common to flies and worms, about 3000 are also found in the proteome of the yeast Saccharomyces cerevisiae. Based on these data from these three completely sequenced complex genomes, it seems likely that all organisms whose cells have a nucleus and

chromosomes will turn out to share several thousand protein families. Furthermore, about a thousand of these protein families are shared with organisms as distantly related as bacteria.

The Molecular Unity of Life Why do organisms share a common set of similar genes and proteins? Because all creatures share a common origin. The process of evolution takes place when a population of organisms descended from a common ancestor gradually changes in genetic composition through time. From an evolutionary perspective, the unity of fundamental molecular processes is derived by inheritance from a distant common ancestor in which the molecular mechanisms were already in place. Not only the unity of life but also many other features of living organisms become comprehensible from an evolutionary perspective. For example, the interposition of an RNA intermediate in the basic flow of

09131_01_1677P

Photocaptionphotoca ptionphotocaptionphotocaptionphotocaptionphotocaptionp hotocaptionphotocaptionphotocaptionphotocaptionphotocaption photocaptionphotocaptionphotocaptionphotocaptionphotocap tionphotocaptionphotocaptionphotocaptionphotocaption

09131_01_1678P

1.8 Evolution from Genes to Genomes, from Proteins to Proteomes

27

genetic information from DNA to RNA to protein makes sense if the earliest forms of life used RNA for both genetic information and enzyme catalysis. The importance of the evolutionary perspective in undestanding aspects of biology that seem pointless or needlessly complex is summed up in a famous aphorism of the evolutionary biologist Theodosius Dobzhansky: “Nothing in biology makes sense except in the light of evolution.” One indication of the common ancestry among Earth’s creatures is illustrated in Figure 1.21. The tree of relationships was inferred from similarities in nucleotide sequence in an RNA molecule found in the small subunit of the ribosome. Three major kingdoms of organisms are distinguished: 1. The kingdom Bacteria This group includes most bacteria and cyanobacteria (formerly called blue-green algae). Cells of these organisms lack a membrane-bounded nucleus and mitochondria, are surrounded by a cell wall, and divide by binary fission. 2. The kingdom Archaea This group was initially discovered among microorganisms that produce methane gas or that live in extreme environments, such as hot springs or high salt concentrations. They are widely distributed in more normal environments as well. Like those of bacteria, the cells of archaeans lack internal membranes. DNA sequence analysis indicates that the machinery for DNA replication and transcription in archaeans resembles that of eukaryans, whereas their metabolism strongly

resembles that of bacteria. About half of the genes found in the kingdom Archaea are unique to this group. 3. The kingdom Eukarya This group includes all organisms whose cells contain an elaborate network of internal membranes, a membrane-bounded nucleus, and mitochondria. Their DNA is organized into true chromosomes, and cell division takes place by means of mitosis (discussed in Chapter 4). The eukaryotes include plants and animals as well as fungi and many single-celled organisms, such as amoebae and ciliated protozoa.

The members of the kingdoms Bacteria and Archaea are often grouped together into a larger assemblage called prokaryotes, which literally means “before [the evolution of] the nucleus.” This terminology is convenient for designating prokaryotes as a group in contrast with eukaryotes, which literally means “good [well-formed] nucleus.”

Natural Selection and Diversity Although Figure 1.21 illustrates the unity of life, it also illustrates life’s diversity. Frogs are different from fungi, and beetles are different from bacteria. As a human being, it is sobering to consider that complex, multicellular organisms are relatively recent ar-

EUKARYA Stramenopiles Alveolates

Figure 1.21 Evolutionary relationRed algae ships among the major life forms Plantae as inferred from similarities in Slime molds nucleotide sequence in an Animalia Entamoebae RNA molecule found in BACTERIA Heterolobosea Fungi the small subunit of the Mycoplasma Physarum ribosome. The three Plant Chloroplasts major kingdoms— Kinetoplastids Cyanobacteria Bacteria, Archaea, and Euglenoids Eukarya—are apparent. Microsporidians Plants, animals, and Agrobacterium fungi are more closely Trichomonads Diplomonads related to each other Plant Mitochondria than to members of either Enterobacteria of the other kingdoms. Sulfolobus Among eukaryotes, the time of branching of the diverse groups Thermoplasma Halobacteria of undifferentiated, relatively simple Methanobacteria organisms (trichomonads, microsporidians, euglenoids, and so forth) is probably not as ARCHAEA ancient as suggested by the diagram. [Courtesy of Mitchell L. Sogin.] 28

Chapter 1 Introduction to Molecular Genetics and Genomics

rivals to the evolutionary scene of life on Earth. Animals came later still and primates very late indeed. What about human evolution? In the time scale of Earth history, human evolution is a matter of a few million years—barely a snap of the fingers. If common ancestry is the source of the unity of life, what is the source of its diversity? Because differences among species are inherited, the original source of the differences must be mutation. However, mutations alone are not sufficient to explain why organisms are adapted to living in their environments—why ocean mammals have special adaptations for swimming and diving, why desert mammals have special adaptations that enable them to survive on minimal amounts of water. Mutations are chance events not directed toward any particular adaptive goal, such as longer fur among mammals living in the Arctic. The

process that accounts for adaptation was described by Charles Darwin in his 1859 book On the Origin of Species. Darwin proposed that adaptation is the result of natural selection: Individual organisms carrying particular mutations or combinations of mutations that enable them to survive or reproduce more effectively in the prevailing environment leave more offspring than other organisms and so contribute their favorable genes disproportionately to future generations. If this process is repeated throughout the course of many generations, the entire species becomes genetically transformed—that is, it evolves—because a gradually increasing proportion of the population inherits the favorable mutations. Mutation, natural selection, and other features of the field of population genetics (also called evolutionary genetics) are discussed in Chapter 17.

Chapter Summary Organisms of the same species have some traits (characteristics) in common but may differ from each other in innumerable other traits. Many of the differences between organisms result from genetic differences, the effects of the environment, or both. Genetics is the study of inherited traits, including those influenced in part by the environment. The elements of heredity consist of genes, which are transmitted from parents to offspring in reproduction. Although the sorting of genes in successive generations was first expressed numerically by Mendel, the chemical basis of genes was discovered by Miescher in the form of a weak acid, deoxyribonucleic acid (DNA). However, experimental proof that DNA is the genetic material did not come until about the middle of the twentieth century. The first convincing evidence of the role of DNA in heredity came from the experiments of Avery, MacLeod, and McCarty, who showed that genetic characteristics in bacteria could be altered from one type to another by treatment with purified DNA. In studies of Streptococcus pneumoniae, they transformed mutant cells unable to cause pneumonia into cells that could do so by treating them with pure DNA from disease-causing forms. A second important line of evidence was the Hershey–Chase experiment. Hershey and Chase showed that the T2 bacterial virus injects primarily DNA into the host bacterium (Escherichia coli) and that a much higher proportion of parental DNA, compared with parental protein, is found among the progeny phage. The three-dimensional structure of DNA, proposed in 1953 by Watson and Crick, gave many clues to the manner in which DNA functions as the genetic material. A molecule of DNA consists of two long chains of nu-

cleotide subunits twisted around each other to form a right-handed helix. Each nucleotide subunit contains any one of four bases: A (adenine), T (thymine), G (guanine), or C (cytosine). The bases are paired in the two strands of a DNA molecule; wherever one strand has an A, the partner strand has a T, and wherever one strand has a G, the partner strand has a C. The base pairing means that the two paired strands in a DNA duplex molecule have complementary base sequences along their lengths. The structure of the DNA molecule suggested that genetic information could be coded in DNA in the sequence of bases. Mutations—changes in the genetic material—result from changes in the sequence of bases, such as the substitution of one nucleotide for another or the insertion or deletion of one or more nucleotides. The structure of DNA also suggested a mode of replication: The two strands of the parental DNA molecule separate, and each individual strand serves as a template for the synthesis of a new, complementary strand. Most genes code for proteins. More precisely stated, most genes specify the sequence of amino acids in a polypeptide chain. The transfer of genetic information from DNA into protein is a multistep process that includes several types of RNA (ribonucleic acid). Structurally, an RNA strand is similar to a DNA strand except that the “backbone” contains a different sugar (ribose instead of deoxyribose) and RNA contains the base uracil (U) instead of thymine (T). Also, RNA is usually present in cells in the form of single, unpaired strands. The initial step in gene expression is transcription, in which a molecule of RNA is synthesized that is complementary in base sequence to whichever DNA strand is being transcribed. Chapter Summary

29

In polypeptide synthesis, which takes place on a ribosome, the base sequence in the RNA transcript is translated in groups of three adjacent bases (codons). The codons are recognized by different types of transfer RNA (tRNA) through base pairing. Each type of tRNA is attached to a particular amino acid, and when a tRNA basepairs with the proper codon on the ribosome, the growing end of the polypeptide chain is transferred to the amino acid on the tRNA. The table of all codons and the amino acids they specify is called the genetic code. Special codons specify the “start” (AUG, Met) and “stop” (UAA, UAG, and UGA) of polypeptide synthesis. The reason why various types of RNA are an intimate part of transcription and translation is probably that the earliest forms of life used RNA for both genetic information and enzyme catalysis. A mutation that alters one or more codons in a gene can change the amino acid sequence of the resulting polypeptide chain synthesized in the cell. Often the altered protein is functionally defective, so an inborn error of metabolism results. One of the first inborn errors of metabolism studied was alkaptonuria; it results from the absence of an enzyme for cleaving homogentisic acid, which accumulates and is excreted in the urine, turning black upon oxidation. Phenylketonuria (PKU) is an inborn error of metabolism that affects the same metabolic pathway. The enzyme defect in PKU results in an inability to convert phenylalanine to tyrosine. Phenylalanine accumulation has catastrophic effects on the development of the brain. Children with the disease have severe mental deficits unless they are treated with a special diet low in phenylalanine.

Most visible traits of organisms result from many genes acting together in combination with environmental factors. The relationship between genes and traits is often complex because (1) every gene potentially affects many traits (pleiotropy), (2) every trait is potentially affected by many genes, and (3) many traits are significantly affected by environmental factors as well as by genes. All living creatures are united by sharing many features of the genetic apparatus (for example, transcription and translation) and many aspects of metabolism. They share many similar genes in the genome and many similar proteins in the proteome. The unity of life results from all life being of common ancestry and provides evidence for evolution. There is also great diversity among living creatures. The three major kingdoms of organisms are the kingdoms Bacteria (whose members lack a membranebounded nucleus), Archaea (whose members share features with both bacteria and eukaryans but form a distinct group), and Eukarya (which includes all “higher” organisms whose cells have a membrane-bounded nucleus that contains DNA organized into discrete chromosomes). Members of the kingdoms Bacteria and Archaea are often collectively called prokaryotes. The ultimate source of diversity among organisms is mutation, but natural selection is the process by which mutations that enhance survival and reproduction are retained and mutations that are harmful are eliminated. Natural selection, first proposed by Darwin, is therefore the primary mechanism by which organisms become progressively better adapted to their environments.

Key Terms adenine (A) alkaptonuria amino acid Archaea Bacteria bacteriophage base (in DNA or RNA) biochemical pathway block (in a biochemical pathway) central dogma chaperone chromosome codon colony complementary base sequence cytosine (C) deoxyribose deoxyribonucleic acid (DNA) double-stranded DNA duplex DNA enzyme Eukarya eukaryote 30

evolution gene genetic code genetics genome guanine (G) homogentisic acid inborn error of metabolism messenger RNA (mRNA) metabolic pathway metabolite monomer mutant mutation native conformation natural selection nucleotide oligomerization domain phage phenylalanine hydroxylase (PAH) phenylketonuria (PKU) pleiotropic effect pleiotropy

Chapter 1 Introduction to Molecular Genetics and Genomics

polarity (of DNA or RNA) polypeptide chain product molecule prokaryote protein folding proteome replication ribonucleic acid (RNA) ribose ribosomal RNA (rRNA) ribosome single-stranded DNA substrate molecule template tetramer thymine (T) transcript transcription transfer RNA (tRNA) transformation translation uracil (U) Watson–Crick base pairing

Review the Basics • What were the key experiments showing that DNA is

the genetic material? • How did understanding the molecular structure of DNA give clues to its ability to replicate, to code for proteins, and to undergo mutation? • Why is pairing of complementary bases a key feature of DNA replication? • What is the process of transcription and in what ways

does it differ from DNA replication? • What three types of RNA participate in protein syn-

thesis, and what is the role of each type of RNA? • What is the “genetic code,” and how is it relevant to

the translation of a polypeptide chain from a molecule of messenger RNA? • What is an inborn error of metabolism? How did this

concept serve as a bridge between genetics and biochemistry?

• How does the “central dogma” explain Garrod’s dis-

covery that nonfunctional enzymes result from mutant genes? • Explain why many mutant forms of phenylalanine

hydroxylase have a simple amino acid replacement, yet the mutant polypeptide chains are absent or present in very small amounts.? • What is a pleiotropic effect of a gene mutation? Give

an example. • What are some of the major differences in cellular or-

ganization among Bacteria, Archaea, and Eukarya? • What process was Charles Darwin describing when

he wrote the following statement? “As more individuals are produced than can possibly survive, there must in every case be a struggle for existence, either one individual with another of the same species, or with the individuals of distinct species, or with the physical conditions of life.”

Guide to Problem Solving Problem 1 In the human gene for the ␤ chain of hemoglobin (the oxygen-carrying protein in the red blood cells), the first 30 nucleotides in the amino-acid–coding region have the sequence 3'-TACCACGTGGACTGAGGACTCCTCTTCAGA-5' What is the sequence of the partner strand?

Answer The base pairing between the strands is A with T and G with C, but it is equally important that the strands in a DNA duplex have opposite polarity. The partner strand is therefore oriented with its 5' end at the left, and the base sequence is 5'-ATGGTGCACCTGACTCCTGAGGAGAAGTCT-3'

Problem 2 If the DNA duplex for the ␤ chain of hemoglo-

that the transcribed strand is that given in Problem 1. The RNA transcript therefore has the base sequence 5'-AUGGUGCACCUGACUCCUGAGGAGAAGUCU-3'

Problem 3 Given the RNA sequence coding for part of human ␤ hemoglobin deduced in Problem 2, what is the amino acid sequence in this part of the ␤ polypeptide chain?

Answer The polypeptide chain is translated in successive groups of three nucleotides (each group constituting a codon), starting at the 5' end of the coding sequence and moving in the 5'-to-3' direction. The amino acid corresponding to each codon can be found in the genetic code table. The first ten amino acids in the polypeptide chain are therefore

bin in Problem 1 were transcribed from left to right, deduce the base sequence of the RNA in this coding region.

5'-AUG GUG CAC CUG ACU CCU GAG GAG AAG UCU-3' Met Val His Leu Thr Pro Glu Glu Lys Ser

Answer To deduce the RNA sequence, we must apply three concepts. First, in the transcription of RNA, the base pairing is such that an A, T, G, or C in the DNA template strand is transcribed as U, A, C, or G, respectively, in the RNA strand. Second, the RNA transcript and the DNA template strand have opposite polarity. Third (and critically for this problem), the RNA transcript is always transcribed in the 5'-to-3' direction, so the 5' end of the RNA is the end synthesized first. This being the case, and considering the opposite polarity, the 3' end of the template strand must be transcribed first. Because we are told that transcription takes place from left to right, we can deduce

Problem 4 A very important mutation in human hemoglobin occurs in the DNA sequence shown in Problem 1. In this mutation, the T at nucleotide position 20 is replaced with an A, where the numbering is from the left in the strand shown at the top in Problem 1. The mutant hemoglobin is called sickle-cell hemoglobin, and it is associated with a severe anemia known as sickle-cell anemia. Severe as the genetic disease is, the mutant gene is present at relatively high frequency in some human populations because carriers of the gene, who have only a mild anemia, are more resistant to falciparum malaria than are noncarriers. What is the nucleotide sequence of this region of the Guide to Problem Solving

31

C H A P T E R

X X

GeNETics on the Web will introduce you to some of the most important sites for finding genetic information on the Internet. To explore these sites, visit the Jones and Bartlett home page at http://www.jbpub.com/genetics For the book Genetics: Analysis of Genes and Genomes, choose the link that says Enter GeNETics on the Web. You will be presented with a chapter-by -chapter list of highlighted keywords. Select any highlighted keyword and you will be linked to a Web site containing genetic information related to the keyword. • James D. Watson once said that he and Francis Crick had no doubt that their proposed DNA structure was essentially correct, because the structure was so beautiful it had to be true! At an internet site accessed by the keyword DNA, you can view a large collection of different types of models of DNA structure. Some models highlight the sugar–phosphate backbones, others the AҀT and GҀC base pairs, still others the helical structure of double-stranded DNA. • How was PKU discovered? The story is told by the Norwegian physician Ivar Fölling: “The stage is set in 1934. A mother with two severely mentally retarded children came to

DNA duplex in sickle-cell hemoglobin (both strands) and that of the messenger RNA, and what is the amino acid replacement that results in sickle-cell hemoglobin?

Answer The mutation is already given as a T 씮 A substitution at position 20. The sequence of the DNA duplex is obtained as in Problem 1, that of the RNA as in Problem 2,

DNA (transcribed strand) DNA (nontranscribed strand) RNA coding region Polypeptide chain

see my father. . . . She had asked many doctors for help, which none had been able to give. But this woman was unusually persistent and would not accept the situation without explanation. She had also noticed that a peculiar smell always clung to her children. . . . This woman was advised to seek help from my father [who held a professorship of nutrient research at the University Hospital in Norway]. He of course had no real hope of being able to help her. But he did not want to reject her, and he agreed to examine the children. On clinical examination he found nothing of importance, except the [severe mental retardation], which was beyond doubt. [Urine analysis] was part of his thorough routine examination. On adding ferric chloride [to normal urine] the color normally stays brownish, [but in the case of these children] a deep green color developed. He had not seen this reaction before . . . He concluded that two mentally retarded children excreted a substance not found in normal urine. But which substance?” Consult this keyword site for more information on how Asbjörn Fölling finally tracked down the cause of the disease. • With proper dietary control of blood phenylalanine, patients with PKU can develop normally and lead normal lives.

and that of the mutant polypeptide chain as in Problem 3, except that at each step there is one nucleotide (or one amino acid) that differs from the nonmutant. The DNA, RNA, and polypeptide in this region of sickle-cell hemoglobin are as follows, where the differences from the nonmutant gene are in red. The amino acid replacement is glutamic acid 씮 valine.

3'-TAC CAC GTG GAC TGA GGA CAC CTC TTC AGA-5' 5'-ATG GTG CAC CTG ACT CCT GTG GAG AAG TCT-3' 앗 5'-AUG GUG CAC CUG ACU CCU GUG GAG AAG UCU-3' Met Val His Leu Thr Pro Val Glu Lys Ser

Problem 4 Answer—Nucleotide sequences

Analysis and Applications 1.1 Classify each of the following statements as true or false. (a) Each gene is responsible for only one visible trait. (b) Every trait is potentially affected by many genes. (c) The sequence of nucleotides in a gene specifies the sequence of amino acids in a protein encoded by the gene. 32

(d) There is one-to-one correspondence between the set of codons in the genetic code and the set of amino acids encoded. 1.2 From their examination of the structure of DNA, what were Watson and Crick able to infer about the probable mechanisms of DNA replication, coding capability, and mutation?

Chapter 1 Introduction to Molecular Genetics and Genomics

When dietary control is relaxed, however, blood phenylalanine returns to high levels. This situation is extremely dangerous for a developing fetus, resulting in high risk of congenital heart disease, small head size, mental retardation, and slow growth. Affected children are said to have maternal PKU. They are affected, not because of their own inability to metabolize phenylalanine, but because of high levels of phenyalanine in their mothers’ blood. The risk can be reduced, but not entirely eliminated, if PKU mothers plan their pregnancies and adhere to a strict dietary regimen prior to and during pregnancy. To learn more about this unanticipated consequence of dietary treatment of PKU, log onto the phenylalanine hydroxylase knowledge database at the keyword site.

• The Mutable Site changes frequently. Each new update includes a different site that highlights genetics resources available on the World Wide Web. Select the Mutable Site for Chapter 1 and you will be linked automatically. • The Pic Site showcases some of the most visually appealing genetics sites on the World Wide Web. To visit the genetics Web site pictured below, select the PIC Site for Chapter 1.

• Perhaps surprisingly, the history of the bacteriophage T2 that figures so prominently in the experiments of Hershey and Chase is shrouded in mystery. Indeed, the time, place, and source material for the original isolation of phages T2, T4, and T6 (known as the “T-even phages”) may never be known with certainty. Use the keyword T2 to learn what is known about the origin of the bacteriophage and what sleuthing was required to discover what is known.

1.3 What does it mean to say that each strand of a duplex DNA molecule has a polarity? What does it mean to say that the paired strands in a duplex molecule have opposite polarity? 1.4 What is the end result of replication of a duplex DNA molecule? 1.5 What is the role of the messenger RNA in translation? What is the role of the ribosome? What is the role of transfer RNA? Is there more than one type of ribosome? Is there more than one type of transfer RNA? 1.6 What important observation about S and R strains of Streptococcus pneumoniae prompted Avery, MacLeod, and McCarty to study this organism? 1.7 In the transformation experiments of Avery, MacLeod, and McCarty, what was the strongest evidence that the substance responsible for the transformation was DNA rather than protein? 1.8 A chemical called phenyl (carbolic acid) destroys proteins but not nucleic acids, and a strong alkali such as sodium hydroxide destroys both proteins and nucleic acids. In the transformation experiments with Streptococcus pneumoniae, what result would be expected if the S-strain extract had been treated with phenyl? What would be expected if it had been treated with a strong alkali?

Au: Phenol changed to phenyl two times in prob 1.8. OK?

1.9 What feature of the physical organization of bacteriophage T2 made it suitable for use in the Hershey–Chase experiments? 1.10 Like DNA, molecules of RNA contain large amounts of phosphorus. When Hershey and Chase grew their T2 phage in bacterial cells that had grown in the presence of radioactive phosphorus, the RNA must also have incorporated the labeled phosphorus, and yet the experimental result was not compromised. Why not? 1.11 Although the Hershey–Chase experiments were widely accepted as proof that DNA is the genetic material, the results were not completely conclusive. Why not? 1.12 The DNA extracted from a bacteriophage contains 28 percent A, 28 percent T, 22 percent G, and 22 percent C. What can you conclude about the structure of this DNA molecule? 1.13 The DNA extracted from a bacteriophage consists of 24 percent A, 30 percent T, 20 percent G, and 26 percent C. What is unusual about this DNA? What can you conclude about its structure? 1.14 A double-stranded DNA molecule is separated into its constituent strands, and the strands are separated in an ultracentrifuge. In one of the strands the base composition is 24 percent A, 28 percent T, 22 percent G, and 26 Analysis and Applications

33

percent C. What is the base composition of the other strand? 1.15 While studying sewage, you discover a new type of bacteriophage that infects E. coli. Chemical analysis reveals protein and RNA but no DNA. Is this possible? 1.16 One strand of a DNA duplex has the base sequence 5'-ATCGTATGCACTTTACCCGG-3'. What is the base sequence of the complementary strand? 1.17 A region along one strand of a double-stranded DNA molecule consists of tandem repeats of the trinucleotide 5'-TCG-3', so the sequence in this strand is 5'-TCGTCGTCGTCGTCG . . . -3' What is the sequence in the other strand? 1.18 A duplex DNA molecule contains a random sequence of the four nucleotides with equal proportions of each. What is the average spacing between consecutive occurrences of the sequence 5'-GGCC-3'? Between consecutive occurrences of the sequence 5'-GAATTC-3'? 1.19 A region along a DNA strand that is transcribed contains no A. What base will be missing in the corresponding region of the RNA? 1.20 The duplex nucleic acid molecule shown here consists of a strand of DNA paired with a complementary strand of RNA. Is the RNA the top or the bottom strand? One of the base pairs is mismatched. Which pair is it? 5'-AUCGGUUACAUUCCGACUGA-3' 3'-TAGCCAATGTAAGGGTGACT-5' 1.21 The sequence of an RNA transcript that is initially synthesized is 5'-UAGCUAC-3', and successive nucleotides are added to the 3' end. This transcript is produced from a DNA strand with the sequence 3'-AAGTCGCATATCGATGCTAGCGCAACCT-5' What is the sequence of the RNA transcript when synthesis is complete? 1.22 An RNA molecule folds back upon itself to form a “hairpin”structure held together by a region of base pairing. One segment of the molecule in the paired region has the base sequence 5'-AUACGAUA-3'. What is the base sequence with which this segment is paired? 1.23 A synthetic mRNA molecule consists of the repeating base sequence 5'-UUUUUUUUUUUU . . . -3' When this molecule is translated in vitro using ribosomes, transfer RNAs, and other necessary constituents from E. coli, the result is a polypeptide chain consisting of the re34

peating amino acid PheҀPheҀPheҀPhe . . . . If you assume that the genetic code is a triplet code, what does this result imply about the codon for phenylalanine (Phe)? 1.24 A synthetic mRNA molecule consisting of the repeating base sequence 5'-UUUUUUUUUUUU . . . -3' is terminated by the addition, to the right-hand end, of a single nucleotide bearing A. When translated in vitro, the resulting polypeptide consists of a repeating sequence of phenylalanines terminated by a single leucine. What does this result imply about the codon for leucine? 1.25 With in vitro translation of an RNA into a polypeptide chain, the translation can begin anywhere along the RNA molecule. A synthetic RNA molecule has the sequence 5'-CGCUUACCACAUGUCGCGAACUCG-3' How many reading frames are possible if this molecule is translated in vitro? How many reading frames are possible if this molecule is translated in vivo, in which translation starts with the codon AUG? 1.26 You have sequenced both strands of a doublestranded DNA molecule. To inspect the potential amino acid coding content of this molecule, you conceptually transcribe it into RNA and then conceptually translate the RNA into a polypeptide chain. How many reading frames will you have to examine? 1.27 A synthetic mRNA molecule consists of the repeating base sequence 5'-UCUCUCUCUCUCUCUC . . . -3'. When this molecule is translated in vitro, the result is a polypeptide chain consisting of the alternating amino acids SerҀLeuҀSerҀLeuҀSerҀLeu . . . . Why do the amino acids alternate? What does this result imply about the codons for serine (Ser) and leucine (Leu)? 1.28 A synthetic mRNA molecule consists of the repeating base sequence 5'-AUCAUCAUCAUCAUCAUC . . . -3'. When this molecule is translated in vitro, the result is a mixture of three different polypeptide chains. One consists of repeating isoleucines (IleҀIleҀIleҀIle . . .), another of repeating serines (SerҀSerҀSerҀSer . . .), and the third of repeating histidines (HisҀHisҀHisҀHis . . .). What does this result imply about the manner in which an mRNA is translated? 1.29 How is it possible for a gene with a mutation in the coding region to encode a polypeptide with the same amino acid sequence as the nonmutant gene? 1.30 A polymer is made that has a random sequence consisting of 75 percent G’s and 25 percent U’s. Among the amino acids in the polypeptide chains resulting from in vitro translation, what is the expected frequency of Trp? of Val? of Phe?

Chapter 1 Introduction to Molecular Genetics and Genomics

Challenge Problems 1.31 The coding sequence in the messenger RNA for amino acids 1 through 10 of human phenylalanine hydroxylase is 5'-AUGUCCACUGCGGUCCUGGAAAACCCAGGC-3' (a) What are the first 10 amino acids? (b) What sequence would result from a which the red A was changed to G? (c) What sequence would result from a which the red C was changed to G? (d) What sequence would result from a which the red U was changed to C? (e) What sequence would result from a which the red G was changed to U?

mutant RNA in mutant RNA in mutant RNA in

consider that the coding sequence in the messenger RNA for the first 10 amino acids in human beta hemoglobin (part of the oxygen carrying protein in the blood) is 5'–AUGGUGCACCUGACUCCUGAGGAGAAGUCU · · · –3'

(a) What is the amino acid sequence in this part of the polypeptide chain? (b) What would be the consequence of a frameshift mutation resulting in an RNA missing the red U? (c) What would be the consequence of a frameshift mutation resulting in an RNA with a G inserted immediately in front of the red U?

mutant RNA in

1.32 A “frameshift” mutation is a mutation in which some number of base pairs, not a multiple of three, is inserted into or deleted from a coding region of DNA. The result is that, at the point of the frameshift mutation, the reading frame of protein translation is shifted with respect to the nonmutant gene. To see the consequence,

1.31 With regard to the wildtype and mutant RNA molecules described in the previous problem, deduce the base sequence in both strands of the corresponding doublestranded DNA for: (a) The wildtype sequence (b) The single-base deletion (c) The single-base insertion

Further Reading Abedon, S. T. 2000. The murky origin of Snow White and her T-even dwarfs. Genetics 155: 481. Bearn, A. G. 1994. Archibald Edward Garrod, the reluctant geneticist. Genetics 137: 1. Calladine, C. R. 1997. Understanding DNA: The Molecule and How It Works. New York: Academic Press. Ciechanover, A., A. Orian, and A. L. Schwartz. 2000. Ubiquitin-mediated proteolysis: Biological regulation via destruction. Bioessays 22: 442. Doolittle, W. F. 2000. Uprooting the tree of life. Scientific American, February. Gehrig, A., S. R. Schmidt, C. R. Muller, S. Srsen, K. Srsnova, and W. Kress. 1997. Molecular defects in alkaptonuria. Cytogenetics & Cell Genetics 76: 14. Gould, S. J. 1994. The evolution of life on the earth. Scientific American, October. Horowitz, N. H. 1996. The sixtieth anniversary of biochemical genetics. Genetics 143: 1. Judson, H. F. 1996. The Eighth Day of Creation: The Makers of the Revolution in Biology. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. Mirsky, A. 1968. The discovery of DNA. Scientific American, June.

Olson, G. J., and C. R. Woese. 1997. Archaeal genomics: An overview. Cell 89: 991. Radman, M., and R. Wagner. 1988. The high fidelity of DNA duplication. Scientific American, August. Rennie, J. 1993. DNA’s new twists. Scientific American, March. Scazzocchio, C. 1997. Alkaptonuria: From humans to moulds and back. Trends in Genetics 13: 125. Scriver, C. R., and P. J. Waters. 1999. Monogenic traits are not simple: Lessons from phenylketonuria. Trends in Genetics 15: 267. Stadler, D. 1997. Ultraviolet-induced mutation and the chemical nature of the gene. Genetics 145: 863. Susman, M. 1995. The Cold Spring Harbor phage course (1945–1970): A 50th anniversary remembrance. Genetics 139: 1101. Watson, J. D. 1968. The Double Helix. New York: Atheneum. Zhang, J. Z. 2000. Protein-length distributions for the three domains of life. Trends in Genetics 16: 107.

Further Reading

35

C H A P T E R

2 DN A Structure and DN A M anipulation

36

C H A P T E R 2.1

2.2

2.3

2.4

O U T L I N E

Genomes and Genetic Differences Among Individuals DN A M arkers as Landmarks in Chromosomes The M olecular Structure of DN A Polynucleotide Chains Base Pairing and Base Stacking Antiparallel Strands DN A Structure as Related to Function The Separation and Identificatin of Genomic DN A Fragments Restriction Enzymes and Site-Specific DN A Cleavage Gel Electrophoresis N ucleic Acid H ybridization The Southern Blot Selective Replication of Genomic DN A Fragments Constraints on DN A Replication: Primers and 5' -to-3' Strand Elongation The Polymerase Chain Reaction

2.5

The Terminology of Genetic Analysis

2.6

Types of DN A M arkers Present in Genomic DN A Single N ucleotide Polymorphisms (SN Ps) Restriction Fragment Length Polymorphisms (RFLPs) Random Amplified Polymorphic DN A (RAPD) Amplified Fragment Length Polymorphisms (AFLPs) Simple Tandem Repeat Polymorphisms (STRPs)

P R I N C I P L E S • A DN A strand is a polymer of A, T, G and C deoxyribonucleotides joined 3' -to-5' by phosphodiester bonds. • The two DN A strands in a duplex are held together by hydrogen bonding between the AҀT and GҀC base pairs and by base stacking of the paired bases. • Each type of restriction endonuclease enzyme cleaves doublestranded DN A at a particular sequence of bases usually four or six nucleotides in length. • The DN A fragments produced by a restriction enzyme can be separated by electrophoresis, isolated, sequenced, and manipulated in other ways. • Separated strands of DN A or RN A that are complementary in nucleotide sequence can come together (hybridize) spontaneously to form duplexes. • DN A replication takes place only by elongation of the growing strand in the 5' -to-3' direction through the addition of successive nucleotides to the 3' end. • In the polymerase chain reaction, short oligonucleotide primers are used in successive cycles of DN A replication to amplify selectively a particular region of a DN A duplex. • Genetic markers in DN A provide a large number of easily accessed sites in the genome that can be used to identify the chromosomal locations of disease genes, for DN A typing in individual identification, for the genetic improvement of cultivated plants and domesticated animals, and for many other applications.

C O N N E C T I O N S The Double H elix

2.7

Applications of DN A M arkers Genetic M arkers, Genetic M apping, and “ Disease Genes” Other Uses for DN A M arkers

Jam es D. Watson and Francis H . C. Crick 19 53 A Structure for Deoxyribose Nucleic Acid Origin of the H uman Genetic Linkage M ap

David Botstein, Raym ond L. White, Mark Skolnick, and Ronald W. Davis 19 80 Construction of a Genetic Linkage Map in Man Using Restriction Fragment Length Polymorphisms

37

I

n Chapter 1, we reviewed th e experim en tal eviden ce dem on stratin g th at th e gen etic m aterial is DNA. We saw h ow, th rou gh th e u n iqu e stru ctu re of th e DNA m olecu le, gen etic in form ation can be tran scribed an d tran slated in to protein s th at affect th e in h erited ch aracteristics of organ ism s. Wh en a m u tan t gen e en codes a n on fu n ction al protein th at resu lts in som e ph ysical or ph ysiological abn orm ality—for exam ple, an “in born error of m etabolism ”—th e expression of th at abn orm ality can be u sed to trace th e tran sm ission of th e m u tan t gen e from on e gen eration to th e n ext in a pedigree (fam ily h istory). As a con sequ en ce, u n til recen tly, th e first step in gen etic an alysis was th e iden tification of organ ism s with su ch abn orm al traits, su ch as peas with wrin kled seeds in stead of rou n d seeds an d fru it flies with wh ite eyes in stead of red. Th ese traits were stu died by m ean s of con trolled crosses so th at th e paren tage of each in dividu al cou ld be traced. Largescale gen etic stu dies were typically lim ited to on e of a sm all n u m ber of m odel organ ism s especially favorable for isolatin g an d iden tifyin g m u tan t gen es, su ch as th e bu ddin g yeast Saccharomyces cerevisiae, th e n em atode worm Caenorhabditis elegans, or th e fru it fly Drosophila melanogaster. Sin ce th e m id-1970s, stu dies in gen etics h ave u n dergon e a revolu tion based on th e u se of in creasin gly soph isticated ways to isolate an d iden tify specific fragm en ts of DNA. Th e cu lm in ation of th ese tech n iqu es was large-scale gen om ic sequ en cin g—th e ability to determ in e th e correct sequ en ce of th e base pairs th at m ake u p th e DNA in an en tire gen om e an d to iden tify th e sequ en ces associated with gen es. Becau se m an y of th e m odel organ ism s u sed in gen etics h ave relatively sm all gen om es, th ese sequ en ces were com pleted first, in th e late 1990s ( Figure 2.1) Th e tech n iqu es u sed to sequ en ce th ese sim pler gen om es were th en scaled u p to sequ en ce th e h u m an gen om e. Th e in itial “rou gh draft” of th e h u m an gen om e was an n ou n ced in Ju n e 2000; th is represen ts an im portan t m ileston e in th e Hu m an Gen om e Project, wh ose goals in -

Figure 2.1 Tim elin e of large-scale gen om ic DNA

sequ en cin g.

38

Chapter 2 DNA Structure and DNA Manipulation

09131_01_1651A

clu de determ in in g th e sequ en ce, an d iden tifyin g th e fu n ction , of all h u m an gen es.

2.1

Genomes and Genetic Differences Among Individuals

Th e n u m bers associated with th e gen om e of even a sim ple organ ism can be in tim idatin g. Th e sequ en ced gen om es of D. melanogaster an d C. elegans, both approxim ately 100 m illion base pairs in len gth , en code 13,601 protein s an d 18,424 protein s, respectively. Th e h u m an gen om e is con siderably larger. As fou n d in a h u m an reprodu ctive cell, th e h u m an gen om e con sists of 3 billion base pairs organ ized in to 23 distin ct ch rom osom es (each ch rom osom e con tain s a sin gle m olecu le of du plex DNA). A typical ch rom osom e can con tain several h u n dred to several th ou san d gen es, arran ged in lin ear order alon g th e DNA m olecu le presen t in th e ch rom osom e. Th e sequ en ces th at m ake u p th e protein -codin g part of th ese gen es actu ally accou n t for on ly abou t 4 percen t of th e en tire gen om e. Th e oth er 96 percen t of th e sequ en ces do n ot code for protein s. Som e n on codin g sequ en ces are gen etic “ch aff” th at gets separated from th e protein codin g “wh eat” wh en gen es are tran scribed an d th e RNA tran script is processed in to m essen ger RNA. Oth er n on codin g sequ en ces are relatively sh ort sequ en ces th at are fou n d in h u n dreds or th ou san ds of copies scattered th rou gh ou t th e gen om e. Still oth er n on codin g sequ en ces are rem n an ts of gen es called pseudogenes. As m igh t be expected, iden tifyin g th e protein -codin g gen es from am on g th e large backgrou n d of n on codin g DNA in th e h u m an gen om e is a ch allen ge in itself. Gen eticists often speak of th e n u cleotide sequ en ce of “th e” h u m an gen om e becau se 99.9 percen t of th e DNA sequ en ces in an y two in dividu als are th e sam e. Th is is ou r evolu tion ary legacy; it con tain s th e gen etic in form ation th at m akes u s h u m an bein gs. In reality, h owever, th ere are m an y differen t h u m an gen om es. Gen eticists h ave great in terest in th e 0.1 percen t of th e h u m an DNA sequ en ce—3 m illion base pairs—th at differs from on e gen om e to th e n ext, becau se th ese differen ces in clu de th e m u ta-

tion s th at are respon sible for gen etic diseases su ch as ph en ylketon u ria an d oth er in born errors of m etabolism , as well as th e m u tation s th at in crease th e risk of m ore com plex diseases su ch as h eart disease, breast can cer, an d diabetes. Fortu n ately, on ly a sm all proportion of all differen ces in DNA sequ en ce are associated with disease. Som e of th e oth ers are associated with in h erited differen ces in h eigh t, weigh t, h air color, eye color, facial featu res, an d oth er traits. Most of th e gen etic differen ces between people are com pletely h arm less. Man y h ave n o detectable effects on appearan ce or h ealth . Su ch differen ces can be stu died on ly th rou gh direct exam in ation of th e DNA itself. Th ese h arm less differen ces are n everth eless im portan t, becau se th ey serve as gen etic m arkers.

Au : In 1st para in 1st colu m n , proofreader asks if “h u n dreds or th ou san ds of copies” is correct? Or sh ou ld ph rase be “h u n dreds of th ou san ds of copies”?

DN A M arkers as Landmarks in Chromosomes In gen etics, a ge n e tic m ark e r is an y differen ce in DNA, n o m atter h ow it is detected, wh ose pattern of tran sm ission from gen eration to gen eration can be tracked. Each in dividu al wh o carries th e m arker also carries a len gth of ch rom osom e on eith er side of it, so it marks a particu lar region of th e gen om e. A m u tan t gen e, or som e portion of a m u tan t gen e, can serve as a gen etic m arker. In th e “classical” approach to gen etics, it is th e ou tward expression of a gen e (or lack of expression ) th at form s th e basis of gen etic an alysis. For exam ple, a m u tation cau sin g wrin kled peas is a gen etic m arker, wh ich can be iden tified th rou gh its effects on pea sh ape. In m odern gen etic an alysis, any differen ce in DNA sequ en ce between two in dividu als can serve as a gen etic m arker. An d alth ou gh th ese gen etic m arkers are often h arm less in th em selves, th ey allow th e position s of disease gen es to be located an d th eir DNA isolated, iden tified, an d stu died. Gen etic m arkers th at are detected by direct an alysis of th e DNA are often called D N A m ark e rs. DNA m arkers are im portan t in gen etics becau se th ey serve as lan dm arks in lon g DNA m olecu les, su ch as th ose fou n d in ch rom osom es, wh ich allow gen etic differen ces am on g in dividu als to be

2.1 Genomes and Genetic Differences Among Individuals

39

tracked. Th ey are like sign posts alon g a h igh way. Usin g DNA m arkers as lan dm arks, th e gen eticist can iden tify th e position s of n orm al gen es, m u tan t gen es, breaks in ch rom osom es, an d oth er featu res im portan t in gen etic an alysis ( Figure 2.2). Th e detection of DNA m arkers u su ally requ ires th at th e ge n o m ic D N A (th e total DNA extracted from cells of an organ ism ) be fragm en ted in to pieces of m an ageable size (u su ally a few th ou san d n u cleotide pairs) th at can be m an ipu lated in laboratory experim en ts. In th e followin g section s, we exam in e som e of th e prin cipal ways in wh ich DNA is m an ipu lated to reveal gen etic differen ces am on g in dividu als, wh eth er or n ot th ese differen ces fin d ou t-

Ch rom osom es are located in th e cell n u cleu s.

ward expression . Use of th ese m eth ods broaden s th e scope of gen etics, m akin g it possible to carry ou t gen etic an alysis in any organ ism . Th is m ean s th at detailed gen etic an alysis is n o lon ger restricted to h u m an bein gs, dom esticated an im als, cu ltivated plan ts, an d th e relatively sm all n u m ber of m odel organ ism s favorable for gen etic stu dies. Direct stu dy of DNA elim in ates th e n eed for prior iden tification of gen etic differen ces between in dividu als; it even elim in ates th e n eed for con trolled crosses. Th e m eth ods of m olecu lar an alysis discu ssed in th is ch apter h ave tran sform ed gen etics: Th e m an ipu lation of DNA is th e basic experim en tal operation in m odern gen etics.

Each ch rom osom e con tain s on e lon g m olecu le of du plex (dou ble-stran ded) DNA.

DNA m arkers are u sed to iden tify particu lar region s alon g th e DNA in ch rom osom es. A

B

Nucleus

C DNA m arker Double-stranded DNA m olecule

Chrom osom es

Cleavage

HUM AN CELL DNA fragm ent

D

Sin gle DNA fragm en ts can be cleaved from th e m olecu le.

BACTERIAL CELL CLONED DNA

Large qu an tities of th e clon ed h u m an DNA fragm en t can be isolated from th e bacterial cells.

D

D

Au : Proofreader asks th at you lcon firm Ch apter 13 referen ce in Fig 2.2 caption is correct.

40

Replication in bacterial cells

Th e fragm en ts can be tran sferred in to bacterial cells, wh ere th ey can replicate. (Th is is th e procedu re of cloning.)

A DNA m arker, in th is case D, serves to iden tify bacterial cells con tain in g a particu lar DNA fragm en t of in terest.

Figure 2.2 DNA m arkers serve as lan dm arks th at iden tify ph ysical position s alon g a DNA m olecu le,

su ch as DNA from a ch rom osom e. As sh own at th e righ t, a DNA m arker can also be u sed to iden tify bacterial cells in to wh ich a particu lar fragm en t of DNA h as been in trodu ced. Th e procedu re of DNA clon in g is n ot qu ite as sim ple as in dicated h ere; it is discu ssed fu rth er in Ch apter 13.

Chapter 2 DNA Structure and DNA Manipulation

Th ese m eth ods are th e prin cipal tech n iqu es u sed in virtu ally every m odern gen etics laboratory.

2.2

The M olecular Structure of DN A

09131_01_1745P

Modern experim en tal m eth ods for th e m an ipu lation an d an alysis of DNA grew ou t of a detailed u n derstan din g of its m olecu lar stru ctu re an d replication . Th erefore, to u n derstan d th ese m eth ods, on e n eeds to kn ow som eth in g abou t th e m olecu lar stru ctu re of DNA. We saw in Ch apter 1 th at DNA is a h elix of two paired, com plem en tary stran ds, each com posed of an ordered strin g of nucleotides, each bearin g on e of th e bases A (aden in e), T (th ym in e), G (gu an in e), or cytosin e (C). Watson –Crick base pairin g between A an d T an d between G an d C in th e com plem en tary stran ds h olds th e stran ds togeth er. Th e com plem en tary stran ds also h old th e key to replication , becau se each stran d can serve as a tem plate for th e syn th esis of a n ew com plem en tary stran d. We will n ow take a closer look at DNA stru ctu re an d at th e key featu res of its replication .

Polynucleotide Chains

Photocaptionph otocaption ph otocaption ph otoc aption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption

gar), ph osph oric acid, an d th e fou r n itrogen -con tain in g bases den oted A, T, G, an d C. Th e ch em ical stru ctu res of th e bases are sh own in Figure 2.3. Note th at two of th e bases h ave a dou ble-rin g stru ctu re; th ese are called p u rin e s. Th e oth er two bases h ave a sin gle-rin g stru ctu re; th ese are called p yrim id in e s.

In term s of bioch em istry, a DNA stran d is a polym er—a large m olecu le bu ilt from repeatin g u n its. Th e u n its in DNA are com posed of 2 ' -deoxyribose (a five-carbon su -

• Th e pu rin e bases are aden in e (A) an d gu an in e (G). • Th e pyrim idin e bases are th ym in e (T) an d cytosin e (C).

Purines

Pyrimidines

Adenine H

Cytosine

H N C

O

N1 6 5 C 2 A C 3 4C H

Thymine

Guanine

N

H

N 7 8 9

C

N

H

N

H H

Deoxyribose

C

G

C C

C N

CH3

N

H

N C

H

N

Deoxyribose

C C6 N

Deoxyribose

1

5

T 2

C

4C 3

N

H O

H

H

C C N

Deoxyribose

O

C C

C

N

H H

N

O

Figure 2.3 Ch em ical stru ctu res of th e fou r n itrogen -con tain in g bases in DNA: aden in e, th ym in e,

gu an in e, an d cytosin e. Th e n itrogen atom lin ked to th e deoxyribose su gar is in dicated. Th e atom s sh own in red participate in h ydrogen bon din g between th e DNA base pairs.

2.2 The M olecular Structure of DN A

41

Nucleoside

Nucleotide OH

Base HOCH2

O

5

4

H

H 3

HO

A, G, T, or C H 2

P

Base O

O

1

H

CH2

H Phosphate

OH H Sugar

Th is grou p is OH in RNA.

O

5 4

A, G, T, or C

H 3

H 2

1

H

OH H Sugar

Th is grou p is OH in RNA.

Figure 2.4 A typical n u cleotide, sh owin g th e th ree m ajor com pon en ts (ph osph ate, su gar, an d base),

th e differen ce between DNA an d RNA, an d th e distin ction between a n u cleoside (n o ph osph ate grou p) an d a n u cleotide (with ph osph ate). Nu cleotides are m on oph osph ates (with on e ph osph ate grou p). Nu cleoside diph osph ates con tain two ph osph ate grou ps, an d n u cleoside triph osph ates con tain th ree.

Au : Qu ery from MH—Wh y n o prim es sh own h ere (sh own in Fig 2.5)?

In DNA, each base is ch em ically lin ked to on e m olecu le of th e su gar deoxyribose, form in g a com pou n d called a n u cle o sid e . Wh en a ph osph ate grou p is also attach ed to th e su gar, th e n u cleoside becom es a n u cle o tid e ( Figure 2.4 ). Th u s a n u cleotide is a n u cleoside plu s a ph osph ate. In th e con ven tion al n u m berin g of th e carbon atom s in th e su gar in Figu re 2.4, th e carbon atom to wh ich th e base is attach ed is th e 1 ' carbon . (Th e atom s in th e su gar are given prim ed n u m bers to distin gu ish th em from atom s in th e bases.) Th e n om en clatu re of th e n u cleoside an d n u cleotide derivatives of th e DNA bases is su m m arized

in Table 2.1. Most of th ese term s are n ot n eeded in th is book; th ey are in clu ded becau se th ey are likely to be en cou n tered in fu rth er readin g. In n u cleic acids, su ch as DNA an d RNA, th e n u cleotides are join ed to form a p o lyn u cle o tid e ch ain , in wh ich th e ph osph ate attach ed to th e 5 ' carbon of on e su gar is lin ked to th e h ydroxyl grou p attach ed to th e 3 ' carbon of th e n ext su gar in lin e ( Figure 2.5). Th e ch em ical bon ds by wh ich th e su gar com pon en ts of adjacen t n u cleotides are lin ked th rou gh th e ph osph ate grou ps are called p h o sp h o d ie ste r bo n d s. Th e 5 ' –3 ' –5 ' –3 ' orien tation of th ese lin kages

Table 2.1 DN A nomenclature

42

Base

N ucleoside

Adenine (A)

Deoxyadenosine

Deoxyadenosine-5' monophosphate (dAM P) diphosphate (dADP) triphosphate (dATP)

Guanine (G)

Deoxyguanosine

Deoxyguanosine-5' monophosphate (dGM P) diphosphate (dGDP) triphosphate (dGTP)

Thymine (T)

Deoxythymidine

Deoxythymidine-5' monophosphate (dTM P) diphosphate (dTDP) triphosphate (dTTP)

Cytosine (C)

Deoxycytidine

Deoxycytidine-5' monophosphate (dCM P) diphosphate (dCDP) triphosphate (dCTP)

Chapter 2 DNA Structure and DNA Manipulation

N ucleotide

(B)

(A)

5’ end

5’ end

5’ en d term in ates with ph osph ate grou p

P

A

NH2

P

G

P

C

–O –O

P

O N

O 5’ CH2

H

–O

N

O

H

H

3’ O

H

P

Phosphodiester bonds

N

3’ end

N NH2

N

H H

3’ O

H

P

NH2

O H

C

5’CH2

H

G

H O

O

3’ end

H

N

H

–O

HO

O

O

5’ CH2

H

H

N

H

O

Ph osph ate lin ked to 5’ carbon an d to 3’ carbon

N

A

H

O H

H

H

3’ OH

H

N

N O

H

3’ en d term in ates with h ydroxyl (–OH) Figure 2.5 Th ree n u cleotides at th e 5 ' en d of a sin gle polyn u cleotide stran d. (A) Th e ch em ical

stru ctu re of th e su gar–ph osph ate lin kages, sh owin g th e 5 ' -to-3 ' orien tation of th e stran d (th e red n u m bers are th ose assign ed to th e carbon atom s). (B) A com m on sch em atic way to depict a polyn u cleotide stran d.

con tin u es th rou gh ou t th e ch ain , wh ich typically con sists of m illion s of n u cleotides. Note th at th e term in al grou ps of each polyn u cleotide ch ain are a 5'-p h o sp h ate (5'-P) gro u p at on e en d an d a 3'-h yd ro x yl (3'-OH) gro u p at th e oth er. Th e asym m etry of th e en ds of a DNA stran d im plies th at each stran d h as a p o larity determ in ed by wh ich en d bears th e 5 ' ph osph ate an d wh ich en d bears th e 3 ' h ydroxyl. A few years before Watson an d Crick proposed th eir essen tially correct th reedim en sion al stru ctu re of DNA as a dou ble h elix, Erwin Ch argaff developed a ch em ical

tech n iqu e to m easu re th e am ou n t of each base presen t in DNA. As we describe th is tech n iqu e, we will let th e m olar con cen tration of an y base be represen ted by th e sym bol for th e base in squ are brackets; for exam ple, [A] den otes th e m olar con cen tration of aden in e. Ch argaff u sed h is tech n iqu e to m easu re th e [A], [T], [G], an d [C] con ten t of th e DNA from a variety of sou rces. He fou n d th at th e base co m p o sitio n of th e DNA, defin ed as th e p e rce n t G ⫹ C, differs am on g species bu t is con stan t in all cells of an organ ism an d with in a species. Data on th e base com position of

2.2 The M olecular Structure of DN A

43

DNA from a variety of organ ism s are given in Table 2.2. Ch argaff also observed certain regu lar relation sh ips am on g th e m olar con cen tration s of th e differen t bases. Th ese relation sh ips are n ow called Ch argaff’s ru le s: • Th e am ou n t of aden in e equ als th at of th ym in e: [A] ⫽ [T]. • Th e am ou n t of gu an in e equ als th at of cytosin e: [G] ⫽ [C]. • Th e am ou n t of th e pu rin e bases equ als th at of th e pyrim idin e bases: [A] ⫹ [G] ⫽ [T] ⫹ [C].

Alth ou gh th e ch em ical basis of th ese observation s was n ot kn own at th e tim e, on e of th e appealin g featu res of th e Watson –Crick stru ctu re of paired com plem en tary stran ds was th at it explain ed Ch argaff’s ru les. Becau se A is always paired with T in dou blestran ded DNA, it m u st follow th at [A] ⫽ [T]. Sim ilarly, becau se G is paired with C, we kn ow th at [G] ⫽ [C]. Th e th ird ru le follows

by addition of th e oth er two: [A] ⫹ [G] ⫽ [T] ⫹ [C]. In th e n ext section , we exam in e th e m olecu lar basis of base pairin g in m ore detail.

Base Pairing and Base Stacking In th e th ree-dim en sion al stru ctu re of th e DNA m olecu le proposed in 1953 by Watson an d Crick, th e m olecu le con sists of two polyn u cleotide ch ain s twisted arou n d on e an oth er to form a dou ble-stran ded h elix in wh ich aden in e an d th ym in e, an d gu an in e an d cytosin e, are paired in opposite stran ds ( Figure 2.6 ). In th e stan dard stru ctu re, wh ich is called th e B fo rm o f D N A , each ch ain m akes on e com plete tu rn every 34 Å. Th e h elix is righ t-h an ded, wh ich m ean s th at as on e looks down th e barrel, each ch ain follows a clockwise path as it progresses. Th e bases are spaced at 3.4 Å, so th ere are ten bases per h elical tu rn in each stran d an d ten base pairs per tu rn of th e dou ble h elix.

Table 2.2 Base composition of DN A from different organisms Base (and percentage of total bases)

Base composition

Thymine

Guanine

Cytosine

(percent G ⫹ C)

26.0

26.0

24.0

24.0

48.0

Clostridium perfringens

36.9

36.3

14.0

12.8

26.8

Streptococcus pneumoniae

30.2

29.5

21.6

18.7

40.3

Escherichia coli

24.7

23.6

26.0

25.7

51.7

Sarcina lutea

13.4

12.4

37.1

37.1

74.2

Saccharomyces cerevisiae

31.7

32.6

18.3

17.4

35.7

N eurospora crassa

23.0

22.3

27.1

27.6

54.7

Wheat

27.3

27.2

22.7

22.8*

45.5

M aize

26.8

27.2

22.8

23.2*

46.0

Drosophila melanagaster

30.8

29.4

19.6

20.2

39.8

Pig

29.4

29.6

20.5

20.5

41.0

Salmon

29.7

29.1

20.8

20.4

41.2

H uman being

29.8

31.8

20.2

18.2

38.4

Organism Bacteriophage T7

Adenine

Bacteria

Fungi

H igher plants

Animals

* Includes one-fourth 5-methylcytosine, a modified form of cytosine found in most plants more complex than algae and in many animals

44

Chapter 2 DNA Structure and DNA Manipulation

(A)

(B)

M inor groove

A C

Adenine

A

T

Thym ine

T

Guanine

G

M ajor groove

Cytosine

G C

A T C G 34 Å per com plete turn (10 base pairs per turn)

GC P

P

GC

C

C

H

C

N

N

N

H

Guanine

N

N

C

H C H

O

O

Cytosine

H

N

C

N C

C

H

C

H

C

C

H

N

N

C

N

N

C N

H

H

Adenine

C N

H

O

H

H

N

C

O

C

N

Thym ine

C

C

P

P

P

3

CH

H

Phosphate Deoxyribose sugar

P P

Base

Diam eter 20 Å

Oxygen Hydrogen Phosphorus C in sugar– phosphate chain C and N in bases Figure 2.6 Two represen tation s of DNA, illu stratin g th e th ree-dim en sion al stru ctu re of th e dou ble

h elix. (A) In a ribbon diagram , th e su gar–ph osph ate backbon es are depicted as ban ds, with h orizon tal lin es u sed to represen t th e base pairs. (B) A com pu ter m odel of th e B form of a DNA m olecu le. Th e stick figu res are th e su gar–ph osph ate ch ain s win din g arou n d ou tside th e stacked base pairs, form in g a m ajor groove an d a m in or groove. Th e color codin g for th e base pairs is as follows: A, red or pin k; T, dark green or ligh t green ; G, dark brown or beige; C, dark blu e or ligh t blu e. Th e bases depicted in dark colors are th ose attach ed to th e blu e su gar–ph osph ate backbon e; th e bases depicted in ligh t colors are attach ed to th e beige backbon e. [B, cou rtesy of An ton y M. Dean .]

2.2 The M olecular Structure of DN A

45

Au : Term in Key Term s list is “h ydroph obic in teraction ” rath er th an “h ydroph ic.” Wou ld you like to ch an ge term in list? Or ch an ge text h ere?

Th e stran ds featu re base p airin g, in wh ich each base is paired to a com plem en tary base in th e oth er stran d by h ydrogen bon ds. (A h yd ro ge n bo n d is a weak bon d in wh ich two participatin g atom s sh are a h ydrogen atom between th em .) Th e h ydrogen bon ds provide on e type of force h oldin g th e stran ds togeth er. In Watson –Crick base pairin g, aden in e (A) pairs with th ym in e (T), an d gu an in e (G) pairs with cytosin e (C). Th e h ydrogen bon ds th at form in th e aden in e–th ym in e base pair an d in th e gu an in e–cytosin e pair are illu strated in Figure 2.7. Note th at an AҀT pair (Figu re 2.7A an d B) h as two h ydrogen bon ds an d th at a GҀC pair (Figu re 2.7C an d D) h as th ree h ydrogen bon ds. Th is m ean s th at th e h ydrogen bon din g between G an d C is stron ger in th e sen se th at it requ ires m ore en ergy to break; for exam ple, th e am ou n t of h eat requ ired to separate th e paired stran ds in a DNA du plex in creases with th e

percen t of G ⫹ C. Becau se n oth in g restricts th e sequ en ce of bases in a sin gle stran d, an y sequ en ce cou ld be presen t alon g on e stran d. Th is explain s Ch argaff’s observation th at DNA from differen t organ ism s m ay differ in base com position . However, becau se th e stran ds in du plex DNA are com plem en tary, Ch argaff’s ru les of [A] ⫽ [T] an d [G] ⫽ [C] are tru e wh atever th e base com position . In th e B form of DNA, th e paired bases are plan ar, parallel to on e an oth er, an d perpen dicu lar to th e lon g axis of th e dou ble h elix. Th is featu re of dou ble-stran ded DNA is kn own as base stack in g. Th e u pper an d lower faces of each n itrogen ou s base are relatively flat an d n on polar (u n ch arged). Th ese su rfaces are said to be h yd ro p h o bic becau se th ey bin d poorly to water m olecu les, wh ich are very polar. (Th e polarity refers to th e asym m etrical distribu tion of ch arge across th e V-sh aped water m olecu le;

(A)

(B)

Two h ydrogen bon ds attract A an d T. H

H C

N

N C

N C Deoxyribose

A

N

H

CH3

O

C

C

C

N

H

T

N

C

C

H

O

Adenine

C

H

N Deoxyribose

Thymine

(C)

(D)

Th ree h ydrogen bon ds attract G an d C. H

H C

N

O C

N C Deoxyribose

N

H

G

C

C

N

H

C

N

C N

H

N

C

C H

O

C

H

N Deoxyribose

H Guanine

Cytosine

Figure 2.7 Norm al base pairs in DNA. On th e left, th e h ydrogen bon ds (dotted lin es) with th e join ed atom s are sh own in red. (A an d B) AҀT base pairin g. (C an d D) GҀC base pairin g. In th e spacefillin g m odels (B an d D), th e colors are as follows: C, gray; N, blu e; O, red; an d H (sh own in th e bases on ly), wh ite. Each h ydrogen bon d is depicted as a wh ite disk squ eezed between th e atom s sh arin g th e h ydrogen . Th e stick figu res on th e ou tside represen t th e backbon es win din g arou n d th e stacked base pairs. [Space-fillin g m odels cou rtesy of An ton y M. Dean .] 46

Chapter 2 DNA Structure and DNA Manipulation

th e oxygen at th e base of th e V ten ds to be qu ite n egative, wh ereas th e h ydrogen s at th e tips are qu ite positive). Owin g to th eir repu lsion of water m olecu les, th e paired n itrogen ou s bases ten d to stack on top of on e an oth er in su ch a way as to exclu de th e m axim u m am ou n t of water from th e in terior of th e dou ble h elix. Hen ce a dou blestran ded DNA m olecu le h as a h ydroph obic core com posed of stacked bases, an d it is th e en ergy of base stackin g th at provides dou ble-stran ded DNA with m u ch of its ch em ical stability. Wh en discu ssin g a DNA m olecu le, m olecu lar biologists frequ en tly refer to th e in dividu al stran ds as sin gle stran ds or as sin gle-stran ded DNA; th ey refer to th e dou ble h elix as dou ble-stran ded DNA or as a duplex m olecu le. Th e two grooves spiralin g alon g ou tside of th e dou ble h elix are n ot sym m etrical; on e groove, called th e m ajo r gro o ve , is larger th an th e oth er, wh ich is called th e m in o r gro o ve . Protein s th at in teract with dou ble-stran ded DNA often h ave region s th at m ake con tact with th e base pairs by fittin g in to th e m ajor groove, in to th e m in or groove, or in to both grooves (Figu re 2.6B).

5’ end (term inates in 5’ phosphate)

P

T

A

P

P

G

C

P

P

C

G

P

P

A

T

P

P

T

A

P

P

G

C

P 5’ end (term inates in 5’ phosphate)

HO

Antiparallel Strands Each backbon e in a dou ble h elix con sists of deoxyribose su gars altern atin g with ph osph ate grou ps th at lin k th e 3 ' carbon atom of on e su gar to th e 5 ' carbon of th e n ext in lin e (Figu re 2.5). Th e two polyn u cleotide stran ds of th e dou ble h elix h ave opposite polarity in th e sen se th at th e 5 ' en d of on e stran d is paired with th e 3 ' en d of th e oth er stran d. Stran ds with su ch an arran gem en t are said to be an tip aralle l. On e im plication of an tiparallel stran ds in du plex DNA is th at in each pair of bases, on e base is attach ed to a su gar th at lies above th e plan e of pairin g, an d th e oth er base is attach ed to a su gar th at lies below th e plan e of pairin g. An oth er im plication is th at each term in u s of th e dou ble h elix possesses on e 5 ' -P grou p (on on e stran d) an d on e 3 ' -OH grou p (on th e oth er stran d), as sh own in Figure 2.8. Th e diagram of th e DNA du plex in Figu re 2.6 is static an d so som ewh at m isleadin g. DNA is a dyn am ic m olecu le, con stan tly in m otion . In som e region s, th e stran ds can separate briefly an d th en com e togeth er again in th e sam e con form ation or in a dif-

3’ end (term inates in 3’ hydroxyl) OH

3’ end (term inates in 3’ hydroxyl) Figure 2.8 A segm en t of a DNA m olecu le, sh owin g th e an tiparallel orien tation of th e com plem en tary stran ds. Th e overlyin g blu e arrows in dicate th e 5 ' -to-3 ' direction of each stran d. Th e ph osph ates (P) join th e 3 ' carbon atom of on e deoxyribose (h orizon tal lin e) to th e 5 ' carbon atom of th e adjacen t deoxyribose.

feren t on e. Th e righ t-h an ded dou ble h elix in Figu re 2.6 is th e stan dard B form , bu t depen din g on con dition s, DNA can actu ally form m ore th an 20 sligh tly differen t varian ts of a righ t-h an ded h elix, an d som e region s can even form h elices in wh ich th e stran ds twist to th e left (called th e Z fo rm o f D N A ). If th ere are com plem en tary stretch es of n u cleotides in th e sam e stran d, th en a sin gle stran d, separated from its partn er, can fold back u pon itself like a h airpin . Even triple h elices con sistin g of th ree stran ds can form in region s of DNA th at con tain su itable base sequ en ces. 2.2 The M olecular Structure of DN A

47

The Double H elix James D. Watson and Francis H . C. Crick 1953

Cavendish Laboratory, Cam bridge, England A Structure for Deoxyribose Nucleic Acid Thisisone of the watershed papersof twentieth-century biology. After its publication, nothing in genetics was the same. Everything that was known, and everything still to be discovered, would now need to be interpreted in terms of the structure and function of DN A. The importance of the paper was recognized immediately, in no small part because of its lucid and concise description of the structure. Watson and Crick benefited tremendously in knowing that their structure was consistent with the unpublished structural studies of M aurice Wilkins and Rosalind Franklin. The same issue of Nature that included the Watson and Crick paper also included, back to back, a paper from the Wilkins group and one from the Franklin group detailing their data and the consistency of their data with the proposed structure. It has been said that Franklin was poised a mere two halfsteps from making the discovery herself, alone. In any event, Watson and Crick and Wilkins were awarded the 1962 N obel Prize for their discovery of DN A structure. Rosalind Franklin, tragically, died of cancer in 1958 at the age of 38.

Wewish to suggest a structurefor thesalt nine and cytosine. The sequence of of deoxyribose nucleic acid (DN A). . . . bases on a single chain does not appear The structure has two helical chains to be restricted in any way. H owever, if each coiled round the same axis. . . . only specific pairs of bases can be Both chains follow right-handed helices, formed, it follows that if the sequence of but the two chains run in opposite dibases on one chain is given, then the serections. . . . The bases are quence on the other on the inside of the helix and If only specific pairs chain is automatically the phosphates on the outdetermined. . . . It has of bases can be side. . . . There is a residue not escaped our notice formed, it follows on each chain every 3.4 Å and that the specific pairthat if the sequence the structure repeats after 10 ing we have postulated residues. . . . The novel feaimmediately suggests of bases on one ture of the structure is the a plausible copying chain is given, then manner in which the two mechanism for the gethe sequence on the chains are held together by netic material. . . . We other chain is the purine and pyrimidine are much indebted to bases. The planes of the Dr. Jerry Donohue for automatically bases are perpendicular to constant advice and determined. the fiber axis. They are joined criticism, especially on together in pairs, a single base from one interatomic distances. We have also chain being hydrogen-bonded to a sinbeen stimulated by a knowledge of the gle base from the other chain, so that general nature of the unpublished exthe two lie side by side. One of the pair perimental results and ideas of Dr. must be a purine and the other a pyrimiM aurice H . F. Wilkins, Dr. Rosalind dine for bonding to occur. . . . Only speFranklin and their co-workers at King’s cific pairs of bases can bond together. College, London. These pairs are adenine (purine) with thymine (pyrimidine), and guanine Source: N ature 171: 737–738 (purine) with cytosine (pyrimidine). In other words, if an adenine forms one member of a pair, on either chain, then on these assumptions the other member must be thymine; similarly for gua-

DN A Structure as Related to Function In th e stru ctu re of th e DNA m olecu le, we can see h ow th ree essen tial requ irem en ts of a gen etic m aterial are m et. 1. An y gen etic m aterial m u st be able to be replicated accu rately, so th at th e in form ation it con tain s will be precisely replicated an d in h erited by dau gh ter cells. Th e basis for exact du plication of a DNA m olecu le is th e

48

Chapter 2 DNA Structure and DNA Manipulation

pairin g of A with T an d of G with C in th e two polyn u cleotide ch ain s. Un win din g an d separation of th e ch ain s, with each free ch ain bein g copied, resu lts in th e form ation of two iden tical dou ble h elices (see Figu re 1.6). 2. A gen etic m aterial m u st also h ave th e capacity to carry all of th e in form ation n eeded to direct th e organ ization an d m etabolic activities of th e cell. As we saw in Ch apter 1, th e produ ct of m ost gen es is a protein m olecu le—a polym er com posed of repeatin g u n its of am in o acids. Th e sequ en ce

of am in o acids in th e protein determ in es its ch em ical an d ph ysical properties. A gen e is expressed wh en its protein produ ct is syn th esized, an d on e requ irem en t of th e gen etic m aterial is th at it direct th e order in wh ich am in o acid u n its are added to th e en d of a growin g protein m olecu le. In DNA, th is is don e by m ean s of a gen etic code in wh ich grou ps of th ree bases specify am in o acids. Becau se th e fou r bases in a DNA m olecu le can be arran ged in an y sequ en ce, an d becau se th e sequ en ce can vary from on e part of th e m olecu le to an oth er an d from organ ism to organ ism , DNA can con tain a great m an y u n iqu e region s, each of wh ich can be a distin ct gen e. A lon g DNA ch ain can direct th e syn th esis of a variety of differen t protein m olecu les. 3. A gen etic m aterial m u st also be capable of u n dergoin g occasion al m u tation s in wh ich th e in form ation it carries is altered. Fu rth erm ore, so th at m u tation s will be h eritable, th e m u tan t m olecu les m u st be capable of bein g replicated as faith fu lly as th e paren tal m olecu le. Th is featu re is n ecessary to accou n t for th e evolu tion of diverse organ ism s th rou gh th e slow accu m u lation of favorable m u tation s. Watson an d Crick su ggested th at h eritable m u tation s m igh t be possible in DNA by rare m ispairin g of th e bases, with th e resu lt th at an in correct n u cleotide becom es in corporated in to a replicatin g DNA stran d.

2.3

The Separation and Identification of Genomic DN A Fragments

Th e followin g section s sh ow h ow an u n derstan din g of DNA stru ctu re an d replication h as been pu t to practical u se in th e developm en t of procedu res for th e separation an d iden tification of particu lar DNA fragm en ts. Th ese m eth ods are u sed prim arily eith er to iden tify DNA m arkers or to aid in th e isolation of particu lar DNA fragm en ts th at are of gen etic in terest. For exam ple, con sider a pedigree of fam ilial breast can cer in wh ich a particu lar DNA fragm en t serves as a m arker for a bit of ch rom osom e th at also in clu des th e m u tan t gen e respon sible for th e in creased risk; th en th e ability to iden tify th e fragm en t is critically im portan t in assessin g th e relative risk for each of th e wom en in th e pedigree

wh o m igh t carry th e m u tan t gen e. To take an oth er exam ple, su ppose th ere is reason to believe th at a m u tation cau sin g a gen etic disease is presen t in a particu lar DNA fragm en t; th en it is im portan t to be able to pin poin t th is fragm en t an d isolate it from affected in dividu als to verify wh eth er th is h ypoth esis is tru e an d, if so, to iden tify th e n atu re of th e m u tation . Most procedu res for th e separation an d iden tification of DNA fragm en ts can be grou ped in to two gen eral categories: 1. Th ose th at iden tify a specific DNA fragm en t presen t in gen om ic DNA by m akin g u se of th e fact th at com plem en tary sin gle-stran ded DNA sequ en ces can , u n der th e proper con dition s, form a du plex m olecu le. Th ese procedu res rely on nucleic acid hybridization. 2. Th ose th at u se prior kn owledge of th e sequ en ce at th e en ds of a DNA fragm en t to specifically an d repeatedly replicate th is on e fragm en t from gen om ic DNA. Th ese procedu res rely on selective DNA replication ( amplification ) by m ean s of th e polymerase chain reaction.

Th e m ajor differen ce between th ese approach es is th at th e first (relyin g on n u cleic acid h ybridization ) iden tifies fragm en ts th at are presen t in th e gen om ic DNA itself, wh ereas th e secon d (relyin g on DNA am plification ) iden tifies experim en tally m an u factu red replicas of fragm en ts wh ose origin al tem plates (bu t n ot th e replicas) were presen t in th e gen om ic DNA. Th is differen ce h as practical im plication s: • Hybridization m eth ods requ ire a greater am ou n t of gen om ic DNA for th e experim en tal procedu res, bu t relatively large fragm en ts can be iden tified, an d n o prior kn owledge of th e DNA sequ en ce is n ecessary. • Am plification m eth ods requ ire extrem ely sm all am ou n ts of gen om ic DNA for th e experim en tal procedu res, bu t th e am plification is u su ally restricted to relatively sm all fragm en ts, an d som e prior kn owledge of DNA sequ en ce is n ecessary.

Th e followin g section s discu ss both types of approach es an d give exam ples of h ow th ey are u sed. In m eth ods th at u se n u cleic acid h ybridization to iden tify particu lar fragm en ts presen t in gen om ic DNA, th e first step is u su ally cu ttin g th e gen om ic DNA in to fragm en ts of experim en tally m an ageable size. Th is procedu re is discu ssed n ext.

2.3 The Separation and Identification of Genomic DN A Fragments

49

Restriction Enzymes and Site-Specific DN A Cleavage Procedu res for ch em ical isolation of DNA, su ch as th ose developed by Avery, MacLeod, an d McCarty (Ch apter 1), u su ally lead to ran dom breakage of dou blestran ded m olecu les in to an average len gth of abou t 50,000 base pairs. Th is len gth is den oted 50 kb, wh ere kb stan ds for k ilo base s (1 kb ⫽ 1000 base pairs). A len gth of 50 kb is close to th e len gth of dou ble-stran ded DNA presen t in th e bacterioph age ␭ th at in fects E. coli. Th e 50-kb fragm en ts can be m ade sh orter by vigorou s sh earin g forces, su ch as occu r in a kitch en blen der, bu t on e of th e problem s with breakin g large DNA m olecu les in to sm aller fragm en ts by ran dom sh earin g is th at th e fragm en ts con tain in g a particu lar gen e, or part of a gen e, will be of differen t sizes. In oth er words, with ran dom sh earin g, it is n ot possible to isolate an d iden tify a particular DNA fragm en t on th e basis of its size an d sequ en ce con ten t, becau se each ran dom ly sh eared m olecu le th at con tain s th e desired sequ en ce som ewh ere with in it differs in size from all oth er m olecu les th at con tain th e sequ en ce. In th is section we describe an im portan t en zym atic tech n iqu e th at can be u sed for cleavin g DNA m olecu les at specific sites. Th is m eth od en su res th at all DNA fragm en ts th at con tain a particu lar sequ en ce h ave th e sam e size; fu rth erm ore, each fragm en t th at con tain s th e desired sequ en ce h as th e sequ en ce located at exactly th e sam e position with in th e fragm en t. Th e cleavage m eth od m akes u se of an im portan t class of DNA-cleavin g en zym es isolated prim arily from bacteria. Th e en zym es are called re strictio n e n d o n u cle ase s or re strictio n e n zym e s, an d th ey are able to cleave DNA m olecu les at th e position s at wh ich particu lar, sh ort sequ en ces of bases are presen t. Th ese n atu rally occu rrin g en zym es serve to protect th e bacterial cell by disablin g th e DNA of bacterioph ages th at attack it. Th eir discovery earn ed Wern er Arber of Switzerlan d a Nobel Prize in 1978. Tech n ically, th e en zym es are kn own as type II restriction endonucleases. Th e restriction en zym e Bam HI is on e exam ple; it recogn izes th e dou ble-stran ded sequ en ce 5' -GGATCC-3' 3' -CCTAGG-5' 50

Chapter 2 DNA Structure and DNA Manipulation

Figure 2.9 Stru ctu re of th e part of th e restriction

en zym e Bam HI th at com es in to con tact with its recogn ition site in th e DNA (blu e). Th e pin k an d green cylin ders represen t region s of th e en zym e in wh ich th e am in o acid ch ain is twisted in th e form of a righ t-h an ded h elix. [Cou rtesy of A. A. Aggarwal. Reprin ted with perm ission from M. Newm an , T. Strzelecka, L. F. Dorn er, I. Sch ildkrau t, an d A. A. Aggarwal, 1995. Science 269: 656. Copyrigh t 2000 Am erican Association for th e Advan cem en t of Scien ce.]

an d cleaves each stran d between th e Gbearin g n u cleotides sh own in red. Figure 2.9 sh ows h ow th e region s th at m ake u p th e active site of Bam HI con tact th e recogn ition site (blu e) ju st prior to cleavage, an d th e cleavage reaction is in dicated in Figure 2.10 . Table 2.3 lists n in e of th e several h u n dred restriction en zym es th at are kn own . Most restriction en zym es are n am ed after th e species in wh ich th ey were fou n d. Bam HI, for exam ple, was isolated from th e bacteriu m Bacillus amyloliquefaciens strain H, an d it is th e first (I) restriction en zym e isolated from th is organ ism . Becau se th e first th ree letters in th e n am e of each restriction en zym e stan d for th e bacterial species of origin , th ese letters are prin ted in italics; th e rest of th e sym bols in th e n am e are n ot italicized. Most restriction en zym es recogn ize on ly on e sh ort base sequ en ce, u su ally fou r or six n u cleotide pairs. Th e en zym e bin ds with th e DNA at th ese sites an d m akes a break in each stran d of th e DNA m olecu le, produ cin g 3 ' -OH an d 5 ' -P grou ps at each position . Th e n u cleotide sequ en ce recogn ized for cleavage by a restriction en zym e is called th e re strictio n site of th e en zym e. Th e exam ples in Table 2.3 sh ow th at som e restriction en zym es cleave th eir restriction site asym m etrically (at differen t sites in th e

Bam H1 restriction site, GGATCC

3’ end

5’ end GGA T CC CCTA GG 3’ end

5’ end

Cleavage creates a sh ort com plem en tary sin glestran ded overh an g in each cleaved en d (“sticky en ds”).

Cleavage occu rs in each stran d at th e site of th e arrowh ead. 5’

New ends created

5’

3’

3’

5’

3’ G CCTAG

Figure 2.10 Mech an ism of DNA cleavage by th e restriction en zym e Bam HI. Wh erever th e du plex con tain s a Bam HI restriction site, th e en zym e m akes a sin gle cu t in th e backbon e of each DNA stran d. Each cu t creates a n ew 3 ' en d an d a n ew 5 ' en d, separatin g th e du plex in to two fragm en ts. In th e case of Bam HI th e cu ts are staggered cu ts, so th e resu ltin g en ds term in ate in sin gle-stran ded region s, each fou r base pairs in len gth .

GA T CC G 5’

3’

Restriction fragment

Restriction fragment

Table 2.3 Some restriction endonucleases, their sources, and their cleavage sites Enzyme (M icroorganism)

Enzyme (M icroorganism)

Eco RI ( Escherichia coli )

GA A T T C CT TAAG

Enzyme (M icroorganism)

H ind III ( H aemophilus influenzae)

Target sequ en ce an d cleavage site; sticky en ds

AAGCT T T T CGA A

AruI ( Arthrobacter luteus)

Target sequ en ce an d cleavage site; blu n t en ds

AGCT T CGA

Bam H I ( Bacillus amyloliquefaciens H )

Pst I

RsAI

( Providencia stuartii )

( Rhodopseudomonas sphaeroides)

GGA T CC CCTA GG

CTGCAG GA CGT C

GTAC CA TG

H aeII

Taq I

PvuII

( H aemophilus aegyptus)

( Thermus aquaticus)

( Proteus vulgaris)

PuG C G C Py Py C G C G Pu

T CGA AGCT

CAGCTG GT CGA C

Note: The vertical dashed line indicates the axis of symmetry in each sequence. Red arrows indicate the sites of cutting. The enzyme Taq I yields cohesive ends consisting of two nucleotides, whereas the cohesive ends produced by the other enzymes contain four nucleotides. Pu and Py refer to any purine and pyrimidine, respectively.

2.3 The Separation and Identification of Genomic DN A Fragments

51

Bam H I ( Bacillus amyloliquefaciens H )

GGA T CC CCTA GG

AruI ( Arthrobacter luteus)

AGCT T CGA

two DNA stran ds), bu t oth er restriction en zym es cleave sym m etrically (at th e sam e site in both stran ds). Th e form er leave stick y e n d s becau se each en d of th e cleaved site h as a sm all, sin gle-stran ded overh an g th at is com plem en tary in base sequ en ce to th e oth er en d (Figu re 2.10). In con trast, en zym es th at h ave sym m etrical cleavage sites yield DNA fragm en ts th at h ave blu n t e n d s. In virtu ally all cases, th e restriction site of a restriction en zym e reads th e sam e on both stran ds, provided th at th e opposite polarity of th e stran ds is taken in to accou n t; for exam ple, each stran d in th e restriction site of Bam HI reads 5 ' -GGATCC-3 ' (Figu re 2.10). A DNA sequ en ce with th is type of sym m etry is called a p alin d ro m e . (In ordin ary En glish , a palin drom e is a word or ph rase th at reads th e sam e forwards an d backwards, su ch as “m adam .”) Restriction en zym es h ave th e followin g im portan t ch aracteristics: • Most restriction en zym es recogn ize a sin gle restriction site. • Th e restriction site is recogn ized with ou t regard to th e sou rce of th e DNA. • Becau se m ost restriction en zym es recogn ize a u n iqu e restriction site sequ en ce, th e n u m ber of cu ts in th e DNA from a particu lar organ ism is determ in ed by th e n u m ber of restriction sites presen t.

Slots for sam ples

Bands (visible after suitable treatm ent) Gel

Electrode

Buffer solution



Direction of m ovem ent

+

Figure 2.11 Apparatu s for gel electroph oresis. Liqu id gel is allowed to h arden with an appropriately sh aped m old in place to form “wells” for th e sam ples (pu rple). After electroph oresis, th e DNA fragm en ts, located at variou s position s in th e gel, are m ade visible by im m ersin g th e gel in a solu tion con tain in g a reagen t th at bin ds to or reacts with DNA. Th e separated fragm en ts in a sam ple appear as ban ds, wh ich m ay be eith er visibly colored or flu orescen t, depen din g on th e particu lar reagen t u sed. Th e region of a gel in wh ich th e fragm en ts in on e sam ple can m ove is called a lane; th is gel h as seven lan es. 52

Chapter 2 DNA Structure and DNA Manipulation

Th e DNA fragm en t produ ced by a pair of adjacen t cu ts in a DNA m olecu le is called a re strictio n fragm e n t. A large DNA m olecu le will typically be cu t in to m an y restriction fragm en ts of differen t sizes. For exam ple, an E. coli DNA m olecu le, wh ich con tain s 4.6 ⫻ 10 6 base pairs, is cu t in to several h u n dred to several th ou san d fragm en ts, an d m am m alian gen om ic DNA is cu t in to m ore th an a m illion fragm en ts. Alth ou gh th ese n u m bers are large, th ey are actu ally qu ite sm all relative to th e n u m ber of su gar–ph osph ate bon ds in th e DNA of an organ ism .

Gel Electrophoresis Th e DNA fragm en ts produ ced by a restriction en zym e can be separated by size u sin g th e fact th at DNA is n egatively ch arged an d m oves in respon se to an electric field. If th e term in als of an electrical power sou rce are con n ected to th e opposite en ds of a h orizon tal tu be con tain in g a DNA solu tion , th en th e DNA m olecu les will m ove toward th e positive en d of th e tu be at a rate th at depen ds on th e electric field stren gth an d on th e sh ape an d size of th e m olecu les. Th e m ovem en t of ch arged m olecu les in an electric field is called electrophoresis. Th e type of electroph oresis m ost com m on ly u sed in gen etics is ge l e le ctro p h o re sis. An experim en tal arran gem en t for gel electroph oresis of DNA is sh own in Figure 2.11. A th in slab of a gel, u su ally agarose or acrylam ide, is prepared con tain in g sm all slots (called wells) in to wh ich sam ples are placed. An electric field is applied, an d th e n egatively ch arged DNA m olecu les pen etrate an d m ove th rou gh th e gel toward th e an ode (th e positively ch arged electrode). A gel is a com plex m olecu lar n etwork th at con tain s n arrow, tortu ou s passages, so sm aller DNA m olecu les pass th rou gh m ore easily; h en ce th e rate of m ovem en t in creases as th e size of th e DNA fragm en t decreases. Figure 2.12 sh ows th e resu lt of electroph oresis of a set of dou blestran ded DNA m olecu les in an agarose gel. Each discrete region con tain in g DNA is called a ban d . Th e ban ds can be visu alized u n der u ltraviolet ligh t after soakin g th e gel in th e dye ethidium bromide, th e m olecu les of wh ich in tercalate between th e stacked bases in du plex DNA an d ren der it flu orescen t. In Figu re 2.12, each ban d in th e gel

Ban d from h eaviest fragm en t (m oves least) Direction of movement

Figure 2.12 Gel electroph oresis of DNA. Fragm en ts of differen t sizes were m ixed an d placed in a well. Electroph oresis was in th e down ward direction . Th e DNA h as been m ade visible by th e addition of a dye (eth idiu m brom ide) th at bin ds on ly to DNA an d th at flu oresces wh en th e gel is illu m in ated with sh ort-wavelen gth u ltraviolet ligh t.

Ban d from ligh test fragm en t (m oves m ost)

20

log 10 of size in bp

18 16 Size range (kb) for efficient separation of linear double-stranded DNA fragm ents

resu lts from th e fact th at all DNA fragm en ts of a given size h ave m igrated to th e sam e position in th e gel. To produ ce a visible ban d, a m in im u m of abou t 5 ⫻ 10 ⫺9 gram s of DNA is requ ired, wh ich for a fragm en t of size 3 kb works ou t to abou t 10 9 m olecu les. Th e poin t is th at a very large n u m ber of copies of an y particu lar DNA fragm en t m u st be presen t in order to yield a visible ban d in an electroph oresis gel. A lin ear dou ble-stran ded DNA fragm en t h as an electroph oretic m obility th at decreases in proportion to th e logarith m of its len gth in base pairs—th e lon ger th e fragm en t, th e slower it m oves—bu t th e proportion ality con stan t depen ds on th e agarose con cen tration , th e com position of th e bu fferin g solu tion , an d th e electroph oretic con dition s. Th is m ean s th at differen t con cen tration s of agarose allow efficien t separation of differen t size ran ges of DNA fragm en ts (see Figure 2.13). Less den se gels, su ch as 0.6 percen t agarose, are u sed to separate larger fragm en ts; wh ereas m ore den se gels, su ch as 2 percen t agarose, are u sed to separate sm aller fragm en ts. Th e in set in Figu re 2.13 sh ows th e depen den ce of electroph oretic m obility on th e logarith m of fragm en t size. It also in dicates th at th e lin ear relation sh ip breaks down for th e largest fragm en ts th at can be resolved u n der a given set of con dition s.

For an y on e agarose con cen tration , except for th e largest fragm en ts, th e distan ce m igrated decreases as a lin ear fu n ction of th e logarith m of fragm en t size.

14 12

Distance m igrated

10 8 6 4 2 0

0.6

0.7 0.9 1.2

1.5

2.0

Agarose concentration (%) Figure 2.13 In agarose gels, th e con cen tration of

agarose is an im portan t factor in determ in in g th e size ran ge of DNA fragm en ts th at can be separated.

2.3 The Separation and Identification of Genomic DN A Fragments

53

(A) EcoRI enzym e

λ DNA

22

5

5.5

7.5

6

4

1

5

4

2

3

6

1 (B)

2

3 4

5

6

EcoRI Bam HI 2 34

1

5/6

(C) Bam HI enzym e

λ DNA

6

1

5

4

3

2

5.5

17.5

5

6.5

7.5

8

Figure 2.14 Restriction m aps of ␭ DNA for th e restriction en zym es (A) EcoRI an d (C) Bam HI. Th e vertical bars in dicate th e sites of cu ttin g. Th e n u m bers with in th e arrows are th e approxim ate len gth s of th e fragm en ts in kilobase pairs (kb). (B) An electroph oresis gel of Bam HI an d EcoRI en zym e digests of ␭ DNA. Nu m bers in dicate fragm en ts in order from largest (1) to sm allest (6); th e circled n u m bers on th e m aps correspon d to th e n u m bers beside th e gel. Th e DNA h as n ot u n dergon e electroph oresis lon g en ou gh to separate ban ds 5 an d 6 of th e Bam HI digest. Note: In Problem 2 at th e en d of th is ch apter (Gu ide to Problem Solvin g), we sh ow h ow to u se th e resu lts of a dou ble digest to determ in e th e particu lar order of fragm en ts for a pair of restriction en zym es.

Becau se of th e sequ en ce specificity of cleavage, a particular restriction enzyme produces a unique set of fragments for a particular DNA molecule. An oth er en zym e will produ ce a differen t set of fragm en ts from th e sam e DNA m olecu le. In Figure 2.14 , th is prin ciple is illu strated for th e digestion of E. coli ph age ␭ DNA by eith er EcoRI or Bam HI (see part B). Th e location s of th e cleavage sites for th ese en zym es in ␭ DNA are sh own in Figu res 2.14A an d C. A diagram sh owin g sites of cleavage alon g a DNA m olecu le is called a re strictio n m ap . Particu lar DNA fragm en ts can be isolated by cu ttin g ou t th e sm all region of th e gel th at con tain s th e fragm en t an d rem ovin g th e DNA from th e gel. On e im portan t u se of isolated restriction fragm en ts em ploys th e en zym e DNA ligase to in sert th em in to self-replicatin g m olecu les su ch as bacterioph age, plasm ids, or even sm all artificial ch rom osom es (Figu re 2.2). Th ese procedu res con stitu te D N A clo n in g an d are th e basis of on e form of

54

Chapter 2 DNA Structure and DNA Manipulation

genetic engineering, discu ssed fu rth er in Ch apter 13. DNA fragm en ts th at h ave been clon ed in to organ ism s su ch as E. coli are widely u sed becau se th e fragm en ts can be isolated in large am ou n ts an d pu rified relatively easily. Am on g th e u ses of clon ed DNA are: • DNA sequencing. All cu rren t m eth ods of DNA sequ en cin g requ ire clon ed DNA fragm en ts. Th ese m eth ods are discu ssed in Ch apter 6. • Nucleic acid hybridization. As we sh all see below, an im portan t application of clon ed DNA fragm en ts en tails in corporatin g a radioactive or ligh t-em ittin g “label” in to th em , after wh ich th e labeled m aterial is u sed to “tag” DNA fragm en ts con tain in g sim ilar sequ en ces. • Storage and distribution. Clon ed DNA can be stored for lon g periods with ou t risk of ch an ge an d can easily be distribu ted to oth er research ers.

N ucleic Acid H ybridization Most gen om es are su fficien tly large an d com plex th at digestion with a restriction en zym e produ ces m an y ban ds th at are th e sam e or sim ilar in size. Iden tifyin g a particu lar DNA fragm en t in a backgrou n d of m an y oth er fragm en ts of sim ilar size presen ts a n eedle-in -a-h aystack problem . Su ppose, for exam ple, th at we are in terested in a particu lar 3.0 Bam HI fragm en t from th e h u m an gen om e th at serves as a m arker in dicatin g th e presen ce of a gen etic risk factor toward breast can cer am on g th e in dividu als in a particu lar pedigree. Th is fragm en t of 3.0 kb is in distin gu ish able, on th e basis of size alon e, from fragm en ts ran gin g from abou t 2.9 to 3.1 kb. How m an y fragm en ts in th is size ran ge are expected? Wh en h u m an gen om ic DNA is cleaved with Bam HI, th e average len gth of a restriction fragm en t is 4 6 ⫽ 4096 base pairs, an d th e expected total n u m ber of Bam HI fragm en ts is abou t 730,000; in th e size ran ge 2.9–3.1 kb, th e expected n u m ber of fragm en ts is abou t 17,000. Wh at th is m ean s is th at even th ou gh we kn ow th at th e fragm en t we are in terested in is 3 kb in len gth , it is on ly on e of 17,000 fragm en ts th at are so sim ilar in size th at ou rs can n ot be distin gu ish ed from th e oth ers by len gth alon e. Th is iden tification task is actu ally h arder th an fin din g a n eedle in a h aystack becau se

• A sm all part of a DNA fragm en t can be “zipped” with a m u ch larger DNA fragm en t. Th is prin ciple is u sed in iden tifyin g specific DNA fragm en ts in a com plex m ixtu re, su ch as th e 3-kb Bam HI m arker for breast can cer th at we h ave been con siderin g. Application s of th is type in clu de th e trackin g of gen etic m arkers in pedigrees an d th e isolation of fragm en ts con tain in g a particu lar m u tan t gen e. • A DNA fragm en t from on e gen e can be “zipped” with sim ilar fragm en ts from oth er gen es in th e sam e gen om e; th is prin ciple is u sed to iden tify differen t m em bers of families of gen es th at are sim ilar, bu t n ot iden tical, in sequ en ce an d th at h ave related fu n ction s. • A DNA fragm en t from on e species can be “zipped” with sim ilar sequ en ces from oth er species. Th is allows th e isolation of gen es th at h ave th e sam e or related fu n ction s in m u ltiple species. It is u sed to stu dy aspects of m olecu lar evolu tion , su ch as h ow differen ces in sequ en ce are correlated with differen ces in fu n ction , an d th e pattern s an d rates of ch an ge in gen e sequ en ces as th ey evolve.

As we saw in Section 2.2, th e dou blestran ded h elical stru ctu re of DNA is m ain tain ed by base stackin g an d by h ydrogen bon din g between th e com plem en tary base

Denatured single strands

Stran d separation Relative light absorbance at 260 nm

h aystacks are u su ally dry. A m ore accu rate an alogy wou ld be lookin g for a n eedle in a h aystack th at h ad been pitch ed in to a swim m in g pool fu ll of water. Th is an alogy is m ore relevan t becau se gels, even th ou gh th ey con tain a su pportin g m atrix to m ake th em sem isolid, are prim arily com posed of water, an d each DNA m olecu le with in a gel is su rrou n ded en tirely by water. Clearly, we n eed som e m eth od by wh ich th e m olecu les in a gel can be im m obilized an d ou r specific fragm en t iden tified. Th e DNA fragm en ts in a gel are u su ally im m obilized by tran sferrin g th em on to a sh eet of special filter paper con sistin g of n itrocellu lose, to wh ich DNA can be perm an en tly (covalen tly) bou n d. How th is is don e is described in th e n ext section . In th is section we exam in e h ow th e two stran ds in a dou ble h elix can be “u n zipped” to form sin gle stran ds an d h ow, u n der th e proper con dition s, two sin gle stran ds th at are com plem en tary or n early com plem en tary in sequ en ce can be “zipped” togeth er to form a differen t dou ble h elix. Th e “u n zippin g” is called d e n atu ratio n , th e “zippin g” re n atu ratio n . Th e practical application s of den atu ration an d ren atu ration are m an y:

1.40

1.30 Further denaturation

Tem peratu re at wh ich h alf th e base pairs are den atu red an d h alf rem ain in tact

1.20

Denaturation begins

1.10

Doublestranded DNA 1.00 30

50

70

Tm 90

100

110

Tem perature, °C Figure 2.15 Mech an ism of den atu ration of DNA by h eat. Th e tem peratu re at wh ich 50 percen t of th e base pairs are den atu red is th e melting temperature, sym bolized Tm .

pairs. Wh en solu tion s con tain in g DNA fragm en ts are raised to tem peratu res in th e ran ge 85–100°C, or to th e h igh pH of stron g alkalin e solu tion s, th e paired stran ds begin to separate, or “u n zip.” Un win din g of th e h elix h appen s in less th an a few m in u tes (th e tim e depen ds on th e len gth of th e m olecu le). Wh en th e h elical stru ctu re of DNA is disru pted, an d th e stran ds h ave becom e com pletely u n zipped, th e m olecu le is said to be d e n atu re d . A com m on way to detect den atu ration is by m easu rin g th e capacity of DNA in solu tion to absorb u ltraviolet ligh t of wavelen gth 260 n m , becau se th e absorption at 260 n m ( A 260 ) of a solu tion of sin gle-stran ded m olecu les is 37 percen t h igh er th an th e absorption of th e dou blestran ded m olecu les at th e sam e con cen tration . As sh own in Figure 2.15, th e progress of den atu ration can be followed by slowly h eatin g a solu tion of dou ble-stran ded DNA an d recordin g th e valu e of A 260 at variou s tem peratu res. Th e tem peratu re requ ired for den atu ration in creases with G ⫹ C con ten t, n ot on ly becau se GҀC base pairs h ave th ree h ydrogen bon ds an d AҀT base pairs two, bu t becau se con secu tive GҀC base pairs h ave stron ger base stackin g.

2.3 The Separation and Identification of Genomic DN A Fragments

55

Photocaptionph otocaption ph oto caption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otoc aption ph otocaption

09131_01_1746P

Den atu red DNA stran ds can , u n der certain con dition s, form dou ble-stran ded DNA with oth er stran ds, provided th at th e stran ds are su fficien tly com plem en tary in sequ en ce. Th is process of ren atu ration is called n u cle ic acid h ybrid izatio n becau se th e dou ble-stran ded m olecu les are “h ybrid” in th at each stran d com es from a differen t sou rce. For DNA stran ds to h ybridize, two requ irem en ts m u st be m et: 1. Th e salt con cen tration m u st be h igh (⬎ 0.25M) to n eu tralize th e n egative ch arges of th e ph osph ate grou ps, wh ich wou ld oth erwise cau se th e com plem en tary stran ds to repel on e an oth er. 2. Th e tem peratu re m u st be h igh en ou gh to disru pt h ydrogen bon ds th at form at ran dom between sh ort sequ en ces of bases with in th e sam e stran d, bu t n ot so h igh th at stable base pairs between th e com plem en tary stran ds are disru pted.

Th e in itial ph ase of ren atu ration is a slow process becau se th e rate is lim ited by th e ran dom ch an ce th at a region of two com plem en tary stran ds will com e togeth er to form a sh ort sequ en ce of correct base pairs. Th is in itial pairin g step is followed by a rapid pairin g of th e rem ain in g com plem en tary bases an d rewin din g of th e h elix. Rewin din g is accom plish ed in a m atter of secon ds, an d its rate is in depen den t of DNA con cen tration becau se th e com plem en tary stran ds h ave already fou n d each oth er. Th e exam ple of n u cleic acid h ybridization in Figure 2.16 will en able u s to u n derstan d som e of th e m olecu lar details an d also to see h ow h ybridization is u sed to “tag” an d iden tify a particu lar DNA fragm en t.

56

Chapter 2 DNA Structure and DNA Manipulation

Sh own in part A is a solu tion of den atu red DNA, called th e p ro be , in wh ich each m olecu le h as been labeled with eith er radioactive atom s or ligh t-em ittin g m olecu les. Probe DNA is typically obtain ed from a clon e, an d th e labeled probe u su ally con tain s den atu red form s of both stran ds presen t in th e origin al du plex m olecu le. (Th is h as led to som e con fu sin g term in ology. Gen eticists say th at probe DNA h ybridizes with DNA fragm en ts con tain in g sequ en ces th at are similar to th e probe, rath er th an complementary. Wh at actu ally occu rs is th at on e stran d of th e probe u n dergoes h ybridization with a com plem en tary sequ en ce in th e fragm en t. Bu t becau se th e probe u su ally con tain s both stran ds, h ybridization takes place with an y fragm en t th at con tain s a sim ilar sequ en ce, each stran d in th e probe u n dergoin g h ybridization with th e com plem en tary sequ en ce in th e fragm en t.) Part B in Figu re 2.16 is a diagram of gen om ic DNA fragm en ts th at h ave been im m obilized on a n itrocellu lose filter. Wh en th e probe is m ixed with th e gen om ic fragm en ts (part C), ran dom collision s brin g sh ort, com plem en tary stretch es togeth er. If th e region of com plem en tary sequ en ce is sh ort (part D), th en ran dom collision can n ot in itiate ren atu ration becau se th e flan kin g sequ en ces can n ot pair; in th is case th e probe falls off alm ost im m ediately. If, h owever, a collision brin gs sh ort sequ en ces togeth er in th e correct register (part E), th en th is in itiates ren atu ration , becau se th e pairin g proceeds zipperlike from th e in itial con tact. Th e m ain poin t is th at DNA fragm en ts are able to h ybridize on ly if th e len gth of

(A)–Fragm ents of denatured and labeled probe DNA M ix

(B)–Fragm ents of denatured genom ic DNA im m obilized on filter

Th e den atu red probe u su ally con tain s both com plem en tary stran ds.

GTA TA A TGCGA GCC CA TA T TA CGCT CGG

Renaturation

Som e fragm en ts in th e gen om ic DNA m ay con tain a sequ en ce sim ilar to th at in th e probe DNA.

Ran dom collision s brin g sm all region s of com plem en tary sequ en ces togeth er to start th e ren atu ration . (D)–Initial pairing w ith incorrect fragm ent

(C)– Heat-sealed bag

(E)–Initial pairing w ith correct fragm ent

T GCA G CCGT TA CA T G C T C A GGA T T A C A GC T T CG A C G

Base pairin g can n ot go farth er becau se flan kin g sequ en ces are n ot com plem en tary; probe falls away.

A A CA C A TAA TGCG A G CCA T T G C C C CA T A T T A C G C TCGG

Base pairin g proceeds in a zipper-like fash ion becau se flan kin g sequ en ces are com plem en tary; probe sticks.

Figure 2.16 Nu cleic acid h ybridization . (A) Du plex m olecu les of probe DNA (obtain ed from a clon e)

are den atu red an d (B) placed in con tact with a filter to wh ich is attach ed den atu red stran ds of gen om ic DNA. (C) Un der th e proper con dition s of salt con cen tration an d tem peratu re, sh ort com plem en tary stretch es com e togeth er by ran dom collision . (D) If th e sequ en ces flan kin g th e paired region are n ot com plem en tary, th en th e pairin g is u n stable an d th e stran ds com e apart again . (E) If th e sequ en ces flan kin g th e paired region are com plem en tary, th en fu rth er base pairin g stabilizes th e ren atu red du plex.

th e region in wh ich th ey can pair is su fficien tly lon g. Som e m ism atch es in th e paired region can be tolerated. How m an y m ism atch es are allowed is determ in ed by th e con dition s of th e experim en t: Th e lower th e tem peratu re at wh ich th e h ybridization is carried ou t, an d th e h igh er th e salt con cen tration , th e greater th e proportion of m ism atch es th at are tolerated.

The Southern Blot Th e ability to ren atu re DNA in th e m an n er ou tlin ed in Figu re 12.16 m ean s th at solu tion con tain in g a sm all fragm en t of den a-

tu red DNA, if it is su itably labeled (for exam ple, with radioactive 32 P), can be com bin ed with a com plex m ixtu re of den atu red DNA fragm en ts, an d u pon ren atu ration th e sm all fragm en t will “tag” with radioactivity an y m olecu les in th e com plex m ixtu re with wh ich it can h ybridize. Th e radioactive tag allows th ese m olecu les to be iden tified. Th e m eth ods of DNA cleavage, electroph oresis, tran sfer to n itrocellu lose, an d h ybridization with a probe are all com bin ed in th e So u th e rn blo t, n am ed after its in ven tor Edward Sou th ern . In th is procedu re, a gel in wh ich DNA m olecu les h ave been separated by electroph oresis is treated with

2.3 The Separation and Identification of Genomic DN A Fragments

57

(A)–DNA is cleaved; electrophoresis is used to separate DNA Gel

DNA restriction fragm en ts (so m an y th at in dividu al ban ds ru n togeth er)

Size m arkers

(B)–DNA fragm ents are blotted onto nitrocellulose filter

(C)–Filter is exposed to radioactive probe

(D)–Filter is exposed to photographic film ; film is developed

Heat-sealed bag Filter Probe in solution

Nitrocellulose filter

x-ray film

Gel

Gel

Buffer solution

Absorbent paper

Nitrocellulose filter

DNA fragm en ts tran sferred on to filter (in visible at th is stage)

Probe h ybridizes with h om ologou s DNA fragm en ts (still n ot visible)

Ban ds on x-ray film created by labeled probe

Figure 2.17 Sou th ern

blot. (A) DNA restriction fragm en ts are separated by electroph oresis, blotted from th e gel on to a n itrocellu lose or n ylon filter, an d ch em ically attach ed by th e u se of u ltraviolet ligh t. (B) Th e stran ds are den atu red an d m ixed with radioactive or ligh t-sen sitive probe DNA, wh ich bin ds with com plem en tary sequ en ces presen t on th e filter. Th e bou n d probe rem ain s, wh ereas u n bou n d probe wash es off. (C) Bou n d probe is revealed by darken in g of ph otograph ic film placed over th e filter. Th e position s of th e ban ds in dicate wh ich restriction fragm en ts con tain DNA sequ en ces h om ologou s to th ose in th e probe.

Au : Proofreader asks th at you con firm referen ce to Ch apter 13 in first para of sec. 2.4.

58

alkali to den atu re th e DNA an d ren der it sin gle-stran ded ( Figure 2.17). Th en th e DNA is tran sferred to a sh eet of n itrocellu lose filter in su ch a way th at th e relative position s of th e DNA fragm en ts are m ain tain ed. Th e tran sfer is accom plish ed by overlayin g th e n itrocellu lose on to th e gel an d stackin g m an y layers of absorben t paper on top; th e absorben t paper su cks water m olecu les from th e gel an d th rou gh th e n itrocellu lose, to wh ich th e DNA fragm en ts adh ere. (Th is step is th e “blot” com pon en t of th e Sou th ern blot, parts A an d B.) Th en th e filter is treated so th at th e sin gle-stran ded DNA becom es perm an en tly bou n d. Th e treated filter is m ixed with a solu tion con tain in g den atu red probe (DNA or RNA) u n der con dition s th at allow com plem en tary stran ds to h ybridize to form du plex m olecu les (part C). Radioactive or oth er label presen t in th e probe becom es stably bou n d to th e filter, an d th erefore resistan t to rem oval by wash in g, on ly at position s at wh ich base sequ en ces com plem en tary to th e probe are already presen t on th e filter, so th at th e probe can form du plex m olecu les. Th e label is located by placin g th e paper in con tact with x-ray film . After developm en t of th e film , blacken ed region s in dicate position s of ban ds con tain in g th e radioactive or ligh tem ittin g label (part D). Th e procedu re in Figu re 2.17 solves th e wet-h aystack problem by tran sferrin g an d im m obilizin g th e gen om ic DNA fragm en ts to a filter an d iden tifyin g, by h ybridization , Chapter 2 DNA Structure and DNA Manipulation

th e on es th at are of in terest. Practical application s of Sou th ern blottin g cen ter on iden tifyin g DNA fragm en ts th at con tain sequ en ces sim ilar to th e probe DNA or RNA, wh ere th e proportion of m ism atch ed n u cleotides allowed is determ in ed by th e con dition s of h ybridization . Th e advan tages of th e Sou th ern blot are con ven ien ce an d sen sitivity. Th e sen sitivity com es from th e fact th at both h ybridization with a labeled probe an d th e u se of ph otograph ic film am plify th e sign al; u n der typical con dition s, a ban d can be observed on th e film with on ly 5 ⫻ 10 ⫺12 gram s of DNA—a th ou san d tim es less DNA th an th e am ou n t requ ired to produ ce a visible ban d in th e gel itself.

2.4

Selective Replication of Genomic DN A Fragments

Alth ou gh n u cleic acid h ybridization allows a particu lar DNA fragm en t to be iden tified wh en presen t in a com plex m ixtu re of fragm en ts, it does n ot en able th e fragm en t to be separated from th e oth ers an d pu rified. Obtain in g th e fragm en t in pu rified form requ ires clon in g, wh ich is straigh tforward bu t tim e-con su m in g. (Clon in g m eth ods are discu ssed in Ch apter 13.) However, if th e fragm en t of in terest is n ot too lon g, an d if th e n u cleotide sequ en ce at each en d is kn own , th en it becom es possible to obtain large qu an tities of th e fragm en t m erely by

selective replication . Th is process is called am p lifi catio n . How wou ld on e kn ow th e sequ en ce of th e en ds? Let u s retu rn to ou r exam ple of th e 3.0-kb Bam HI fragm en t th at serves to m ark a risk factor for breast can cer in certain pedigrees. Su ppose th at th is fragm en t is clon ed an d sequ en ced from on e affected in dividu al, an d it is fou n d th at, relative to th e n orm al gen om ic sequ en ce in th is region , th e Bam HI fragm en t is m issin g a region of 500 base pairs. At th is poin t th e sequ en ces at th e en ds of th e fragm en t are kn own , an d we can also in fer th at am plification of gen om ic DNA from in dividu als with th e risk factor will yield a ban d of 3.0 kb, wh ereas am plification from gen om ic DNA of n on carriers will yield a ban d of 3.5 kb. Th is differen ce allows every person in th e pedigree to be diagn osed as a carrier or n on carrier m erely by m ean s of DNA am plification . To u n derstan d h ow am plification works, it is first n ecessary to exam in e a few key featu res of DNA replication .

Constraints on DN A Replication: Primers and 5'-to-3' Strand Elongation As with m ost m etabolic reaction s in livin g cells, n u cleic acids are syn th esized in ch em ical reaction s con trolled by en zym es. An en zym e th at form s th e su gar–ph osph ate bon d (th e ph osph odiester bon d) between adjacen t n u cleotides in a n u cleic acid ch ain is called a D N A p o lym e rase . A variety of DNA polym erases h ave been pu rified, an d for am plification of a DNA fragm en t, th e DNA syn th esis is carried ou t in vitro by com bin in g pu rified cellu lar com pon en ts in a test tu be u n der precisely defin ed con dition s. ( In vitro m ean s “with ou t participation of livin g cells.”) In order for DNA polym erase to catalyze syn th esis of a n ew DNA stran d, preexistin g sin gle-stran ded DNA m u st be presen t. Each sin gle-stran ded DNA m olecu le presen t in th e reaction m ix can serve as a tem plate u pon wh ich a n ew partn er stran d is created by th e DNA polym erase. For DNA replication to take place, th e 5 ' -triph osph ates of th e fou r deoxyn u cleosides m u st also be presen t. Th is requ irem en t is rath er obviou s, becau se th e n u cleoside triph osph ates are th e precu rsors from wh ich n ew DNA stran ds are created. Th e triph osph ates n eeded are th e com pou n ds den oted in Table 2.1 as

dATP, dGTP, dTTP, an d dCTP, wh ich con tain th e bases aden in e, gu an in e, th ym in e, an d cytosin e, respectively. Details of th e stru ctu res of dCTP an d dGTP are sh own in Figure 2.18, in wh ich th e ph osph ate grou ps cleaved off du rin g DNA syn th esis are in dicated. DNA syn th esis requ ires all fou r n u cleoside 5 ' -triph osph ates an d does n ot take place if an y of th em is om itted. A featu re fou n d in all DNA polym erases is th at A DNA polym erase can on ly elongate a DNA stran d. It is n ot possible for DNA polym erase to initiate syn th esis of a n ew stran d, even wh en a tem plate m olecu le is presen t.

On e im portan t im plication of th is prin ciple is th at DNA syn th esis requ ires a preexistin g segm en t of n u cleic acid th at is h ydrogen -bon ded to th e tem plate stran d. Th is segm en t is called a p rim e r. Becau se th e prim er m olecu le can be very sh ort, it is an o ligo n u cle o tid e , wh ich literally m ean s “few n u cleotides.” As we sh all see in Ch apter 6, in livin g cells th e prim er is a sh ort segm en t of RNA, bu t in DNA am plification in vitro, th e prim er em ployed is u su ally DNA. H H O– –O

C

O–

O–

H P

O

O

P

O

P

O

CH2

O

O

N

O

H

C

C

C N

N C O

H

H

H

H OH

H

Deoxycytidine 5'-triphosphate (dCTP)

Th e ou ter two ph osph ate grou ps are cleaved off wh en n u cleotides are added to th e growin g DNA stran d.

O– –O

O–

H

O–

C P O

O

P O

O

P

O

CH2

O

N C

O

H

H

H

H OH

H

Deoxyguanosine 5'-triphosphate (dGTP)

O C

N N

C

G

N

H

C N

H

H

Figure 2.18 Two deoxyn u cleoside triph osph ates u sed in DNA syn th esis. Th e ou ter two ph osph ate grou ps are rem oved du rin g syn th esis. 2.4 Selective Replication of Genomic DN A Fragments

59

44

A P–P (pyroph osph ate) grou p is released.

OH

C

O

5’ end

P

P

P

P

P

1

C

Base pairin g specifies th e n ext n u cleotide to be added at th e 3’ en d.

3

An O–P bon d is form ed to attach th e n ew n u cleotide.

3’ end

G

OH 3’

T Tem plate stran d

3’ end

P

Newly syn th esized stran d 5’ end

2 3’

O

P

Th e 3’ h ydroxyl grou p at th e 3’ en d of th e growin g stran d attacks th e in n erm ost ph osph ate grou p of th e in com in g trin u cleotide.

Figure 2.19 Addition of n u cleotides to th e 3 ' -OH term in u s of a growin g stran d. Th e recogn ition step

is sh own as th e form ation of h ydrogen bon ds between th e A an d th e T. Th e ch em ical reaction is th at th e 3 ' -OH grou p of th e 3 ' en d of th e growin g ch ain attacks th e in n erm ost ph osph ate grou p of th e in com in g trin u cleotide.

It is th e 3 ' en d of th e prim er th at is essen tial, becau se, as em ph asized in Ch apter 1, DNA syn th esis proceeds on ly by addition of su ccessive n u cleotides to th e 3 ' en d of th e growin g stran d. In oth er words, ch ain elon gation always takes place in th e 5 ' -to-3 ' direction (5 '  3 ' ).

Th e reason for th e 5 '  3 ' direction of ch ain elon gation is illu strated in Figure 2.19 . It is a con sequ en ce of th e fact th at th e reaction catalyzed by DNA polym erase is th e form ation of a ph osph odiester bon d between th e free 3 ' -OH grou p of th e ch ain bein g exten ded an d th e in n erm ost ph osph oru s atom of th e n u cleoside triph osph ate bein g in corporated at th e 3 ' en d. Recogn ition of th e appropriate in com in g n u cleoside triph osph ate in replication depen ds on base pairin g with th e opposite n u cleotide in th e tem plate stran d. DNA polym erase will u su ally catalyze th e polym erization reaction th at in corporates th e n ew n u cleotide at th e prim er term in u s on ly wh en th e correct base pair is presen t. Th e sam e DNA polym erase is u sed to add each of th e fou r deoxyn u cleoside ph osph ates to th e 3 ' -OH term in u s of th e growin g stran d. 60

Chapter 2 DNA Structure and DNA Manipulation

The Polymerase Chain Reaction Th e requ irem en t for an oligon u cleotide prim er, an d th e con strain t th at ch ain elon gation m u st always occu r in th e 5 '  3 ' direction , m ake it possible to obtain large qu an tities of a particu lar DNA sequ en ce by selective am plification in vitro. Th e m eth od for selective am plification is called th e p o lym e rase ch ain re actio n (PCR). For its in ven tion , Californ ian Kary B. Mu llis was awarded a Nobel Prize in 1993. PCR am plification u ses DNA polym erase an d a pair of sh ort, syn th etic oligon u cleotide prim ers, u su ally 18–22 n u cleotides in len gth , th at are com plem en tary in sequ en ce to th e en ds of th e DNA sequ en ce to be am plified. Figure 2.20 gives an exam ple in wh ich th e prim er oligon u cleotides (green ) are 9-m ers. Th ese are too sh ort for m ost practical pu rposes, bu t th ey will serve for illu stration . Th e origin al du plex m olecu le (part A) is sh own in blu e. Th is du plex is m ixed with a vast excess of prim er m olecu les, DNA polym erase, an d all fou r n u cleoside triph osph ates. Wh en th e tem peratu re is raised, th e stran ds of th e du plex den atu re an d becom e separated.

(A)–Original duplex DNA

DNA CTA TGCA TG GA TA CGTA C

A CA TGA CGT TGTA CTGCA

(B)–First cycle—prim ers attached

CTA TGCA TG GA TA CGTA C

A CA TGA CGT Prim er

Region to be am plified

A CA TGA CGT TGTA CTGCA

Prim er GA TA CGTA C

(C)–Elongation (DNA synthesis)

A CA TGA CGT TGTA CTGCA

CTA TGCA TG GA TA CGTA C

A CA TGA CGT TGTA CTGCA

CTA TGCA TG GA TA CGTA C

Figure 2.20 Role of prim er sequ en ces in PCR am plification . (A) Target DNA du plex (blu e), sh owin g sequ en ces ch osen as th e prim er-bin din g sites flan kin g th e region to be am plified. (B) Prim er (green ) bou n d to den atu red stran ds of target DNA. (C) First rou n d of am plification . Newly syn th esized DNA is sh own in pin k. Note th at each prim er is exten ded beyond th e oth er prim er site. (D) Secon d rou n d of am plification (on ly on e stran d sh own ); in th is rou n d, th e n ewly syn th esized stran d term in ates at th e opposite prim er site. (E) Th ird rou n d of am plification (on ly on e stran d sh own ); in th is rou n d, both stran ds are tru n cated at th e prim er sites. Prim er sequ en ces are n orm ally abou t twice as lon g as sh own h ere.

(D)–Second cycle—after elongation

A CA TGA CGT TGTA CTGCA

CTA TGCA TG GA TA CGTA C

(E)–Third cycle—after elongation

A CA TGA CGT TGTA CTGCA

Wh en th e tem peratu re is lowered again to allow ren atu ration , th e prim ers, becau se th ey are in great excess, becom e an n ealed to th e separated tem plate stran ds (part B). Note th at th e prim er sequ en ces are differen t from each oth er bu t com plem en tary to sequ en ces presen t in opposite stran ds of th e origin al DNA du plex an d flan kin g th e region to be am plified. Th e prim ers are orien ted with th eir 3 ' en ds poin tin g in th e direction of th e region to be am plified, be-

CTA TGCA TG GA TA CGTA C

cau se each DNA stran d elon gates on ly at th e 3 ' en d. After th e prim ers h ave an n ealed, each is elon gated by DNA polym erase u sin g th e origin al stran d as a tem plate, an d th e n ewly syn th esized DNA stran ds (red) grow toward each oth er as syn th esis proceeds (part C). Note th at: A region of du plex DNA presen t in th e origin al reaction m ix can be PCR-am plified on ly if th e region is flan ked by th e prim er oligon u cleotides. 2.4 Selective Replication of Genomic DN A Fragments

61

To start a secon d cycle of PCR am plification , th e tem peratu re is raised again to den atu re th e du plex DNA. Upon lowerin g of th e tem peratu re, th e origin al paren tal stran ds an n eal with th e prim ers an d are replicated as sh own in Figu re 2.20B an d C. Th e dau gh ter stran ds produ ced in th e first rou n d of am plification also an n eal with prim ers an d are replicated, as sh own in part D. In th is case, alth ou gh th e dau gh ter du plex m olecu les are iden tical in sequ en ce to th e origin al paren tal m olecu le, th ey con sist en tirely of prim er oligon u cleotides an d n on paren tal DNA th at was syn th esized in eith er th e first or th e secon d cycle of PCR. As su ccessive cycles of den atu ration , prim er an n ealin g, an d elon gation occu r, th e origin al paren tal stran ds are dilu ted ou t by th e proliferation of n ew dau gh ter stran ds u n til even tu ally, virtu ally every m olecu le produ ced in th e PCR h as th e stru ctu re sh own in part D. Th e power of PCR am plification is th at th e n u m ber of copies of th e tem plate stran d in creases in expon en tial progression : 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, an d so forth , dou blin g with each cycle of replication . Startin g with a m ixtu re con tain in g as little as on e m olecu le of th e fragm en t of in terest, repeated rou n ds of DNA replication in crease th e n u m ber of am plified m olecu les expon en tially. For exam ple, startin g with a sin gle m olecu le, 25 rou n ds of DNA replication will resu lt in 2 25 ⫽ 3.4 ⫻ 10 7 m olecu les. Th is n u m ber of m olecu les of th e am plified fragm en t is so m u ch greater th an th at of th e oth er u n am plified m olecu les in th e origin al m ixtu re th at th e am plified DNA can often be u sed with ou t fu rth er pu rification . For exam ple, a sin gle fragm en t of 3 kb in E. coli accou n ts for on ly 0.06 percen t of th e DNA in th is organ ism . However, if th is sin gle fragm en t were replicated th rou gh 25 rou n ds of replication , th en 99.995 percen t of th e resu ltin g m ixtu re wou ld con sist of th e am plified sequ en ce. A 3-kb fragm en t of h u m an DNA con stitu tes on ly 0.0001 percen t of th e total gen om e size. Am plification of a 3-kb fragm en t of h u m an DNA to 99.995 percen t pu rity wou ld requ ire abou t 34 cycles of PCR. An overview of th e polym erase ch ain reaction is sh own in Figure 2.21. Th e DNA sequ en ce to be am plified is again sh own in blu e an d th e oligon u cleotide prim ers in green . Th e oligon u cleotides an n eal to th e 62

Chapter 2 DNA Structure and DNA Manipulation

en ds of th e sequ en ce to be am plified an d becom e th e su bstrates for ch ain elon gation by DNA polym erase. In th e first cycle of PCR am plification , th e DNA is den atu red to separate th e stran ds. Th e den atu ration tem peratu re is u su ally arou n d 95°C. Th en th e tem peratu re is decreased to allow an n ealin g in th e presen ce of a vast excess of th e prim er oligon u cleotides. Th e an n ealin g tem peratu re is typically in th e ran ge of 50°C–60°C, depen din g largely on th e G ⫹ C con ten t of th e oligon u cleotide prim ers. To com plete th e cycle, th e tem peratu re is raised sligh tly, to abou t 70°C, for th e elon gation of each prim er. Th e steps of den atu ration , ren atu ration , an d replication are repeated from 20–30 tim es, an d in each cycle th e n u m ber of m olecu les of th e am plified sequ en ce is dou bled. Im plem en tation of PCR with con ven tion al DNA polym erases is n ot practical, becau se at th e h igh tem peratu re n ecessary for den atu ration , th e polym erase is itself irreversibly u n folded (den atu red) an d becom es in active. However, DNA polym erase isolated from certain organ ism s is h eat stable becau se th e organ ism s n orm ally live in h ot sprin gs at tem peratu res well above 90°C, su ch as are fou n d in Yellowston e Nation al Park. Su ch organ ism s are said to be th e rm o p h ile s. Th e m ost widely u sed h eat-stable DNA polym erase is called Taq polym erase, becau se it was origin ally isolated from th e th erm oph ilic bacteriu m Thermus aquaticus. PCR am plification is very u sefu l for gen eratin g large qu an tities of a specific DNA sequ en ce. Th e prin cipal lim itation of th e tech n iqu e is th at th e DNA sequ en ces at th e en ds of th e region to be am plified m u st be kn own so th at prim er oligon u cleotides can be syn th esized. In addition , sequ en ces lon ger th an abou t 5000 base pairs can n ot be replicated efficien tly by con ven tion al PCR procedu res, alth ou gh th ere are m odification s of PCR th at allow lon ger fragm en ts to be am plified. On th e oth er h an d, m an y application s requ ire am plification of relatively sm all fragm en ts. Th e m ajor advan tage of PCR am plification is th at it requ ires on ly trace am ou n ts of tem plate DNA. Th eoretically on ly on e tem plate m olecu le is requ ired, bu t in practice th e am plification of a sin gle m olecu le m ay fail becau se th e m olecu le m ay, by ch an ce, be broken or dam aged. Bu t am plification is u su ally reliable with as

(A) First cycle DNA sequence to be am plified

Denaturation, annealing

Prim er oligonucleotides

DNA replication

(B) Second cycle

(C) 20–30 cycles

Am plified DNA sequences

Figure 2.21 Polym erase ch ain reaction (PCR) for am plification of particu lar DNA sequ en ces. On ly th e region to be am plified is sh own . Oligon u cleotide prim ers (green ) th at are com plem en tary to th e en ds of th e target sequ en ce (blu e) are u sed in repeated rou n ds of den atu ration , an n ealin g, an d DNA replication . Newly replicated DNA is sh own in pin k. Th e n u m ber of copies of th e target sequ en ce dou bles in each rou n d of replication , even tu ally overwh elm in g an y oth er sequ en ces th at m ay be presen t.

2.4 Selective Replication of Genomic DN A Fragments

63

few as 10–100 tem plate m olecu les, wh ich m akes PCR am plification 10,000–100,000 tim es m ore sen sitive th an detection via n u cleic acid h ybridization . Th e exqu isite sen sitivity of PCR am plification h as led to its u se in DNA typin g for crim in al cases in wh ich a m in u scu le am ou n t of biological m aterial h as been left beh in d by th e perpetrator (skin cells on a cigarette bu tt or h air-root cells on a sin gle h air can yield en ou gh tem plate DNA for am plification ). In research , PCR is widely u sed in th e stu dy of in depen den t m u tation s in a gen e wh ose sequ en ce is kn own in order to iden tify th e m olecu lar basis of each m u tation , to stu dy DNA sequ en ce variation am on g altern ative form s of a gen e th at m ay be presen t in n atu ral popu lation s, or to exam in e differen ces am on g gen es with th e sam e fu n ction in differen t species. Th e PCR procedu re h as also com e in to widespread u se in clin ical laboratories for diagn osis. To take ju st on e very im portan t exam ple, th e presen ce of th e h u m an im m u n odeficien cy viru s (HIV), wh ich cau ses acqu ired im m u n e deficien cy syn drom e (AIDS), can be detected in trace qu an tities in blood ban ks via PCR by u sin g prim ers com plem en tary to sequ en ces in th e viral gen etic m aterial. Th ese an d oth er application s of PCR are facilitated by th e fact th at th e procedu re len ds itself to au tom ation by th e u se of m ech an ical robots to set u p an d ru n th e reaction s.

2.5

The Terminology of Genetic Analysis

In order to discu ss th e types of DNA m arkers th at m odern gen eticists com m on ly u se in gen etic an alysis, we m u st first in trodu ce som e key term s th at provide th e essen tial vocabu lary of gen etics. Th ese term s can be u n derstood with referen ce to Figure 2.22. In Ch apter 1 we defin ed a ge n e as an elem en t of h eredity, tran sm itted from paren ts to offsprin g in reprodu ction , th at in flu en ces on e or m ore h ereditary traits. Ch em ically, a gen e is a sequ en ce of n u cleotides alon g a DNA m olecu le. In a popu lation of organ ism s, n ot all copies of a gen e m ay h ave exactly th e sam e n u cleotide sequ en ce. For exam ple, wh ereas on e form of a gen e h as th e codon GCA, wh ich specifies

64

Chapter 2 DNA Structure and DNA Manipulation

alan in e in th e polypeptide ch ain th at th e gen e en codes, an oth er form of th e sam e gen e m ay h ave, at th e sam e position , th e codon GCG, wh ich also specifies alan in e. Hen ce th e two form s of th e gen e en code th e sam e sequ en ce of am in o acids yet differ in DNA sequ en ce. Th e altern ative form s of a gen e are called alle le s of th e gen e. Differen t alleles m ay also code for differen t am in o acid sequ en ces, som etim es with drastic effects. Recall th e exam ple of th e PAH gen e for ph en yalan in e h ydroxylase in Ch apter 1, in wh ich a ch an ge in codon 408 from CGG (argin in e) to TGG (tryptoph an ) resu lts in an in active en zym e th at becom es expressed as th e in born error of m etabolism ph en ylketon u ria. With in a cell, gen es are arran ged in lin ear order alon g m icroscopic th read-like bodies called ch ro m o so m e s, wh ich we will exam in e in detail in Ch apters 4 an d 8. Each h u m an reprodu ctive cell con tain s on e com plete set of 23 ch rom osom es con tain in g 3 ⫻ 10 9 base pairs of DNA. A typical ch rom osom e con tain s several h u n dred to several th ou san d gen es. In h u m an s th e average is approxim ately 3500 gen es per ch rom osom e. Each ch rom osom e con tain s a sin gle m olecu le of du plex DNA alon g its len gth , com plexed with protein s an d very tigh tly coiled. Th e DNA in th e average h u m an ch rom osom e, wh en fu lly exten ded, h as relative dim en sion s com parable to th ose of a wet spagh etti n oodle 25 m iles lon g; wh en th e DNA is coiled in th e form of a ch rom osom e, its ph ysical com paction is com parable to th at of th e sam e n oodle coiled an d packed in to an 18-foot can oe. Th e ph ysical position of a gen e alon g a ch rom osom e is called th e lo cu s of th e gen e. In m ost h igh er organ ism s, in clu din g h u m an bein gs, each cell oth er th an a sperm or egg con tain s two copies of each type of ch rom osom e—on e from th e m oth er an d on e in h erited from th e fath er. Each m em ber of su ch a pair of ch rom osom es is said to be h o m o lo go u s to th e oth er. (Th e ch rom osom es th at determ in e sex are an im portan t exception , discu ssed in Ch apter 4, th at we will ign ore for n ow.) At an y locu s, th erefore, each in dividu al carries two alleles, becau se on e allele is presen t at a correspon din g position in each of th e h om ologou s m atern al an d patern al ch rom osom es (Figu re 2.22).

Th e gen etic con stitu tion of an in dividu al is called its ge n o typ e . For a particu lar gen e, if th e two alleles at th e locu s in an in dividu al are in distin gu ish able from each oth er, th en for th is gen e th e gen otype of th e in dividu al is said to be h o m o zygo u s for th e allele th at is presen t. If th e two alleles at th e locu s are differen t from each oth er, th en for th is gen e th e gen otype of th e in dividu al is said to be h e te ro zygo u s for th e alleles th at are presen t. Typograph ically, gen es are in dicated in italics, an d alleles are typically distin gu ish ed by u ppercase or lowercase letters ( A versu s a), su bscripts ( A 1 versu s A 2), su perscripts ( a⫹ versu s a⫺), or som etim es ju st ⫹ an d ⫺. Usin g th ese sym bols, h om ozygou s gen es wou ld be portrayed by an y of th ese form u las: AA, aa, A 1A 1, A 2A 2, a⫹a⫹, a⫺a⫺, ⫹冒⫹, or ⫺冒⫺. As in th e last two exam ples, th e slash is som etim es u sed to separate alleles presen t in h om ologou s ch rom osom es to avoid am bigu ity. Heterozygou s gen es wou ld be portrayed by an y of th e form u las Aa, A 1A 2, a⫹a⫺, or ⫹冒⫺. In Figu re 2.22, th e gen otype Bb is h eterozygou s becau se th e B an d b alleles are distin gu ish able (wh ich is wh y th ey are assign ed differen t sym bols), wh ereas th e gen otype CC is h om ozygou s. Th ese gen otypes cou ld also be written as B冒b an d C冒C, respectively. Wh ereas th e alleles th at are presen t in an in dividu al con stitu te its gen otype, th e ph ysical or bioch em ical expression of th e gen otype is called th e p h e n o typ e . To pu t it as sim ply as possible, th e distin ction is th at th e gen otype of an in dividu al is wh at is on th e inside (th e alleles in th e DNA), wh ereas th e ph en otype is wh at is on th e outside (th e observable traits, in clu din g bioch em ical traits, beh avioral traits, an d so forth ). Th e distin ction between gen otype an d ph en otype is critically im portan t becau se th ere u su ally is n ot a on e-to-on e correspon den ce between gen es an d traits. Most com plex traits, su ch as h air color, skin color, h eigh t, weigh t, beh avior, life span , an d reprodu ctive fitn ess, are in flu en ced by m an y gen es. Most traits are also in flu en ced m ore or less stron gly by en viron m en t. Th is m ean s th at th e sam e gen otype can resu lt in differen t ph en otypes, depen din g on th e en viron m en t. Com pare, for exam ple, two people with a gen etic risk for lu n g can cer; if on e sm okes an d th e oth er does n ot, th e

On e of each pair of ch rom osom es is m atern al in origin , th e oth er patern al.

Locu s (ph ysical position ) of gen e A in each h om ologou s ch rom osom e Gene A

A1 A2 A3 A4 A5 • • •

Gene B

Gene C

B

C

b

C

Heterozygous genotype Bb

Hom ozygous genotype CC

Man y differen t A alleles can exist in an en tire popu lation of organ ism s, bu t on ly a sin gle allele can be presen t at th e locu s of th e A gen e in an y on e ch rom osom e.

Hom ologous chrom osom es

Gen otypes are som etim es written with a slash (for exam ple, B/b an d C/C) to distin gu ish th e alleles in h om ologou s ch rom osom es.

Figure 2.22 Key con cepts an d term s u sed in m odern gen etics. Note th at a sin gle gen e can h ave an y n u m ber of alleles in th e popu lation as a wh ole, bu t n o m ore th an two alleles can be presen t in an y on e in dividu al.

sm oker is m u ch m ore likely to develop th e disease. En viron m en tal effects also im ply th at th e sam e ph en otype can resu lt from m ore th an on e gen otype; sm okin g again provides an exam ple, becau se m ost sm okers wh o are n ot gen etically at risk can also develop lu n g can cer.

2.6

Types of DN A M arkers Present in Genomic DN A

Gen etic variation , in th e form of m u ltiple alleles of m an y gen es, exists in m ost n atu ral popu lation s of organ ism s. We h ave called su ch gen etic differen ces between in dividu als DNA markers; th ey are also called D N A p o lym o rp h ism s. (Th e term polymorphism literally m ean s “m u ltiple form s.”) Th e m eth ods of DNA m an ipu lation exam in ed in Section s 2.3 an d 2.4 can be u sed in a variety of com bin ation s to detect differen ces am on g in dividu als. An yon e wh o reads th e literatu re in m odern gen etics will en cou n ter a bewilderin g variety of acron ym s referrin g to differen t ways in wh ich gen etic polym orph ism s are detected. Th e differen t approach es are in u se becau se n o sin gle

2.6 Types of DN A M arkers Present in Genomic DN A

65

m eth od is ideal for all application s, each m eth od h as its own advan tages an d lim itation s, an d n ew m eth ods are con tin u ally bein g developed. In th is section we exam in e som e of th e prin cipal m eth ods for detectin g DNA polym orph ism s am on g in dividu als.

Single-N ucleotide Polymorphisms (SN Ps) A sin gle -n u cle o tid e p o lym o rp h ism , or SN P (pron ou n ced “sn ip”), is presen t at a particu lar n u cleotide site if th e DNA m olecu les in th e popu lation frequ en tly differ in th e iden tity of th e n u cleotide pair th at occu pies th e site. For exam ple, som e DNA m olecu les m ay h ave a TҀA base pair at a particu lar n u cleotide site, wh ereas oth er DNA m olecu les in th e sam e popu lation m ay h ave a CҀG base pair at th e sam e site. Th is differen ce con stitu tes a SNP. Th e SNP defin es two “alleles” for wh ich th ere cou ld be th ree gen otypes am on g in dividu als in th e popu lation : h om ozygou s with TҀA at th e correspon din g site in both h om ologou s ch rom osom es, h om ozygou s with CҀG at th e correspon din g site in both h om ologou s ch rom osom es, or h eterozygou s with TҀA in on e ch rom osom e an d CҀG in th e h om ologou s ch rom osom e. Th e word allele is in qu otation m arks above becau se th e SNP n eed n ot be in a codin g sequ en ce, or even in a gen e. In th e h u m an gen om e, an y two ran dom ly ch osen DNA m olecu les are likely

(A) GA A T T C CT TAAG

Polym orphic G A A T T C site GA A T T C CT TAAG CT TAAG

5' 3'

3' 5'

to differ at on e SNP site abou t every 1000–3000 bp in protein -codin g DNA an d at abou t on e SNP site every 500–1000 bp in n on codin g DNA. Note, in th e defin ition of a SNP, th e stipu lation th at DNA m olecu les m u st differ at th e n u cleotide site “frequ en tly.” Th is provision exclu des rare gen etic variation of th e sort fou n d in less th an 1 percen t of th e DNA m olecu les in a popu lation . Th e reason for th e exclu sion is th at gen etic varian ts th at are too rare are n ot gen erally as u sefu l in gen etic an alysis as th e m ore com m on varian ts. A catalog of SNPs is regarded as th e u ltim ate com pen diu m of DNA m arkers, becau se SNPs are th e m ost com m on form of gen etic differen ces am on g people an d becau se th ey are distribu ted approxim ately u n iform ly alon g th e ch rom osom es. By th e m iddle of 2001, som e 300,000 SNPs are expected to h ave been iden tified in h u m an popu lation s an d th eir position s in th e ch rom osom es located.

Restriction Fragment Length Polymorphisms (RFLPs) Alth ou gh m ost SNPs requ ire DNA sequ en cin g to be stu died, th ose th at h appen to be located with in a restriction site can be an alyzed with a Sou th ern blot. An exam ple of th is situ ation is sh own in Figure 2.23, wh ere th e SNP con sists of a TҀA n u cleotide pair in som e m olecu les an d a CҀG pair in oth ers. In th is exam ple, th e polym orph ic n u cleotide

(B) GA A T T C CT TAAG

Polym orphic G A A C T C site GA A T T C CT T GA G CT TAAG

5' 3'

Treatm ent of DNA w ith EcoRI

3' 5' Treatm ent of DNA w ith EcoRI No cleavage

Cleavage Result: Tw o fragm ents

Result: One larger fragm ent

Figure 2.23 A m in or differen ce in th e DNA sequ en ce of two m olecu les can be detected if th e differ-

en ce elim in ates a restriction site. (A) Th is m olecu le con tain s th ree restriction sites for EcoRI, in clu din g on e at each en d. It is cleaved in to two fragm en ts by th e en zym e. (B) Th is m olecu le h as an altered EcoRI site in th e m iddle, in wh ich 5 ' -GAATTC-3 ' becom es 5 ' -GAACTC-3 ' . Th e altered site can n ot be cleaved by EcoRI, so treatm en t of th is m olecu le with EcoRI resu lts in on e larger fragm en t. 66

Chapter 2 DNA Structure and DNA Manipulation

site is in clu ded in a cleavage site for th e restriction en zym e EcoRI (5 ' -GAATTC-3 ' ). Th e two n earest flan kin g EcoRI sites are also sh own . In th is kin d of situ ation , DNA m olecu les with TҀA at th e SNP will be cleaved at both flan kin g sites an d also at th e m iddle site, yieldin g two EcoRI restriction fragm en ts. Altern atively, DNA m olecu les with CҀG at th e SNP will be cleaved at both flan kin g sites bu t n ot at th e m iddle site (becau se th e presen ce of CҀG destroys th e EcoRI restriction site) an d so will yield on ly on e larger restriction fragm en t. A SNP th at elim in ates a restriction site is kn own as a re strictio n fragm e n t le n gth p o lym o r-

p h ism , or RFLP (pron ou n ced eith er as “riflip” or by spellin g it ou t). Becau se RFLPs ch an ge th e n u m ber an d size of DNA fragm en ts produ ced by digestion with a restriction en zym e, th ey can be detected by th e Sou th ern blottin g procedu re discu ssed in Section 2.3. An exam ple appears in Figure 2.24 . In th is case th e labeled probe DNA h ybridizes n ear th e restriction site at th e far left an d iden tifies th e position of th is restriction fragm en t in th e electroph oresis gel. Th e du plex m olecu le labeled “allele A ” h as a restriction site in th e m iddle an d, wh en cleaved an d su bjected to electroph oresis, yields a sm all ban d th at

Positions of cleavage sites Direction of current Site of hybridization w ith probe DNA “ Allele” A

Larger DNA fragm ents

5’ 3’

Sm aller DNA fragm ents

3’ 5’ Duplex DNA DNA band from allele A

“ Allele” a

5’ 3’

3’ 5’ Duplex DNA DNA band from allele a Figure 2.24 In a restric-

Duplex DNA in hom ologous chrom osom es 5’ 3’

3’ 5’

5’ 3’

3’ 5’

5’ 3’

3’ 5’

5’ 3’

3’ 5’

5’ 3’

3’ 5’

5’ 3’

3’ 5’

Hom ozygous AA DNA band from genotype AA

Hom ozygous Aa DNA bands from genotype Aa

Hom ozygous aa DNA band from genotype aa

2.6 Types of DN A M arkers Present in Genomic DN A

tion fragm en t len gth polym orph ism (RFLP), alleles m ay differ in th e presen ce or absen ce of a cleavage site in th e DNA. In th is exam ple, th e a allele lacks a restriction site th at is presen t in th e DNA of th e A allele. Th e differen ce in fragm en t len gth can be detected by Sou th ern blottin g. RFLP alleles are codom in an t, wh ich m ean s (as sh own at th e bottom ) th at DNA from th e h eterozygou s Aa gen otype yields each of th e sin gle ban ds observed in DNA from h om ozygou s AA an d aa gen otypes. 67

Origin of the H uman Genetic Linkage M ap David Botstein,1 Raymond L. White,2 M ark Skolnick,3 and Ronald W. Davis4 1980 1Massachusetts

Institute of Technology, Cam bridge, Massachusetts 2 University of Massachusetts Medical Center, Worcester, Massachusetts 3University of Utah, Salt Lake City, Utah 4 Stanford University, Stanford, California Construction of a Genetic Linkage Map in Man Using Restriction Fragment Length Polymorphisms This historic paper stimulated a major international effort to establish a genetic linkage map of the human genome based on DN A polymorphisms. Pedigree studies using these genetic markers soon led to the chromosomal localization and identification of mutant genes for hundreds of human diseases. A more ambitious goal, still only partly achieved, is to understand the genetic and environmental interactions involved in complex traits such as heart disease and cancer. The " small set of large pedigrees" called for in the excerpt was soon established by the Centre d'Etude du Polymorphisme H umain (CEPH ) in Paris, France, and made available to investiga-

tors worldwide for genetic linkage studies. Today the CEPH maintains a database on the individualsin these pedigreesthat comprises approximately 12,000 polymorphic DN A markersand 2.5million genotypes.

partially by a major locus segregating in a pedigree to be mapped. Such a procedure would not require any knowledge of the biochemical nature of the trait or of the nature of the alterations in the DN A responsible for the trait. . . . N o method of systematically mapping The most efficient procedure will be to human genes has been destudy a small set of vised, largely because of the large pedigrees which In principle, linked paucity of highly polymorhave been genotyped marker loci can phic marker loci. The advent for all known polymorallow one to of recombinant DN A techphic markers. . . . The establish, with high nology has suggested a theoresolution of genetic retically possible way to and environmental certainty, the define an arbitrarily large components of disgenotype of an number of arbitrarily polyease . . . must involve individual. morphic marker loci. . . . A unraveling the underlysubset of such polymorphisms can ing genetic predisposition, understandreadily be detected as differences in the ing the environmental contributions, length of DN A fragments after digestion and understanding the variability of exwith DN A sequence-specific restriction pression of the phenotype. In principle, endonucleases. These restriction fraglinked marker loci can allow one to ment length polymorphisms (RFLPs) establish, with high certainty, the genocan be easily assayed in individuals, fatype of an individual and, consequently, cilitating large population studies. . . . assess much more precisely the con[Genetic mapping] of many DN A tribution of modifying factors such as marker loci should allow the essecondary genes, likelihood of exprestablishment of a set of well-spaced, sion of the phenotype, and environment. highly polymorphic genetic markers covering the entire human genome [and Source: American Journal of H uman Genetics 32: 314–331 enabling] any trait caused wholly or

con tain s sequ en ces h om ologou s to th e probe DNA. Th e du plex m olecu le labeled “allele a” lacks th e m iddle restriction site an d yields a larger ban d. In th is situ ation th ere can be th ree gen otypes—AA, Aa, or aa, depen din g on wh ich alleles are presen t in th e h om ologou s ch rom osom es—an d all th ree gen otypes can be distin gu ish ed as sh own in th e Figu re 2.24. Hom ozygou s AA yields on ly a sm all fragm en t, h om ozygou s aa yields on ly a large fragm en t, an d h eterozygou s Aa yields both a sm all an d a large fragm en t. Becau se th e presen ce of both th e A an d a alleles can be detected in h eterozygou s Aa gen otypes, A an d a are said to be

68

Chapter 2 DNA Structure and DNA Manipulation

co d o m in an t. In Figu re 2.24, th e ban ds from AA an d aa h ave been sh own as som ewh at th icker th an th ose from Aa, becau se each AA gen otype h as two copies of th e A allele an d each aa gen otype h as two copies of th e a allele, com pared with on ly on e copy of each allele in th e h eterozygou s gen otype Aa.

Random Amplified Polymorphic DN A (RAPD) For stu dyin g DNA m arkers, on e lim itation of Sou th ern blottin g is th at it requ ires m aterial (probes available in th e form of clon ed

DNA) an d on e lim itation of PCR is th at it requ ires sequ en ce in form ation (so prim er oligon u cleotides can be syn th esized). Th ese are n ot severe h an dicaps for organ ism s th at are well stu died (for exam ple, h u m an bein gs, dom esticated an im als an d cu ltivated plan ts, an d m odel gen etic organ ism s su ch as yeast, fru it fly, n em atode, or m ou se), becau se research m aterials an d sequ en ce in form ation are readily available. Bu t for th e vast m ajority of organ ism s th at biologists stu dy, th ere are n eith er research m aterials n or sequ en ce in form ation . Gen etic an alysis can still be carried ou t in th ese organ ism s by u sin g an approach called ran d o m am p lifi e d p o lym o rp h ic D N A or RA PD (pron ou n ced “rapid”), described in th is section . RAPD an alysis m akes u se of a set of PCR prim ers of 8–10 n u cleotides wh ose sequ en ce is essen tially random . Th e ran dom prim ers are tried in dividu ally or in pairs in PCR reaction s to am plify fragm en ts of gen om ic DNA from th e organ ism of in terest. Becau se th e prim ers are so sh ort, th ey

often an n eal to gen om ic DNA at m u ltiple sites. Som e prim ers an n eal in th e proper orien tation an d at a su itable distan ce from each oth er to su pport am plification of th e u n kn own sequ en ce between th em . Am on g th e set of am plified fragm en ts are on es th at can be am plified from som e gen om ic DNA sam ples bu t n ot from oth ers, wh ich m ean s th at th e presen ce or absen ce of th e am plified fragm en t is polym orph ic in th e popu lation of organ ism s. In m ost organ ism s it is u su ally straigh tforward to iden tify a large n u m ber of RAPDs th at can serve as gen etic m arkers for m an y differen t kin ds of gen etic stu dies. An exam ple of RAPD gel an alysis is illu strated in Figure 2.25, wh ere th ree pairs of prim ers (sets 1–3) are u sed to am plify gen om ic DNA from fou r in dividu als in a popu lation . Th e fragm en ts th at am plify are th en separated on an electroph oresis gel an d visu alized after strain in g with eth idiu m brom ide. Man y am plified ban ds are typically observed for each prim er set, bu t on ly som e of th ese are

RAPD prim er sets Set 1 Individuals

A

B

Set 2 C

D

A

B

Set 3 C

D

A

B

C

D

RAPD bands on electrophoresis gel

M onom orphic bands

Polym orphic bands

Fragm en ts th at PCR-am plify with gen om ic DNA from som e in dividu als bu t n ot oth ers, u sin g th e sam e prim er set

2.6 Types of DN A M arkers Present in Genomic DN A

Figure 2.25 Ran dom am plified polym orph ic DNA (RAPD) is detected th rou gh th e u se of relatively sh ort prim er sequ en ces th at, by ch an ce, m atch gen om ic DNA at m u ltiple sites th at are close en ou gh togeth er to su pport PCR am plification . Gen om ic DNA from a sin gle in dividu al typically yields m an y ban ds, on ly som e of wh ich are polym orph ic in th e popu lation . Differen t sets of prim ers am plify differen t fragm en ts of gen om ic DNA.

69

polym orph ic. Th ese are in dicated in Figu re 2.25 by th e colored dots. Th e am plified ban ds th at are n ot polym orph ic are said to be m o n o m o rp h ic in th e sam ple, wh ich m ean s th at th ey are th e sam e from on e in dividu al to th e n ext. Th is exam ple sh ows 17 RAPD polym orph ism s. Figure 2.26 sh ows an actu al RAPD gel am plified from gen om ic DNA obtain ed from sm all tissu e sam ples from a popu lation of fish ( Campostoma anomalum ) in th e Great Miam i River Basin , Oh io. Th e fish were collected as part of a water qu ality assessm en t program to determ in e wh eth er fish popu lation s in stressfu l water en viron m en ts progressively lose th eir gen etic variation (th at is, becom e in creasin gly m on om orph ic). Each pair of sam ples is flan ked by a lan e con tain in g DNA size stan dards produ cin g a “ladder” of fragm en ts at 100-bp in crem en ts.

900 bp

400 bp

Figure 2.26 RAPD polym orph ism s in th e

Au : Are 400 an d 900 base pairs correct? Or is it 100 an d 400 base pairs?

70

ston eroller fish ( Campostoma anomalum ) trapped in tribu taries of th e Great Miam i River in Oh io. Each pair of sam ples is flan ked by a lan e con tain in g DNA size stan dards; in th ese lan es, th e sm allest DNA fragm en t is 100 base pairs (bp), an d each su ccessively larger fragm en t in creases in size by 100 bp. Fragm en ts wh ose sizes are m u ltiples of 500 bp are presen t in greater con cen tration an d so yield darker ban ds. [Cou rtesy of Mich ael Sim on ich , Man ju Garg, an d An a Braam (Path ology Associates In tern ation al, Cin cin n ati, Oh io).]

Chapter 2 DNA Structure and DNA Manipulation

Figu res 2.24 th rou gh 2.26 illu strate an im portan t poin t: In m odern gen etics, th e ph en otypes th at are stu died are very often ban ds in a gel rath er th an ph ysical or ph ysiological ch aracteristics.

Figu re 2.25 offers a good exam ple. Each position at wh ich a ban d is observed in on e or m ore sam ples is a ph en otype, wh eth er or n ot th e ban d is polym orph ic. For exam ple, prim er set 1 yields a total of 19 ban ds, of wh ich 5 are polym orph ic an d 14 are m on om orph ic in th e sam ple. Th e ph en otypes cou ld be n am ed in an y con ven ien t way, su ch as by in dicatin g th e prim er set an d th e fragm en t len gth . For exam ple, su ppose th at th e sm allest am plified fragm en t for prim er set 1 is 125 bp, wh ich is th e polym orph ic fragm en t at th e bottom left in Figu re 2.25. We cou ld n am e th is fragm en t u n am bigu ou sly as 1-125 becau se it is a fragm en t of 125 bp am plified by prim er set 1. To u n derstan d wh y a DNA ban d is a phenotype, rath er th an a gen e or a gen otype, it is u sefu l to assign differen t n am es to th e “alleles” th at do or do n ot su pport am plification . (Th e word allele is in qu otation m arks again , becau se th e 1-125 fragm en t th at is am plified n eed n ot be part of a gen e.) We are talkin g on ly abou t th e 1-125 fragm en t, so we cou ld call th e allele capable of su pportin g am plification th e plu s (⫹) allele an d th e allele n ot capable of su pportin g am plification th e m in u s (⫺) allele. Th en th ere are th ree possible gen otypes with regard to th e am plified fragm en t: ⫹冒⫹, ⫹冒⫺, an d ⫺冒⫺. Usin g gen om ic DNA from th ese gen otypes, th e h om ozygou s ⫹冒⫹ an d h eterozygou s ⫹冒⫺ will both su pport am plification of th e 1-125 fragm en t, wh ereas th e ⫺冒⫺ gen otype will n ot su pport am plification . Hen ce th e presen ce of th e 1-125 fragm en t is th e phenotype observed in both ⫹冒⫹ an d ⫹冒⫺ gen otypes. In oth er words, with regard to am plification , th e ⫹ allele is d o m in an t to th e ⫺ allele, becau se th e ph en otype (presen ce of th e 1-125 ban d) is presen t in both h om ozygou s ⫹冒⫹ an d h eterozygou s ⫹冒⫺ gen otypes. Th erefore, on th e basis of th e ph en otype for th e 1-125 ban d in Figu re 2.25, we cou ld say th at in dividu als A an d D cou ld h ave eith er a ⫹冒⫹ or ⫹冒⫺ gen otype bu t th at in dividu als B an d C m u st h ave gen otype ⫺冒⫺.

Amplified Fragment Length Polymorphisms (AFLPs) Becau se RAPD prim ers are sm all an d m ay n ot m atch th e tem plate DNA perfectly, th e am plified DNA ban ds often differ a great deal in h ow dark or ligh t th ey appear. Th is variation creates a poten tial problem , becau se som e exception ally dark ban ds m ay actu ally resu lt from two am plified DNA fragm en ts of th e sam e size, an d som e exception ally ligh t ban ds m ay be difficu lt to visu alize con sisten tly. To obtain am plified fragm en ts th at yield m ore u n iform ban d in ten sities, dou ble-stran ded oligon u cleotide sequ en ces th at m atch th e prim er sequ en ces perfectly can be attach ed to gen om ic restriction fragm en ts en zym atically prior to am plification . Th is m eth od, wh ich is ou tlin ed in Figure 2.27, yields a class of DNA (A)

polym orph ism s kn own as am p lifi e d fragm e n t le n gth p o lym o rp h ism s, or A FLPs (u su ally pron ou n ced by spellin g it ou t). Th e first step (part A) is to digest gen om ic DNA with a restriction en zym e; th is exam ple u ses th e en zym e EcoRI, wh ose restriction site is 5 ' -GAATTC-3 ' . Digestion yields a large n u m ber of restriction fragm en ts flan ked by wh at rem ain s of an EcoRI site on each side. In th e n ext step (part B), dou blestran ded oligon u cleotides called p rim e r ad ap te rs, with sin gle-stran ded overh an gs com plem en tary to th ose on th e restriction fragm en ts, are ligated on to th e restriction fragm en ts u sin g th e en zym e DNA ligase. Th e resu ltin g fragm en ts (C) are ready for am plification by m ean s of PCR. Note th at th e sam e adapter is ligated on to each en d, so a sin gle prim er sequ en ce will an n eal to both en ds an d su pport am plification . Eco RI site

Eco RI site 5’ – GA A TTC 3’ – CTTA A G

GA A TTC– 3’ CTTA A G– 5’

Cleavage (B) 5’–NN… NN–3’ 3’–NN… NNTTAA–5’

5’–AATTC 3’–G

G–3’ CTTAA–5’ Restriction fragm ent

Primer adapter

5’–AATTNN… NN–3’ 3’–NN… NN–5’ Primer adapter

Adapter ligation

(C) 5’–NN… NNAATTC 3’–NN… NNTTAAG

GAATTNN… NN–3’ CTTAANN… NN–5’

Am plification (D)

Fraction of fragm ents am plified

Primer sequence 5’–NN… NNAATTC–3’ 5’–NN… NNAATTCA–3’ 5’–NN… NNAATTCAC–3’ 5’–NN… NNAATTCACT–3’

Nu cleotide exten sion s redu ce th e n u m ber of am plification s.

3’–CTTAANN… NN–5’

All

3’–ACTTAANN… NN–5’

1/16

3’–CACTTAANN… NN–5’

1/256

3’–TCACTTAANN… NN–5’

1/4096

Figure 2.27 An am plified fragm en t len gth polym orph ism (AFLP). (A an d B) Gen om ic DNA is

digested with on e or m ore restriction en zym es (in th is case, EcoRI). (C) Oligon u cleotide adaptors are ligated on to th e fragm en ts; n ote th at th e sin gle-stran ded overh an g of th e adaptors m atch es th ose of th e gen om ic DNA fragm en ts. (D) Th e resu ltin g fragm en ts are su bjected to PCR u sin g prim ers com plem en tary to th e adaptors. Th e n u m ber of am plified fragm en ts can be adju sted by m an ipu latin g th e n u m ber of n u cleotides in th e adaptors th at are also presen t in th e prim ers.

2.6 Types of DN A M arkers Present in Genomic DN A

71

09131-01-1747P

Photocaptionph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption

Th ere are n everth eless a n u m ber of ch oices con cern in g th e prim er sequ en ce. A prim er th at m atch es th e adapters perfectly will am plify all fragm en ts, bu t th is often resu lts in so m an y am plified fragm en ts th at th ey are n ot well separated in th e gel. A PCR prim er m u st m atch perfectly at its 3 ' en d to be elon gated. Th u s, addition al n u cleotides added to th e 3 ' en d redu ce th e n u m ber of am plified fragm en ts, becau se th ese prim ers will am plify on ly th ose fragm en ts th at, by ch an ce, h ave a com plem en tary n u cleotide im m ediately adjacen t to th e EcoRI site. For exam ple, th e prim er sequ en ce in th e secon d row h as a sin glen u cleotide 3 ' exten sion ; becau se on ly 1 冒4 ⫻ 1 冒4 of th e fragm en ts wou ld be expected to h ave a com plem en tary T im m ediately adjacen t to th e EcoRI site on both sides, th is prim er is expected to am plify 1 冒16 of all th e restriction fragm en ts. Sim ilarly, th e prim er sequ en ce in th e th ird row h as a two-n u cleotide 3 ' exten sion , so th is prim er is expected to am plify 1 冒16 ⫻ 1 冒16 ⫽ 1 冒256 of all th e fragm en ts. On e application of AFLP an alysis is to organ ism s with large gen om es, su ch as grassh oppers an d crickets, for wh ich RAPD an alysis wou ld yield an excessive n u m ber of am plified ban ds. How large is a “large” gen om e? Com pared to th e h u m an gen om e, th at of th e brown m ou n tain grassh opper Podisma pedestris is 7 tim es larger, an d th ose of North Am erican salam an ders in th e gen u s Amphiuma are 70 tim es larger! (Gen om e size is discu ssed in Ch apter 8.) 72

Chapter 2 DNA Structure and DNA Manipulation

Simple Tandem Repeat Polymorphisms (STRPs) On e m ore type of DNA polym orph ism warran ts con sideration becau se it is u sefu l in DNA typin g for in dividu al iden tification an d for assessin g th e degree of gen etic relatedn ess between in dividu als. Th is type of polym orph ism is called a sim p le tan d e m re p e at p o lym o rp h ism (STRP) becau se th e gen etic differen ces am on g DNA m olecu les con sist of th e n u m ber of copies of a sh ort DNA sequ en ce th at m ay be repeated m an y tim es in tan dem at a particu lar locu s in th e gen om e. STRPs th at are presen t at differen t loci m ay differ in th e sequ en ce an d len gth of th e repeatin g u n it, as well as in th e m in im u m an d m axim u m n u m ber of tan dem copies th at occu r in DNA m olecu les in th e popu lation . A STRP with a repeatin g u n it of 2–9 bp is often called a m icro sate llite or a sim p le se qu e n ce le n gth p o lym o rp h ism (SSLP), wh ereas a STRP with a repeatin g u n it of 10–60 bp is often called a m in isate llite or a variable n u m be r o f tan d e m re p e ats (VN TR). Figure 2.28 sh ows an exam ple of a STRP with a copy n u m ber ran gin g from 1 th rou gh 10. Becau se th e n u m ber of copies determ in es th e size of an y restriction fragm en t th at in clu des th e STRP, each DNA m olecu le yields a sin gle-size restriction fragm en t depen din g on th e n u m ber of copies it con tain s. Th e STRP in Figu re 2.28 h as 10 differen t “alleles” (again , we u se qu otation m arks becau se th e STRP m ay n ot Au : Proofreader asks th at you con firm referen ce to Ch apter 8 in 1st para, 1st colu m n .

Direction of current Positions of cleavage sites

Larger DNA fragm ents Duplex DNA m olecules

1 5’ 3’

3’ 5’ 1

Position of band in DNA gel

2

5’ 3’

3’ 5’ 1

2

Sm aller DNA fragm ents

3 3’ 5’

5’ 3’ 1

2

3

4

5’ 3’

3’ 5’ 1

2

3

4

5

5’ 3’

3’ 5’ 1

2

3

4

5

6

5’ 3’

3’ 5’ 1

2

3

4

5

6

7

5’ 3’

3’ 5’ 1

2

3

4

5

6

7

8

5’ 3’

3’ 5’ 1

2

3

4

5

6

7

8

9

5’ 3’

3’ 5’ 1

2

3

4

5

6

7

8

9

5’ 3’

10 3’ 5’

Tandem repeats of a DNA sequence Figure 2.28 In a sim ple tan dem repeat polym orph ism (STRP), th e alleles in a popu lation differ in th e

n u m ber of copies of a sh ort sequ en ce (typically 2–60 bp) th at is repeated in tan dem alon g th e DNA m olecu le. Th is exam ple sh ows alleles in wh ich th e repeat n u m ber varies from 1 to 10. Cleavage at restriction sites flan kin g th e STRP yields a u n iqu e fragm en t len gth for each allele. Th e alleles can also be distin gu ish ed by th e size of th e fragm en t am plified by PCR u sin g prim ers th at flan k th e STRP.

be in a gen e), wh ich cou ld be distin gu ish ed eith er by Sou th ern blottin g u sin g a probe to a u n iqu e (n on repeatin g) sequ en ce with in th e restriction fragm en t or by PCR am plification u sin g prim ers to a u n iqu e sequ en ce on eith er side of th e tan dem repeats. In th is situ ation th e locu s is said to h ave m u ltip le alle le s in th e popu lation . Even with m u ltiple alleles, h owever, an y on e ch rom osom e can carry on ly on e of th e alleles, an d an y

in dividu al gen otype can carry at m ost two differen t alleles. Neverth eless, a large n u m ber of alleles m ean s an even larger n u m ber of gen otypes, wh ich is th e featu re th at gives STRPs th eir u tility in in dividu al iden tification . For exam ple, even with on ly 10 alleles in a popu lation of organ ism s, th ere cou ld be 10 differen t h om ozygou s gen otypes an d 45 differen t h eterozygou s gen otypes. 2.6 Types of DN A M arkers Present in Genomic DN A

73

More gen erally, with n alleles th ere are n h om ozygou s gen otypes an d n ( n ⫺ 1) 冒2 h eterozygou s gen otypes, or n ( n ⫹ 1) 冒2 differen t gen otypes altogeth er. With STRPs, n ot on ly are th ere a relatively large n u m ber of alleles, bu t n o on e allele is exception ally com m on , so each of th e m an y gen otypes in th e popu lation h as a relatively low frequ en cy. If th e gen otypes at 6–8 STRP loci are con sidered sim u ltan eou sly, th en each possible m u ltiple-locu s gen otype is exceedin gly rare. Becau se of th eir h igh degree of variation am on g people, STRPs are widely u sed in D N A typ in g (som etim es called D N A fi n ge rp rin tin g) to establish in dividu al iden tity for u se in crim in al in vestigation s, paren tage determ in ation s, an d so forth (Ch apter 17).

2.7

Applications of DN A M arkers

Wh y are gen eticists in terested in DNA m arkers (DNA polym orph ism s)? Th eir in terest can be ju stified on an y n u m ber of grou n ds. In th is section we con sider th e reason s m ost often cited.

Genetic M arkers, Genetic M apping, and “Disease Genes” Perh aps th e key goal in stu dyin g DNA polym orph ism s in h u m an gen etics is to iden tify th e ch rom osom al location of m u tan t gen es associated with h ereditary diseases. In th e con text of disorders cau sed by th e in teraction of m u ltiple gen etic an d en viron m en tal factors, su ch as h eart disease, can cer, diabetes, depression , an d so forth , it is im portan t to th in k of a h arm fu l allele as a risk facto r for th e disease, wh ich in creases th e probability of occu rren ce of th e disease, rath er th an as a sole cau sative agen t. Th is n eeds to be em ph asized, especially becau se gen etic risk factors are often called d ise ase ge n e s. For exam ple, th e m ajor “disease gen e” for breast can cer in wom en is th e gen e BRCA1. For wom en wh o carry a m u tan t allele of BRCA1, th e lifetim e risk of breast can cer is abou t 36 percen t, an d h en ce m ost wom en with th is gen etic risk factor do not develop breast can cer. On th e

74

Chapter 2 DNA Structure and DNA Manipulation

oth er h an d, am on g wom en wh o are n ot carriers, th e lifetim e risk of breast can cer is abou t 12 percen t, an d h en ce m an y wom en with ou t th e gen etic risk factor do develop breast can cer. In deed, BRCA1 m u tation s are fou n d in on ly 16 percen t of affected wom en wh o h ave a fam ily h istory of breast can cer. Th e im portan ce of a gen etic risk factor can be expressed qu an titatively as th e re lative risk , wh ich equ als th e risk of th e disease in person s wh o carry th e risk factor as com pared to th e risk in person s wh o do n ot. Th e relative risk for BRCA1 equ als 3.0 (calcu lated as 36 percen t 冒12 percen t). Th e u tility of DNA polym orph ism s in locatin g an d iden tifyin g disease gen es resu lts from ge n e tic lin k age , th e ten den cy for gen es th at are su fficien tly close togeth er in a ch rom osom e to be in h erited togeth er. Gen etic lin kage will be discu ssed in detail in Ch apter 5, bu t th e key con cepts are su m m arized in Figure 2.29 , wh ich sh ows th e location of m an y DNA polym orph ism s alon g a ch rom osom e th at also carries a gen etic risk factor den oted D (for disease gen e). Each DNA polym orph ism serves as a gen etic m arker for its own location in th e ch rom osom e. Th e im portan ce of gen etic lin kage is th at DNA m arkers th at are su fficien tly close to th e disease gen e will ten d to be in h erited togeth er with th e disease gen e in pedigrees—an d th e closer th e m arkers, th e stron ger th is association . Hen ce, th e in itial approach to th e iden tification of a disease gen e is to fin d DNA m arkers th at are gen etically lin ked with th e disease gen e in order to iden tify its ch rom osom al location , a procedu re kn own as ge n e tic m ap p in g. On ce th e ch rom osom al position is kn own , oth er m eth ods can be u sed to pin poin t th e disease gen e itself an d to stu dy its fu n ction s. If gen etic lin kage seem s a rou n dabou t way to iden tify disease gen es, con sider th e altern ative. Th e h u m an gen om e con tain s approxim ately 80,000 gen es. If gen etic lin kage did n ot exist, th en we wou ld h ave to exam in e 80,000 DNA polym orph ism s, on e in each gen e, in order to iden tify a disease gen e. Bu t th e h u m an gen om e h as on ly 23 pairs of ch rom osom es, an d becau se of gen etic lin kage an d th e power of gen etic m appin g, it actu ally requ ires on ly a few h u n dred DNA polym orph ism s to iden tify th e ch rom osom e an d approxim ate location of a gen etic risk factor.

Locus of a “ disease gene” (a genetic risk factor), D

DNA m arkers th at are too far from th e disease gen e in th e ch rom osom e (or are in a differen t ch rom osom e) are n ot lin ked to th e disease gen e. Th ey do n ot ten d to be in h erited with th e disease gen e in pedigrees.

DNA m arkers th at are close en ou gh to a disease gen e ten d to be in h erited togeth er (gen etically lin ked) with th e disease gen e. D

DNA polym orphism s (genetic m arkers) along the chrom osom es

Th e closer a m arker is to th e disease gen e, th e closer th e lin kage an d th e m ore likely it is th at th ey will be in h erited togeth er.

Figure 2.29 Con cepts in gen etic localization of gen etic risk factors for disease. Polym orph ic DNA

m arkers (in dicated by th e vertical lin es) th at are close to a gen etic risk factor ( D) in th e ch rom osom e ten d to be in h erited togeth er with th e disease itself. Th e gen om ic location of th e risk factor is determ in ed by exam in in g th e kn own gen om ic location s of th e DNA polym orph ism s th at are lin ked with it.

Other Uses for DN A M arkers DNA polym orph ism s are widely u sed in all aspects of m odern gen etics becau se th ey provide a large n u m ber of easily accessed gen etic m arkers for gen etic m appin g an d oth er pu rposes. Am on g th e oth er u ses of DNA polym orph ism s are th e followin g. In d ivid u al id e n tifi catio n . We h ave already m en tion ed th at DNA polym orph ism s h ave application as a m ean s of DNA typin g (DNA fin gerprin tin g) to iden tify differen t in dividu als in a popu lation . DNA typin g in oth er organ ism s is u sed to determ in e in dividu al an im als in en dan gered species an d to iden tify th e degree of gen etic relatedn ess am on g in dividu al organ ism s th at live in packs or h erds. For exam ple, DNA typin g in wild h orses h as sh own th at th e wild stallion in ch arge of a h arem of m ares actu ally sires fewer th an on e-th ird of th e foals. Ep id e m io lo gy an d fo o d safe ty scie n ce . DNA typin g also h as im portan t application s in trackin g th e spread of viral an d bacterial epidem ic diseases, as well as in iden tifyin g th e sou rce of con tam in ation in con tam in ated foods. Hu m an p o p u latio n h isto ry. DNA polym orph ism s are widely u sed in an th ropology to recon stru ct th e evolu tion ary origin , global expan sion , an d diversification of th e h u m an popu lation .

Im p ro ve m e n t o f d o m e sticate d p lan ts an d an im als. Plan t an d an im al breeders h ave tu rn ed to DNA polym orph ism s as gen etic m arkers in pedigree stu dies to iden tify, by gen etic m appin g, gen es th at are associated with favorable traits in order to in corporate th ese gen es in to cu rren tly u sed varieties of plan ts an d breeds of an im als. Histo ry o f d o m e sticatio n . Plan t an d an im al breeders also stu dy gen etic polym orph ism s to iden tify th e wild an cestors of cu ltivated plan ts an d dom esticated an im als, as well as to in fer th e practices of artificial selection th at led to gen etic ch an ges in th ese species du rin g dom estication . D N A p o lym o rp h ism s as e co lo gical in d icato rs. DNA polym orph ism s are bein g evalu ated as biological in dicators of gen etic diversity in key in dicator species presen t in biological com m u n ities exposed to ch em ical, biological, or ph ysical stress. Th ey are also u sed to m on itor gen etic diversity in en dan gered species an d species bred in captivity. Evo lu tio n ary ge n e tics. DNA polym orph ism s are stu died in an effort to describe th e pattern s in wh ich differen t types of gen etic variation occu r th rou gh ou t th e gen om e, to in fer th e evolu tion ary m ech an ism s by wh ich gen etic variation is m ain tain ed, an d to illu m in ate th e processes by wh ich gen etic polym orph ism s with in

2.7 Applications of DN A M arkers

75

species becom e tran sform ed in to gen etic differen ces between species.

popu lation h istory, pattern s of m igration , an d so forth .

Po p u latio n stu d ie s. Popu lation ecologists em ploy DNA polym orph ism s to assess th e level of gen etic variation in diverse popu lation s of organ ism s th at differ in gen etic organ ization (prokaryotes, eu karyotes, organ elles), popu lation size, breedin g stru ctu re, or life-h istory ch aracters, an d th ey u se gen etic polym orph ism s with in su bpopu lation s of a species as in dicators of

Evo lu tio n ary re latio n sh ip s am o n g sp e cie s. Differen ces in h om ologou s DNA sequ en ces between species is th e basis of m olecu lar system atics, in wh ich th e sequ en ces are an alyzed to determ in e th e an cestral h istory (ph ylogen y) of th e species an d to trace th e origin of m orph ological, beh avioral, an d oth er types of adaptation s th at h ave arisen in th e cou rse of evolu tion .

Chapter Summary Th e sequ en ce of bases in th e h u m an gen om e is 99.9% iden tical from on e person to th e n ext. Th e rem ain in g 0.1% —com prisin g 3 m illion base pairs—differs am on g in dividu als. In clu ded in th ese differen ces are m an y m u tation s th at cau se or in crease th e risk of disease, bu t th e m ajority of th e differen ces are h arm less in th em selves. An y of th ese differen ces between gen om es can be u sed as a gen etic m arker. Gen etic m arkers are widely em ployed in gen etics to serve as position al lan dm arks alon g a ch rom osom e or to iden tify particu lar clon ed DNA fragm en ts. Th e m an ipu lation of DNA m olecu les to iden tify gen etic m arkers is th e basic experim en tal operation in m odern gen etics. A DNA stran d is a polym er of deoxyribon u cleotides, each com posed of a n itrogen ou s base, a deoxyribose su gar, an d a ph osph ate. Su gars an d ph osph ates altern ate in form in g a polyn u cleotide ch ain with on e term in al 3 ' OH grou p an d on e term in al 5 ' -P grou p. In dou blestran ded (du plex) DNA, th e two stran ds are paired an d an tiparallel. Each en d of th e dou ble h elix carries a term in al 3 ' -OH grou p in on e stran d an d a term in al 5 ' -P grou p in th e oth er stran d. Th e fou r bases fou n d in DNA are th e pu rin es, aden in e (A) an d gu an in e (G), an d th e pyrim idin es, cytosin e (C) an d th ym in e (T). Equ al n u m bers of pu rin es an d pyrim idin es are fou n d in dou ble-stran ded DNA (Ch argaff’s ru les), becau se th e bases are paired as AҀT pairs an d GҀC pairs. Th e h ydrogen -bon ded base pairs, alon g with h ydroph obic base stackin g of th e n u cleotide pairs in th e core of th e dou ble h elix, h old th e two polyn u cleotide stran ds togeth er in a dou ble h elix. Du plex DNA can be cleaved in to fragm en ts of defin ed len gth by restriction en zym es, each of wh ich cleaves DNA at a specific recogn ition sequ en ce (restriction site) u su ally fou r or six n u cleotide pairs in len gth . Th ese fragm en ts can be separated by electroph oresis. Th e position s of particu lar restriction fragm en ts in a gel can be visu alized by m ean s of n u cleic acid h ybridization , in wh ich stran ds of du plex DNA th at h ave been separated (den atu red) by h eatin g are m ixed an d com e togeth er (ren atu re) with stran ds h avin g com plem en tary n u cleotide sequ en ces. In a Sou th ern blot, den atu red an d labeled probe DNA is m ixed with den atu red DNA m ade u p of restriction fragm en ts th at h ave been tran sferred to a filter m em bran e after electroph oresis. Th e probe DNA an n eals an d form s 76

stable du plexes with wh atever fragm en ts con tain su fficien tly com plem en tary base sequ en ces, an d th e position s of th ese du plexes can be determ in ed by exposin g th e filter to x-ray film on wh ich radioactive em ission (or, in som e procedu res, ligh t em ission ) produ ces an im age of th e ban d. Particu lar DNA sequ en ces can also be am plified with ou t clon in g by m ean s of th e polym erase ch ain reaction (PCR), in wh ich sh ort, syn th etic oligon u cleotides are u sed as prim ers to replicate repeatedly an d am plify th e sequ en ce between th em . Th e prim ers m u st flan k, an d h ave th eir 3 ' en ds orien ted toward, th e region to be am plified, becau se DNA polym erase can elon gate th e prim ers on ly by th e addition of su ccessive n u cleotides to th e 3 ' en d of th e growin g ch ain . Each rou n d of PCR am plification resu lts in a dou blin g of th e n u m ber of am plified fragm en ts. Most gen es are presen t in pairs in th e n on reprodu ctive cells of m ost an im als an d h igh er plan ts. On e m em ber of each gen e pair is in th e ch rom osom e in h erited from th e m atern al paren t, an d th e oth er m em ber of th e gen e pair is at a correspon din g location (locu s) in th e h om ologou s ch rom osom e in h erited from th e patern al paren t. A gen e can h ave differen t form s th at correspon d to differen ces in DNA sequ en ce. Th e differen t form s of a gen e are called alleles. Th e particu lar com bin ation of alleles presen t in an organ ism con stitu tes its gen otype. Th e observable ch aracteristics of th e organ ism con stitu te its ph en otype. In an organ ism , if th e two alleles of a gen e pair are th e sam e (for exam ple, AA or aa), th en th e gen otype is h om ozygou s for th e A or a allele; if th e alleles are differen t ( Aa), th en th e gen otype is h eterozygou s. Even th ou gh each gen otype can in clu de at m ost two alleles, m u ltiple alleles are often en cou n tered am on g th e in dividu als in n atu ral popu lation s. DNA polym orph ism s (DNA m arkers) are com m on in n atu ral popu lation s of m ost organ ism s. Am on g th e m ost widely u sed DNA polym orph ism s are sin gle-n u cleotide polym orph ism s (SNPs), restriction fragm en t len gth polym orph ism s (detected by Sou th ern blots), an d su ch PCRbased polym orph ism s as ran dom am plified polym orph ic DNA (RAPD), am plified fragm en t len gth polym orph ism s (AFLPs), an d sim ple tan dem repeat polym orph ism s (STRPs). DNA polym orph ism s are u sed in gen etic m ap-

Chapter 2 DNA Structure and DNA Manipulation

pin g stu dies to iden tify DNA m arkers th at are gen etically lin ked to disease gen es (gen etic risk factors) in th e ch rom osom e in order to pin poin t th eir location . Th ey are also u sed in DNA typin g for iden tifyin g in dividu als, trackin g th e cou rse of viru s an d bacterial epidem ics, stu dyin g h u -

m an popu lation h istory, an d im provin g cu ltivated plan ts an d dom esticated an im als, as well as for th e gen etic m on itorin g of en dan gered species an d for m an y oth er pu rposes.

Key Terms allele amplification amplified fragment length polymorphism (AFLP) antiparallel band base composition base pairing base stacking B form of DN A blunt end Chargaff’s rules chromosome codominant denaturation denatured DN A disease gene DN A cloning DN A fingerprinting DN A marker DN A polymerase DN A polymorphism DN A typing dominant 5' -P (phosphate) group gel electrophoresis gene genetic linkage genetic mapping genetic marker

genomic DN A genotype heterozygous homologous chromosomes homozygous hydrogen bond hydrophobic interaction kilobase (kb) locus major groove microsatellite minisatellite minor groove monomorphic multiple alleles nucleic acid hybridization nucleoside nucleotide oligonucleotide palindrome percent G ⫹ C phenotype phosphodiester bond polarity polymerase chain reaction (PCR) polynucleotide chain primer primer adapter

probe purine pyrimidine random amplified polymorphic DN A (RAPD) relative risk renaturation restriction endonuclease restriction enzyme restriction fragment restriction fragment length polymorphism (RFLP) restriction map restriction site risk factor simple sequence length polymorphism (SSLP) simple tandem repeat polymorphism (STRP) single-nucleotide polymorphism (SN P) Southern blot sticky end thermophile 3' -OH (hydroxyl) group variable number of tandem repeats (VN TR). Z form of DN A

Review the Basics • Wh at fou r bases are com m on ly fou n d in th e n u cleotides in DNA? Wh ich form base pairs?

• Describe h ow a Sou th ern blot is carried ou t. Explain wh at it u sed for. Wh at is th e role of th e probe?

• Wh ich ch em ical grou ps are presen t at th e extrem e 3’ an d 5’ en ds of a sin gle polyn u cleotide stran d?

• How does th e polym erase ch ain reaction work? Wh at is it u sed for? Wh at in form ation abou t th e target sequ en ce m u st be kn own in advan ce? Wh at is th e role of th e oligon u cleotide prim ers?

• Wh at does it m ean to say th at a sin gle stran d of DNA stran d h as a polarity? Wh at does it m ean to say th at th e DNA stran ds in a du plex m olecu le are an tiparallel? • Wh at are restriction en zym es an d wh y are th ey im portan t in th e stu dy of particu lar DNA fragm en ts? Wh at does it m ean to say th at m ost restriction sites are palin drom es?

• Wh at is a DNA m arker? Explain h ow h arm less DNA m arkers can serve as aids in iden tifyin g disease gen es th rou gh gen etic m appin g. • Defin e an d given an exam ple of each of th e followin g key gen etic term s: locu s, allele, gen otype, h eterozygou s, h om ozygou s, ph en otype. Review the Basics

77

contributing to the stability of double-stranded DN A is the stacking of the base pairs on top of one another as a result of hydrophobic interactions. For further discussion of this feature of DN A, and much else of interest regarding the discovery and analysis of this critical biological macromolecule, consult the keyword site.

GeN ETics on the Web will introduce you to some of the most important sites for finding genetic information on the Internet. To explore these sites, visit the Jones and Bartlett home page at http:/ / www.jbpub.com/ genetics

For the book Genetics: Analysis of Genes and Genomes, choose the link that says Enter GeN ETics on the Web. You will be presented with a chapter-by -chapter list of highlighted keywords. Select any highlighted keyword and you will be linked to a Web site containing genetic information related to the keyword.

• The concept of the polymerase chain reaction ( PCR) occurred to Kary M ullis one night while cruising on Route 128 from San Francisco to M endocino. H e immediately realized that this approach would be unique in its ability to amplify, at an exponential rate, a specific nucleotide sequence present in a vanishingly small quantity amid a much larger background of total nucleic acid. Once its feasibility was demonstrated, PCR was quickly recognized as a major technical advance in molecular biology. The new technique earned M ullis the 1993 N obel Prize in chemistry, and today it is the basis of a large number of experimental and diagnostic procedures. At this keyword site you can learn more about the

• DN A is like Coca-Cola? According to this keyword site, it is. It contains sugar, which is highly soluble in water; phosphate groups, which are of moderate solubility; and bases, which have extremely low solubility. (The base in the soft drink is caffeine, which is chemically similar to adenine and can sometimes be incorporated into DN A, causing a mutation.) As this keyword site emphasizes, the most important property

Guide to Problem Solving How m an y possible restriction m aps are com patible with th ese data? For each possible restriction m ap, m ake a diagram of th e circu lar m olecu le an d in dicate th e relative position s of th e EcoRI an d Hin d III restriction sites.

Distin gu ish between base pairin g an d base stackin g in dou ble-stran ded DNA.

Problem 1

Base pairin g is th e h ydrogen bon din g between correspon din g bases in opposite stran ds of du plex DNA; A (aden in e) is paired with T (th ym in e), an d G (gu an in e) is paired with C (cytosin e). Base stackin g refers to th e h ydroph obic (water-h atin g) in teraction between con secu tive base pairs alon g a DNA du plex, wh ich prom otes th e form ation of a “stack” of base pairs with th e su gar–ph osph ate backbon es of th e stran ds ru n n in g alon g ou tside.

Answer

Answer Becau se th e sin gle-en zym e digests give two ban ds each , th ere m u st be two restriction sites for each en zym e in th e m olecu le. Fu rth erm ore, becau se digestion with Hin dIII m akes both th e 6-kb an d th e 14-kb restriction fragm en ts disappear, each of th ese fragm en ts m u st con tain on e Hin dIII site. Con siderin g th e sizes of th e fragm en ts in th e dou ble digest, th e 6-kb EcoRI fragm en t m u st be cleaved in to 2-kb an d 4-kb fragm en ts, an d th e 14-kb EcoRI fragm en t m u st be cleaved in to 5-kb an d 9-kb fragm en ts. Two restriction m aps are com patible with th e data, depen din g on wh ich en d of th e 6-kb EcoRI fragm en t th e Hin dIII site is n earest. Th e position of th e rem ain in g Hin dIII site is determ in ed by th e fact th at th e 2-kb an d 5-kb fragm en ts in th e dou ble digest m u st be adjacen t in th e in tact m olecu le in order for a 13-kb fragm en t to be produ ced by Hin dIII digestion alon e. Th e accom pan yin g figu re sh ows th e relative position s of th e

Problem 2 Th e restriction en zym e EcoRI cleaves dou blestran ded DNA at th e sequ en ce 5 ' -GAATTC-3 ' , an d th e restriction en zym e Hin d III cleaves at 5 ' -AAGCTT-3 ' . A 20-kilobase (kb) circu lar plasm id is digested with each en zym e in dividu ally an d th en in com bin ation , an d th e resu ltin g fragm en t sizes are determ in ed by m ean s of electroph oresis. Th e resu lts are as follows:

EcoRI alon e Hin d III alon e EcoRI an d Hin d III

fragm en ts of 6 kb an d 14 kb fragm en ts of 7 kb an d 13 kb fragm en ts of 2 kb, 4 kb, 5 kb an d 9 kb

(A)

EcoRI

(B)

6

Hin dIII

14

EcoRI Hin dIII 2

5

Hin dIII 2

5

EcoRI

EcoRI Hin dIII

78

4

9

4 9

EcoRI

EcoRI

(C)

Chapter 2 DNA Structure and DNA Manipulation

development of PCR from M ullis’s original conception, including two major innovations that were necessary to perfect the process.

• The Pic Site showcases some of the most visually appealing genetics sites on the World Wide Web. To visit the genetics Web site pictured below, select the PIC Site for Chapter 2.

• H uman beings rely on plants for food, shelter, and medicines. Although at least 5000 species are cultivated, modern agricultural research emphasizes a few widely cultivated crops while largely ignoring plants such as Bambara groundnut ( Vigna subterranea), breadfruit ( Artocarpus altilis), carob ( Ceratonia siliqua), coriander ( Coriandrum sativum ), emmer wheat ( Triticum dicoccum ), oca ( Oxalis tuberosa), and ulluco ( Ullucus tuberosus). To learn more about these minor crops and the use of molecular markers for characterizing and preserving their genetic diversity, consult this keyword site. • The M utable Site changes frequently. Each new update includes a different site that highlights genetics resources available on the World Wide Web. Select the M utable Site for Chapter 2 and you will be linked automatically.

EcoRI sites (part A). Parts B an d C are th e two possible restriction m aps, wh ich differ accordin g to wh eth er th e EcoRI site at th e top gen erates th e 2-kb or th e 4-kb fragm en t in th e dou ble digest. Problem 3 Th e

accom pan yin g diagram sh ows th e position s of restriction sites (tick m arks) for a particu lar restriction en zym e th at can be presen t in th e DNA at a locu s in a h u m an ch rom osom e. Th e DNA presen t in an y particu lar ch rom osom e m ay be th at sh own at th e top or th at sh own at th e bottom . A probe DNA bin ds to th e fragm en ts at th e position sh own by th e rectan gle. With respect to an RFLP based on th ese fragm en ts, th ree gen otypes are possible. Wh at are th ey? Use th e sym bol A 1 to refer to th e allele th at yields th e u pper DNA fragm en t, an d u se A 2 to refer to th e allele th at yields th e lower DNA fragm en t. In th e 3 kb

9 kb

A1

12 kb

accom pan yin g gel diagram , in dicate th e gen otypes across th e top an d th e ph en otype (ban d position or position s) expected for each gen otype. Th e scale on th e righ t sh ows th e expected position s of fragm en ts ran gin g in size from 1 to 12 kb. Answer After cleavage with th e restriction en zym e, th e A 1-type DNA yields a 3-kb fragm en t th at bin ds with th e probe (yieldin g a 3-kb ban d) an d a 9-kb fragm en t th at does n ot bin d with th e probe (yieldin g n o visible ban d), wh ereas th e A 2-type DNA yields a 12-kb fragm en t th at bin ds with th e probe (yieldin g a 12-kb ban d). A particu lar ch rom osom e m ay carry allele A 1 or allele A 2. Becau se in dividu als h ave two copies of each ch rom oosm e (except for th e sex ch rom osom es), an y in dividu al m ay carry A 1A 1, A 1A 2, or A 2A 2. DNA from h om ozygou s A 1A 1 gen otypes yields a 3-kb ban d, th at from h eterozygou s A 1A 2 gen otypes yields both a 3-kb an d a 12-kb ban d, an d th at from h om ozygou s A 2A 2 gen otypes yields a 12kb ban d. Th e expected ph en otypes are illu strated h ere.

A2 A 1A 1

A 1A 2

A 2A 2

12 kb 9 kb

12 kb 9 kb

6 kb

6 kb

3 kb

3 kb

1 kb

1 kb

Guide to Problem Solving

79

5' -GTACGGGCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCGTC-3' 3' -CATGCCCGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGCAG-5' Problem 4

A gen eticist plan s to u se th e polym erase ch ain reaction (PCR) to am plify part of th e DNA sequ en ce sh own below, u sin g oligon u cleotide prim ers th at are h exam ers m atch in g th e region s sh own in red. (In practice, h exam ers are too sh ort for m ost pu rposes.) State th e sequ en ce of th e prim er oligon u cleotides th at sh ou ld be u sed, in clu din g th e polarity, an d give th e sequ en ce of th e DNA m olecu le th at resu lts from am plification . Problem 4

on e th at is elon gated in a left-to-righ t direction ) sh ou ld h ave th e sequ en ce 5 ' -GCAATG-3 ' an d th e “reverse prim er” (th e on e th at is elon gated in a righ t-to-left direction ) sh ou ld h ave th e sequ en ce 3 ' -TTCGGC-5 ' . Th e resu ltin g am plified sequ en ce is sh own below. 5' -GCAATGGTAATTTTTCAGGAACCAGGGCCCTTAAGCCG-3' 3' -CGTTACCATTAAAAACTCCTTGGTCCCGGGAATTCGGC-5'

Th e prim ers m u st be able to base-pair with th e ch osen prim er sites an d m u st be orien ted with th eir 3 ' en ds facin g on e an oth er. Th u s th e “forward prim er” (th e

Answer

AU/ ED: Space in art for prob 2.6 redu ced sligh tly to m ake page.OK?

Analysis and Applications 2.1 Man y restriction en zym es produ ce restriction fragm en ts th at h ave “sticky en ds.” Wh at does th is m ean ? 2.2 Wh ich of th e followin g sequ en ces are palin drom es an d wh ich are n ot? Explain you r an swer. Sym bols su ch as (A冒T) m ean th at th e site m ay be occu pied by (in th is case) eith er A or T, an d N stan ds for an y n u cleotide. (a) 5 ' -AATT-3 ' (b) 5 ' -AAAA-3 ' (c) 5 ' -AANTT-3 ' (d) 5 ' -AA(A冒T)AA-3 ' (e) 5 ' -AA(G冒C)TT-3 '

Th e followin g list gives h alf of each of a set of palin drom ic restriction sites. Wh at is th e com plete sequ en ce of each restriction site? (N stan ds for an y n u cleotide.) (a) 5 ' -AA??-3 ' (b) 5 ' -ATG???-3 ' (c) 5’' -GGN??-3 ' (d) 5 ' -ATNN??-3 ' 2.3

Apart from th e base sequ en ce, wh at is differen t abou t th e en ds of restriction fragm en ts produ ced by th e followin g restriction en zym es? (Th e down ward arrow represen ts th e site of cleavage in each stran d.) (a) HaeIII (5 ' -GG j CC-3 ' ) (b) MaeI (5 ' -C j TAG-3 ' ) (c) CfoI (5 ' -GCG j C-3 ' )

2.6 Th e lin ear DNA fragm en t sh own h ere h as cleavage sites for Bam HI ( B) an d EcoRI ( E). In th e accom pan yin g diagram of an electroph oresis gel, in dicate th e position s at wh ich ban ds wou ld be fou n d after digestion with : (a) Bam HI alon e (b) EcoRI alon e (c) Bam HI an d EcoRI togeth er Th e dash ed lin es on th e righ t in dicate th e position s to wh ich ban ds of 1–12 kb wou ld m igrate.

0

(a)

A solu tion con tain s dou blestran ded DNA fragm en ts of size 3 kb, 6 kb, 9 kb, an d 12 kb. Th ey are separated in an electroph oresis gel. In th e diagram of th e gel at th e righ t, m atch th e fragm en t sizes with th e correct ban ds.

80

(b) (c) (d)

E

B

2

4

6

Bam HI

Eco RI

E

8

10

12 kb

Bam HI + Eco RI

12 kb 9 kb 6 kb

2.4

2.5

B

3 kb

2.7 Th e circu lar DNA m olecu le sh own at th e top of page 81 h as cleavage sites for Bam HI an d EcoRI. In th e accom pan yin g diagram of an electroph oresis gel, in dicate th e position s at wh ich ban ds wou ld be fou n d after digestion with : (a) Bam HI alon e (b) EcoRI alon e (c) Bam HI an d EcoRI togeth er Th e dash ed lin es on th e righ t in dicate th e position s to wh ich ban ds of 1–12 kb wou ld m igrate.

Chapter 2 DNA Structure and DNA Manipulation

0 kb 9

8

AU/ ED: Space in art for prob 2.7 redu ced sligh tly to m ake page.OK?

1

Bam HI

Bam HI

7 Eco RI

2.11 In a ran dom sequ en ce con sistin g of equ al proportion s of all fou r n u cleotides, wh at is th e average distan ce between restriction sites for: (a) A restriction en zym e with a 4-base cleavage site? (b) A restriction en zym e with a 6-base cleavage site? (c) A restriction en zym e with an 8-base cleavage site?

3

4

5 kb

Bam HI

(c)

A restriction en zym e with a 4-base cleavage site? A restriction en zym e with a 6-base cleavage site? A restriction en zym e with an 8-base cleavage site?

2

Eco RI

6

(a) (b)

Eco RI

Bam HI + Eco RI

12 kb 9 kb 6 kb 3 kb 1 kb

2.8 Con sider th e accom pan yin g diagram of a region of du plex DNA, in wh ich th e B’s represen t bases in Watson – Crick pairs. Specify as precisely as possible th e iden tity of: (a) B5 , assu m in g th at B1 ⫽ A (b) B6 , assu m in g th at B2 ⫽ C (c) B7 , assu m in g th at B3 ⫽ pu rin e (d) B8 , assu m in g th at B4 ⫽ A or T

OH

1 P

B1 B5

P

2 P

B2 B6

P

3 P

B3 B7

P

4 P

B4to 85% B8to m ake page. OK?P

2.12 If h u m an DNA were essen tially a ran dom sequ en ce of 3 ⫻ 10 9 bp with equ al proportion s of all fou r n u cleotides (th is is an oversim plification ), approxim ately h ow m an y restriction fragm en ts wou ld be expected from cleavage with (a) A “4-cu tter” restriction en zym e? (b) A “6-cu tter” restriction en zym e? (c) An “8-cu tter” restriction en zym e? 2.13 Con sider th e restriction en zym es Bam HI (cleavage site 5 ' -G j GATCC-3 ' ) an d Sau 3A (cleavage site 5 ' -j GATC-3 ' ), wh ere th e down ward arrow den otes th e site of cleavage in each stran d. Is every Bam HI site a Sau 3A site? Is every Sau 3A site a Bam HI site? Explain you r an swer. 2.14 A DNA du plex with th e sequ en ce sh own h ere is cleaved with Bam HI (cleavage site 5 ' -G j GATCC-3 ' ), wh ere th e arrow den otes th e site of cleavage in each stran d. If th e resu ltin g fragm en ts were brou gh t togeth er in th e righ t order, an d th e breaks in th e backbon es repaired, wh at possible DNA du plexes wou ld be expected?

5' -ATTGGATCCAAACCCCAAAGGATCCTTA-3' 3' -TAACCTAGGTTTGGGGTTTCCTAGGAAT-5' 2.15 Th e restriction en zym es PstI, Pvu II, an d Mlu I h ave th e followin g restriction sites, wh ere th e arrow in dicates th e site of cleavage in each stran d.

PstI 5 ' -CTGCA j G-3 ' PvuII 5 ' -CAG j CTG-3 ' Mlu I 5 ' -A j CGCGT-3 '

AU/ ED: Art for prob 2.8 scaled

2.9 Refer to th e DNA m olecu le diagram m ed in Problem 2.8. In th e precu rsor n u cleotides of th is m olecu le, wh ich base was each of th e ph osph ate grou ps 1–4 associated with ?

A DNA du plex with th e sequ en ce above is digested. Wh at fragm en ts wou ld resu lt from cleavage with : (a) PstI? (b) Pvu II? (c) Mlu I? With regard to th e restriction en zym es an d th e DNA du plex in Problem 2.15, wh at fragm en ts wou ld resu lt from digestion with (a) PstI an d Mlu I? (b) Pvu II an d Mlu I? 2.16

In a ran dom sequ en ce con sistin g of equ al proportion s of all fou r n u cleotides, wh at is th e probability th at a particu lar sh ort sequ en ce of n u cleotides m atch es a restriction site for: 2.10

Problems 2.15, 2.16

5' -ATGCCCTGCAGTACCATGACGCGTTACGCAGCTGATCGAAACGCGTATATATGCC-3' 3' -TACGGGACGTCATGGTACTGCGCAATGCGTCGACTAGCTTTGCGCATATATACGG-5' Analysis and Applications

81

2.17

Con sider th e sequ en ce: 5' -CTGCAGGTG-3' 3' -GACGTCCAC-5'

If this sequence were cleaved with PstI (5' -CTGCA j G-3' ), could it still be cleaved with PvuII (5 ' -CAG j CTG-3 ' )? If it were cleaved with Pvu II, cou ld it still be cleaved with PstI? Explain you r an swer.

m en t.) In th e accom pan yin g gel diagram , in dicate th e gen otypes across th e top an d th e ph en otype (ban d position or position s) expected for each gen otype. (Th e scale on th e righ t sh ows th e expected position s of fragm en ts from 1 to 12 kb.) 4 kb

2 kb

A1

6 kb

A circu lar DNA m olecu le is cleaved with Bam HI, EcoRI, or th e two restriction en zym es togeth er. Th e accom pan yin g diagram sh ows th e resu ltin g electroph oresis gel, with th e ban d sizes in dicated. Draw a diagram of th e circu lar DNA, sh owin g th e relative position s of th e Bam HI an d EcoRI sites.

A2

2.18

Bam HI

Eco RI

12 kb 9 kb 6 kb

Bam HI + Eco RI

3 kb 1 kb

10 kb 7 kb 3 kb

2.19 In th e diagram s of DNA fragm en ts sh own h ere, th e tick m arks in dicate th e position s of restriction sites for a particu lar restriction en zym e. A m ixtu re of th e two types of m olecu les is digested an d an alyzed with a Sou th ern blot u sin g eith er probe A or probe B, wh ich h ybridizes to th e fragm en ts wh ere sh own by th e rectan gles. In th e accom pan yin g gel diagram , in dicate th e ban ds th at wou ld resu lt from th e u se of each of th ese probes. (Th e scale on th e righ t sh ows th e expected position s of fragm en ts from 1 to 12 kb.)

5 kb

2.21 Th e accom pan yin g diagram sh ows th e DNA fragm en ts associated with an RFLP revealed by a probe th at h ybridizes wh ere sh own by th e rectan gle. Th e tick m arks are cleavage sites for th e restriction en zym e u sed in th e RFLP an alysis. How m an y alleles does th is RFLP h ave? How m an y gen otypes are possible? In th e accom pan yin g gel diagram , in dicate th e ph en otype (pattern of ban ds) expected of each gen otype. (Th e scale on th e righ t sh ows th e expected position s of fragm en ts from 1 to 12 kb.)

4 kb

8 kb 12 kb

7 kb

12 kb 9 kb

12 kb Probe A Probe A

6 kb Probe B

3 kb

Probe B 1 kb

12 kb 9 kb 6 kb 3 kb 1 kb 2.20 In th e accom pan yin g diagram , th e tick m arks in dicate th e position s of restriction sites in two altern ative DNA fragm en ts th at can be presen t at th e A locu s in a h u m an ch rom osom e. An RFLP an alysis is carried ou t, u sin g probe DNA th at bin ds to th e fragm en ts at th e position sh own by th e rectan gle. With respect to th is RFLP, h ow m an y gen otypes are possible? (Use th e sym bol A 1 to refer to th e allele th at yields th e u pper DNA fragm en t, an d u se A 2 to refer to th e allele th at yields th e lower DNA frag-

82

2.22 Th e th ick h orizon tal lin es sh own below represen t altern ative DNA m olecu les at a particu lar locu s in a h u m an ch rom osom e. Th e tick m arks in dicate th e position s of restriction sites for a particu lar restriction en zym e. Gen om ic DNA from a sam ple of people is digested an d an alyzed by a Sou th ern blot u sin g a probe DNA th at h ybridizes at th e position sh own by th e rectan gle. How m an y possible RFLP alleles wou ld be observed in th e sam ple? How m an y gen otypes?

Chapter 2 DNA Structure and DNA Manipulation

3 kb

3 kb

3 kb

2 kb 5 kb

6 kb

4 kb 4 kb

2 kb

4 kb

2.23 Th e RFLPs described in Problem 2.22 are an alyzed with th e sam e restriction en zym e bu t a differen t probe, wh ich h ybridizes at th e site in dicated h ere by th e rectan gle. How m an y RFLP alleles wou ld be fou n d? How m an y gen otypes? (Use th e sym bols A 1, A 2, . . . to in dicate th e alleles.) In th e accom pan yin g gel diagram , in dicate th e gen otypes across th e top an d th e ph en otype (ban d position or position s) expected for each gen otype. (Th e scale on th e righ t sh ows th e expected position s of fragm en ts from 1 to 12 kb.)

A

B

C

D

12 10 8 Band num ber 6 4 2

3 kb

3 kb

3 kb

2 kb 5 kb

6 kb

4 kb 4 kb

2 kb

4 kb

12 kb 9 kb 6 kb

2.28 A cigarette bu tt fou n d at th e scen e of a robbery is fou n d to h ave a su fficien t n u m ber of epith elial cells stu ck to th e paper for th e DNA to be extracted an d typed. Sh own below are th e resu lts of typin g for th ree probes (locu s 1, locu s 2, an d locu s 3) of th e eviden ce (X) an d 7 su spects (A th rou gh G). Wh ich of th e su spects can be exclu ded? Wh ich can n ot be exclu ded? Can you iden tify th e robber? Explain you r reason in g.

3 kb

A

B

C

D

E

F

G

X

A

B

C

D

E

F

G

X

A

B

C

D

E

F

G

X

1 kb

If h exam ers were lon g en ou gh oligon u cleotides to serve as specific prim ers for PCR (for m ost pu rposes th ey are too sh ort), wh at DNA fragm en t wou ld be am plified u sin g th e “forward” prim er pair 5 ' -AATGCC-3 ' an d th e “reverse” prim er 3 ' -GCATGT-5 ' on th e dou ble-stran ded DNA m olecu le sh own below? 2.24

Locus 1

Wou ld th e prim er pairs 3 ' -AATGCC-5 ' an d 5 ' -GCATGT-3 ' am plify th e sam e fragm en t described in th e previou s problem ? Explain you r an swer. 2.25

Locus 2

A h u m an DNA fragm en t of 3 kb is to be am plified by PCR. Th e total gen om e size is 3 ⫻ 10 9 bp. 2.26

(a)

Prior to am plification , wh at fraction of th e total DNA does th e target sequ en ce con stitu te?

(b)

Wh at fraction does it con stitu te after 10 cycles of PCR?

(c)

After 20 cycles of PCR?

(d)

After 30 cycles of PCR?

2.27 RAPD an alysis is carried ou t u sin g gen om ic DNA from fou r in dividu als (A–D) sam pled from a n atu ral popu lation of Hawaiian crickets. Th e gel sh own at th e top of th e page resu lted from PCR with on e of th e prim er pairs tested. Wh ich ban ds are th e RAPD polym orph ism s?

Locus 3

5' -GATTACCGGTAAATGCCGGATTAACCCGGGTTATCAGGCCACGTACAACTGGAGTCC-3' 3' -CTAATGGCCATTTACGGCCTAATTGGGCCCAATAGTCCGGTGCATGTTGACCTCAGG-5' Problem 2.24

Analysis and Applications

83

2.29 A wom an is u n certain wh ich of two m en is th e fath er of h er ch ild. DNA typin g is carried ou t on blood from th e ch ild (C), th e m oth er (M), an d each of th e two m ales (A an d B), u sin g probes for a h igh ly polym orph ic DNA m arker on two differen t ch rom osom es (“locu s 1” an d “locu s 2”). Th e resu lt is sh own in th e accom pan yin g diagram . Can eith er m ale be exclu ded as th e possible fath er? Explain you r reason in g.

A

B

M

C

Locus 1

A

B

M

2.30 Sn ake ven om ph osph odiesterase cleaves th e ch em ical bon ds sh own in red in th e accom pan yin g diagram , leavin g m on on u cleotides th at are ph osph orylated in th e 3 ' position . If th e ph osph ates n u m bered 2 an d 4 are radioactive, wh ich m on on u cleotides will be radioactive after cleavage with sn ake ven om ph osph odiesterase?

1 P

B 1 B5

P

2 P

B 2 B6

P

3 P

B 3 B7

P

4 P

B 4 B8

P

C

Locus 2

HO AU/ ED: Art for prob 2.8 scaled to 85% for con sisten cy. OK?

Challenge Problems 2.31 Th e gen om e of Drosophila melanogaster is 180 ⫻ 10 6 bp, an d a fragm en t of size 1.8 kb is to be am plified by PCR. How m an y cycles of PCR are n ecessary for th e am plified target sequ en ce to con stitu te at least 99 percen t of th e total DNA?

A

B

C

D

E

A

B

C

D

E

么乆么乆么乆么乆么乆 X

么乆么乆么乆么乆么乆 X

Locus 1

Locus 2

2.32 A m u rder victim is fou n d in an advan ced state of decom position an d can n ot be iden tified. Police su spect th at th e victim is on e of five person s reported by th eir paren ts as m issin g. DNA typin g is carried ou t on tissu es from th e victim (X) an d on th e five sets of paren ts (A th rou gh E), u sin g probes for a h igh ly polym orph ic DNA m arker on two differen t ch rom osom es (“locu s 1” an d “locu s 2”). Th e resu lt is sh own in th e diagram at th e righ t. How do you in terpret th e fact th at gen om ic DNA from each in dividu al yields two ban ds? Can you iden tify th e paren ts of th e victim ? Explain you r reason in g.

2.33 Th e sn ake ven om ph osph odiesterase en zym e described in Problem 2.30 was origin ally u sed in a procedu re called “n earest n eigh bor” an alysis. In th is procedu re, a DNA stran d is syn th esized in th e presen ce of all fou r trin u cleotides, on e of wh ich carries a radioactive ph osph ate in th e α (in n erm ost) position . Th en th e DNA is digested to com pletion with sn ake ven om ph osph odiesterase, an d th e resu ltin g m on on u cleotides are separated an d assayed for radioactivity. Exam in e th e diagram in Problem 2.30, an d th en an swer th e followin g qu estion s.

84

Problem 2.32

(a) (b)

How does th is procedu re reveal th e “n earest n eigh bors” of th e radioactive n u cleotide? Is th e “n earest n eigh bor” on th e 5 ' or th e 3 ' side of th e labeled n u cleotide?

Chapter 2 DNA Structure and DNA Manipulation

Further Reading Botstein , D., R. L. Wh ite, M. Skoln ick, an d R. W. Davis. 1980. Con stru ction of a gen etic lin kage m ap in m an u sin g restriction fragm en t len gth polym orph ism s. American Journal of Human Genetics 32: 314. Calladin e, C. R., an d H. Drew. 1997. Understanding DNA: The Molecule and How it Works. 2d ed. San Diego: Academ ic Press. Cru zan , M. B. 1998. Gen etic m arkers in plan t evolu tion ary ecology. Ecology 79: 400. Dan n a, K., an d D. Nath an s. 1971. Specific cleavage of Sim ian Viru s 40 DNA by restriction en don u clease of Hemophilus influenzae. Proceedings of the National Academy of Sciences, USA 68: 2913. DePam ph ilis, M. L., ed. 1996. DNA Replication in Eukaryotic Cells. Cold Sprin g Harbor, NY: Cold Sprin g Harbor Press. Eeles, R. A. an d A. C. Stam ps. 1993. Polymerase Chain Reaction (PCR): The Technique and Its Application. Au stin TX: R. G. Lan des. Fran k-Kam en etskii, M. D. 1997. Unraveling DNA: The Most Important Molecule of Life. Tr. by L. Liapin . Readin g MA: Addison -Wesley. Hartl, D. L. 2000. A Primer of Populaton Genetics. 3d ed. Su n derlan d, MA: Sin au er. Jorde, L. B., M. Bam sh ad, an d A. R. Rogers. 1998. Usin g m itoch on drial an d n u clear DNA m arkers to recon stru ct h u m an evolu tion . Bioessays 20: 126. Ku m ar, L. S. 1999. DNA m arkers in plan t im provem en t: An overview. Biotechnology Advances 17: 143.

Loxdale, H. D., an d G. Lu sh ai. 1998. Molecu lar m arkers in en tom ology. Bulletin of Entomological Research 88: 577–600. Mitton , J. B. 1994. Molecu lar approach es to popu lation biology. Annual Review of Ecology & Systematics 25: 45. Mu llis, K. B. 1990. Th e u n u su al origin of th e polym erase ch ain reaction . Scientific American, April. Olby, R. C. 1994. The Path to the Double Helix: The Discovery of DNA. New York: Dover. Pen a, S. D. J., V. F. Prado, an d J. T. Epplen . 1995. DNA diagn osis of h u m an gen etic in dividu ality. Journal of Molecular Medicine 73: 555. Sayre, A. 1975. Rosalind Franklin and DNA. New York: Norton . Sch afer, A. J., an d J. R. Hawkin s. 1998. DNA variation an d th e fu tu re of h u m an gen etics. Nature Biotechnology 16: 33. Sm ith ies, O. 1995. Early days of electroph oresis. Genetics 139: 1. Th om son , G., an d M. S. Esposito. 1999. Th e gen etics of com plex diseases. Trends in Genetics 15: M17. Wan g, D. G., J.-B. Fan , C.-J. Siao, A. Bern o, P. You n g, R. Sapolsky et al. 1998. Large-scale iden tification , m appin g, an d gen otypin g of sin gle-n u cleotide polym orph ism s in th e h u m an gen om e. Science 280: 1077. Watson , J. D. 1968. The Double Helix. New York: Ath en eu m .

Further Reading

85

C H A P T E R

3 Transmission Genetics: The Principle of Segregation

86

C H A P T E R

O U T L I N E

P R I N C I P L E S

3.1

M orphological and M olecular Phenotypes

• Inherited traits are determined by the genes present in the reproductive cells united in fertilization.

3.2

Segregation of a Single Gene Phenotypic Ratios in the F2 Generation The Principle of Segregation Verification of Segregation The Testcross and the Backcross

• Genes are usually inherited in pairs—one from the mother and one from the father.

3.3

3.4

Segregation of Two or M ore Genes The Principle of Independent Assortment The Testcross with Unlinked Genes The Big Experiment H uman Pedigree Analysis Characteristics of Dominant and Recessive Inheritance M olecular M arkers in H uman Pedigrees

3.5

Pedigrees and Probability M utually Exclusive Possibilities Independent Possibilities

3.6

Incomplete Dominance and Epistasis M ultiple Alleles H uman ABO Blood Groups Epistasis

3.7

Genetic Analysis: M utant Screens and the Complementation Test The Complementation Test in Gene Identification Why Does the Complementation Test Work?

• The genes in a pair may differ in DN A sequence and in their effect on the expression of a particular inherited trait. • The maternally and paternally inherited genes are not changed by being together in the same organism. • In the formation of reproductive cells, the paired genes separate again into different cells. • Random combinations of reproductive cells containing different genes result in M endel’s ratios of traits appearing among the progeny. • The ratios actually observed for any trait are determined by the types of dominance and gene interaction. • In genetic analysis, the complementation test is used to determine whether two recessive mutations that cause a similar phenotype are alleles of the same gene. The mutant parents are crossed, and the phenotype of the progeny is examined. If the progeny phenotype is nonmutant (complementation occurs), the mutations are in different genes; if the progeny phenotype is mutant (complementation does not occur), the mutations are in the same gene.

C O N N E C T I O N S What Did Gregor M endel Think H e Discovered?

Gregor Mendel 186 6 Experiments on Plant H ybrids Troubadour

The Huntington’s Disease Collaborative Research Group 19 9 3 A Novel Gene Containing a Trinucleotide Repeat That Is Expanded and Unstable on Huntington’s Disease Chromosomes

87

AU: Referen ce in first paragraph ch an ged to Fig. 2.25. Correct? or restore to Fig. 2.24? AU: In first paragraph , ph rase “ph en otype is ch aracterized by” ch an ged to “ph en otype con sists.” OK?

In this small garden plot adjacen t to th e m on astery of St. Th om as, Gregor Men del grew m ore th an 33,500 pea plan ts in th e years 1856–1863, in clu din g m ore th an 6400 plan ts in on e year alon e. He received som e h elp from two fellow m on ks wh o assisted in th e experim en ts.

88

I

n Chapter 2 we em ph asized th at in m odern gen etics, a typical ph en otype con sists of a ban d presen t at a particu lar position in a DNA electroph oresis gel. In a m eth od of gen etic an alysis su ch as RAPD (ran dom am plified polym orph ic DNA), gen om ic DNA from a sin gle in dividu al can yield 30 or m ore ban ds (see Figu re 2.25). Each of th ese ban ds represen ts a ph en otype an d resu lts from th e PCR (polym erase ch ain reaction ) am plification of a sin gle region of DNA in th e gen om e of th e in dividu al. In th is ch apter we con sider h ow gen es are tran sm itted from paren ts to offsprin g an d h ow th is determ in es th e distribu tion of gen otypes an d ph en otypes am on g related in dividu als. Th e stu dy of th e in h eritan ce of traits con stitu tes tran sm issio n ge n e tics. Th is su bject is also called Me n d e lian ge n e tics becau se th e u n derlyin g prin ciples were first dedu ced from experim en ts in garden peas ( Pisum sativum ) carried ou t in th e years 1856–1863 by Gregor Men del, a m on k at th e m on astery of St. Th om as in th e town of Brn o (Brü n n ), in th e Czech Repu blic. He reported h is experim en ts to a local n atu ral h istory society, pu blish ed th e resu lts an d h is in terpretation in its scien tific jou rn al in 1866, an d began exch an gin g letters with on e of th e leadin g botan ists of th e tim e. His experim en ts were carefu l an d exception ally well docu m en ted, an d h is paper con tain s th e first clear exposition of th e statistical ru les govern in g th e tran sm ission of gen es from gen eration to gen eration . Neverth eless, Men del’s paper was ign ored for 34 years u n til its sign ifican ce was fin ally appreciated.

3.1 M orphological and M olecular Phenotypes Gen eticists stu dy traits in wh ich on e organ ism differs from an oth er in order to discover th e gen etic basis of th e differen ce. Un til th e adven t of m olecu lar gen etics, gen eticists dealt m ain ly with m orph ological traits, in wh ich th e differen ces between organ ism s can be expressed in term s of color, sh ape, or size. Men del stu died seven m orph ological traits con trastin g in seed sh ape, seed color, flower color, pod sh ape, an d so forth ( Figure 3.1). Perh aps th e m ost widely kn own exam ple of a con trastin g Men delian trait is rou n d versu s wrin kled seeds. As pea seeds dry, th ey lose water an d sh rin k. Rou n d seeds are rou n d becau se th ey sh rin k u n iform ly; wrin kled seeds are wrin kled becau se th ey sh rin k irregu larly. Th e wrin kled ph en otype is du e to th e absen ce of a bran ch ed-ch ain form of starch kn own as amylopectin, wh ich is n ot syn th esized in wrin kled seeds owin g to a defect in th e en zym e starch -bran ch in g en zym e I (SBEI). Th e n on m u tan t, or w ild typ e , allele of th e gen e for SBEI is design ated W an d th e m u tan t allele w. Seeds th at are h eterozygou s Ww h ave on ly h alf as m u ch SBEI en zym e as wildtype h om ozygou s WW seeds, bu t th is h alf th e n orm al am ou n t of en zym e produ ces en ou gh am ylopectin for th e h eterozygou s Ww seeds to sh rin k u n iform ly an d rem ain ph en otypically rou n d. Hen ce, with respect to seed sh ape, th e wildtype W allele is dom in an t over th e m u tan t w allele. Becau se th e w allele is recessive, on ly th e h om ozygou s ww seeds becom e wrin kled. Th e m olecu lar basis of th e wrin kled m u tation is th at th e SBEI gen e h as becom e in terru pted by th e in sertion , in to th e gen e, of a DNA sequ en ce called a tran sp o sable e le m e n t. Su ch DNA sequ en ces are capable of m ovin g ( transposition ) from on e location to an oth er with in a ch rom osom e or between ch rom osom es. Th e m olecu lar m ech an ism of tran sposition is discu ssed in Ch apter 7, bu t for ou r presen t pu rposes it is n ecessary to kn ow on ly th at tran sposable elem en ts are presen t in m ost gen om es, especially th e large gen om es of eu karyotes, an d th at m an y spon tan eou s m u tation s resu lt from th e in sertion of tran sposable elem en ts in to a gen e.

Chapter 3 Transmission Genetics: The Principle of Segregation

Parental strain 1: Dominant

Parental strain 2: Recessive

Phenotype of progeny of monohybrid cross

Seed shape Round

Wrinkled

Round

Yellow

Green

Yellow

Purple

White

Purple

Inflated

Constricted

Inflated

Green

Yellow

Green

Axial (along stem )

Term inal (at top of stem )

Axial

Standard

Dw arf

Standard

Seed color

Flow er color

Pod shape

Pod color

Flow er and pod position

Stem length

Figure 3.1 Th e seven differen t traits stu died in peas by Men del. Th e ph en otype sh own at th e far

righ t is th e dom in an t trait, wh ich appears in th e h ybrid produ ced by crossin g.

3.1 M orphological and M olecular Phenotypes

89

(C)

(A)–SBEI gene (w ildtype round, W )

Genotype

EcoRI

EcoRI

WW

Ww

ww

Seed phenotype

DNA Duplex Site of hybridization w ith probe DNA

M olecular phenotype

(B)–SBEI gene w ith insertion (m utant w rinkled, w ) EcoRI

EcoRI

Transposable elem ent insertion into SBEI gene Figure 3.2 (A) W (rou n d) is an allele of a gen e th at specifies th e am in o acid sequ en ce of starch

bran ch in g en zym e I (SBEI). (B) w (wrin kled) is an allele th at en codes an in active form of th e en zym e becau se its DNA sequ en ce is in terru pted by th e in sertion of a tran sposable elem en t. (C) At th e level of th e m orph ological ph en otype, W is dom in an t to w: Gen otype WW an d Ww h ave rou n d seeds, wh ereas gen otype ww h as wrin kled seeds. Th e m olecu lar differen ce between th e alleles can be detected as a restriction fragm en t len gth polym orph ism (RFLP) u sin g th e en zym e EcoRI an d a probe th at h ybridizes at th e site sh own . At th e m olecu lar level, th e alleles are codom in an t: DNA from each gen otype yields a differen t m olecu lar ph en otype—a sin gle ban d differin g in size for h om ozygou s WW an d ww, an d both ban ds for h eterozygou s Ww.

Figure 3.2 in clu des a diagram of th e DNA stru ctu re of th e wildtype W an d m u tan t w alleles an d sh ows th e DNA in sertion th at in terru pts th e w allele. High ligh ted are two EcoRI restriction sites, presen t in both alleles, th at flan k th e site of th e in sertion . Th e diagram in part C in dicates wh at pattern of ban ds wou ld be expected if on e were to carry ou t a Sou th ern blot (Section 2.3) in wh ich gen om ic DNA was digested to com pletion with EcoRI an d th en th e resu ltin g fragm en ts were separated by electroph oresis an d h ybridized with a labeled probe com plem en tary to a region sh ared between th e W an d w alleles. Th e EcoRI fragm en t from th e W allele wou ld be sm aller th an th at of th e w allele, becau se of th e in serted DNA in th e w allele, an d th u s it wou ld m igrate faster th an th e correspon din g fragm en t from th e w allele an d m ove to a position closer to th e bottom of th e gel. Gen om ic DNA from h om ozygou s WW wou ld yield a sin gle, fasterm igratin g ban d; th at from h om ozygou s ww a sin gle, slower-m igratin g ban d; an d th at from h eterozygou s Ww two ban ds with th e sam e electroph oretic m obility as th ose observed in th e h om ozygou s gen otypes. In Figu re 3.2C, th e ban d in th e h om ozygou s gen otypes is sh own as som ewh at th icker th an th ose from th e h eterozygou s gen otype, becau se th e sin gle ban d in each h o90

m ozygou s gen otype com es from th e two copies of wh ich ever allele is h om ozygou s, an d th u s it con tain s m ore DNA th an th e correspon din g DNA in th e h eterozygou s gen otype, in wh ich on ly on e copy of each allele is presen t. Hen ce, as illu strated in Figu re 3.2C, th e RFLP an alysis clearly distin gu ish es between th e gen otypes WW, Ww, an d ww, becau se th e h eterozygou s Ww gen otype exh ibits both of th e ban ds observed in th e h om ozygou s gen otypes. Th is situ ation is described by sayin g th at th e W an d w alleles are co d o m in an t with respect to th e m olecu lar ph en otype. However, as in dicated by th e seed sh apes in Figu re 3.2C, W is dom in an t over w with respect to m orph ological ph en otype. In th e discu ssion th at follows, we u se th e RFLP an alysis of th e W an d w alleles to em ph asize th e im portan ce of m olecu lar ph en otypes in m odern gen etics an d to dem on strate experim en tally th e prin ciples of gen etic tran sm ission .

3.2 Segregation of a Single Gene Men del selected peas for h is experim en ts for two prim ary reason s. First, h e h ad access to varieties th at differed in con trastin g traits, su ch as rou n d versu s wrin kled seeds

Chapter 3 Transmission Genetics: The Principle of Segregation

an d yellow versu s green seeds. Secon d, h is earlier stu dies h ad in dicated th at peas u su ally reprodu ce by self-pollin ation , in wh ich pollen produ ced in a flower is u sed to fertilize th e eggs in th e sam e flower. To produ ce h ybrids by cross-pollin ation (o u tcro ssin g), all h e n eeded to do was open th e keel petal (en closin g th e reprodu ctive stru ctu res), rem ove th e im m atu re an th ers (th e pollen produ cin g stru ctu res) before th ey sh ed pollen , an d du st th e stigm a (th e fem ale stru ctu re) with pollen taken from a flower on an oth er plan t ( Figure 3.3). Th e relatively sm all space n eeded to grow each plan t, an d th e relatively large n u m ber of progen y th at

cou ld be obtain ed, gave h im th e opportu n ity, as h e says in h is paper, to “determ in e th e n u m ber of differen t form s in wh ich h ybrid progen y appear” an d to “ascertain th eir n u m erical in terrelation sh ips.” Th e fact th at garden peas are n orm ally self-fertilizin g m ean s th at in th e absen ce of deliberate ou tcrossin g, plan ts with con trastin g traits are u su ally h om ozygou s for altern ative alleles of a gen e affectin g th e trait; for exam ple, plan ts with rou n d seeds h ave gen otype WW an d th ose with wrin kled seeds gen otype ww . Th e h om ozygou s gen otypes are in dicated experim en tally by th e observation th at h ereditary traits in each

Keel petal enclosing reproductive structures Stigm a (fem ale part) Anther (m ale part) Ovule (form s pea pod) Flow er of plant grow n from a round seed

Flow er of plant grow n from a w rinkled seed

Open flow er and discard anthers

Open flow er and collect pollen

Brush pollen onto stigm a

All seeds form ed from flow er are round

3.2 Segregation of a Single Gene

Figure 3.3 Crossin g pea

plan ts requ ires som e m in or su rgery in wh ich th e an th ers of a flower are rem oved before th ey produ ce pollen . Th e stigm a, or fem ale part of th e flower, is n ot rem oved. It is fertilized by bru sh in g with m atu re pollen grain s taken from an oth er plan t. 91

variety are tru e -bre e d in g, wh ich m ean s th at plan ts produ ce on ly progen y like th em selves wh en allowed to self-pollin ate n orm ally. Ou tcrossin g between plan ts th at differ in on e or m ore traits creates a h ybrid . If th e paren ts differ in on e, two, or th ree traits, th e h ybrid is a monohybrid, dihybrid, or trihybrid. In keepin g track of paren ts an d th eir h ybrid progen y, we say th at th e paren ts con stitu te th e P 1 ge n e ratio n an d th eir h ybrid progen y th e F1 ge n e ratio n . Becau se garden peas are sexu al organ ism s, each cross can be perform ed in two ways, depen din g on wh ich ph en otype is presen t in th e fem ale paren t an d wh ich in th e m ale paren t. With rou n d versu s wrin kled, for exam ple, th e fem ale paren t can be rou n d an d th e m ale wrin kled, or th e reverse. Th ese are called re cip ro cal cro sse s. Men del was th e first to dem on strate th e followin g im portan t prin ciple: Th e ou tcom e of a gen etic cross does n ot depen d on wh ich trait is presen t in th e m ale an d wh ich is presen t in th e fem ale; reciprocal crosses yield th e sam e resu lt.

Th is prin ciple is illu strated for rou n d an d wrin kled seeds in Figure 3.4 . Th e gel icon s sh ow th e RFLP ban ds th at gen om ic DNA from each type of seed in th ese crosses wou ld yield. Th e gen otypes of th e crosses an d th eir progen y are as follows: Cross A: WW / ⫻ ww ? 씮 Ww Cross B: ww / ⫻ WW ? 씮 Ww

In both reciprocal crosses, th e progen y h ave th e m orph ological ph en otype of rou n d seeds, bu t as sh own by th e RFLP diagram s on th e righ t, all progen y gen otypes are actu ally h eterozygou s Ww an d th erefore differen t from eith er paren t. Th e gen etic equ ivalen ce of reciprocal crosses illu strated in Figu re 3.4 is a prin ciple th at is qu ite gen eral in its applicability, bu t th ere is an im portan t exception , h avin g to do with sex ch rom osom es, th at will be discu ssed in Ch apter 4. In th e followin g section we exam in e a few of Men del’s origin al experim en ts in th e con text of RFLP an alysis in order to relate th e m orph ological ph en otypes an d th eir

Cross A

Flow er on plant grow n from w rinkled seed

Flow er on plant grow n from round seed

Ovu le from rou n d variety. Pollen from wrin kled variety. Resu lt: All seeds rou n d.

Flow er on plant grow n from w rinkled seed

Ovu le from wrin kled variety. Pollen from rou n d variety. Resu lt: All seeds rou n d.

Cross B

Figure 3.4 Morph o-

logical an d m olecu lar ph en otypes sh owin g th e equ ivalen ce of reciprocal crosses. In th is exam ple, th e h ybrid seeds are rou n d an d yield an RFLP pattern with two ban ds, irrespective of th e direction of th e cross. 92

Flow er on plant grow n from round seed

Chapter 3 Transmission Genetics: The Principle of Segregation

ratios to th e m olecu lar ph en otypes th at wou ld be expected.

Phenotypic Ratios in the F2 Generation Alth ou gh th e progen y of th e crosses in Figu re 3.4 h ave th e dom in an t ph en otype of rou n d seeds, th e RFLP an alysis sh ows th at th ey are actu ally h eterozygou s. Th e w allele is h idden with respect to th e m orph ological ph en otype becau se w is recessive to W . Neverth eless, th e wrin kled ph en otype reappears in th e n ext gen eration wh en th e h ybrid progen y are allowed to u n dergo selffertilization . For exam ple, if th e rou n d F1 seeds from Cross A in Figu re 3.4 were grown in to sexu ally m atu re plan ts an d allowed to u n dergo self-fertilization , som e of th e resu ltin g seeds wou ld be rou n d an d oth ers wrin kled. Th e progen y seeds produ ced by self-fertilization of th e F1 gen eration con stitu te th e F2 ge n e ratio n . Wh en Men del carried ou t th is experim en t, in th e F2 gen eration h e observed th e resu lts sh own in th e followin g diagram : Parental lines:

Round

Wrinkled

F1 generation: All round seeds

Table 3.1 Results of M endel’s monohybrid experiments Parental traits

F1 trait

Round

⫻ wrinkled (seeds)

Round

5474 round, 1850 wrinkled

2.96 : 1

Yellow

⫻ green (seeds)

Yellow

6022 yellow, 2001 green

3.01 : 1

Purple

705 purple, 224 white

3.15 : 1

Inflated

882 inflated, 299 constricted

2.95 : 1

Green

428 green, 152 yellow

2.82 : 1

Purple ⫻ white (flowers) Inflated Green

⫻ constricted (pods)

⫻ yellow (unripe pods)

N umber of F2 progeny

F2 ratio

Axial

⫻ terminal (flower position)

Axial

651 axial, 207 terminal

3.14 : 1

Long

⫻ short (stems)

Long

787 long, 277 short

2.84 : 1

prin cipal observation s from th e data in Table 3.1 are • Th e F1 h ybrids express on ly th e dom in an t trait (becau se th e F1 progen y are h eterozygou s—for exam ple, Ww ) • In th e F2 gen eration , plan ts with eith er th e dom in an t or th e recessive trait are presen t (wh ich m ean s th at som e F2 progen y are h om ozygou s—for exam ple, ww ). • In th e F2 gen eration , th ere are approxim ately th ree tim es as m an y plan ts with th e dom in an t ph en otype as plan ts with th e recessive ph en otype.

Th e 3 : 1 ratio observed in th e F2 gen eration is th e key to u n derstan din g th e m ech an ism of gen etic tran sm ission . In th e n ext section we u se RFLP an alysis of th e W an d w alleles to explain wh y th is ratio is produ ced.

F2 generation: 3 round (5474)

:

1 w rinkled (1850)

Note th at th e ratio 5474 : 1850 is approxim ately 3 : 1. A 3 : 1 ratio of dom in an t : recessive form s in th e F2 progen y is ch aracteristic of sim ple Men delian in h eritan ce. Men del’s data dem on stratin g th is poin t are sh own in Table 3.1. Note th at th e first two traits in th e table (rou n d versu s wrin kled seeds an d yellow versu s green seeds) h ave m an y m ore observation s th an an y of th e oth ers. Th e reason is th at th ese traits can be classified directly in th e seeds, wh ereas th e oth ers can be classified on ly in th e m atu re plan ts, an d Men del cou ld an alyze m an y m ore seeds th an h e cou ld m atu re plan ts. Th e

09131_01_1715P

captioncaptioncaption caption caption caption caption caption caption caption caption caption caption caption caption 3.2 Segregation of a Single Gene

93

3. Each reprodu ctive cell (gam e te ) produ ced by an in dividu al con tain s on ly one allele of each gen e (th at is, eith er W or w ).

The Principle of Segregation Th e 3 : 1 ratio can be explain ed with referen ce to Figure 3.5. Th is is th e h eart of Men delian gen etics. You sh ou ld m aster it an d be able to u se it to dedu ce th e progen y types produ ced in crosses. Th e diagram illu strates th ese key featu res of sin gle-gen e in h eritan ce:

4. In th e form ation of gam etes, an y particu lar gam ete is equ ally likely to in clu de eith er allele (h en ce, from a h eterozygou s Ww gen otype, h alf th e gam etes con tain W an d th e oth er h alf con tain w ). 5. Th e u n ion of m ale an d fem ale reprodu ctive cells is a ran dom process th at reu n ites th e alleles in pairs.

1. Gen es com e in pairs, wh ich m ean s th at a cell or in dividu al h as two copies (alleles) of each gen e.

Th e essen tial featu re of tran sm ission gen etics is th e separation , tech n ically called se gre gatio n , in u n altered form , of th e two alleles in an in dividu al du rin g th e form ation of its reprodu ctive cells. Segregation

2. For each pair of gen es, th e alleles m ay be iden tical (h om ozygou s WW or h om ozygou s ww ), or th ey m ay be differen t (h eterozygou s Ww ).

Parents:

WW

ww M eiosis

Reproductive cells (gametes):

w

W

WW paren t can con tribu te on ly W gam etes

ww paren t can con tribu te on ly w gam etes

Fertilization F1 progeny:

Both W an d w presen t, bu t seeds are rou n d

Ww F2 progeny:

Ww paren t produ ces equ al n u m bers of W an d w gam etes

Fem ale gam etes 1/ 2 w W

1/ 2

Figure 3.5 A diagram m atic

explan ation of th e 3 : 1 ratio of dom in an t : recessive m orph ological ph en otypes observed in th e F2 gen eration of a m on oh ybrid cross. Th e 3 : 1 ratio is observed becau se of dom in an ce. Note th at th e ratio of WW : Ww : ww gen otypes in th e F2 gen eration is 1 : 2 : 1, as can be seen from th e restriction fragm en t ph en otypes.

94

1/ 2

W 1/ 4

WW

1/ 4

Ww

1/ 4

Ww

1/ 4

ww

M ale gam etes 1/ 2

w

1/ 4 WW

: 1/2 Ww :

Chapter 3 Transmission Genetics: The Principle of Segregation

1/ 4

ww

1/ 3

of rou n d seeds are WW ; 2/ 3 of rou n d seeds are Ww

correspon ds to poin ts 3 an d 4 in th e foregoin g list. Th e prin ciple of segregation is som etim es called Mendel’s first law. Th e Prin cip le o f Se gre gatio n : In th e form ation of gam etes, th e paired h ereditary determ in an ts separate (segregate) in su ch a way th at each gam ete is equ ally likely to con tain eith er m em ber of th e pair.

An oth er key featu re of tran sm ission gen etics is th at th e h ereditary determ in an ts are presen t as pairs in both th e paren tal organ ism s an d th e progen y organ ism s bu t as sin gle copies in th e reprodu ctive cells. Th is featu re correspon ds to poin ts 1 an d 5 in th e foregoin g list. Figu re 3.5 illu strates th e biological m ech an ism u n derlyin g th e im portan t Men delian ratios in th e F2 gen eration of 3 : 1 for ph en otypes an d 1 : 2 : 1 for gen otypes. To u n derstan d th ese ratios, con sider first th e paren tal gen eration in wh ich th e origin al cross is WW ⫻ ww . Th e sex of th e paren ts is n ot stated becau se reciprocal crosses yield iden tical resu lts. (Th ere is, h owever, a con ven tion in gen etics th at u n less oth erwise specified, crosses are given with th e fem ale paren t listed first.) In th e origin al cross, th e WW paren t produ ces on ly W -con tain in g gam etes, wh ereas th e ww paren t produ ces on ly w -con tain in g gam etes. Segregation still takes place in th e h om ozygou s gen otypes as well as in th e h eterozygou s gen otype, even th ou gh all of th e gam etes carry th e sam e type of allele ( W from h om ozygou s WW an d w from h om ozygou s ww ). Wh en th e W bearin g an d w -bearin g gam etes com e togeth er in fertilization , th e h ybrid gen otype is h eterozygou s Ww , wh ich is sh own by th e ban ds in th e gel icon n ext to th e F1 progen y. With regard to seed sh ape, th e h ybrid Ww seeds are rou n d becau se W is dom in an t over w . Wh en th e h eterozygou s F1 progen y form gam etes, segregation im plies th at h alf th e gam etes will con tain th e W allele an d th e oth er h alf will con tain th e w allele. Th ese gam etes com e togeth er at ran dom wh en an F1 in dividu al is self-fertilized or wh en two F1 in dividu als are crossed. Th e resu lt of ran dom fertilization can be dedu ced from th e sort of cross-m u ltiplication squ are sh own at th e bottom of th e figu re, in wh ich th e fem ale gam etes an d th eir fre-

qu en cies are arrayed across th e top m argin an d th e m ale gam etes an d th eir frequ en cies alon g th e left-h an d m argin . Th is calcu latin g device is widely u sed in gen etics an d is called a Pu n n e tt squ are after its in ven tor Regin ald C. Pu n n ett (1875–1967). Th e Pu n n ett squ are in Figu re 3.5 sh ows th at ran dom com bin ation s of th e F1 gam etes resu lt in an F2 gen eration with th e gen otypic com position 1 冒4 WW , 1 冒2 Ww , an d 1 冒4 ww . Th is can be con firm ed by th e RFLP ban din g pattern s in th e gel icon s becau se of th e codom in an ce of W an d w with respect to th e m olecu lar ph en otype. Bu t becau se W is dom in an t over w with respect to th e m orph ological ph en otoype, th e WW an d Ww gen otypes h ave rou n d seeds an d th e ww gen otypes h ave wrin kled seeds, yieldin g th e ph en otypic ratio of rou n d : wrin kled seeds of 3 : 1. Hen ce it is a com bin ation of segregation , ran dom u n ion of gam etes, an d dom in an ce th at resu lts in th e 3 : 1 ratio. Th e ratio of F2 gen otypes is as im portan t as th e ratio of F2 ph en otypes. Th e Pu n n ett squ are in Figu re 3.5 also sh ows th at th e ratio of WW : Ww : ww gen otypes is 1 : 2 : 1, wh ich can be con firm ed directly by th e RFLP an alysis.

Verification of Segregation Th e rou n d seeds in Figu re 3.5 con ceal a gen otypic ratio of 1 WW : 2 Ww . To say th e sam e th in g in an oth er way, am on g th e F2 seeds th at are rou n d (or, m ore gen erally, am on g organ ism s th at sh ow th e dom in an t m orph ological ph en otype), 1 冒3 are h om ozygou s (in th is exam ple, WW ) an d 2 冒3 are h eterozygou s (in th is exam ple, Ww ). Th is con clu sion is obviou s from th e RFLP pattern s in Figu re 3.5, bu t it is n ot at all obviou s from th e m orph ological ph en otypes. Un less you kn ew som eth in g abou t gen etics already, it wou ld be a very bold h ypoth esis, becau se it im plies th at two organ ism s with th e sam e m orph ological ph en otype (in th is case rou n d seeds) m igh t n everth eless differ in m olecu lar ph en otype an d in gen otype. Yet th is is exactly wh at Men del proposed. Bu t h ow cou ld th is h ypoth esis be tested experim en tally? He realized th at it cou ld be tested via self-fertilization of th e F2 plan ts. With self-fertilization , plan ts grown

3.2 Segregation of a Single Gene

95

What Did Gregor M endel Think H e Discovered? Gregor M endel 1866

Monastery of St. Thom as, Brno [then Brünn], Czech Republic Experiments on Plant H ybrids (original in German) M endel’s paper is remarkable for its precision and clarity. It is worth reading in its entirety for this reason alone. Although the most important discovery attributed to M endel is segregation, he never uses this term. H is description of segregation is found in the first passage in italics in the excerpt. (All of the italics are reproduced from the original.) In his description of the process, he takes us carefully through the separation of A and a in gametes and their coming together again at random in fertilization. One flaw in the description is M endel’s occasional confusion between genotype and phenotype, which is illustrated by his writing A instead of AA and a instead of aa in the display toward the end of the passage. M ost early geneticists

made no consistent distinction between genotype and phenotype until 1909, when the terms themselves were coined.

experimentation also justifies the assumption that pea hybrids form germinal and pollen cells that in their composition correspond in equal numbers to all the Artificial fertilization undertaken on orconstant forms resulting from the comnamental plants to obtain bination of traits united Whether the plan by through new color variants initiated fertilization. which the individual The difference of forms the experiments reported here. The striking regularity among the progeny of experiments were with which the same hybrid hybrids, as well as the set up and carried forms always reappeared ratios in which they are out was adequate to observed, find an adewhenever fertilization bethe assigned task tween like species took place quate explanation in suggested further experithe principle [of segreshould be decided ments whose task it was to gation] just deduced. by a benevolent follow that development of The simplest case is judgment. hybrids in their progeny. . . . given by the series for This paper discusses the atone pair of differing tempt at such a detailed experiment. . . . traits. It is shown that this series is deWhether the plan by which the individual scribed by the expression: A ⫹ 2Aa ⫹ a, experiments were set up and carried out in which A and a signify the forms with was adequate to the assigned task constant differing traits, and Aa the form should be decided by a benevolent judghybrid for both. The series contains four ment. . . . [H ere the experimental reindividuals in three different terms. In sults are described in detail.] Thus their production, pollen and germinal

from th e h om ozygou s WW gen otypes sh ou ld be tru e-breedin g for rou n d seeds, wh ereas th ose from th e h eterozygou s Ww gen otypes sh ou ld yield rou n d an d wrin kled seeds in th e ratio of 3 : 1. On th e oth er h an d, th e plan ts grown from wrin kled seeds sh ou ld be tru e-breedin g for wrin kled becau se th ese plan ts are h om ozygou s ww . Th e resu lts Men del obtain ed are su m m arized in Figure 3.6 . As predicted from th e gen etic h ypoth esis, plan ts grown from F2 wrin kled seeds were tru e-breedin g for wrin kled seeds, yieldin g on ly wrin kled seeds in th e F3 gen eration . Bu t som e of th e plan ts grown from rou n d seeds sh owed eviden ce of segregation . Am on g 565 plan ts grown from rou n d F2 seeds, 372 plan ts produ ced both rou n d an d wrin kled seeds in a proportion very close to 3 : 1, wh ereas th e rem ain in g 193 plan ts produ ced on ly rou n d seeds in 96

th e F3 gen eration . Th e ratio 193 : 372 equ als 1 : 1.93, wh ich is very close to th e ratio 1 : 2 of WW : Ww gen otypes predicted from th e gen etic h ypoth esis. An im portan t featu re of th e h om ozygou s rou n d an d h om ozygou s wrin kled seeds produ ced in th e F2 an d F3 gen eration s is th at th e ph en otypes are exactly th e sam e as th ose observed in th e origin al paren ts in th e P1 gen eration . Th is m akes sen se in term s of DNA, becau se th e DNA of each allele rem ain s u n altered u n less a n ew m u tation h appen s to occu r. Men del described th is resu lt in a letter by sayin g th at in th e progen y of crosses, “th e two paren tal traits appear, separated an d u n ch an ged, an d th ere is n oth in g to in dicate th at on e of th em h as eith er in h erited or taken over an yth in g from th e oth er.” From th is fin din g, h e con clu ded th at th e h ereditary determ in an ts for

Chapter 3 Transmission Genetics: The Principle of Segregation

cells of form A and a participate, on the average, equally in fertilization; therefore each form manifests itself twice, since four individuals are produced. Participating in fertilization are thus: Pollen cells Germinal cells

A⫹A⫹a⫹a A⫹A⫹a⫹a

It is entirely a matter of chance which of the two kinds of pollen combines with each single germinal cell. H owever, according to the laws of probability, in an average of many cases it will always happen that every pollen form A and a will unite equally often with every germinal-cell form A and a; therefore, in fertilization, one of the two pollen cells A will meet a germinal cell A, the other a germinal cell a, and equally, one pollen cell a will become associated with a germinal cell A, and the other a.

Germinal cells

A

j

A

a

a

a

a

j

j

A

j

Pollen cells

A

The result of fertilization can be visualized by writing the designations for associated germinal and pollen cells in the form of fractions, pollen cells above the line, germinal cells below. In the case under discussion one obtains A A a a ᎏ ⫹ ᎏ ⫹ ᎏ ⫹ ᎏ A a A a

In the first and fourth terms germinal and pollen cells are alike; therefore the products of their association must be constant, namely A and a; in the second and third, however, a union of the two differing parental traits takes place again, therefore the forms arising from such fertilizations are absolutely identical with the hybrid from which they derive. Thus, repeated hybridization takes place. The striking phenomenon, that hybrids are able to produce, in addition to the two parental types, progeny that resemble themselves is thus explained: Aa and aA both give the same association, Aa, since, as mentioned earlier, it

makes no difference to the consequence of fertilization which of the two traits belongs to the pollen and which to the germinal cell. Therefore A A a a ᎏ ⫹ ᎏ ⫹ ᎏ ⫹ ᎏ ⫽ A ⫹ 2Aa ⫹ a A a A a

This represents the average course of self-fertilization of hybrids when two differing traits are associated in them. In individual flowers and individual plants, however, the ratio in which the members of the series are formed may be subject to not insignificant deviations. . . . Thus it was proven experimentally that, in Pisum, hybrids form different kinds of germinal and pollen cells and that this is the reason for the variability of their offspring. Source: Verhandlungen des naturforschenden den Vereines in Brünn 4: 3–47.

1/ 3 (193) gave plants = 1/ 4 of all w ith pods containing only F2 seeds round F3 seeds

Figure 3.6 Su m m ary of

F2 ph en otypes an d th e progen y produ ced by self-fertilization .

3/ 4 (5474) w ere round; 565 w ere planted, and... 2/ 3 (372) gave plants w ith pods containing both round and w rinkled F3 seeds in a 3 : 1 ratio of round : w rinkled

Of 7324 F2 seeds

1/ 4 (1850) w ere w rinkled

th e traits in th e paren tal lin es were tran sm itted as two differen t elem en ts th at retain th eir pu rity in th e h ybrids. In oth er words, th e h ereditary determ in an ts do n ot “m ix”

all gave plants producing only w rinkled F3 seeds

= 1/ 2 of all

F2 seeds

= 1/ 4 of all

F2 seeds

or “con tam in ate” each oth er. In m odern term in ology, th is m ean s th at, with rare bu t im portan t exception s, gen es are tran sm itted u n ch an ged from gen eration to gen eration . 3.2 Segregation of a Single Gene

97

gam etes produ ced by th e h eterozygou s paren t, becau se th e recessive paren t con tribu tes on ly recessive alleles. Heterozygous Ww parent

Segregation yields W an d w gam etes in a ratio of 1 : 1.

Hom ozygous recessive parent

1/ 2

W

1/ 2

Ww

1/ 2

w

all w 1/ 2

ww

Th e progen y of a testcross in clu des dom in an t an d recessive ph en otypes in a ratio of 1 : 1. Figure 3.7 In a testcross of an Ww h eterozygou s paren t with a ww h om ozygou s recessive, th e progen y are Ww an d ww in th e ratio of 1 : 1. A testcross sh ows th e resu lt of segregation .

3.3 Segregation of Two

The Testcross and the Backcross An oth er straigh tforward way of testin g th e gen etic h ypoth esis in Figu re 3.5 is by m ean s of a te stcro ss, a cross between an organ ism th at is h eterozygou s for on e or m ore gen es (for exam ple, Ww ) an d an organ ism th at is h om ozygou s for th e recessive alleles (for exam ple, ww ). Th e resu lt of su ch a testcross is sh own in Figure 3.7. Becau se th e h eterozygou s paren t is expected to produ ce W an d w gam etes in equ al n u m bers, wh ereas th e h om ozygou s recessive produ ces on ly w gam etes, th e expected progen y are 1 冒2 with th e gen otype Ww an d 1 冒2 with th e gen otype ww . Th e form er h ave th e dom in an t ph en otype becau se W is dom in an t over w , an d th e latter h ave th e recessive ph en otype. A testcross is often extrem ely u sefu l in gen etic an alysis: In a testcross, th e ph en otypes of th e progen y reveal th e relative frequ en cies of th e differen t

Table 3.2 M endel’s testcross results Testcross (F1 heterozygote

ⴛ homozygous recessive)

Ratio

⫻ wrinkled seeds

193 round, 192 wrinkled

1.01 : 1

Yellow

⫻ green seeds

196 yellow, 189 green

1.04 : 1

85 purple, 81 white

1.05 : 1

87 long, 79 short

1.10 : 1

Long

98

Progeny from testcross

Round

Purple ⫻ white flowers

⫻ short stems

Men del carried ou t a series of testcrosses with variou s traits. Th e resu lts are sh own in Table 3.2. In all cases, th e ratio of ph en otypes am on g th e testcross progen y is very close to th e 1 : 1 ratio expected from segregation of th e alleles in th e h eterozygou s paren t. An oth er valu able type of cross is a back cro ss, in wh ich h ybrid organ ism s are crossed with on e of th e paren tal gen otypes. Backcrosses are com m on ly u sed by gen eticists an d by plan t an d an im al breeders, as we will see in later ch apters. Note th at th e testcrosses in Table 3.2 are also backcrosses, becau se in each case, th e F1 h eterozygou s paren t cam e from a cross between th e h om ozygou s dom in an t an d th e h om ozygou s recessive.

or M ore Genes Th e resu lts of m an y gen etic crosses depen d on th e segregation of th e alleles of two of m ore gen es. Th e gen es m ay be in differen t ch rom osom es or in th e sam e ch rom osom e. Alth ou gh in th is section we con sider th e case of gen es th at are in two differen t ch rom osom es, th e sam e prin ciples apply to gen es th at are in th e sam e ch rom osom e bu t are so far distan t from each oth er th at th ey segregate in depen den tly. Th e case of linkage of gen es in th e sam e ch rom osom e is exam in ed in Ch apter 5. To illu strate th e prin ciples, we con sider again a cross between h om ozygou s gen otypes, bu t in th is case h om ozygou s for th e alleles of two gen es. A specific exam ple is a tru e-breedin g variety of garden peas with seeds th at are wrin kled an d green (gen otype ww gg) versu s a variety with seeds th at are rou n d an d yellow (gen otype WW GG). As su ggested by th e u se of u ppercase an d lowercase sym bols for th e alleles, th e dom in an t alleles are W an d G, th e recessive alleles w an d g. Crossin g th ese strain s yields F1 seeds with th e gen otype Ww Gg, wh ich are ph en otypically rou n d an d yellow becau se of th e dom in an ce relation s. Wh en th e F1 seeds are grown in to m atu re plan ts an d self-fertilized, th e F2 progen y sh ow th e resu lt of sim u ltan eou s segregation of th e W, w allele pair an d th e G, g allele pair. Wh en

Chapter 3 Transmission Genetics: The Principle of Segregation

Men del perfom ed th is cross, h e obtain ed th e followin g n u m bers of F2 seeds: Rou n d, yellow Rou n d, green Wrin kled, yellow Wrin kled, green Total

Seed color phenotypes

315 108 101 32 5苶5苶6苶

In th ese data, th e first th in g to be n oted is th e expected 3 : 1 ratio for each trait con sidered separately. Th e ratio of rou n d : wrin kled (poolin g across yellow an d green ) is

( 315 ⫹ 108 ) : ( 101 ⫹ 32 ) ⫽ 423 : 133 ⫽ 3.18 : 1 An d th e ratio of yellow : green (poolin g across rou n d an d wrin kled) is

3/ 4

1/ 4

Yellow

Green

Round

9/ 16 Round, yellow

3/ 16 Round, green

Wrinkled

3/ 16 Wrinkled, yellow

1/ 16 Wrinkled, green

3/ 4

Seed shape phenotypes 1/ 4

( 315 ⫹ 101 ) : ( 108 ⫹ 32 ) ⫽ 416 : 140 ⫽ 2.97 : 1 Both of th ese ratios are in satisfactory agreem en t with 3 : 1. (Testin g for goodn ess of fit to a predicted ratio is described in Ch apter 4.) Fu rth erm ore, in th e F2 progen y of th e dih ybrid cross, th e separate 3 : 1 ratios for th e two traits were com bin ed at ran dom . With ran dom com bin ation s, as sh own in Figure 3.8, am on g th e 3 冒4 of th e progen y th at are rou n d, 3 冒4 will be yellow an d 1 冒4 green ; sim ilarly, am on g th e 1 冒4 of th e progen y th at are wrin kled, 3 冒4 will be yellow an d 1 冒4 green . Th e overall proportion s of rou n d yellow to rou n d green to wrin kled yellow to wrin kled green are th erefore expected to be ⫽

Ratio of ph en otypes in th e F2 progen y of a dih ybrid cross is 9 : 3 : 3 : 1. Figure 3.8 Th e 3 : 1 ratio of rou n d : wrin kled, wh en com bin ed at ran dom with th e 3 : 1 ratio of yellow : green , yields th e 9 : 3 : 3 : 1 ratio observed in th e F2 progen y of th e dih ybrid cross.

3 冒4 ⫻ 3 冒4 : 3 冒4 ⫻ 1 冒4 : 1 冒4 ⫻ 3 冒4 : 1 冒4 ⫻ 1 冒4 9 冒16 : 3 冒16 : 3 冒16 : 1 冒16

Segregation of W an d w alleles

Th e observed ratio of 315 : 108 : 101 : 32 equ als 9.84 : 3.38 : 3.16 : 1, wh ich is a satisfactory fit to th e 9 : 3 : 3 : 1 ratio expected from th e Pu n n ett squ are in Figu re 3.8.

The Principle of Independent Assortment Th e in depen den t segregation of th e W, w an d G, g allele pairs is illu strated in Figure 3.9 . Wh at in depen den ce m ean s is th at if a gam ete con tain s W , it is equ ally likely to con tain G or g; an d if a gam ete con tain s w , it is equ ally likely to con tain G or g. Th e im plication is th at th e fou r gam etes are form ed in equ al frequ en cies: 1 冒4 W G 1 冒4 W g 1 冒4 w G 1 冒4 w g

1/ 2

In depen den t segregation of G an d g alleles 1/ 4

WG

1/ 4

Wg

1/ 4

wG

1/ 4

wg

W (and G or g)

Ww Gg 1/ 2

w (and G or g)

Resu lt : An equ al frequ en cy of all fou r possible types of gam etes Figure 3.9 In depen den t segregation of th e W, w an d G, g allele

pairs m ean s th at am on g each of th e W an d w gam etic classes, th e ratio of G : g is 1 : 1. Likewise, am on g each of th e G an d g gam etic classes, th e ratio of W : w is 1 : 1. 3.3 Segregation of Two or M ore Genes

99

Parents:

Round, yellow

Wrinkled, green

Phenotypes

WW GG

ww gg

Genotypes

Th e resu lt of in depen den t assortm en t wh en th e fou r types of gam etes com bin e at ran dom to form th e zygotes of th e n ext gen eration is sh own in Figure 3.10 . Note th at th e expected ratio of ph en otypes am on g th e F2 progen y is 9:3:3:1

Gametes:

WG

wg

F1 progeny:

Au : Is ph rase “in depen den t segregation ” correct in Fig. 3.10 caption , or sh ou ld it be ch an ged to “in depen den t assortm en t”?

Round, yellow

Double heterozygote

However, as th e Pu n n ett squ are also sh ows, th e ratio of gen otypes in th e F2 gen eration is m ore com plex; it is 1:2:1:2:4:2:1:2:1 Th e reason for th is ratio is sh own in Am on g seeds th at h ave th e WW gen otype, th e ratio of

Figure 3.11.

GG : Gg : gg equ als 1 : 2 : 1

Am on g seeds th at h ave th e Ww gen otype, th e ratio is

Ww Gg

2:4:2 Fem ale gam etes 1/ 4

1/ 4

WG

wG

1/ 4

wg

WW Gg

GG : Gg : gg equ als Ww GG

WW Gg

WW gg

Ww Gg

Ww gg

Th e ph en otypes of th e seeds are sh own ben eath th e gen otypes, an d th e com bin ed ph en otypic ratio is 9:3:3:1

wG Ww GG

Ww Gg

ww GG

ww Gg

Figu re 3.11 also sh ows th at am on g seeds th at are GG, th e ratio of WW : Ww : ww

1/ 4

1:2:1

Ww Gg

Wg

M ale gam etes 1/ 4

1/ 4

Wg

WG WW GG

1/ 4

1/ 4

(wh ich is a 1 : 2 : 1 ratio m u ltiplied by 2 becau se th ere are twice as m an y Ww gen otypes as eith er WW or ww ) . An d am on g seeds th at h ave th e ww gen otype, th e ratio of

equ als 1 : 2 : 1

wg Ww Gg

Ww gg

ww Gg

ww gg

am on g seeds th at are Gg, it is 2:4:2

F2 progeny:

an d am on g seeds th at are gg, it is Genotypes

Phenotypes

+ 1/ 16 WW GG 2/ 16 WW Gg + + 2/ 16 Ww GG 4/ 16 Ww Gg = 9/ 16 round, yellow 1/ 16 ww GG + 2/ 16 ww Gg

= 3/ 16 w rinkled, yellow

1/ 16 WW gg + 2/ 16 Ww gg

= 3/ 16 round, green

1/ 16 ww gg

= 1/ 16 w rinkled, green

Figure 3.10 In depen den t segregation is th e biological basis for th e 9 : 3 : 3 : 1

ratio of F2 ph en otypes resu ltin g from a dih ybrid cross.

10 0

1:2:1 Th erefore, th e in depen den t segregation m ean s th at am on g each of th e possible gen otypes form ed by on e pair of alleles, th e ratio of h om ozygou s dom in an t to h eterozygou s to h om ozygou s recessive is 1 : 2 : 1 for th e oth er in depen den t pair of alleles. Th e prin ciple of in depen den t segregation of two pairs of alleles in differen t ch rom osom es (or located su fficien tly far apart in th e sam e ch rom osom e) h as com e to be kn own as th e prin ciple of in depen den t as-

Chapter 3 Transmission Genetics: The Principle of Segregation

Segregation of Gg w ithin WW

Segregation of Gg w ithin Ww

WW GG WW Gg WW gg 1

:

2

:

Segregation of Gg w ithin w w

Ww GG Ww Gg Ww gg

:

1

2

All genotypes com bined

:

4

:

2

ww GG ww Gg ww gg :

9

Round, yellow

3

Round, green

3

Wrinkled, yellow

1

Wrinkled, green

1

:

2

:

1

Figure 3.11 Gen otypes an d ph en otypes of th e F2 progen y of th e dih ybrid cross for seed sh ape an d

seed color.

sortm en t. It is also som etim es referred to as Mendel’s second law. Th e Prin cip le o f In d e p e n d e n t A sso rtm e n t: Segregation of th e m em bers of an y pair of alleles is in depen den t of th e segregation of oth er pairs in th e form ation of reprodu ctive cells.

Alth ou gh th e prin ciple of in depen den t assortm en t is fu n dam en tal in Men delian gen etics, th e ph en om en on of lin kage, cau sed by proxim ity of gen es in th e sam e ch rom osom e, is an im portan t exception .

The Testcross with Unlinked Genes

As with th e on e-gen e testcross, in a twogen e testcross th e ratio of progen y ph en otypes is a direct dem on stration of th e ratio of gam etes produ ced by th e dou bly h eterozygou s paren t. In th e actu al cross, Men del obtain ed 55 rou n d yellow, 51 rou n d green , 49 wrin kled yellow, an d 53 wrin kled green , wh ich is in good agreem en t with th e predicted 1 : 1 : 1 : 1 ratio.

Ww Gg

Parents:

Gen es th at sh ow in depen den t assortm en t are said to be u n lin k e d . Th e h ypoth esis of in depen den t assortm en t can be tested directly in a testcross with th e dou ble h om ozygou s recessive:

1/ 4

Ww Gg ⫻ ww gg

Th e resu lt of th e testcross is sh own in Figure 3.12. Becau se plan ts with dou bly h eterozygou s gen otypes produ ce fou r types of gam etes—W G, W g, w G, and w g—in equ al frequ en cies, wh ereas th e plan ts with ww gg gen otypes produ ce on ly w g gam etes, th e possible progen y gen otypes are Ww Gg, Ww gg, ww Gg, an d ww gg, an d th ese are expected in equ al frequ en cies. Becau se of th e dom in an ce relation s—W over w an d G over g—th e progen y ph en otypes are expected to be rou n d yellow, rou n d green , wrin kled yellow, an d wrin kled green in a ratio of

ww gg

Gametes Gametes

Au : Proof asks if it is OK th at term in Key Term s list is “in depen den t assortm en t.” Or sh ou ld term in list be ch an ged to “prin ciple of in depen den t assortm en t”? Or sh ou ld term be m ade bold elsewh ere in text?

wg

All gam etes from h om ozygou s recessive paren t are wg.

= 1/ 4 round, yellow

WG

Ww Gg

Gam etes from h eterozygou s paren t sh ow in depen den t assortm en t.

1/ 4

= 1/ 4 round, green

Wg

Ww gg 1/ 4

= 1/ 4 w rinkled, yellow

wG

ww Gg 1/ 4

= 1/ 4 w rinkled, green

wg

ww gg

Figure 3.12 Gen otypes an d ph en otypes resu ltin g from a testcross of th e

1:1:1 :1

Ww Gg dou ble h eterozygote.

3.3 Segregation of Two or M ore Genes

10 1

The Big Experiment By an alogy with th e in depen den t segregation of two gen es illu strated in Figu re 3.8, on e m igh t expect th at in depen den t segregation of th ree gen es wou ld produ ce F2 progen y in wh ich com bin ation s of th e ph en otypes are given by su ccessive term s in th e m u ltiplication of [(3 冒4) ⫹ (1 冒4) ] 3 , wh ich , wh en m u ltiplied ou t, yields 27 : 9 : 9 : 9 : 3 : 3 : 3 : 1. A specific exam ple is sh own in Figure 3.13. Th e allele pairs are W, w for rou n d versu s wrin kled seeds; G, g for yellow versu s green seeds; an d P, p for pu rple versu s wh ite flowers. (Th e alleles in dicated with th e u ppercase letters are dom in an t.) With in depen den t segregation , am on g th e F2 progen y th e m ost frequ en t ph en otype (27 冒64) h as th e dom in an t form of all th ree traits, th e n ext m ost frequ en t (9 冒64) h as th e dom in an t form of two of th e traits, th e n ext m ost frequ en t (3 冒64) h as th e dom in an t form of on ly on e trait, an d th e least frequ en t (1 冒64) is th e triple recessive. Observe th at if you con sider an y on e of th e traits an d ign ore th e oth er two, th en th e ratio of ph en otypes is 3 : 1; an d if you con sider an y two of th e traits, th en th e ratio of ph en otypes is 9 : 3 : 3 : 1. Th is m ean s th at all of th e possible on e- an d two-gen e in depen den t segregation s are presen t in th e overall th ree-gen e in depen den t segregation .

(3/ 4 W  + 1/ 4 ww )

Figure 3.13 With in depen den t assortm en t,

th e expected ratio of ph en otypes in a trih ybrid cross is obtain ed by m u ltiplyin g th e th ree in depen den t 3 : 1 ratios of th e dom in an t an d recessive ph en otypes. A dash u sed in a gen otype sym bol in dicates th at eith er th e dom in an t or th e recessive allele is presen t; for exam ple, W  refers collectively to th e gen otypes WW an d Ww. (Th e expected n u m bers total 640 rath er th an 639 becau se of rou n d-off error.)

10 2

X

Men del tested th is h ypoth esis, too, bu t by th is tim e h e was com plain in g of th e am ou n t of work th e experim en t en tailed, n otin g th at “of all th e experim en ts, it requ ired th e m ost tim e an d effort.” Th e resu lt of th e experim en t is sh own in Figure 3.14 . From top to bottom , th e th ree Pu n n ett squ ares sh ow th e segregation of W an d w from G an d g in th e gen otypes PP, Pp, an d pp. In each box, th e n u m ber in red is th e expected n u m ber of plan ts of each gen otype, assu m in g in depen den t assortm en t, an d th e n u m ber in black is th e observed n u m ber of each gen otype of plan t. Th e excellen t agreem en t con firm ed wh at Men del regarded as th e m ain con clu sion of all of h is experim en ts: “Pea h ybrids form germ in al an d pollen cells th at in th eir com position correspon d in equ al n u m bers to all th e con stan t form s resu ltin g from th e com bin ation of traits u n ited th rou gh fertilization .”

In th is adm ittedly som ewh at tu rgid sen ten ce, Men del in corporated both segregation an d in depen den t assortm en t. In m odern term s, wh at h e m ean s is th at with in depen den t segregation , th e gam etes produ ced by an y h ybrid plan t con sist of equ al n u m bers of all possible com bin ation s of th e alleles th at are h eterozygou s. For exam ple,

(3/ 4 G + 1/ 4 gg)

X

(3/ 4 P + 1/ 4 pp) Observed num ber

Expected num ber

27/64

W  G P Round, yellow , purple

269

270

9/64

W  G pp

Round, yellow , w hite

98

90

9/64

W  gg

P Round, green, purple

86

90

9/64

ww G P Wrinkled, yellow , purple

88

90

3/64

W  gg

Round, green, w hite

27

30

3/64

ww G pp

Wrinkled, yellow , w hite

34

30

3/64

ww gg

P Wrinkled, green, purple

30

30

1/64

ww gg

pp

7

10

pp

Wrinkled, green, w hite

For an y on e gen e, th e ratio of ph en otypes is 48 : 16 = 3 : 1

Chapter 3 Transmission Genetics: The Principle of Segregation

For an y pair of gen es,th e ratio of ph en otypes is 36 : 12 : 12 : 4 = 9 : 3 : 3 : 1

Expected num ber

gg

Gg

GG

WW

Gg

ww PP

Ww

8

10

Ww

Observed num ber

WW

GG

20

gg

40 10

Pp

20

15

ww

14 10

49

8

9 20

20

20

pp

19

PP

10

10

PP PP

WW

GG

40

80 20

ww

38

40

45

gg

Ww

22

20

Gg

20

78

25

17 40

40

40

36

Pp

20

20

Pp Pp

WW

GG 10

14

Ww

20

18

Gg 18

20

gg 10

11 20

40

10

48

16 10

ww

20

10

24

pp 7

pp pp

Figure 3.14 Progen y observed in th e F2 gen era-

tion of a trih ybrid cross with th e allelic pairs W, w an d G, g an d P, p. In each box, th e red en try is th e expected n u m ber an d th e black en try is th e

observed. Note th at each gen e, by itself, yields a 1 : 2 : 1 ratio of gen otypes an d th at each pair of gen es yields a 1 : 2 : 1 : 2 : 4 : 2 : 1 : 2 : 1 ratio of gen otypes.

th e cross WW gg ⫻ ww GG produ ces F1 progen y of gen otype Ww Gg, wh ich yields th e gam etes W G, W g, w G, an d w g in equ al n u m bers. Segregation is illu strated by th e 1

: 1 ratio of W : w an d G : g gam etes, an d in depen den t assortm en t is illu strated by th e equ al n u m bers of W G,W g, w G, an d w g gam etes.

3.3 Segregation of Two or M ore Genes

10 3

3.4 H uman Pedigree Analysis Large deviation s from expected gen etic ratios are often fou n d in in dividu al h u m an fam ilies an d in dom esticated large an im als becau se of th e relatively sm all n u m ber of progen y. Th e effects of segregation are n everth eless eviden t u pon exam in ation of th e ph en otypes am on g several gen eration s of related in dividu als. A diagram of a fam ily tree sh owin g th e ph en otype of each in dividu al am on g a grou p of relatives is a p e d igre e . In th is section we in trodu ce basic con cepts in pedigree an alysis.

Characteristics of Dominant and Recessive Inheritance Figure 3.15 defin es th e stan dard sym bols u sed in depictin g h u m an pedigrees. Fem ales are represen ted by circles an d m ales by squ ares. (A diam on d is u sed if th e sex is u n kn own —as, for exam ple, in a m iscarriage.) Person s with th e ph en otype of in terest are in dicated by colored or sh aded sym bols. For recessive alleles, h eterozygou s carriers are depicted with h alf-filled sym bols. A m atin g between a fem ale an d a m ale is in dicated by join in g th eir sym bols with a h orizon tal lin e, wh ich is con n ected vertically to a secon d h orizon tal lin e, below, th at con n ects th e

sym bols for th eir offsprin g. Th e offsprin g with in a sibsh ip, called siblin gs or sibs, are represen ted from left to righ t in order of th eir birth . A pedigree for th e trait Huntington disease, cau sed by a dom in an t m u tation , is sh own in Figure 3.16 . Th e n u m bers in th e pedigree are for con ven ien ce in referrin g to particu lar person s. Th e su ccessive gen eration s are design ated by Rom an n u m erals. With in an y gen eration , all of th e person s are n u m bered con secu tively from left to righ t. Th e pedigree starts with th e wom an I-1 an d th e m an I-2. Th e m an h as Hu n tin gton disease, wh ich is a progressive n erve degen eration th at u su ally begin s abou t m iddle age. It resu lts in severe ph ysical an d m en tal disability an d th en death . Th e dom in an t allele, HD, th at cau ses Hu n tin gton disease is rare. All affected person s in th e pedigree h ave th e h eterozygou s gen otype HD hd, wh ereas n on affected person s h ave th e h om ozygou s n orm al gen otype hd hd. Th e disease h as com plete pen etran ce. Th e p e n e tran ce of a gen etic disorder is th e proportion of in dividu als with th e at-risk gen otype wh o actu ally express th e trait; complete penetrance m ean s th e trait is expressed in 100 percen t of person s with th at gen otype. Th e pedigree dem on strates th e followin g ch aracteristic featu res of in h eri-

Norm al fem ale

M ating

Norm al m ale

M ating betw een relatives

Sex unknow n, norm al Fem ale w ith phenotype of interest M ale w ith phenotype of interest Sex unknow n, w ith phenotype of interest Fem ale heterozygous for recessive allele

Rom an n u m erals represen t gen eration

I Parents and offspring (offspring depicted in order of birth)

II First born

Siblings

Last born

M ale heterozygous for recessive allele Stillbirth or spontaneous abortion or

+

Deceased

Tw o-egg (dizygotic) tw ins

One-egg (m onozygotic) tw ins

Figure 3.15 Con ven tion al sym bols u sed in depictin g h u m an pedigrees. 10 4

Chapter 3 Transmission Genetics: The Principle of Segregation

Figure 3.16 Pedigree of a h u m an fam ily sh owin g th e in h eritan ce of th e dom in an t gen e for Hu n tin gton disease. Fem ales an d m ales are represen ted by circles an d squ ares, respectively. Red sym bols in dicate person s affected with th e disease.

I 1

2

II 1

2

3

4

5

6

7

III 1

2

3

4

5

6

7

8

9

10 11 12

13

14

15

Affected person s h ave gen otype HD hd becau se th e HD allele is very rare

Non affected person s h ave gen otype hd hd becau se hd is recessive

tan ce du e to a rare dom in an t allele with com plete pen etran ce. 1. Fem ales an d m ales are equ ally likely to be affected. 2. Affected offsprin g h ave on e affected paren t (except for rare n ew m u tation s), an d th e affected paren t is equ ally likely to be th e m oth er or th e fath er. 3. On average, h alf of th e in dividu als in sibsh ips with an affected paren t are affected.

A pedigree for a trait du e to a h om ozygou s recessive allele is sh own in Figure 3.17. Th e trait is albinism, absen ce of pigm en t in th e skin , h air, an d iris of th e eyes. Th is pedigree illu strates ch aracteristics of in h eritan ce du e to a rare recessive allele with com plete pen etran ce: 1. Fem ales an d m ales are equ ally likely to be affected. 2. Affected in dividu als, if th ey reprodu ce, u su ally h ave u n affected progen y. 3. Most affected in dividu als h ave u n affected paren ts. 4. Th e paren ts of affected in dividu als are often relatives. 5. Am on g siblin gs of affected in dividu als, th e proportion affected is approxim ately 25 percen t.

With rare recessive in h eritan ce, th e m ates of h om ozygou s affected person s are u su ally h om ozygou s for th e n orm al allele, so all of th e offsprin g will be h eterozygou s an d n ot affected. Heterozygou s carrie rs of th e m u tan t allele are con siderably m ore com m on th an h om ozygou s affected in dividu als, becau se it is m ore likely th at a person will in h erit on ly on e copy of a rare m u tan t allele th an two copies. Most h om ozygou s recessive gen otypes th erefore resu lt from

09131_01_1686_P

Captioncaptioncaption caption caption capton caption caption caption caption caption caption caption caption caption caption caption caption caption caption

On e of th ese person s is h eterozygou s I 1

2

Heterozygou s

II 1

2

3

4

5

III 2

1

3

4

5

6

7

8

IV 1

2

3

4

5

Matin g between first cou sin s

Hom ozygou s recessive

Figure 3.17 Pedigree of albin ism . With recessive in h eritan ce, affected

person s (filled sym bols) often h ave u n affected paren ts. Th e dou ble h orizon tal lin e in dicates a m atin g between relatives—in th is case, first cou sin s. 3.4 H uman Pedigree Analysis

10 5

m atin gs between carriers (h eterozygou s ⫻ h eterozygou s), in wh ich each offsprin g h as a 1 冒4 ch an ce of bein g affected. An oth er im portan t featu re of rare recessive in h eritan ce is th at th e paren ts of affected in dividu als are often related (co n san gu in e o u s). A m atin g between relatives is in dicated with a dou ble lin e con n ectin g th e partn ers, as for th e firstcou sin m atin g in Figu re 3.17. Matin g between relatives is im portan t for recessive alleles to becom e h om ozygou s, becau se wh en a recessive allele is rare, it is m ore likely to becom e h om ozygou s th rou gh in h eritan ce from a com m on an cestor th an from paren ts wh o are com pletely u n related. Th e reason is th at th e carrier of a rare allele m ay h ave m an y descen dan ts wh o are also carriers. If two of th ese carriers sh ou ld m ate with each oth er (for exam ple, in a first-cou sin m atin g), th en th e h idden recessive allele can becom e h om ozygou s with a probability of 1 冒4. Matin g between relatives con stitu tes inbreeding, an d th e con sequ en ces of in breedin g are discu ssed fu rth er in Ch apter 17. Becau se an affected in dividu al in dicates th at th e paren ts are h eterozygou s carriers, th e expected proportion of affected in dividu als am on g th e siblin gs is approxim ately 25 percen t, bu t th e exact valu e depen ds on th e details of h ow affected in dividu als are iden tified an d in clu ded in th e database.

M olecular M arkers in H uman Pedigrees Before th e adven t of m olecu lar m eth ods, th ere were m an y practical obstacles to th e stu dy of h u m an gen etics.

• Most gen es th at cau se gen etic diseases are rare, so th ey are observed in on ly a sm all n u m ber of fam ilies. • Man y gen es of in terest in h u m an gen etics are recessive, so th ey are n ot detected in h eterozygou s gen otypes. • Th e n u m ber of offsprin g per h u m an fam ily is relatively sm all, so segregation can n ot u su ally be detected in sin gle sibsh ips. • Th e h u m an gen eticist can n ot perform testcrosses or backcrosses, becau se h u m an m atin gs are n ot m an ipu lated by an experim en ter.

Becau se tech n iqu es for m an ipu latin g DNA allow direct access to th e DNA, m odern gen etic stu dies of h u m an pedigrees are carried ou t prim arily u sin g gen etic m arkers presen t in th e DNA itself, rath er th an th rou gh th e ph en otypes produ ced by m u tan t gen es. Th e h u m an gen om e in clu des on e sin gle-n u cleotide polym orph ism per 500 –3000 bp, depen din g on th e region bein g stu died. Th is m ean s th at two ran dom ly ch osen h u m an gen om es differ at 1–6 m illion n u cleotide position s. Variou s types of DNA polym orph ism s were discu ssed in Ch apter 2, alon g with th e m eth ods by wh ich th ey are detected an d stu died. An exam ple of a DNA polym orph ism segregatin g in a th ree-gen eration h u m an pedigree is sh own in Figure 3.18. Th e type of polym orph ism is a simple tandem repeat polymorphism (STRP), in wh ich each allele differs in size accordin g to th e n u m ber of copies it con tain s of a sh ort DNA sequ en ce repeated in tan dem . Th e differen ces in size are detected by electroph oresis after am plification of th e region by PCR. STRP m arkers u su ally h ave as m an y as 20 codom in an t alleles, an d th e m ajority of in -

I A4A5

A4A6

A1A2

A3A4

II III

Figure 3.18 Hu m an

pedigree sh owin g segregation of VNTR alleles. Six alleles ( A 1–A 6) are presen t in th e pedigree, bu t an y on e person can h ave on ly on e allele (if h om ozygou s) or two alleles (if h eterozygou s).

10 6

A1A3 A1A4 A1A6 A3A4 A1A6 A3A6 A3A4 A1A4

DNA ban ds are observed in gel.

Position of DNA fragm ent in gel

A4A6

1 2 3 4 5 6

Chapter 3 Transmission Genetics: The Principle of Segregation

1 2 3 4 5 6

dividu als are h eterozygou s for two differen t alleles. In th e exam ple in Figu re 3.18, each of th e paren ts is h eterozygou s, as are all of th e ch ildren . More th an 5000 gen etic m arkers of th is type h ave been iden tified in th e h u m an gen om e, each h eterozygou s in an average of 70 percen t of in dividu al gen otypes. Six alleles are depicted in Figu re 3.18, den oted by A 1 th rou gh A 6. In th e gel, th e n u m bers of th e ban ds correspon d to th e su bscripts of th e alleles. Th e m atin g in gen eration II is between two h eterozygou s gen otypes: A 4A 6 ⫻ A 1A 3. Becau se of segregation in each paren t, fou r gen otypes are possible am on g th e offsprin g ( A 4A 1, A 4A 3, A 6A 1, an d A 6A 3); th ese wou ld con ven tion ally be written with th e sm aller su bscript first, as A 1A 4, A 3A 4, A 1A 6, an d A 3A 6. With ran dom fertilization th e offsprin g gen otypes are equ ally likely, as m ay be verified from a Pu n n ett squ are for th e m atin g. Figu re 3.18 illu strates som e of th e prin cipal advan tages of m u ltiple, codom in an t alleles for h u m an pedigree an alysis: (1) Heterozygou s gen otypes can be distin gu ish ed from h om ozygou s gen otypes. (2) Man y in dividu als in th e popu lation are h eterozygou s, an d so m an y m atin gs are in form ative in regard to segregation . (3) Each segregatin g gen etic m arker yields u p to fou r distin gu ish able offsprin g gen otypes.

3.5 Pedigrees and Probability Gen etic an alysis ben efits from large n u m bers of progen y becau se th ese n u m bers determ in e th e degree to wh ich observed ratios of gen otypes an d ph en otypes fit th eir expected valu es. Greater statistical variation occu rs in sm aller n u m bers—an d th erefore poten tially greater deviation s from th e expected valu es. For exam ple, am on g th e seeds of in dividu al pea plan ts in wh ich a 3 : 1 ratio was expected, Men del observed ratios ran gin g from 1.85 : 1 to 4.85 : 1. Th e large variation resu lts from th e relatively sm all n u m ber of seeds per plan t, wh ich averaged abou t 34 in th ese experim en ts. Taken togeth er, th e seeds yielded a ph en otypic ratio of 3.08 : 1, wh ich n ot on ly fits 3 : 1 bu t is actu ally a better fit th an th at observed in an y of th e in dividu al plan ts. Becau se of statistical variation in sm all

n u m bers, a workin g kn owledge of probability is basic to u n derstan din g gen etic tran sm ission . In th e first place, each even t of fertilization represen ts a ch an ce com bin ation of alleles presen t in th e paren tal gam etes. In th e secon d place, th e proportion s of th e differen t types of offsprin g obtain ed from a cross are th e cu m u lative resu lt of n u m erou s in depen den t even ts of fertilization . In th e an alysis of gen etic crosses, th e probability of a particu lar ou tcom e of a fertilization even t m ay be con sidered as equ ivalen t to th e proportion of tim es th at th is ou tcom e is expected to be realized in n u m erou s repeated trials. Th e reverse is also tru e: Th e proportion of tim es th at an ou tcom e is expected to be realized in n u m erou s repeated trials is equ ivalen t to th e probability th at it is realized in a sin gle trial. To take a specific exam ple u sin g th e pedigree in Figu re 3.18, th e m atin g at th e u pper left is A 4A 5 ⫻ A 4A 6. Am on g a large n u m ber of offsprin g from su ch a m atin g, th e expected proportion of A 4A 6 gen otypes is 1 冒4. Equ ivalen tly, we cou ld say th at an y offsprin g wh ose gen otype is u n kn own h as a probability of gen otype A 4A 6 equ al to 1 冒4. However, th e gen otype of th e fem ale offsprin g sh own is already given as A 4A 6, so relative to th is in dividu al, th e situ ation h as becom e a certain ty: Th e probability th at h er gen otype is A 4A 6 equ als 1, an d th e probability th at h er gen otype is an yth in g else equ als 0. To evalu ate th e probability of a gen etic even t u su ally requ ires an u n derstan din g of th e m ech an ism of in h eritan ce an d kn owledge of th e particu lar cross. For exam ple, in assertin g th at th e m atin g I-1 ⫻ I-2 in Figu re 3.18 h as a probability of yieldin g an A 4A 6 offsprin g equ al to 1 冒4 , we n eeded to take th e paren tal gen otypes in to accou n t. If th e m atin g were A 4A 6 ⫻ A 4A 6, th en th e probability of an A 4A 6 offsprin g wou ld be 1 冒2 rath er th an 1 冒4; if th e m atin g were A 4A 4 ⫻ A 6A 6, th en th e probability of an A 4A 6 offsprin g wou ld be 1. In m an y gen etic crosses, th e possible ou tcom es of fertilization are equ ally likely. Su ppose th at th ere are n possible ou tcom es, each as likely as an y oth er, an d th at in m of th ese, a particu lar ou tcom e of in terest is realized; th en th e probability of th e ou tcom e of in terest is m 冒n. In th e lan gu age of probability, a possible ou tcom e of in terest is typically called an event. As an exam ple, 3.5 Pedigrees and Probability

10 7

Troubadour

Au : In abstract, th e word in th e ph rase “n ow called h u n tin gtin ” ch an ged to “h u n tin gton ”. OK?

The H untington’s Disease Collaborative Research Group 1993

Com prising 58 authors am ong 9 institutions A Novel Gene Containing a Trinucleotide Repeat That Is Expanded and Unstable on Huntington’s Disease Chromosomes M odern genetic research is sometimes carried out by large collaborative groups in a number of research institutions scattered across several countries. This approach is exemplified by the search for the gene responsible for H untington disease. The search was highly publicized because of the severity of the disease, the late age of onset, and the dominant inheritance. Famed folk singer Woody Guthrie, who wrote “ This Land Is Your Land” and other

Captioncaptioncaptio n caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption caption

well-known tunes, died of the disease in 1967. When the gene was identified, it turned out to encode a protein (now called huntington) of unknown function that is expressed in many cell types throughout the body and not, as expected, exclusively in nervous tissue. Within the coding sequence of this gene is a trinucleotide repeat (5'-CAG-3') that is repeated in tandem a number of times according to the general formula (5'-CAG-3') n. Among normal alleles, the number n of repeats ranges from 11 to 34 with an average of 18; among mutant alleles, the number of repeats ranges from 40 to 86. This tandem repeat is genetically unstable in that it can, by some unknown mechanism, increase in copy number (“ expand” ). In two cases in which a new mutant allele was analyzed, one had increased in repeat number from 36 to 44 and the other from

09131_01_1687P

con sider again th e m atin g A 4A 5 ⫻ A 4A 6. If we were con cern ed with th e exact specification of each gen otype, th en we wou ld distin gu ish am on g all of th e fou r possible gen otypes of progen y ( A 4A 4, A 4A 6, A 4A 5, an d A 5A 6) an d assert th at each is equ ally likely becau se it h as a probability of 1 冒4. On th e oth er h an d, if we were con cern ed on ly with h om ozygosity or h eterozygosity, th en th ere wou ld be on ly two possible ou tcom es, 10 8

33 to 49. This is a mutational mechanism that is quite common in some human genetic diseases. The excerpt cites several other examples. The authors also emphasize that their discovery raises important ethical issues, including genetic testing, confidentiality, and informed consent.

H untington disease (H D) is a progressive neurodegenerative disorder characterized by motor disturbance, cognitive loss, and psychiatric manifestations. It is inherited in an autosomal dominant fashion and affects approximately 1 in 10,000 individuals in most populations of European origin. The hallmark of H D is a distinctive choreic [jerky] movement disorder that typically has a subtle, insidious onset in the fourth to fifth decade of life and gradually worsens over a course of 10 to 20 years until death. . . . The

09131_01_1688P

h om ozygou s ( A 4A 4) an d h eterozygou s ( A 4A 6, A 4A 5, an d A 5A 6), wh ich h ave probabilities 1 冒4 an d 3 冒4, respectively. Or again , if we were in terested on ly in wh eth er or n ot an offsprin g carries th e A 5 allele, th en th e probabilities wou ld be 1 冒2 for “yes” an d 1 冒2 for “n o.” Th ese exam ples illu strate th e poin t th at offsprin g of a gen etic cross can be classified or grou ped togeth er in differen t ways, depen din g on th e particu lar qu estion

Chapter 3 Transmission Genetics: The Principle of Segregation

genetic defect causing H D was assigned somes, the length of the trinucleotide resider it of the utmost importance that to chromosome 4 in one of the first sucpeat is substantially increased. . . . Elonthe current internationally accepted cessful linkage analyses using guidelines and counseling proDN A markers in humans. Since We consider it of the utmost importance that the tocols for testing people at risk that time, we have pursued an continue to be observed, and current internationally accepted guidelines and approach to isolating and charthat samples from unaffected counseling protocols for testing people at risk acterizing the H D gene based relatives should not be tested continue to be observed, and that samples from on progressively refining its loinadvertently or without full calization. . . . [We have found consent. . . . With the mystery unaffected relatives should not be tested that a] 500-kb segment is the of the genetic basis of H D apinadvertently or without full consent. most likely site of the genetic parently solved, [it opens] the defect. [The abbreviation kb stands for gation of a trinucleotide repeat senext challenges in the effort to underkilobase pairs; 1 kb equals 1000 base quence has been implicated previously stand and to treat this devastating disorpairs.] Within this region, we have identias the cause of three quite different huder. fied a large gene, spanning approximan disorders, the fragile-X syndrome, mately 210 kb, that encodes a previously myotonic dystrophy, and spino-bulbar Source: Cell 72: 971–983 undescribed protein. The reading frame muscular atrophy. . . . It can be expected contains a polymorphic (CAG) n trinuthat the capacity to monitor directly the cleotide repeat with at least 17 alleles in size of the trinucleotide repeat in indithe normal population, varying from 11 viduals “ at risk” for H D will revolutionto 34 CAG copies. On H D chromoize testing for the disorder. . . . We con-

of in terest, an d differen t types of grou pin gs m ay h ave differen t probabilities associated with th em .

M utually Exclusive Possibilities Som etim es an ou tcom e of in terest in clu des two or m ore differen t possibilities. Th e offsprin g of th e m atin g A 4A 5 ⫻ A 4A 6 again provides a con ven ien t exam ple. As we h ave seen , if th e offsprin g are classified as eith er h om ozygou s or h eterozygou s, th en th e ou tcom e “h om ozygou s” con sists of ju st on e possibility ( A 4A 4) an d th e ou tcom e “h eterozygou s” con sists of th ree possibilities ( A 4A 6, A 4A 5, an d A 5A 6). Fu rth erm ore, if an in dividu al gen otype is an y on e of th e th ree h eterozygou s gen otypes, it can n ot at th e sam e tim e be an y of th e oth er h eterozygou s gen otypes. In oth er words, on ly on e gen otype can be realized in an y on e organ ism , so th e realization of on e possibility preclu des th e realization of an oth er in th e sam e organ ism . Even ts th at exclu de each oth er in th is m an n er are said to be mutually exclusive. Wh en even ts are m u tu ally exclu -

sive, th eir probabilities are com bin ed accordin g to th e addition ru le. A d d itio n Ru le : Th e probability of th e realization of on e or th e oth er of two m u tu ally exclu sive even ts, A or B, is th e su m of th eir separate probabilities.

In sym bols, if we u se Prob to m ean probability, th en th e addition ru le for two possible ou tcom es is written Prob {A or B} ⫽ Prob {A} ⫹ Prob {B}

Application of th e addition ru le is straigh tforward in th e above exam ples of h om ozygou s versu s h eterozygou s gen otypes, bu t th ere are th ree possible ou tcom es in stead of two. Becau se th e offsprin g gen otypes are equ ally likely, th e probability of a h eterozygou s offsprin g equ als Prob {A 4A 6, A 4A 5, or A 5 A 6 } ⫽ Prob {A 4 A 6 } ⫹ Prob {A 4 A 5 } ⫹ Prob {A 5A 6} ⫽ 1 冒4 ⫹ 1 冒4 ⫹ 1 冒4 ⫽ 3 冒4. Becau se 3 冒4 is th e probability of an in dividu al offsprin g bein g h eterozygou s, it is also th e expected proportion of h eterozygou s gen otypes am on g a large n u m ber of progen y. 3.5 Pedigrees and Probability

10 9

Independent Possibilities Even ts th at are n ot m u tu ally exclu sive m ay be independent, wh ich m ean s th at th e realization of on e ou tcom e h as n o in flu en ce on th e possible realization of an y oth ers. We h ave already seen an exam ple in th e in depen den t segregation of rou n d versu s wrin kled as again st yellow versu s green peas (Figu re 3.9). Th e prin ciple is th at wh en th e possible ou tcom es of an experim en t or observation are in depen den t, th e probability th at th ey are realized togeth er is obtain ed by m u ltiplication . Su ccessive offsprin g from a cross are also in depen den t even ts, wh ich m ean s th at th e gen otypes of early progen y h ave n o in flu en ce on th e probabilities in later progen y ( Figure 3.19 ). Th e in depen den ce of su ccessive offsprin g con tradicts th e widespread belief th at in each h u m an fam ily, th e ratio of girls to boys m u st “even ou t” at approxim ately 1 : 1. Accordin g to th is reason in g, a fam ily with fou r girls wou ld be m ore likely to h ave

(A)

A1

A 1A 2 1/ 2

A2

1/ 2

B1

1/ 2

B2

B 1B 2

A 1B 1

Segregation of A 1A 2 is in depen den t of segregation of B1B2; th e probabilities m u ltiply, an d so th e gam etes are:

1/ 4

A 1B2 1/4 A 2B1 1/4 A 2B2 1/4

Su ccessive offsprin g in a h u m an sibsh ip (or peas in a pod) are in depen den t, an d so th e probabilities of gen otypes or ph en otypes can be m u ltiplied.

Each offsprin g resu lts from an in depen den t even t of fertilization . Figure 3.19 In gen etics, two im portan t types of in depen den ce are (A) in depen den t segregation of alleles th at sh ow in depen den t assortm en t an d (B) in depen den t fertilization s resu ltin g in su ccessive offsprin g. In th ese cases, th e probabilities of th e in dividu al ou tcom es of segregation or fertilization are m u ltiplied to obtain th e overall probability. 110

Mu ltip licatio n Ru le : Th e probability of two in depen den t even ts, A an d B, bein g realized sim u ltan eou sly is given by th e produ ct of th eir separate probabilities.

In sym bols, th e m u ltiplication ru le is written 1/ 2

(B)

a boy th e n ext tim e arou n d. Bu t th is belief is su pported n eith er by th eory n or by actu al data on th e sex ratio in h u m an sibsh ips. Th e data in dicate th at h u m an fam ilies are equ ally likely to h ave a girl or a boy on an y birth , irrespective of th e sex distribu tion in previou s birth s. Alth ou gh statistics gu aran tees th at th e sex ratio will balan ce ou t wh en averaged over a very large n u m ber of sibsh ips, th is does n ot im ply th at it will equ alize in an y in dividu al sibsh ip. To be con crete, am on g fam ilies in wh ich th ere are five ch ildren , sibsh ips con sistin g of five boys balan ce th ose con sistin g of five girls, yieldin g an overall sex ratio of 1 : 1; n everth eless, both of th ese sibsh ips h ave an u n u su al sex ratio. Wh en even ts are in depen den t (su ch as in depen den t traits or su ccessive offsprin g from a cross), th e probabilities are com bin ed by m ean s of th e m u ltiplication ru le.

Prob {A an d B} ⫽ Prob {A} ⭈ Prob {B}

Th e m u ltiplication ru le can be u sed to an swer qu estion s su ch as th is: For two offsprin g from th e m atin g A 4A 5 ⫻ A 4A 6, wh at is th e probability of on e h om ozygou s gen otype an d on e h eterozygou s gen otype? Th e birth order h om ozygou s–h eterozygou s h as probability 1 冒4 ⫻ 3 冒4, an d th e birth order h eterozygou s–h om ozygou s h as probability 3 冒4 ⫻ 1 冒4. Th ese possibilities are m u tu ally exclu sive, so th e overall probability is 1 冒4 ⫻ 3 冒4 ⫹ 3 冒4 ⫻ 1 冒4 ⫽ 2(1 冒4)(3 冒4) ⫽ 3 冒8 Note th at th e addition ru le was u sed twice in solvin g th is problem , on ce to calcu late th e probability of a h eterozygou s offsprin g an d again to calcu late th e probability of both birth orders. Th e m u ltiplication ru le can also be u sed to calcu late th e probability of a specific gen otype am on g th e progen y of a com plex cross. For exam ple, if a qu adru ple h eterozygou s gen otype Aa Bb Cc Dd is self-fertilized, wh at is th e probability of a qu adru ple h eterozygou s offsprin g, Aa Bb Cc Dd ? Assu m in g in depen den t assortm en t, th e an swer is (1 冒2)(1 冒2)(1 冒2)(1 冒2) ⫽ (1 冒2) 4 ⫽ 1 冒16, becau se each of th e in de-

Chapter 3 Transmission Genetics: The Principle of Segregation

pen den t gen es yields a h eterozygou s offsprin g with probability 1 冒2.

Parents:

Red

Ivory

II

ii

3.6 Incomplete Dominance and Epistasis Dom in an ce an d codom in an ce are n ot th e on ly possibilities for pairs of alleles. Th ere are situ ation s of in co m p le te d o m in an ce , in wh ich th e ph en otype of th e h eterozygou s gen otype is in term ediate between th e ph en otypes of th e h om ozygou s gen otypes. A classic exam ple of in com plete dom in an ce con cern s flower color in th e sn apdragon Antirrhinum majus ( Figure 3.20 ). In wildtype flowers, a red type of an th ocyan in pigm en t is form ed by a sequ en ce of en zym atic reaction s. A wildtype en zym e, en coded by th e I allele, is lim itin g to th e rate of th e overall reaction , so th e am ou n t of red pigm en t is determ in ed by th e am ou n t of en zym e th at th e I allele produ ces. Th e altern ative i allele codes for an in active en zym e, an d ii flowers are ivory in color. Becau se th e am ou n t of th e critical en zym e is redu ced in Ii h eterozygotes, th e am ou n t of red pigm en t in th e flowers is redu ced also, an d th e effect of th e dilu tion is to m ake th e flowers pin k. A cross between plan ts differin g in flower color th erefore gives direct ph en otypic eviden ce of segregation (Figu re 3.20). Th e cross II (red) ⫻ ii (ivory) yields F1 plan ts with gen otype Ii an d pin k flowers. In th e F2 progen y obtain ed by self-pollin ation of th e F1 h ybrids, on e experim en t resu lted in 22 plan ts with red flowers, 52 with pin k flowers, an d 23 with ivory flowers, wh ich fits th e expected ratio of 1 : 2 : 1.

F1:

In com plete dom in an ce; h eterozygou s gen otype is in term ediate in color

Self-fertilization Pink

Ii F2:

1/ 4 Red

II

1/ 2 Pink

1/ 4 Ivory

Ii

ii

F3:

All red

All ivory

ii

II

1/ 4 Red

II

1/ 2 Pink

Ii

1/ 4 Ivory

ii

Figure 3.20 Red versu s wh ite flower color in sn apdragon s sh ows n o

dom in an ce.

M ultiple Alleles Th e occu rren ce of m u ltip le alle le s is exem plified by th e alleles A 1–A 6 of th e STRP m arker in th e h u m an pedigree in Figu re 3.18. Mu ltiple alleles are relatively com m on in n atu ral popu lation s an d, as in th is exam ple, can be detected m ost easily by m olecu lar m eth ods. In th e DNA of a gen e, each n u cleotide can be A, T, G, or C, so a gen e of n n u cleotides can th eoretically m u tate at an y of th e position s to an y of th e th ree oth er n u cleotides. Th e n u m ber of possible sin gle-n u cleotide differen ces in a

gen e of len gth n is th erefore 3 ⫻ n . If n ⫽ 5000, for exam ple, th ere are poten tially 15,000 alleles (n ot cou n tin g an y of th e possibilities with m ore th an on e n u cleotide su bstitu tion ). Most of th e poten tial alleles do n ot actu ally exist at an y on e tim e. Som e are absen t becau se th ey did n ot occu r, oth ers did occu r bu t were elim in ated by ch an ce or becau se th ey were h arm fu l, an d still oth ers are presen t bu t at su ch a low frequ en cy th at th ey rem ain u n detected. Neverth eless, at th e level of DNA sequ en ce, m ost gen es in 3.6 Incomplete Dominance and Epistasis

111

m ost n atu ral popu lation s h ave m u ltiple alleles, all of wh ich can be con sidered “wildtype.” Mu ltiple wildtype alleles are u sefu l in su ch application s as DNA typin g becau se two u n related people are u n likely to h ave th e sam e gen otype, especially if several differen t loci, each with m u ltiple alleles, are exam in ed. Man y h arm fu l m u tation s also exist in m u ltiple form s. Recall from Ch apter 1 th at m ore th an 400 m u tan t form s of th e ph en ylalan in e h ydroxylase gen e h ave been iden tified in patien ts with ph en ylketon u ria. Th e alleles A 1–A 6 in Figu re 3.18 also illu strate th at alth ou gh a population of organ ism s m ay con tain an y n u m ber of alleles, an y particu lar organ ism or cell m ay carry n o m ore th an two, an d an y gam ete m ay carry n o m ore th an on e. In som e cases, th e m u ltiple alleles of a gen e exist m erely by ch an ce an d reflect th e h istory of m u tation s th at h ave taken place in th e popu lation an d th e dissem in ation of th ese m u tation s am on g popu lation su bgrou ps by m igration an d in terbreedin g. In oth er cases, th ere are biological m ech an ism s th at favor th e m ain ten an ce of a large n u m ber of alleles. For exam ple, gen es th at con trol self-sterility in certain flowerin g plan ts can h ave large n u m bers of allelic types. Th is type of self-sterility is fou n d in species of red clover th at grow wild in m an y pastu res. Th e self-sterility gen es preven t self-fertilization becau se a pollen grain can u n dergo pollen tu be growth an d fertilization on ly if it con tain s a self-sterility allele differen t from eith er of th e alleles presen t in th e flower on wh ich it lan ds. In oth er words, a pollen grain con tain in g an allele already presen t in a flower will n ot fu n ction on th at flower. Becau se all pollen grain s produ ced by a plan t m u st con tain on e of th e self-sterility alleles presen t in th e plan t, pollen can n ot fu n ction on th e sam e plan t th at produ ced it, an d self-fertilization can n ot take place. Un der th ese con dition s, an y plan t with a n ew allele h as a selective advan tage, becau se pollen th at con tain s th e n ew allele can fertilize all flowers except th ose on th e sam e plan t. Th rou gh evolu tion , popu lation s of red clover h ave accu m u lated h u n dreds of alleles of th e self-sterility gen e, m an y of wh ich h ave been isolated an d th eir DNA sequ en ces determ in ed. Man y of th e alleles differ at m u ltiple n u cleotide sites, wh ich im plies th at th e alleles in th e popu lation are very old. 112

H uman ABO Blood Groups In a m u ltiple allelic series, th ere m ay be differen t dom in an ce relation sh ips between differen t pairs of alleles. An exam ple is fou n d in th e h u m an ABO blood grou ps, wh ich are determ in ed by th ree alleles den oted I A , I B, an d I O. (Actu ally, th ere are two sligh tly differen t varian ts of th e I A allele.) Th e blood grou p of an y person m ay be A, B, AB, or O, depen din g on th e type of polysacch aride (polym er of su gars) presen t on th e su rface of red blood cells. On e of two differen t polysacch arides, A or B, can be form ed from a precu rsor m olecu le th at is m odified by th e en zym e produ ct of th e I A or th e I B allele. Th e gen e produ ct is a glycosyl tran sferase en zym e th at attach es a su gar u n it to th e precu rsor ( Figure 3.21). Th e IA or th e I B alleles en code differen t form s of th e en zym e with replacem en ts at fou r am in o acid sites; th ese alter th e su bstrate specificity so th at each en zym e attach es a differen t su gar. People of gen otype I A I A produ ce red blood cells h avin g on ly th e A polysacch aride an d are said to h ave blood type A. Th ose of gen otype I BI B h ave red blood cells with on ly th e B polysacch aride an d h ave blood type B. Heterozygou s I A I B people h ave red cells with both th e A an d th e B polysacch arides an d h ave blood type AB. Th e th ird allele, I O, en codes an en zym atically in active protein th at leaves th e precu rsor u n ch an ged; n eith er th e A n or th e B type of polysacch aride is produ ced. Hom ozygou s I OI O person s th erefore lack both th e A an d th e B polysacch arides an d are said to h ave blood type O. In th is m u ltiple allelic series, th e I A an d B I alleles are codom in an t: Th e h eterozygou s gen otype h as th e ch aracteristics of both h om ozygou s gen otypes—th e presen ce of both th e A an d th e B carboh ydrate on th e red blood cells. On th e oth er h an d, th e I O allele is recessive to both I A an d I B. Hen ce, h eterozygou s I AI O gen otypes produ ce th e A polysacch aride an d h ave blood type A, an d h eterozygou s I BI O gen otypes produ ce th e B polysacch aride an d h ave blood type B. Th e gen otypes an d ph en otypes of th e ABO blood grou p system are su m m arized in th e first th ree colu m n s of Table 3.3. ABO blood grou ps are im portan t in m edicin e becau se of th e n eed for blood tran sfu sion s. A cru cial featu re of th e ABO system is th at m ost h u m an blood con tain s

Chapter 3 Transmission Genetics: The Principle of Segregation

Figure 3.21 Th e ABO

O

N -acetylgalactosam in e added to precu rsor

O O

O

O O

O

Th is is th e A an tigen ; it reacts with an ti-A an tibody.

O

IA -encoded transferase

O OH

O

O OH

O

IO-encoded transferase

O O

O O

O O

Th is is th e “H” an tigen ; it reacts with n eith er an ti-A n or an ti-B an tibody.

Precursor carbohydrate

IB-encoded

O

transferase O O

H

CH2OH O H OH H

HO

H

HO

OH H

CH2OH O H OH H

N -acetylglucosam ine

CH2OH O H OH H

H

OH

Th is is th e B an tigen ; it reacts with an ti-B an tibody.

H H OH

H

NHCOCH3

N -acetylgalactosam ine

Galactose

an tibodies to eith er th e A or th e B polysacch aride. An an tibo d y is a protein m ade by th e im m u n e system in respon se to a stim u latin g m olecu le called an an tige n an d is capable of bin din g to th e an tigen . An an tibody is u su ally specific in th at it recog-

Table 3.3

HO

OH H

O

O

H

H

NHCOCH3

O

O

Galactose added to precu rsor

Key:

O

an tigen s on th e su rface of h u m an red blood cells are carboh ydrates. Th ey are form ed from a precu rsor carboh ydrate by th e action of tran sferase en zym es en coded by alleles of th e I gen e. Allele I O codes for an in active en zym e an d leaves th e precu rsor (called th e H su bstan ce) u n m odified. Th e I A allele en codes an en zym e th at adds N acetylgalactosam in e (pu rple) to th e precu rsor. Th e I B allele en codes an en zym e th at adds galactose (green ) to th e precu rsor. Th e oth er colored su gar u n its are N -acetylglu cosam in e (oran ge) an d fu cose (yellow).

H

O CH3 H HO

HO

OH H

OH

H

Fucose

n izes on ly on e an tigen . Som e an tibodies com bin e with an tigen an d form large m olecu lar aggregates th at m ay precipitate. An tibodies act to defen d again st in vadin g viru ses an d bacteria. Alth ou gh an tibodies do n ot n orm ally form with ou t prior

Genetic control of the human ABO blood groups

Genotype

Antigens present on red blood cells

ABO blood goup phenotype

Antibodies present in blood fluid

Blood types that can be tolerated in transfusion

Blood types that can accept blood for transfusion

I AI A

A

Type A

Anti-B

A& O

A & AB

A O

A

Type A

Anti-B

A& O

A & AB

B B

B

Type B

Anti-A

B& O

B & AB

B O

B

Type B

Anti-A

B& O

B & AB

A B

A& B

Type AB

Neither anti-A nor anti-B

A, B, AB & O

AB only

I OI O

Neither A nor B

Type O

Anti-A & anti-B

O only

A, B, AB & O

I I I I I I

I I

3.6 Incomplete Dominance and Epistasis

113

stim u lation by th e an tigen , people capable of produ cin g an ti-A an d an ti-B an tibodies do produ ce th em . Produ ction of th ese an tibodies m ay be stim u lated by an tigen s th at are sim ilar to polysacch arides A an d B an d th at are presen t on th e su rfaces of m an y com m on bacteria. However, a m ech an ism called tolerance preven ts an organ ism from produ cin g an tibodies again st its own an tigen s. Th is m ech an ism en su res th at A an tigen or B an tigen elicits an tibody produ ction on ly in people wh ose own red blood cells do n ot con tain A or B, respectively. Th e en d resu lt: People of blood type O m ake both an ti-A an d an ti-B an tibodies, th ose of blood type A m ake an ti-B an tibodies, th ose of blood type B m ake an ti-A an tibodies, an d th ose of blood type AB m ake n eith er type of an tibody.

Th e an tibodies fou n d in th e blood flu id of people with each of th e ABO blood types are sh own in th e fou rth colu m n in Table 3.3. Th e clin ical sign ifican ce of th e ABO blood grou ps is th at tran sfu sion of blood con tain in g A or B red-cell an tigen s in to person s wh o m ake an tibodies again st th em resu lts in an agglu tin ation reaction in wh ich th e don or red blood cells are clu m ped. In th is reaction , th e an ti-A an tibody agglu tin ates red blood cells of eith er blood type A or blood type AB, becau se both carry th e A an tigen ( Figure 3.22). Sim ilarly, an ti-B an tibody agglu tin ates red blood cells of eith er blood type B or blood type AB. Wh en th e blood cells agglu tin ate, m an y blood vessels are blocked, an d th e recipien t of th e tran sfu sion goes in to sh ock an d m ay die. In com patibility in th e oth er

A antigen

+

Anti-A antibody

Red blood cells in type-A person

Antibody causes agglutination of type-A red blood cells

B antigen

+

Figure 3.22 An tibody again st type-A an tigen agglu tin ates red blood cells th at carry th e type-A an tigen , wh eth er or n ot th ey also carry th e type-B an tigen . Blood flu id con tain in g an ti-A an tibody agglu tin ates red blood cells of type A an d type AB, bu t n ot red blood cells of type B or type O. 114

Red blood cells in type-B person

No agglutination of type-B red blood cells

Anti-A antibody

+

Red blood cells in type-AB person

Anti-A antibody

Chapter 3 Transmission Genetics: The Principle of Segregation

Antibody causes agglutination of type-AB red blood cells

direction , in wh ich th e don or blood con tain s an tibodies again st th e recipien t’s red blood cells, is u su ally acceptable becau se th e don or’s an tibodies are dilu ted so rapidly th at clu m pin g is avoided. Th e types of com patible blood tran sfu sion s are sh own in th e last two colu m n s of Table 3.3. Note th at a person of blood type AB can receive blood from a person of an y oth er ABO type; type AB is called a universal recipient. Con versely, a person of blood type O can don ate blood to a person of an y ABO type; type O is called a universal donor.

ch an ge th e resu lt of in depen den t segregation , it m erely con ceals th e fact th at th e u n derlyin g ratio of th e gen otypes C P : C pp : cc P : cc pp is 9 : 3 : 3 : 1. For a trait determ in ed by th e in teraction of two gen es, each with a dom in an t allele,

Parents:

Hom ozygous m utant pp

Hom ozygous m utant cc

CC pp

cc PP

Epistasis In Ch apter 1 we saw h ow th e produ cts of several gen es m ay be n ecessary to carry ou t all th e steps in a bioch em ical path way. In gen etic crosses in wh ich two m u tation s th at affect differen t steps in a sin gle path way are both segregatin g, th e typical F2 ratio of 9 : 3 : 3 : 1 is n ot observed. Gen e in teraction th at pertu rbs th e n orm al Men delian ratios is kn own as e p istasis. On e type of epistasis is illu strated by th e in teraction of th e C, c an d P, p allele pairs affectin g flower coloration in peas. Th ese gen es en code en zym es in th e bioch em ical path way for th e syn th esis of an th ocyan in pigm en t, an d th e produ ction of an th ocyan in requ ires th e presen ce of at least on e wildtype dom in an t allele of each gen e. Th e proper way to represen t th is situ ation gen etically is to write th e requ ired gen otype as C P

wh ere each dash is a “blan k” th at m ay be filled with eith er allele of th e gen e. Hen ce C com prises th e gen otypes CC an d Cc, an d likewise P com prises th e gen otypes PP an d Pp. All fou r gen otypes in clu ded in th e sym bol C P, an d on ly th ese gen otypes, h ave pu rple flowers. Figure 3.23 sh ows a cross between th e h om ozygou s recessive gen otypes pp an d cc. Th e ph en otype of th e flowers in th e F1 gen eration is pu rple becau se th e gen otype is Cc Pp. Self-fertilization of th e F1 plan ts (in dicated by th e en circled cross sign ) resu lts in th e F2 progen y gen otypes sh own in th e Pu n n ett squ are. Becau se on ly th e C P progen y h ave pu rple flowers, th e ratio of pu rple flowers to wh ite flowers in th e F2 gen eration is 9 : 7. Th e epistasis does n ot

F1 generation:

En circled X m ean s selffertilization

Dou ble h eterozygote Cc Pp (com plem en tation is observed)

F2 generation: M ale gametes Cp cP

CP

cp

CP

CC PP

CC Pp

Cc PP

Cc Pp

CC Pp

CC pp

Cc Pp

Cc pp

Cc PP

Cc Pp

cc PP

cc Pp

Cc Pp

Cc pp

cc Pp

cc pp

F2 ratio: 9 pu rple

:

Cp

Female gametes cP

cp

7 wh ite

Figure 3.23 Epistasis in th e determ in ation of

flower color in peas. Form ation of th e pu rple pigm en t requ ires th e dom in an t allele of both th e C an d P gen es. With th is type of epistasis, th e dih ybrid F2 ratio is m odified to 9 pu rple : 7 wh ite. 3.6 Incomplete Dominance and Epistasis

115

th ere are on ly a lim ited n u m ber of ways in wh ich th e 9 : 3 : 3 : 1 dih ybrid ratio can be m odified. Th e possibilities are illu strated in Figure 3.24 . In part A are th e gen otypes produ ced in th e F2 gen eration by in depen den t segregation . In th e absen ce of epistasis, th e F2 ratio of ph en otypes is 9 : 3 : 3 : 1. Th e possible m odified ratios are sh own in part B. In each row, th e color codin g in dicates ph en otypes th at are in distin gu ish able becau se of epistasis, an d th e resu ltin g m odified ratio is given . For exam ple, in th e m odified ratio at th e bottom , th e ph en otypes of th e “3 : 3 : 1” classes are in distin gu ish able, resu ltin g in a 9 : 7 ratio. Th is is th e ratio observed in th e segregation of th e C, c an d P, p alleles in Figu re 3.23, with its 9 : 7 ratio of pu rple to wh ite flowers. Takin g all th e possible m odified ratios in Figu re 3.24B

(A)

AA BB AA Bb Aa BB Aa Bb 1

2

2 9

4

AA bb

Aa bb

1

2

togeth er, th ere are n in e possible dih ybrid ratios wh en both gen es sh ow com plete dom in an ce. Exam ples of each of th e m odified ratios are kn own . Th e m ost frequ en tly en cou n tered m odified ratios are 9 : 7, 12 : 3 : 1, 13 : 3 , 9 : 4 : 3, an d 9 : 6 : 1. Th e types of epistasis th at resu lt in th ese m odified ratios are illu strated in th e followin g exam ples, wh ich are taken from a variety of organ ism s. Oth er exam ples can be fou n d in th e problem s at th e en d of th e ch apter. 9 : 7 is observed wh en a h om ozygou s recessive m u tation in eith er or both of two differen t gen es resu lts in th e sam e m u tan t ph en otype, as in Figu re 3.23. 12 : 3 : 1 resu lts wh en th e presen ce of a dom in an t allele at on e locu s m asks th e

aa BB 1

aa Bb

aa bb

2

1

3

3

9:3:3:1

1 M odified F2 ratio

Color below sh ows ph en otypic expression (B)

Unmodified ratio

9

3

3

1

12 : 3 : 1

9

3

3

1

10 : 3 : 3

9

3

3

1

9:6:1

9

3

3

1

9:4:3

9

3

3

1

15 : 1

9

3

3

1

13 : 3

9

3

3

1

12 : 4

9

3

3

1

10 : 6

9

3

3

1

9:7

Figure 3.24 Modified F2 dih ybrid ratios. (A) Th e F2 gen otypes of two in depen den tly assortin g gen es with com plete dom in an ce resu lt in a 9 : 3 : 3 : 1 ratio of ph en otypes, provided th ere is n o in teraction (epistasis) between th e gen es. (B) If th ere is epistasis th at ren ders two or m ore of th e ph en otypes in distin gu ish able (in dicated by th e colors), th en th e F2 ratio is m odified.

116

Chapter 3 Transmission Genetics: The Principle of Segregation

gen otype at a differen t locu s, su ch as th e A  gen otype ren derin g th e B an d bb gen otypes in distin gu ish able. For exam ple, in gen etic stu dy of th e color of th e h u ll in oat seeds, a variety with wh ite h u lls was crossed with a variety with black h u lls. Th e F1 h ybrid seeds h ad black h u lls. Am on g 560 progen y in th e F2 gen eration , th e h u ll ph en otypes observed were 418 black, 106 gray, an d 36 wh ite. Th e ratio of ph en otypes is 11.6 : 3.9 : 1, or very n early 12 : 3 : 1. A gen etic h ypoth esis to explain th ese resu lts is th at th e black-h u ll ph en otype is du e to th e presen ce of a dom in an t allele A an d th e gray-h u ll ph en otype is du e to an oth er dom in an t allele B wh ose effect is apparen t on ly in aa gen otypes. On th e basis of th is h ypoth esis, th e origin al varieties h ad gen otypes aa bb (wh ite) an d AA BB (black). Th e F1 h as gen otype Aa Bb (black). If th e A, a allele pair an d th e B, b allele pair u n dergo in depen den t assortm en t, th en th e F2 gen eration is expected to h ave th e gen otypic an d ph en otypic com position 9 冒16 A  B (black h u ll), 3 冒16 A  bb (black h u ll), 3 冒16 aa B (gray h u ll), 1 冒16 aa bb (wh ite h u ll), or 12 : 3 : 1. 13 : 3 is illu strated by th e differen ce between Wh ite Legh orn ch icken s (gen otype CC II) an d Wh ite Wyan dotte ch icken s (gen otype cc ii). Both breeds h ave wh ite feath ers becau se th e C allele is n ecessary for colored feath ers, bu t th e I allele in Wh ite Legh orn s is a dom in an t in h ibitor of feath er coloration . Th e F1 gen eration of a dih ybrid cross between th ese breeds h as th e gen otype Cc Ii, wh ich is expressed as wh ite feath ers becau se of th e in h ibitory effects of th e I allele. In th e F2 gen eration , on ly th e C ii gen otype h as colored feath ers, so th ere is a 13 : 3 ratio of wh ite : colored. 9 : 4 : 3 is observed wh en h om ozygosity for a recessive allele with respect to on e gen e m asks th e expression of th e gen otype of a differen t gen e. For exam ple, if th e aa gen otype h as th e sam e ph en otype regardless of wh eth er th e gen otype is B or bb, th en th e 9 : 4 : 3 ratio resu lts. As an exam ple, in m ice th e grayish “agou ti” coat color resu lts from a h orizon tal ban d of yellow pigm en t ju st ben eath th e tip of each h air. Th e agou ti pattern is du e to a dom in an t allele A, an d in aa an im als th e coat color is

black. A secon d dom in an t allele, C, is n ecessary for th e form ation of h air pigm en ts of an y kin d, an d cc an im als are albin o (wh ite). In a cross of AA CC (agou ti) ⫻ aa cc (albin o), th e F1 progen y are Aa Cc an d ph en otypically agou ti. Crosses between F1 m ales an d fem ales produ ce F2 progen y in th e proportion s 9 冒16 A  C (agou ti), 3 冒16 A  cc (albin o), 3 冒16 aa C (black), 1 冒16 aa cc (albin o), or 9 agou ti : 4 albin o : 3 black. 9 : 6 : 1 im plies th at h om ozygosity for eith er of two recessive alleles yields th e sam e ph en otype bu t th at th e ph en otype of th e dou ble h om ozygote is differen t. In Du roc–Jersey pigs, red coat color requ ires th e presen ce of 09131_01_1749P two dom in an t alleles R an d S. Pigs of gen otype R ss an d rr S h ave san dy-colored coats, an d rr ss pigs are wh ite. Th e F2 ratio is th erefore 9 冒16 R S (red), 3 冒16 R ss (san dy), 3 冒16 rr S (san dy), 1 冒16 rr ss (wh ite), or 9 red : 6 san dy : 1 wh ite.

3.7 Genetic Analysis: M utant Screens and the Complementation Test Wh en a gen eticist m akes a statem en t su ch as “Flower color in peas is determ in ed by a sin gle gen e with dom in an t allele P an d recessive allele p,” th is does n ot m ean th at on ly on e gen e is n eeded for flower color. Im plicit in th e statem en t is th e existen ce of two strain s or varieties, on e with colored flowers an d on e with wh ite flowers, in wh ich th e differen ce is cau sed by on e strain h avin g gen otype PP an d th e oth er pp. Man y gen es beside P are also n ecessary for pu rple flower coloration . Am on g th em are oth er gen es in th e bioch em ical path way for th e syn th esis of an th ocyan in , as well as differen t gen es n eeded for expression of th e bioch em ical path way in developin g flowers. A th ird variety of peas th at h as wh ite flowers m ay n ot h ave th e gen otype pp. In th is variety, th e wh ite flowers cou ld be cau sed by a m u tation in on e of th e oth er gen es n eeded for flower color. We h ave already seen an exam ple in Figu re 3.23, in wh ich th e wh ite flower at th e u pper righ t

3.7 Genetic Analysis: M utant Screens and the Complementation Test

117

resu lts from gen otype cc. Fu rth erm ore, a gen eticist in terested in u n derstan din g th e gen etic basis of flower color wou ld rarely be satisfied with h avin g iden tified th e P gen e alon e or both th e P gen e an d th e C gen e. Th e u ltim ate goal of a gen etic an alysis of flower color wou ld be, by isolatin g n ew m u tation s, to iden tify every gen e n ecessary for pu rple coloration an d th en , th rou gh fu rth er stu dy of th e m u tan t ph en otypes, to determ in e th e n orm al fu n ction of each of th e gen es th at affect th e trait.

The Complementation Test in Gene Identification In a gen etic an alysis of flower color, a gen eticist wou ld begin by isolatin g m an y n ew m u tan ts with wh ite flowers. Alth ou gh m u tation s are u su ally very rare, th eir frequ en cy can be in creased by treatm en t with radiation or certain ch em icals. Th e isolation of a set of m u tan ts, all of wh ich sh ow th e sam e type of ph en otype, is called a m u tan t scre e n . Am on g th e m u tan ts th at are isolated, som e will con tain m u tation s in gen es already iden tified. For exam ple, a gen etic an alysis of flower color in peas m igh t yield on e or m ore n ew m u tation s th at ch an ged th e wildtype P allele in to a recessive allele

09131_01_1718P

Photocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otoca ption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption ph otocaption

118

th at blocks th e form ation of th e pu rple pigm en t. Each of th ese m u tation s m igh t differ in DNA sequ en ce, form in g a set of m u ltiple alleles th at are all m u tan t form s of th e wildtype P allele. On th e oth er h an d, a m u tan t screen sh ou ld also yield m u tation s in gen es n ot previou sly iden tified. Each of th e n ew gen es m igh t be also represen ted by m u ltiple m u tan t alleles. In a m u tan t screen for flower color, m ost of th e n ew m u tation s will be recessive. Th is is becau se m ost of th e n ew m u tan t alleles will en code an in active protein of som e kin d, an d m ost protein -codin g gen es are expressed at su ch a level th at on e copy of th e wildtype allele is su fficien t to yield a n orm al m orph ological ph en otype. (Th e m olecu lar ph en otype is often in term ediate; th e h eterozygotes h ave h alf as m u ch protein as th e wildtype h om ozygotes.) New recessive alleles are th erefore iden tified in h om ozygou s gen otypes, becau se h om ozygou s recessive plan ts h ave wh ite flowers. Am on g all th e n ew m u tan t strain s th at are isolated, som e will h ave a recessive m u tation in on e gen e, oth ers in a sceon d gen e, still oth ers in a th ird gen e, an d so on . Com plicatin g th e issu e are m u ltiple alleles, becau se two or m ore in depen den tly isolated m u tation s m ay be alleles of th e sam e wildtype gen e. How can th e strain s be sorted ou t? How can th e gen eticist determ in e wh ich m u tation s are alleles of th e sam e gen e an d wh ich are m u tation s in differen t gen es? Becau se th e n ew m u tation s are recessive, all on e h as to do is cross th e h om ozygou s gen otypes. Th e ph en otype of th e F1 progen y yields th e an swer. Th is prin ciple is illu strated in Figure 3.25. In part A we su ppose th at th e m u tation s are recessive alleles of th e sam e gen e. Let u s call th em a1 an d a2. Th en th e paren tal m u tan t strain s h ave gen otypes a1a1 an d a2a2, an d th e F1 progen y h ave th e gen otype a1a2. Becau se th ere is n o wildtype allele at th e locu s, th e ph en otype of th e F1 is wh ite (m u tan t). Th is resu lt is called n o n co m p le m e n tatio n , an d it m ean s th at th e paren tal strain s are h om ozygou s for recessive alleles of th e same gen e. In part B we su ppose th e paren tal m u tan t strain s are h om ozygou s for recessive alleles of differen t gen es, say a1a1 an d b1b1. With respect to th e B locu s, th e gen otype of th e a1a1 strain is BB, an d with respect to th e A locu s, th e gen otype of th e b1b1 strain is

Chapter 3 Transmission Genetics: The Principle of Segregation

(A) P1 generation

(B)

M utant strain 1

M utant strain 2

M utant strain 1

M utant strain 3

Hom ozygous recessive m utation 1

Hom ozygous recessive m utation 2

Hom ozygous recessive m utation 1

Hom ozygous recessive m utation 3

(pp)

(pp)

FPO F1 generation

Purple flow ers

White flow ers

Th e wildtype ph en otype in dicates complementation.

Th e m u tan t ph en otype in dicates noncomplementation.

Mu tation 1 an d m u tation 2 are defects in different gen es.

Mu tation 1 an d m u tation 3 are defects in th e same gen e.

Figure 3.25 Th e com plem en tation test reveals wh eth er two recessive m u tation s are alleles of th e

sam e gen e. In th e com plem en tation test, h om ozygou s recessive gen otypes are crossed. If th e ph en otype of th e F1 progen y is m u tan t (A), it m ean s th at th e m u tation s in th e paren tal strain s are alleles of sam e gen e. If th e ph en otype of th e F1 progen y is n on m u tan t (B), it m ean s th at th e m u tation s in th e paren tal strain s are alleles of differen t gen es.

AA; h en ce th e gen otypes of th e m u tan t strain s cou ld be written m ore com pletely as a1a1 BB an d AA b1b1. In th is case, th e F1 progen y h ave th e gen otype Aa1 Bb1. Th e A allele m asks th e m u tation a1 an d th e B allele m asks th e m u tation b1; h en ce th e ph en otype is pu rple (wildtype). Th is resu lt is called co m p le m e n tatio n , an d it m ean s th at th e paren tal strain s are h om ozygou s for recessive alleles of different gen es. Th e kin d of cross illu strated in Figu re 3.25 is a co m p le m e n tatio n te st. As we h ave seen , it is u sed to determ in e wh eth er recessive m u tation s in each of two differen t strain s are alleles of th e sam e gen e. Becau se th e resu lt in dicates th e presen ce or absen ce of allelism , th e com plem en tation test is on e of th e key experim en tal operation s in gen etics. To illu strate th e application of th e test in practice, su ppose a m u tan t screen were carried ou t to isolate m ore m u tation s for wh ite flowers. Startin g with a tru e-breedin g strain with pu rple flowers, we treat pollen with x rays an d u se th e irradiated pollen to fertilize ovu les to obtain seeds. Th e F1 seeds are grown an d

th e resu ltin g plan ts allowed to self-fertilize, after wh ich th e F2 plan ts are grown . A few of th e F1 seeds m ay con tain a n ew recessive m u tation , bu t in th is gen eration th e gen otype is h eterozygou s. Self-fertilization of su ch plan ts resu lts in F2 progen y in a ratio of 3 pu rple : 1 wh ite. Becau se m u tation s resu ltin g in a particu lar ph en otype are qu ite rare, even wh en in du ced by radiation , on ly a few am on g m an y th ou san ds of self-fertilized plan ts will be fou n d to h ave a n ew wh ite-flower m u tation . Let u s su ppose th at we were lu cky en ou gh to obtain six n ew m u tation s. How are we goin g to n am e th ese m u tation s? We can m ake n o assu m ption s abou t th e n u m ber of gen es represen ted. As far as we kn ow at th is stage, th ey cou ld all be alleles of th e sam e gen e, or th ey cou ld all be alleles of differen t gen es. Th is is wh at we n eed to establish u sin g th e com plem en tation test. For th e m om en t, th en , let u s assign th e m u tation s arbitrary n am es in series—x1, x2, x3, an d so forth —with n o im plication abou t wh ich m ay be alleles of each oth er.

3.7 Genetic Analysis: M utant Screens and the Complementation Test

119

Genotype of female parent

x2x2

x3x3

x4x4

x6x6

x5x5

Ph en otype of F1 progen y

x1x1

+

+

+



+

+

+



+



+



+

+

+



+

+

+



+



+



+

Com plem en tation (m u tation s n ot allelic)

x2x2

Genotype of male parent

Lack of com plem en tation (m u tation s allelic)

x3x3

x4x4

x5 x5

Figure 3.26 Resu lts of com plem en tation tests am on g six m u tan t strain s of peas, each h om ozygou s

for a recessive allele resu ltin g in wh ite flowers. Each box gives th e ph en otype of th e F1 progen y of a cross between th e m ale paren t wh ose gen otype is in dicated in th e far left colu m n an d th e fem ale paren t wh ose gen otype is in dicated in th e top row.

Th e n ext step is to classify th e m u tation s in to grou ps u sin g th e com plem en tation test. Figure 3.26 sh ows th e resu lts, con ven tion ally reported in a trian gu lar array of ⫹ an d ⫺ sign s. Th e crosses th at yield F1 progen y with th e wildtype ph en otype (in th is case, pu rple flowers) are den oted with a ⫹ sign in th e correspon din g box, wh ereas th ose th at yield F1 progen y with th e m u tan t ph en otype (wh ite flowers) are den oted with a ⫺ sign . Th e ⫹ sign s in dicate com plem en tation between th e m u tan t alleles in th e paren ts, an d th e ⫺ sign s in dicate n on com plem en tation . (Th e bottom h alf of th e trian gle is u n n ecessary becau se reciprocal crosses yield th e sam e resu lts, an d th e diagon al elem en ts are u n n ecessary becau se each strain is tru ebreedin g with in itself an d so yields m u tan t progen y.) As we saw in Figu re 3.25, com plem en tation in a cross m ean s th at th e paren tal strain s h ave m u tation s in differen t gen es. Lack of com plem en tation m ean s th at th e paren tal m u tation s are in th e sam e 120

gen e. Th e followin g prin ciple u n derlies th e com plem en tation test. Th e Prin cip le o f Co m p le m e n tatio n : If two recessive m u tation s are alleles of th e sam e gen e, th en a cross between h om ozygou s strain s yields F1 progen y th at are m u tan t (n on com plem en tation ); if th ey are alleles of differen t gen es, th en th e F1 progen y are wildtype (com plem en tation ).

In in terpretin g com plem en tation data su ch as th ose in Figu re 3.26, we actu ally apply th e prin ciple th e oth er way arou n d. We exam in e th e ph en otype of th e F1 progen y of each possible cross an d th en in fer wh eth er or n ot th e paren tal strain s h ave m u tan t alleles of th e sam e gen e. In a com plem en tation test, if th e com bin ation of two recessive m u tation s resu lts in a m u tan t ph en otype, th en th e m u tation s are regarded as alleles of th e sam e gen e; if th e com bin ation resu lts in a wildtype ph en otype, th en th e m u tation s are regarded as alleles of differen t gen es.

Chapter 3 Transmission Genetics: The Principle of Segregation

A con ven ien t way to an alyze th e data in Figu re 3.26 is to arran ge th e alleles in a circle as sh own in Figure 3.27A. Th en , for each possible pair of m u tation s, con n ect th e pair by a straigh t lin e if th e m u tation s fail to com plem en t (Figu re 3.27B). Accordin g to th e prin ciple of com plem en tation , th e lin es m u st con n ect m u tation s th at are alleles of each oth er, becau se in a com plem en tation test, lack of com plem en tation m ean s th at th e m u tation s are alleles. Each of th e grou ps of n on com plem en tin g m u tation s is called a co m p le m e n tatio n gro u p . As we h ave seen , each com plem en tation grou p defin es a gen e, so th e com plem en tation test provides th e gen eticist’s tech n ical defin ition : A gen e is defin ed experim en tally as a set of m u tation s th at m ake u p a sin gle com plem en tation grou p. An y pair of m u tation s with in th e grou p fail to com plem en t on e an oth er, an d th e com plem en tation test yields organ ism s with a m u tan t ph en otype.

Th e m u tation s in Figu re 3.27 th erefore represen t th ree gen es, m u tation of an y on e of wh ich resu lts in wh ite flowers. Mu tation s x1 an d x5 are alleles of on e gen e; x3 is an allele of a differen t gen e; an d x2, x4, an d x6 are alleles of a th ird gen e. Wh at abou t th e gen es P an d C (Figu re 3.23 on page 115) th at also affect flower color? Th e com plem en tation test tells u s n oth in g abou t th ese, becau se we h ave n ot yet in clu ded th em in an y of th e crosses. To test allelism with P an d C, we wou ld h ave to cross on e m u tan t strain from each com plem en tation grou p with strain s of gen otype pp an d cc. If th e F1 progen y are m u tan t, it iden tifies th e com plem en tation grou p with an already kn own m u tation . For exam ple, su ppose we cross each of th e strain s x1x1, x3x3, an d x4x4 with pp an d cc. Su ppose fu rth er th at th e progen y are all wildtype except in two crosses—x1x1 ⫻ pp an d x3x3 ⫻ cc—in wh ich th e progen y are m u tan t. Th is resu lt im plies th at th e com plem en tation grou p con sistin g of x1 an d x5 also in clu des p an d th at th e com plem en tation grou p con sistin g of x3 also in clu des c. At th is poin t in th e gen etic an alysis, it is advisable to ren am e th e m u tation s to in dicate wh ich on es are tru e alleles. (Th ere is an old Ch in ese sayin g th at th e correct n am in g of th in gs is th e begin n in g of wisdom , an d th is is certain ly tru e in th e case of gen es.) Becau se th e p allele already h ad its

n am e before th e m u tation screen was carried ou t, th e n ew alleles of p (th at is, x1 an d x5) sh ou ld be ren am ed to reflect th eir allelism with p. Old n am es h ave priority, an d so we m u st u se th e sym bol p em bellish ed with som e sort of iden tifier. We m igh t, for exam ple, ren am e th e x1 an d x5 m u tation s p2 an d p3 to in dicate th at th ey were th e secon d an d th ird m u tation s of P to be discovered an d to con vey th eir in depen den t origin s. Sim ilarly, we m igh t ren am e x3 (th e n ew allele of c) as c2. Th e rem ain in g com plem en tation grou p con sistin g of x2, x4, an d x6 is a n ew on e, an d we can n am e it arbitrarily. For exam ple, we m igh t call th e locu s albus (Latin for “wh ite”) with th e gen e sym bol alb an d call th e alleles alb1, alb2, an d alb3. Th e wildtype dom in an t allele of alb, wh ich is n ecessary for pu rple coloration , wou ld th en be represen ted as Alb or as alb⫹. Th e procedu re of sortin g n ew m u tation s in to com plem en tation grou ps an d ren am in g th em accordin g to th eir allelism is an exam ple of h ow gen eticists iden tify gen es an d n am e alleles. Su ch ren am in g of alleles is th e typical m an n er in wh ich gen etic term in ology evolves as kn owledge advan ces.

Th is con n ectin g lin e m ean s th at x1 an d x5 fail to com plem en t on e an oth er wh en th e paren ts are crossed; th ey are alleles of th e sam e gen e.

(A)

x1

x6

(B)

x2

x3

x5

x1

x6

x2

x5

x4

Th ese con n ectin g lin es m ean th at x2, x4, an d x6 fail to com plem en t on e an oth er in all com bin ation s in wh ich th e paren ts are crossed; th ey are alleles of yet an oth er gen e.

x3 x4 x3 com plem en ts all th e oth er alleles; it represen ts an oth er gen e th at affects flower coloration .

Figure 3.27 A m eth od for in terpretin g th e resu lts of com plem en tation

tests. (A) Arran ge th e m u tation s in a circle. (B) Con n ect by a straigh t lin e an y pair of m u tation s th at fail to com plem en t (th at yield a m u tan t ph en otype); an y pair of m u tation s so con n ected are alleles of th e sam e gen e. In th is exam ple, th ere are th ree com plem en tation grou ps, each of wh ich represen ts a sin gle gen e n eeded for pu rple flower coloration .

3.7 Genetic Analysis: M utant Screens and the Complementation Test

121

(B) Trans heterozygote for tw o m utations in different genes

(A) Trans heterozygote for tw o m utations in the sam e gene Boundaries of gene

Boundaries of gene 1

Site of m utation in gene

x1

x2

M utant gene product (nonfunctional)

M utant gene product (nonfunctional)

x1

x2+

x1+

x2

M utant gene product (nonfunctional)

Resu lt: No com plem en tation . No fu n ction al gen e produ ct, th erefore m u tan t ph en otype.

Boundaries of gene 2

Norm al gene product (functional)

Norm al gene product (functional)

M utant gene product (nonfunctional)

Resu lt: Com plem en tation . Fu n ction al produ ct from both gen es, th erefore wildtype ph en otype.

M utant (w hite) flow er color

Wildtype (purple) flow er color

Figure 3.28 Molecu lar in terpretation of a com plem en tation test u sed in determ in in g wh eth er two m u tation s are alleles of th e sam e gen e (A) or alleles of differen t gen es (B).

Why Does the Complementation Test Work? Th e m olecu lar basis of com plem en tation is illu strated in Figure 3.28. Part A depicts th e situ ation wh en two recessive m u tation s, x1 an d x2, each resu ltin g in wh ite flowers, are differen t m u tation s in th e sam e gen e. (Th e site of each m u tation is in dicated by a “bu rst.”) Th e cross x1x1 ⫻ x2x2 produ ces an F1 h ybrid in wh ich x1 an d x2 are in h om ologou s DNA m olecu les; x1 en codes a protein with on e type of defect, an d x2 en codes a protein with a differen t type of defect, bu t both types of protein are n on fu n ction al. Hen ce alleles in th e sam e gen e yield a m u tan t ph en otype (wh ite flowers), becau se n eith er m u tation en codes a wildtype form of th e protein .

122

Wh en th e m u tation s are alleles of differen t gen es, th e situ ation is as depicted in Figu re 3.28B. Becau se th e m u tation s are in differen t gen es, th e h om ozygou s x1 strain is also h om ozygou s for th e wildtype allele x2⫹ of th e secon d locu s; likewise, th e h om ozygou s x2 strain is also h om ozygou s for th e wildtype allele x1⫹ of th e first locu s. Hen ce, th e sam e cross th at yields th e gen otype x1x2 in th e case of allelic m u tation s (Figu re 3.28A) yields x1⫹x1 x2⫹x2 in th e case of differen t gen es. Th is is a dou ble h eterozygou s gen otype. Becau se th e m u tation s are both recessive an d in differen t gen es, th ey do com plem en t each oth er an d yield an organ ism with a wildtype ph en otype (pu rple flowers). With respect to th e protein ren dered defective by x1, th ere is a fu n ction al form en coded by th e

Chapter 3 Transmission Genetics: The Principle of Segregation

wildtype allele brou gh t in from th e x2x2 paren t. With respect to th e protein ren dered defective by x2, th ere is again a fu n ction al form en coded by th e wildtype

allele brou gh t in from th e x1x1 paren t. Becau se a fu n ction al form of both protein s is produ ced, th e resu lt is a n orm al ph en otype, or com plem en tation .

Chapter Summary Men delian gen etics deals with th e h ereditary tran sm ission of gen es from on e gen eration to th e n ext. On e key prin ciple is segregation , in wh ich th e two alleles in an in dividu al separate du rin g th e form ation of gam etes so th at each gam ete is equ ally likely to con tain eith er m em ber of th e pair. In a cross su ch as AA ⫻ aa, in wh ich on ly on e gen e is con sidered, th e gen otype of th e offsprin g (con stitu tin g th e F1 gen eration ) is h eterozygou s Aa. Th e ph en otype of th e F1 progen y depen ds on th e dom in an ce relation sh ips am on g th e alleles. For m an y m orph ological traits, th e wildtype allele, h ere den oted A , is dom in an t, an d th e ph en otype of h eterozygou s Aa is in distin gu ish able from th at of h om ozygou s AA . In con trast, th e alleles of m olecu lar gen etic m arkers are often codom in an t, an d th e ph en otypes of AA, Aa, an d aa are all distin ct. For an RFLP m arker, for exam ple, th e h om ozygou s AA an d aa gen otypes h ave a ph en otype con sistin g of a sin gle ban d differin g in electroph oretic m obility, wh ereas th e h eterozygou s Aa gen otype h as a ph en otype con sistin g of both ban ds. In th e form ation of gam etes, an Aa gen otype produ ces A -bearin g an d a-bearin g gam etes in equ al proportion s. Hen ce in th e F1 ⫻ F1 cross Aa ⫻ Aa, assu m in g ran dom u n ion of gam etes in fertilization , th e progen y (th e F2 gen eration ) are expected to con sist of gen otypes AA : Aa : aa in th e proportion s 1 : 2 : 1. Th e distribu tion of ph en otypes in th e F2 gen eration again depen ds on th e dom in an ce relation sh ips. If A is dom in an t to a, th en th e F2 ratio of dom in an t : recessive ph en otypes is expected to be 3 : 1. With codom in an ce all th ree gen otypes are distin gu ish able, an d th e ratio of F2 ph en otypes is 1 : 2 : 1. For crosses in volvin g two gen es—for exam ple, AA BB ⫻ aa bb—th e F1 gen otype is Aa Bb. Segregation of each gen e im plies th at th e ratios of A : a an d of B : b gam etes are both 1 : 1. If th e gen es are u n lin ked, th ey u n dergo in depen den t segregation (in depen den t assortm en t), an d th e gam etic types A B : A b : a B : a b are form ed in th e ratio 1 : 1 : 1 : 1. Hen ce th e F2 gen eration form ed from th e cross F1 ⫻ F1 is expected to h ave gen otypes given by th e produ ct of th e expression (1 冒4 AA ⫹ 1 冒2 Aa ⫹ 1 冒4 aa) ⫻ (1 冒4 BB ⫹ 1 冒2 Bb ⫹ 1 冒4 bb). Usin g a dash to represen t an allele of u n specified type, we can write th e F2 gen otypes as 9 A  B : 3 A  bb : 3 aa B : 1 aa bb, an d if both A an d B are dom in an t, th e ph en otypic ratio in th e F2 gen eration is 9 : 3 : 3 : 1. Th is ratio can be m odified in variou s ways by in teraction between th e gen es (epistasis). Differen t types of epistasis m ay resu lt in dih ybrid ratios su ch as 9 : 7 or 12 : 3 : 1 or 13 : 3 or 9 : 4 : 3. Th e ru les of probability provide th e basis for predictin g th e ou tcom es of gen etic crosses based on th e prin ciples of

segregation an d in depen den t assortm en t. Two basic ru les for com bin in g probabilities are th e addition ru le an d th e m u ltiplication ru le. Th e addition ru le applies to m u tu ally exclu sive even ts; it states th at th e probability of th e realization of eith er on e or th e oth er of two even ts equ als th e su m of th e respective probabilities. Th e m u ltiplication ru le applies to in depen den t even ts; it states th at th e probability of th e sim u ltan eou s realization of both of two even ts is equ al to th e produ ct of th e respective probabilities. In som e organ ism s, in clu din g h u m an bein gs, it is n ot possible to perform con trolled crosses, an d gen etic an alysis is accom plish ed th rou gh th e stu dy of pedigrees th rou gh two or m ore gen eration s. Pedigree an alysis is th e determ in ation of th e possible gen otypes of th e fam ily m em bers in a pedigree an d of th e probability th at an in dividu al m em ber h as a particu lar gen otype. Th e goal of pedigree an alysis is often to in fer th e gen etic basis of an in h erited disease or oth er con dition —for exam ple, to determ in e wh eth er it m ay be du e to a sim ple dom in an t or recessive allele. Alth ou gh m ost m orph ological traits do n ot sh ow sim ple Men delian pattern s of in h eritan ce in pedigrees, m olecu lar m arkers u su ally do. Prom in en t am on g th ese are SNPs (sin gle-n u cleotide polym orph ism s), RFLPs (restriction fragm en t len gth polym orph ism s), an d STRPs (sim ple tan dem repeat polym orph ism s). Mu ltiple alleles are often en cou n tered in n atu ral popu lation s or as a resu lt of m u tan t screen s. Gen es with m u ltiple alleles m ay h ave as few as th ree, su ch as th ose for th e ABO blood grou ps, to as m an y as a h u n dred or m ore. Exam ples of large n u m bers of alleles in clu de th e gen es u sed in DNA typin g an d th e self-sterility alleles in som e flowerin g plan ts. Alth ou gh th ere m ay be m u ltiple alleles in a popu lation , each gam ete can carry on ly on e allele of each gen e, an d each organ ism can carry at m ost two differen t alleles of each gen e. Th e com plem en tation test is th e fu n ction al defin ition of a gen e. If a cross between two h om ozygou s recessive gen otypes resu lts in n on m u tan t progen y, th e m u tation s are said to complement on e an oth er. Com plem en tation is eviden ce th at th e alleles are m u tation s in differen t gen es. On th e oth er h an d, if a cross between two h om ozygou s recessives resu lts in m u tan t progen y, th en th e alleles sh ow noncomplementation (th ey fail to com plem en t). Non com plem en tation is eviden ce th at th e m u tation s are alleles of th e same gen e. For an y grou p of recessive m u tation s, a com plete com plem en tation test en tails crossin g th e h om ozygou s recessives in all pairwise com bin ation s. At th e m olecu lar level, lack of com plem en tation im plies th at allelic m u tation s im pair th e fu n ction of th e sam e protein m olecu le.

Chapter Summary

123

Key Terms addition rule antibody antigen backcross carrier codominance complementation complementation group complementation test consanguineous mating epistasis F1 generation F2 generation gamete

hybrid incomplete dominance independent assortment M endelian genetics multiple alleles multiplication rule mutant screen noncomplementation outcrossing P1 generation pedigree penetrance Punnett square reciprocal cross

segregation sib sibling testcross transmission genetics transposable element true-breeding wildtype unlinked genes

Review the Basics • Wh at is th e prin ciple of segregation , an d h ow is th is prin ciple dem on strated in th e resu lts of a sin gle-gen e (m on oh ybrid) cross? • Wh at is th e prin ciple of in depen den t assortm en t, an d h ow is th is prin ciple dem on strated in th e resu lts of a two-gen e (dih ybrid) cross? • Explain wh y ran dom u n ion of m ale an d fem ale gam etes is n ecessary for Men delian segregation an d in depen den t assortm en t to be observed in th e progen y of a cross. • Wh at is th e differen ce between m u tu ally exclu sive even ts an d in depen den t even ts? How are th e probabilities of th ese two types of even ts com bin ed? Give two exam ples of gen etic even ts th at are m u tu ally exclu sive an d two exam ples of gen etic even ts th at are in depen den t. • Wh en two pairs of alleles sh ow in depen den t assortm en t, u n der wh at con dition s will a 9 : 3 : 3 : 1 ratio of ph en otypes in th e F2 gen eration not be observed?

• Explain th e followin g statem en t: “Am on g th e F2 progen y of a dih ybrid cross, th e ratio of gen otypes is 1 : 2 : 1, bu t am on g th e progen y th at express th e dom in an t ph en otype, th e ratio of gen otypes is 1 : 2.” • Wh at are th e prin cipal featu res of h u m an pedigrees in wh ich a rare dom in an t allele is segregatin g? In wh ich a rare recessive allele is segregatin g? • Wh at is a m u tan t screen an d h ow is it u sed in gen etic an alysis? • Explain th e statem en t: “In gen etics, a gen e is iden tified experim en tally by a set of m u tan t alleles th at fail to sh ow com plem en tation .” Wh at is com plem en tation ? How does a com plem en tation test en able a gen eticist to determ in e wh eth er two differen t m u tation s are or are n ot m u tation s in th e sam e gen e? • Wh at does it m ean to say th at epistasis resu lts in a “m odified dih ybrid F2 ratio?” Give two exam ples of a m odified dih ybrid F2 ratio, an d explain th e gen e in teraction s th at resu lt in th e m odified ratio.

Guide to Problem Solving Explain wh at each of th e followin g ratios represen ts in th e F2 progen y of a sin gle-gen e (m on oh ybrid) cross. Wh ich are ratios of gen otypes an d wh ich are ratios of ph en otypes? (a) 3 : 1 (b) 1 : 2 : 1 (c) 2 : 1

Problem 1

Answer (a) 3 : 1 is th e expected ratio of ph en otypes wh en on e of th e alleles is dom in an t. (b) 1 : 2 : 1 is th e expected ratio of gen otypes AA : Aa : aa. (c) 2 : 1 is th e expected ra-

124

tio of h eterozygou s Aa gen otypes to h om ozygou s AA gen otypes. Com plete th e table by in sertin g 0, 1, 1 冒2, or 1 冒4 for th e probability of each gen otype of progen y from each type of m atin g. For wh ich m atin g are th e paren ts iden tical in gen otype bu t th e progen y as variable in gen otype as th ey can be for a sin gle locu s? For wh ich m atin g are th e paren ts as differen t as th ey can be for a sin gle locu s bu t th e progen y iden tical to each oth er an d differen t from eith er paren t?

Problem 2

Chapter 3 Transmission Genetics: The Principle of Segregation

Progeny genotypes

AA

M ating

Aa

aa

AA ⫻ AA

Answer (a) Follow th e strategy ou tlin ed in th e text, in wh ich th e alleles are placed in th e sh ape of a circle an d pairs of alleles th at fail to sh ow com plem en tation are con n ected by a lin e. Th e resu ltin g pattern is a visu al represen tation of th e com plem en tation grou ps.

AA ⫻ Aa

a

AA ⫻ aa Aa ⫻ Aa

f

b

e

c

Aa ⫻ aa aa ⫻ aa

Th is table is to tran sm ission gen etics wh at th e m u ltiplication table is to arith m etic. It is fu n dam en tal to bein g able to solve alm ost an y type of qu an titative problem in tran sm ission gen etics. It n eeds to be th orou gh ly u n derstood—m em orized, if n ecessary!

d

Answer

(b)

Th e m u tan ts defin e th ree com plem en tation grou ps (“gen es”). On e com plem en tation grou p con sists of th e alleles a, e, an d f; an oth er of th e alleles b an d c; an d th e oth er of th e allele d on ly.

Th e accom pan yin g illu stration sh ows fou r altern ative types of com bs in ch icken s; th ey are called rose, pea, sin gle, an d waln u t. Th e followin g data su m m arize th e resu lts of crosses. Th e rose an d pea strain s u sed in crosses 1, 2, an d 5 are tru e breedin g. 1. rose ⫻ sin gle 씮 rose 2. pea ⫻ sin gle 씮 pea 3. (rose ⫻ sin gle) F1 ⫻ (rose ⫻ sin gle) F1 씮 3 rose : 1 sin gle 4. (pea ⫻ sin gle) F1 ⫻ (pea ⫻ sin gle) F1 씮 3 pea : 1 sin gle 5. rose ⫻ pea 씮 waln u t 6. (rose ⫻ pea) F1 ⫻ (rose ⫻ pea) F1 씮 9 waln u t : 3 rose : 3 pea : 1 sin gle

Problem 4 Progeny genotypes M ating

AA

Aa

aa

AA ⫻ AA

1

0

0

AA ⫻ Aa

1/ 2

1/ 2

0

AA ⫻ aa

0

1

0

Aa ⫻ Aa

1/ 4

1/ 2

1/ 4

Aa ⫻ aa

0

1/ 2

1/ 2

aa ⫻ aa

0

0

1

Th e m atin g for wh ich th e paren ts are iden tical in gen otype, bu t th e progen y are as variable in gen otype as th ey can be for a sin gle locu s, is Aa ⫻ Aa. Th e m atin g for wh ich th e paren ts are as differen t as th ey can be for a sin gle locu s, bu t th e progen y are iden tical to each oth er an d differen t from eith er paren t, is AA ⫻ aa.

Rose com b

Problem 3 Th e data in th e accom pan yin g com plem en tation m atrix su m m arize th e resu lt of crosses between m u tan ts design ated a th rou gh f. (a) Con figu re th e allele sym bols in th e form of a circle, an d u se straigh t lin es to con n ect th e alleles th at are in th e sam e com plem en tation grou p. (b) State th e con clu sion in words: How m an y com plem en tation grou ps are in dicated, an d wh ich m u tan ts are in each com plem en tation grou p?

a b c d

a

b

c

d

e

f



+

+

+









+

+

+



+

+

+



+

+

Single com b

Pea com b

Walnut com b

Wh at gen etic h ypoth esis can explain th ese resu lts? Wh at are th e gen otypes of paren ts an d progen y in each of th e crosses? (c) Wh at are th e gen otypes of tru e-breedin g strain s of rose, pea, sin gle, an d waln u t?

(a)

(b)

e f



– –

Guide to Problem Solving

125

Cross 6 gives th e Men delian ratios expected wh en two gen es are segregatin g, so a gen etic h ypoth esis with two gen es is n ecessary. Crosses 1 an d 3 give th e resu lts expected if rose com b were du e to a dom in an t allele (say, R). Crosses 2 an d 4 give th e resu lts expected if pea com b were du e to a dom in an t allele (say, P). Cross 5 in dicates th at waln u t com b resu lts from th e in teraction of R an d P. Th e segregation in cross 6 m ean s th at R an d P are n ot alleles of th e sam e gen e. (b) 1. RR pp ⫻ rr pp 씮 Rr pp 2. rr PP ⫻ rr pp 씮 rr Pp 3. Rr pp ⫻ Rr pp 씮 3冒4 R pp : 1冒4 rr pp Answer (a)

rr Pp ⫻ rr Pp 씮 3冒4 rr P : 1冒4 rr pp RR pp ⫻ rr PP 씮 Rr Pp 6. Rr Pp ⫻ Rr Pp 씮 9冒16 R P : 3冒16 R pp : 3冒16 rr P : 1冒16 rr pp (c) Th e tru e-breedin g gen otypes are RR pp (rose), rr PP (pea), rr pp (sin gle), an d RR PP (waln u t). 4. 5.

Au : Sh ou ld gen otype in prob. 3.2 be “Aa Bb . . .” in stead of “AA Bb . . .”?

Analysis and Applications

3.1 Wh at gam etes can be form ed by an in dividu al organ ism of gen otype Aa? Of gen otype Bb? Of gen otype Aa Bb? 3.2 How m an y differen t gam etes can be form ed by an organ ism with gen otype AA Bb Cc Dd Ee an d, in gen eral, by an organ ism th at is h eterozygou s for m gen es an d h om ozygou s for n gen es?

Rou n d pea seeds are plan ted th at were obtain ed from th e F2 gen eration of a cross between a tru e-breedin g strain with rou n d seeds an d a tru e-breedin g strain with wrin kled seeds. Th e pollen was collected an d u sed en masse to fertilize plan ts from th e tru e-breedin g wrin kled strain . Wh at fraction of th e progen y is expected to h ave wrin kled seeds?

I-1

I-2

II-1

II-2

I 1

2

II 1

2

3

4

5

III 1

2

3.5 Sh own above righ t are a pedigree an d gel diagram in dicatin g th e clin ical ph en otypes with respect to ph en ylketon u ria an d th e m olecu lar ph en otypes with respect to an RFLP th at overlaps th e PAH gen e for ph en ylalan in e h ydroxylase. In dividu al II-3 is affected. (a) In dicate th e expected m olecu lar ph en otype of II-3. (b) In dicate th e possible m olecu lar ph en otypes of II-4.

126

II-4 12 kb 9 kb 6 kb

3.3

3.4 Ph en ylketon u ria is a recessive in born error of m etabolism of th e am in o acid ph en ylalan in e th at resu lts in severe m en tal retardation of affected ch ildren . Th e fem ale II-3 (red circle) in th e pedigree sh own h ere is affected. If person s III-1 an d III-2 (th ey are first cou sin s) m ate, wh at is th e probability th at th eir offsprin g will be affected? (Assu m e th at person s II-1 an d II-5 are h om ozygou s for th e n orm al allele.)

II-3

3 kb 1 kb

3.6 Assu m in g equ al n u m bers of boys an d girls, if a m atin g h as already produ ced a girl, wh at is th e probability th at th e n ext ch ild will be a boy? If a m atin g h as already produ ced two girls, wh at is th e probability th at th e n ext ch ild will be a boy? On wh at type of probability argu m en t do you base you r an swers? 3.7 Assu m in g equ al sex ratios, wh at is th e probability th at a sibsh ip of fou r ch ildren con sists en tirely of boys? Of all boys or all girls? Of equ al n u m bers of boys an d girls? 3.8 In th e followin g qu estion s, you are asked to dedu ce th e gen otype of certain paren ts in a pedigree. Th e ph en otypes are determ in ed by dom in an t an d recessive alleles of a sin gle gen e. (a) A h om ozygou s recessive resu lts from th e m atin g of a h eterozygote an d a paren t with th e dom in an t ph en otype? Wh at does th is tell you abou t th e gen otype of th e paren t with th e dom in an t ph en otype? (b) Two paren ts with th e dom in an t ph en otype produ ce n in e offsprin g. Two h ave th e recessive ph en otype. Wh at does th is tell you abou t th e gen otype of th e paren ts? (c) On e paren t h as a dom in an t ph en otype, an d th e oth er h as a recessive ph en otype. Two offsprin g resu lt, an d both h ave th e dom in an t ph en otype. Wh at gen otypes are possible for th e paren t with th e dom in an t ph en otype?

Chapter 3 Transmission Genetics: The Principle of Segregation

GeN ETics on the Web will introduce you to some of the most important sites for finding genetic information on the Internet. To explore these sites, visit the Jones and Bartlett home page at http:/ / www.jbpub.com/ genetics

For the book Genetics: Analysis of Genes and Genomes, choose the link that says Enter GeN ETics on the Web. You will be presented with a chapter-by -chapter list of highlighted keywords. Select any highlighted keyword and you will be linked to a Web site containing genetic information related to the keyword. • M endel’s paper is one of the few nineteenth century scientific papers that reads almost as clearly as if it had been written today. It is important reading for every aspiring geneticist. You can access a conveniently annotated text using the keyword M endel. Although modern geneticists make a clear distinction between genotype and phenotype, M endel made no clear distinction between these concepts. At this keyword site you will find a treasure trove of information about M endel, including his famous paper, essays, commentary, and a collection of images—all richly linked to additional Internet resources.

you can learn about genetic testing programs, early symptoms, and the time course of the disease. • The red and purple colors of flowers, as well as of autumn leaves, result from members of a class of pigments called anthocyanins. The biochemical pathway for anthocyanin synthesis in the snapdragon, Antirrhinum majus, can be found at this keyword site. The enzyme responsible for the first step in the pathway (chalcone synthase) limits the amount of pigment formed, which explains why red and white flowers in Antirrhinum show incomplete dominance. • The M utable Site changes frequently. Each new update includes a different site that highlights genetics resources available on the World Wide Web. Select the M utable Site for Chapter 3 and you will be linked automatically. • The Pic Site showcases some of the most visually appealing genetics sites on the World Wide Web. To visit the genetics Web site pictured below, select the PIC Site for Chapter 3.

• H untington disease is a devastating degeneration of the brain that begins in middle life. It affects about 30,000 Americans, and, because of its dominant mode of inheritance and complete penetrance, each of their 150,000 siblings and children has a 50–50 chance of developing the disease. N amed after George H untington, a Long Island physician who first described it in 1872, the disease’s principal symptom is an involuntary, jerky motion of the head, trunk, and limbs called chorea, after the Greek word for “ dance.” At this keyword site

3.9 Pedigree an alysis tells you th at a particu lar paren t m ay h ave th e gen otype AA BB or AA Bb, each with th e sam e probability. Assu m in g in depen den t assortm en t, wh at is th e probability of th is paren t’s produ cin g an Ab gam ete? Wh at is th e probability of th e paren t’s produ cin g an AB gam ete? 3.10 Assu m e th at th e trih ybrid cross AA BB rr ⫻ aa bb RR is m ade in a plan t species in wh ich A an d B are dom in an t bu t th ere is n o dom in an ce between R an d r. Con sider th e F2 progen y from th is cross, an d assu m e in depen den t assortm en t.

(a)

How m an y ph en otypic classes are expected?

(b)

Wh at is th e probability of th e paren tal aa bb RR gen otype?

(c)

Wh at proportion wou ld be expected to be h om ozygou s for all th ree gen es?

3.11 In th e cross Aa Bb Cc Dd ⫻ Aa Bb Cc Dd, in wh ich all gen es u n dergo in depen den t assortm en t, wh at proportion of offsprin g are expected to be h eterozygou s for all fou r gen es? 3.12 Th e pattern of coat coloration in dogs is determ in ed by th e alleles of a sin gle gen e, with S (solid) bein g dom in an t over s (spotted). Black coat color is determ in ed by th e dom in an t allele A of a secon d gen e, tan by h om ozygosity for th e recessive allele a. A fem ale h avin g a solid tan coat is m ated with a m ale h avin g a solid black coat an d produ ces a litter of six pu ps. Th e ph en otypes of th e pu ps are 2 solid tan , 2 solid black, 1 spotted tan , an d 1 spotted black. Wh at are th e gen otypes of th e paren ts? 3.13 In th e h u m an pedigree sh own h ere, th e dau gh ter in dicated by th e red circle (II-1) h as a form of deafn ess determ in ed by a recessive allele. Wh at is th e probability

Analysis and Applications

127

th at th e ph en otypically n orm al son (II-3) is h eterozygou s for th e gen e?

1

2

II

I 1

1

2

II

2

3

4

3 4

5

6

5

7

6

8

9 10 11 12 13

III 1

1

3

2

3.14 Hu n tin gton disease is a rare n eu rodegen erative h u m an disease determ in ed by a dom in an t allele, HD. Th e disorder is u su ally m an ifested after th e age of forty-five. A you n g m an h as learn ed th at h is fath er h as developed th e disease. (a) Wh at is th e probability th at th e you n g m an will later develop th e disorder? (b) Wh at is th e probability th at a ch ild of th e you n g m an carries th e HD allele? 3.15 Assu m e th at th e trait in th e accom pan yin g pedigree is du e to sim ple Men delian in h eritan ce. (a) Is it likely to be du e to a dom in an t allele or a recessive allele? Explain . (b) Wh at is th e m ean in g of th e dou ble h orizon tal lin e con n ectin g III-1 with III-2? (c) Wh at is th e biological relation sh ip between III-1 an d III-2? (d) If th e allele respon sible for th e con dition is rare, wh at are th e m ost likely gen otypes of all of th e person s in th e pedigree in gen eration s I, II, an d III? (Use A an d a for th e dom in an t an d recessive alleles, respectively.)

I 1

2

II 1

2

3

4

III 1

2

IV

2

7

8

9

10 11 12 13

3.18 Th e accom pan yin g pedigree an d gel diagram sh ow th e ph en otypes of th e paren ts for an RFLP th at h as m u ltiple alleles. Wh at are th e possible ph en otypes of th e progen y, an d in wh at proportion s are th ey expected?

12 kb 9 kb 6 kb 3 kb 1 kb

3.19 Red kern el color in wh eat resu lts from th e presen ce of at least on e dom in an t allele of each of two in depen den tly segregatin g gen es (in oth er words, R B gen otypes h ave red kern els). Kern els on rr bb plan ts are wh ite, an d th e gen otypes R bb an d rr B resu lt in brown kern el color. Su ppose th at plan ts of a variety th at is tru e breedin g for red kern els are crossed with plan ts tru e breedin g for wh ite kern els. (a) Wh at is th e expected ph en otype of th e F1 plan ts? (b) Wh at are th e expected ph en otypic classes in th e F2 progen y an d th eir relative proportion s?

Heterozygou s Cp cp ch icken s express a con dition called creeper, in wh ich th e leg an d win g bon es are sh orter th an n orm al ( cp cp). Th e dom in an t Cp allele is leth al wh en h om ozygou s. Two alleles of an in depen den tly segregatin g gen e determ in e wh ite ( W ) versu s yellow ( ww ) skin color. From m atin gs between ch icken s h eterozygou s for both of th ese gen es, wh at ph en otypic classes will be represen ted am on g th e viable progen y, an d wh at are th eir expected relative frequ en cies? 3.20

1

2

3

4

3.16 Th e Hopi, Zu n i, an d som e oth er Sou th west Am erican In dian s h ave a relatively h igh frequ en cy of albin ism (absen ce of skin pigm en t) resu ltin g from h om ozygosity for a recessive allele, a. A n orm ally pigm en ted m an an d wom an , each of wh om h as an albin o paren t, h ave two ch ildren . Wh at is th e probability th at both ch ildren are albin o? Wh at is th e probability th at at least on e of th e ch ildren is albin o? 3.17 Say th e trait in th e accom pan yin g pedigree is du e to sim ple Men delian in h eritan ce. (a) Is it likely to be du e to a dom in an t allele or a recessive allele? Explain . (b) Wh at are th e m ost likely gen otypes of all of th e person s in th e pedigree? (Use A an d a for th e dom in an t an d recessive alleles.)

128

I

3.21 Wh ite Legh orn ch icken s are h om ozygou s for a dom in an t allele, C, of a gen e respon sible for colored feath ers, an d also for a dom in an t allele, I, of an in depen den tly segregatin g gen e th at preven ts th e expression of C. Th e Wh ite Wyan dotte breed is h om ozygou s recessive for both gen es cc ii. Wh at proportion of th e F2 progen y obtain ed from m atin g Wh ite Legh orn ⫻ Wh ite Wyan dotte F1 h ybrids wou ld be expected to h ave colored feath ers?

Chapter 3 Transmission Genetics: The Principle of Segregation

3.22 Th e F2 progen y from a particu lar cross exh ibit a m odified dih ybrid ratio of 9 : 7 (in stead of 9 : 3 : 3 : 1). Wh at ph en otypic ratio wou ld be expected from a testcross of th e F1 ? 3.23 Black h air in rabbits is determ in ed by a dom in an t allele, B, an d wh ite h air by h om ozygosity for a recessive allele, b. Two h eterozygotes m ate an d produ ce a litter of th ree offprin g. (a) Wh at is th e probability th at th e offsprin g are born in th e order wh ite-black-wh ite? Wh at is th e probability th at th e offsprin g are born in eith er th e order wh ite-black-wh ite or th e order black-wh iteblack? (b) Wh at is th e probability th at exactly two of th e th ree offsprin g will be wh ite?

Con sider sibsh ips con sistin g of 6 ch ildren , an d assu m e a sex ratio of 1 : 1. (a) Wh at is th e proportion with n o girls? (b) Wh at is th e proportion with exactly 1 girl? (c) Wh at is th e proportion with exactly 2 girls? (d) Wh at is th e proportion with exactly 3 girls? (e) Wh at is th e proportion with 3 or m ore boys? 3.24

3.25 An dalu sian fowls are colored black, splash ed wh ite (resu ltin g from an u n even sprin klin g of black pigm en t th rou gh th e feath ers), or slate blu e. Black an d splash ed wh ite are tru e breedin g, an d slate blu e is a h ybrid th at segregates in th e ratio 1 black : 2 slate blu e : 1 splash ed wh ite. If a pair of blu e An dalu sian s is m ated an d th e h en lays th ree eggs, wh at is th e probability th at th e ch icks h atch ed from th ese eggs will be on e black, on e blu e, an d on e splash ed wh ite? 3.26 In th e m atin g Aa ⫻ Aa, wh at is th e sm allest n u m ber of offsprin g, n, for wh ich th e probability of at least on e aa offsprin g exceeds 95 percen t? 3.27 From th e F2 gen eration of a cross between m ou se gen otypes AA ⫻ aa, on e m ale progen y of gen otype A  was ch osen an d m ated with an aa fem ale. All of th e progen y in th e resu ltin g litter were A . From th is resu lt you wou ld like to con clu de th at th e sire’s gen otype is AA. How m u ch con fiden ce cou ld you h ave in th is con clu sion for each litter size from 1 to 15? (In oth er words wh at is th e probability th at th e sire’s gen otype is AA, given th at th e a priori probability is 1 冒3 an d th at a litter of n pu ps resu lted in all A  progen y?)

Th e accom pan yin g gel diagram in clu des th e ph en otype of two paren ts (A an d B) with respect to two differen t RAPD polym orph ism s. Each paren t is h om ozygou s for an allele associated with a ban d defin in g a RAPD polym orph ism . Th e two RAPD polym orph ism s are at differen t loci an d u n dergo in depen den t assortm en t. In th e gel diagram , in dicate th e expected ph en otype of th e F1 progen y as well as all possible ph en otypes of th e F2 progen y, alon g with th eir expected proportion s.

3.28

Parents A

F1

F2 progeny

B 12 kb 9 kb 6 kb 3 kb 1 kb

Th e gel diagram sh own below sh ows th e ph en otype of two paren ts (A an d B), each h om ozygou s for two RFLPs th at u n dergo in depen den t assortm en t. Paren t A h as gen otype A 1A 1 B1B1, wh ere th e A 1 allele yields a ban d of 2 kb an d th e B1 allele yields a ban d of 8 kb. Paren t B h as gen otype A 2A 2 B2B2, wh ere th e A 2 allele yields a ban d of 3 kb an d th e B2 allele yields a ban d of 10 kb. Sh ow th e expected ph en otype of th e F1 progen y as well as all possible ph en otypes of th e F2 progen y, alon g with th eir expected proportion s. 3.29

P1 A

F1

F2 progeny

B 12 kb 9 kb 6 kb 3 kb 1 kb

3.30 Com plem en tation tests of th e recessive m u tan t gen es a th rou gh f produ ced th e data in th e accom pan yin g m atrix. Th e circles represen t m issin g data. Assu m in g th at all of th e m issin g m u tan t com bin ation s wou ld yield data con sisten t with th e en tries th at are kn own , com plete th e table by fillin g each circle with a ⫹ or ⫺ as n eeded.

a a

b

c

+

_

d

e

f

+ _

b

_

c d

+

e f

Analysis and Applications

129

Challenge Problems 3.31 Diagram m ed h ere is DNA from a wildtype gen e (top) an d a m u tan t allele (bottom ) th at h as an in sertion of a tran sposable elem en t th at in activates th e gen e. Th e tran sposable elem en t is presen t in m an y copies scattered th rou gh ou t th e gen om e. Th e sym bols B an d E represen t th e position s of restriction sites for Bam HI an d EcoRI, respectively, an d th e rectan gles sh ow sites of h ybridization with each of th ree probes (A, B, an d C) th at are available. Th e dots at th e left in dicate th at th e n earest site of eith er Bam HI or EcoRI cleavage is very far to th e left of th e region sh own . Explain wh ich probe an d wh ich sin gle restriction en zym e you wou ld u se for RFLP an alysis to iden tify both alleles. Also explain wh y an y oth er ch oices wou ld be u n su itable.

0A

B

E

2 B

4

B

0A

2 B

E

B

6 kb

E

E

4C

6

B

Transposable elem ent E

8

10

B

12 kb

3.32 Meiotic drive is an u n u su al ph en om en on in wh ich two alleles do n ot sh ow Men delian segregation from th e h eterozygou s gen otype. Exam ples are kn own from m am m als, in sects, fu n gi, an d oth er organ ism s. Th e u su al m ech an ism is on e in wh ich both types of gam etes are form ed, bu t on e of th em fails to fu n ction n orm ally. Th e excess of th e drivin g allele over th e oth er can ran ge from a sm all am ou n t to n early 100 percen t. Su ppose th at D is an allele sh owin g m eiotic drive again st its altern ative allele d, an d su ppose th at Dd h eterozygotes produ ce fu n ction al D-bearin g an d d-bearin g gam etes in th e proportion s 3 冒4 : 1 冒4. In th e m atin g Dd ⫻ Dd,

130

Th e accom pan yin g table su m m arizes th e effect of in h erited tissu e an tigen s on th e acceptan ce or rejection of tran splan ted tissu es, su ch as skin grafts, in m am m als. Th e tissu e an tigen s are determ in ed in a codom in an t fash ion , so th at tissu e taken from a don or of gen otype Aa carries both th e A an d th e a an tigen . In th e table, th e ⫹ sign m ean s th at a graft of don or tissu e is accepted by th e recipien t, an d th e ⫺ sign m ean s th at a graft of don or tissu e is rejected by th e recipien t. Th e ru le is th at any graft will be rejected whenever the donor tissue contains an antigen not present in the recipient. In oth er words, an y tran splan t will be accepted if, an d on ly if, th e don or tissu e does not con tain an an tigen differen t from an y already presen t in th e recipien t. 3.33

(a)

Wh at are th e expected proportion s of DD, Dd, an d dd gen otypes?

(b)

If D is dom in an t, wh at are th e expected proportion s of D an d dd ph en otypes?

(c)

Am on g th e D ph en otypes, wh at is th e ratio of DD : Dd?

(d)

An swer parts (a) th rou gh (c), assu m in g th at th e m eiotic drive takes place in on ly on e sex.

Th e diagram illu strated below sh ows all possible skin grafts between in bred (h om ozygou s) strain s of m ice (P1 an d P2 ) an d th eir F1 an d F2 progen y. Assu m e th at th e in bred lin es P1 an d P2 differ in on ly on e tissu e-com patibility gen e. For each of th e arrows, wh at is th e probability of acceptan ce of a graft in wh ich th e don or is an an im al ch osen at ran dom from th e popu lation sh own at th e base of th e arrow an d th e recipien t is an an im al ch osen at ran dom from th e popu lation in dicated by th e arrowh ead?

Chapter 3 Transmission Genetics: The Principle of Segregation

Ju dson , H. F. 1996. The Eighth Day of Creation: The Makers of the Revolution in Biology. Cold Sprin g Harbor, NY: Cold Sprin g Harbor Laboratory Press. Lewis, R. 2001. Human Genetics: Concepts and Applications. 4th ed. Du bu qu e, IA: McGraw-Hill. Lewon tin , R. C. 2000. It Ain’t Necessarily So: The Dream of the Human Genome and Other Illusions. New York: New York Review of Books. Maron i, G. 2000. Molecular and Genetic Analysis of Human Traits. Malden , MA: Blackwell. Men del, G. 1866. Experim en ts in plan t h ybridization . (Tran slation .) In The Origins of Genetics: A Mendel Source Book, ed. C. Stern an d E. Sh erwood. 1966. New York: Freem an . Olby, R. C. 1966. Origins of Mendelism . Lon don : Con stable.

Orel, V. 1996. Gregor Mendel: The First Geneticist. Oxford, En glan d: Oxford Un iversity Press. Orel, V., an d D. L. Hartl. 1994. Con troversies in th e in terpretation of Men del’s discovery. History and Philosophy of the Life Sciences 16: 423. San dler, I. 2000. Developm en t: Men del’s legacy to gen etics. Genetics 154: 7. Stern , C., an d E. Sh erwood, eds. 1966. The Origins of Genetics: A Mendel Source Book. New York: Freem an . Stu rtevan t, A. H. 1965. A Short History of Genetics. New York: Harper & Row. Sykes, B, ed. 1999. The Human Inheritance. New York: Oxford Un iversity Press.

Further Reading

131