Essentials of Molecular Genetics [1 ed.] 9781783321933, 9781842659229

ESSENTIALS OF MOLECULAR GENETICS is primarily designed as a text book for undergraduate students studying molecular gene

326 33 30MB

English Pages 772 Year 2015

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Essentials of Molecular Genetics [1 ed.]
 9781783321933, 9781842659229

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Essentials of Molecular Genetics

Essentials of Molecular Genetics

Gurbachan S. Miglani

α

Alpha Science International Ltd. Oxford, U.K.

Essentials of Molecular Genetics 772 pgs. | 402 figs. | 71 tbls.

Gurbachan S. Miglani Visiting Professor School of Agricultural Biotechnology Punjab Agricultural University Ludhiana, Punjab Copyright © 2015 ALPHA SCIENCE INTERNATIONAL LTD. 7200 The Quorum, Oxford Business Park North Garsington Road, Oxford OX4 2JZ, U.K. www.alphasci.com Printed from the camera-ready copy provided by the Author. ISBN 978-1-84265-922-9 E-ISBN 978-1-78332-193-3 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without prior written permission of the publisher.

 

In the sweet memory of my Darshan

  Foreword  Advances in molecular biology and cutting edge science of genomics are leading to breakthroughs in life sciences, particularly in agricultural and medical sciences. Genetic control of biological characteristics of an organism involves complex interactions of DNA, RNA and proteins at molecular and cellular levels. Molecular genetics provides the basic knowledge in understanding the complexity of various biological processes both in prokaryotes and eukaryotes. Genes produce structural and functional molecules that are so essential in controlling the synthesis of different bio-molecules and for modifying metabolic pathways and gene expression, leading to phenotypes. Thus, it is important to understand the complexities of structural and functional components of genetic material. In addition to the complexities of gene expression, epigenetic variation also plays an important role in development and evolution. There is a gap between the knowledge that is required and what is actually being taught in molecular genetics courses at bachelor’s, master’s and doctoral levels. Thus, there is urgent need to upgrade the teaching level of molecular genetics courses in various subjects, e.g., basic, agricultural, medical and veterinary sciences. Good textbooks on molecular genetics are therefore essential for this purpose. I am happy to note that Dr. Gurbachan Singh Miglani has produced a textbook entitled, “Essentials of Molecular Genetics.” This book will fill the above-mentioned need for upgrading the level of teaching of molecular biology. Dr. Miglani who is teaching Genetics and Molecular Biology at the Punjab Agricultural University has 35 years teaching experience. He is eminently qualified to prepare this text book. He has also published nine books earlier, including Dictionary of Plant Genetics and Molecular Biology (1998), Advanced Genetics (2002), Genetic Material (2013) and Gene Regulation (2013). The book under review has 27 chapters. It is very well written and covers contemporary subjects such as nature, structure and organization of genetic material, packaging and replication of nucleic acids, genetic recombination, gene structure, gene organization, gene function, transcription, RNA processing, genetic code, translation and molecular mechanisms of regulation of gene expression in viruses, bacteria and eukaryotes, role of DNA and histone modifications and non-coding RNAs and gene silencing in bacteria, viruses and eukaryotes. Author has given due emphasis to molecular techniques and application of new breakthroughs in molecular genetics in understanding basic biology of complex living systems.

viii 

Foreword 

At the end of each chapter, a few problems have been included to test the comprehension level of the readers. There is an extensive glossary of 58 pages on more than 720 different terms related to molecular genetic techniques. The book is written in simple and easy to read language. Illustrations are clear and easy to understand. Subject index is comprehensive. The book is published by Narosa Publishing House, Pvt. limited, New Delhi and distributed by Alpha Science International, Oxford, U.K. in the international market as a co-publisher. It will be available to Indian as well as to international students. It will also be an excellent source of references for teachers and researchers alike. I would like to congratulate Dr. Miglani for his labor of love in producing this valuable text. Gurdev S. Khush Adjunct Professor University of California, Davis

  Preface  The present era belongs to molecular genetics and biotechnology. Therefore, to become proficient in teaching, research, and applications of the subject matter, the underlying concepts, phenomena, hypotheses, theories, laws and principles must be thoroughly understood at the molecular level. The blueprint or genetic material responsible for the physical appearance of an organism resides inside the cell. This genetic material stores all the information for development and survival of the individual and transmission of biological properties from one generation to the next. The first step for studying molecular biology/genetics/biotechnology is to gain an understanding of the nature, structure, molecular forms, location, organization, analysis, sequencing, synthesis, packaging, recombination, damage, repair, and protection of genetic material in viruses, prokaryotes and eukaryotes. Genetic material performs its function by organizing itself in the form of gene, the unit of heredity. Knowing above-mentioned aspects of genetic material sets the stage for knowing about gene. In order to understand how a gene expresses itself to perform its function, its various aspects, viz., gene structure, gene organization, genome evolution, gene function, transcription, RNA processing, genetic code, translation, and fate of finished proteins, used ribosomes and messenger RNAs need to be understood. Finely regulated mechanisms exist in the cell to control expression of gene at every level of its expression. Every gene requires a large number of proteins to help it perform its assigned function. DNA and protein modifications influence gene expression. Without proper coordination of gene regulatory mechanisms, which “switch on” or “switch off” particular genes, whose products are required by the cell in a spacio-temporal manner, development and evolution cannot be thought of in the present era of molecular biology. Thus gene regulation is the molecular basis of development and evolution. Essentials of Molecular Genetics has been written with the objective of providing concise but complete knowledge on the above-mentioned aspects of the chemical basis of life, genetic material (deoxyribonucleic acid and ribonucleic acid), and expression and regulation of gene in viruses, bacteria and eukaryotes. This book also briefly deals with the role of epigenetic modifications in gene regulation. This book is primarily designed as a text book for undergraduate students studying molecular genetics in any discipline of life sciences, agricultural sciences, medicine, and biotechnology in all the conventional, medical, and agricultural universities. However, postgraduate students, teachers, research workers, and biotechnology professionals working with molecular biotechnology companies/colleges/institutes/schools across the world can very conveniently use it as a reference book.



Essentials of Molecular Genetics 

This book provides a brief historical background in introductory paragraph(s) of every chapter. Recent progress on the topic is discussed. Work of 70 Nobel laureates finds a special mention. Various hypotheses, principles, concepts, phenomena of molecular genetics have been dealt with in a simple and lucid language. The text is supported by a number of important original and recent references, tables, figures and flow diagrams where necessary. Special attention has been paid to precise definitions of various terms in the subject in light of the present day knowledge. For this, there is a separate section on glossary of important terms used in the book. Important literature relevant to the subject matter has been cited in the text and all such references have been listed at the end of every chapter. A few thought-provoking problems are also included in every chapter. Where there are six or more authors, only first three have been mentioned, followed by et al. Throughout efforts have been made to briefly pin-point applications/implications of different discoveries in the area of molecular genetics and biotechnology. Extensive cross-referencing has been done. Essentials of Molecular Genetics should be extremely useful to those who are preparing for national and international level competitive examinations, entrance tests, and interviews for jobs/fellowships. I trust the book will provide enjoyable reading experience. Strenuous efforts have been made to include all important information relevant at undergraduate level; however, if something important is missing, the readers are urged to inform the author so that deficiencies can be corrected in a subsequent printing/edition. Style of presentation and selection of examples are purely my choice. I am responsible for any omissions and commissions. Readers’ feedback will be appreciated. Class-room has been my lab for my teaching experiments. Students have always been a guiding force to me in an indirect way in deciding order of different chapters in the book and organization of the contents in a chapter. Thus contribution of my present and past students in my books has been tremendous. During the writing of this book I have consulted a large number of original research papers, review articles, books, monographs, and web sites. My head bows with respect before all these great authors for their landmark work. I thank various copyright holders to grant me permission for use of their published work in this book. I am grateful to Dr. Kuldeep Singh, Director, School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India for his moral support and providing me facilities for this work. Dr. S.S. Gosal, Director of Research, Punjab Agricultural University, Ludhiana, India, who is an experienced author, made very useful suggestions in the preparation of this book. Stewardship provided by Dr. Manjit Singh Kang, former Vice-Chancellor, Punjab Agricultural University, Ludhiana, India, who has edited and authored several books in the area of genetics, is gratefully acknowledged. I enjoyed full support of Dr. Baldev Singh Dhillon, Vice-Chancellor, Punjab Agricultural University, Ludhiana, India in the writing of this book. Dr. Darshan S. Brar, former Head, Plant Breeding, Genetics and Biotechnology Division, International Rice Research Institute, Los Baños, The Philippines, presently Honorary Adjunct Professor, School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, India advised me to include some thought-provoking questions/problems at the end of every chapter to test comprehension of the reader; for this and many other useful pieces of advice, I am thankful to Dr. Brar. I am indebted to Dr. Gurdev S. Khush, former Principal Plant Breeder and Head, Division of Plant Breeding Genetics and Biochemistry, International Rice Research Institute, Los Baños, The

Preface    

xi 

Philippines, and presently Adjunct Professor, University of California, Davis, USA for showering his blessings on me by writing Foreword of this book. I must make mention of my immediate family, relatives and friends without whose involvement and support, completion of this book would not have been possible. No one can do any creative work, such as writing of a book, without the cooperation of his/her spouse. My wife Harjit showed utmost patience and cooperation during the preparation of this book. My son Jemmy and daughter-in-law Parvi made sure that I was kept away from tedious household duties like depositing bills and making purchases from the market. My daughter Simmi and son-in-law Shally frequently motivated me by checking up on the progress of the book. My grandchildren, Prabhasis, Harmehar and Bhavneet were more of an encouragement rather than a disturbance to me. Because of his experience in writing manuals and books, I often consulted my dear nephew, Dr. Sandeep Singh, Assistant Entomologist, Punjab Agricultural University, Ludhiana, India, who also assisted me in compiling the subject index. Mr. Gagandeep Singh typed the manuscript and drew figures meticulously. Mr. Gurdeep Singh of ALPS Educational Services, Ludhiana formatted the book for camera-ready printouts. I like the way the book Essentials of Molecular Genetics has come up with respect to format and design. For this, I whole heartedly thank Mr. N.K. Mehra, Managing Director, Narosa Publishing House Pvt. Ltd., New Delhi. Gurbachan S. Miglani Visiting Professor School of Agricultural Biotechnology Punjab Agricultural University Ludhiana 141 004, Punjab, India Professor of Genetics (Retired) Department of Plant Breeding, Genetics and Biotechnology Former Adjunct Professor School of Agricultural Biotechnology Punjab Agricultural University Ludhiana 141 004, Punjab, India [email protected] [email protected] [email protected]

  Contents   Foreword Preface

1. 

 

2. 

 

3. 

 

4. 

Introduction

Birth of Molecular Genetics Mendelism Down to Molecular Level Sperm RNAs are delivered to Oocytes at Fertilization Nature, Structure and Organization of the Genetic Material Gene Structure, Function and Expression Gene Regulation Applying Knowledge about Genetic Material and Genes References Problems  

Nature of the Genetic Material 

Characteristics of Genetic Material Discovery of DNA as Genetic Material Discovery of RNA as Genetic Material Viral Genomes Latent Viruses References Problems  

Structure of the Genetic Material 

Structure of DNA Alternate Structural Forms of DNA Structure of RNA Fundamental Properties of Genetic Material References Problems  

Extranuclear Genomes 

Chloroplast Genomes Mitochondrial Genomes Kinetoplast DNA Centriole DNA Cyanelle DNA Promiscuous DNA Coupled Expression of Nuclear and Organelle Genomes References Problems

vii ix

1.1  1.1 1.2 1.3 1.3 1.3 1.4 1.4 1.4

 

2.1 

2.1 2.2 2.5 2.7 2.9 2.10 2.10

 

3.1 

3.1 3.6 3.13 3.15 3.17 3.18

 

4.1 

4.1 4.3 4.7 4.9 4.9 4.10 4.11 4.11 4.14

Contents 

xiv  5. 

Organization of the Genetic Material  DNA Kinetics Organization of Genetic Material Prokaryotic Genome Eukaryotic Genome Repeated DNA Sequences and Diseases References Problems

 

6. 

 

Packaging  of Nucleic Acids  DNA Packaging in Viruses/Bacteriophages DNA Packaging in Bacteria DNA Packaging in Nucleus of Eukaryotic Cells DNA Packaging in Eukaryotic Organelles References Problems

7. 

Replication of Nucleic Acids  Different Modes of DNA Replication DNA Replication in Prokaryotes Steps of DNA Replication Plasmid Replication Replication in Viruses DNA Replication in Eukaryotic Nuclear Chromosomes Replication of Mitochondrial DNA Replication of Chloroplast DNA Models of DNA Replication Duplication of RNA Polymerase Chain Reaction References Problems

 

8. 

 

 

Genetic Recombination  DNA Recombination in Bacteriophages DNA Recombination in Bacteria DNA Recombination in Eukaryotes Types of Recombination Models of Genetic Recombination Crossing Over Event Gene Conversion Genetics and Enzymology of Recombination Mitochondrial DNA Recombination Chloroplast DNA Recombination Intragenic Recombination RNA Recombination References Problems  

5.1  5.1 5.8 5.10 5.10 5.19 5.20 5.20  

6.1  6.1 6.3 6.6 6.19 6.19 6.22

7.1  7.1 7.1 7.6 7.12 7.12 7.12 7.24 7.25 7.26 7.28 7.29 7.30 7.32  

8.1  8.1 8.1 8.4 8.8 8.10 8.17 8.18 8.20 8.23 8.23 8.23 8.25 8.28 8.30  

Contents    

9. 

Mutation  Classification of Mutations Detection of Mutations Molecular Basis of Mutation Site-directed Mutagenesis In Vitro Mutagenesis Mutagenesis in Organelle Genomes Usefulness of Studies on Mutations Environmental Mutagenesis References Problems

10.  DNA Transposition  General Characteristics of Transposons Bacterial Transposable Elements Transposable DNA Phages Eukaryotic Transposable Elements Transposon Mutagenesis Illegitimate Recombination Mechanism of Transposition Role of Transposons Uses of Transposons Limitations of Transposons Mobile Elements and Genome Evolution References Problems  

 

11.  DNA and Non‐DNA Repair Mechanisms  DNA Damage Checkpoints and Response DNA Repair Mechanisms in Prokaryotes DNA Repair Mechanisms in Eukaryotes Diseases due to DNA Repair Defects References Problems

12.  Gene Structure   

Gene Concept Types of Genes Defining Gene? Introns Why are the Eukaryotic Genes Split? Overlapping Genes Pseudogenes Colinearity References Problems

13.   Defense Genes     

Immunoglobulin Genes Human Leukocyte Antigen Complex

xv  9.1  9.1 9.17 9.27 9.41 9.42 9.42 9.44 9.46 9.47 9.49

10.1  10.1 10.3 10.4 10.5 10.18 10.18 10.20 10.23 10.24 10.25 10.25 10.25 10.27  

11.1  11.1 11.1 11.20 11.34 11.37 11.40

12.1  12.1 12.6 12.18 12.19 12.21 12.23 12.24 12.24 12.28 12.30

13.1  13.1 13.8

Contents 

xvi  Oncogenes Proto-oncogenes Epigenetic Alterations and Neoplasia How Immune System Counteracts Oncogenes? Antioncogenes Antisense Molecules as Anticancer Drugs References Problems 

13.10 13.18 13.20 13.22 13.22 13.23 13.24 13.26 

14. 

Gene Organization 

14.1 

 

Gene Families Changes in Genome Size Changes in DNA Sequences Nucleotide Polymorphism References Problems 

14.1 14.10 14.12 14.13 14.14 14.14 

 

15. 

16. 

Gene Function   

15.1 

Relationship Between Gene and Enzyme Genetic Control of Biochemical Reactions One Gene-One Reaction Hypothesis Relationship Between Genotype and Phenotype One Gene-One Enzyme Hypothesis One Gene-One Polypeptide Hypothesis One Gene-One Chromomere Hypothesis One Gene-One Antigen Hypothesis One Cistron-One Polypeptide Hypothesis One Gene-One Ribosome-One Protein Hypothesis One Gene-One mRNA-One Protein Hypothesis One Gene-One Primary Cellular Function Hypothesis Gene Discoveries Two Genes-One Polypeptide Hypothesis Knowing Gene Function by Knocking off Genes One Gene-Many Proteins Hypothesis One Enzyme-Two Functions Concept Recent Thoughts on Gene Function References Problems

15.1 15.1 15.3 15.3 15.5 15.8 15.9 15.10 15.10 15.10 15.10 15.11 15.11 15.12 15.13 15.13 15.14 15.14 15.14 15.16

Transcription   

16.1 

Central Dogma and its Modification Transcription in Bacteria Transcription in Viruses Reverse Transcription Transcription of Eukaryotic Genes Chloroplast Transcription Mitochondrial Transcription References Problems

16.1 16.4 16.10 16.12 16.13 16.26 16.26 16.26 16.28

Contents    

17. 

18. 

19. 

20. 

xvii 

RNA Processing and RNA Editing 

 

17.1 

Pre-Messenger RNA Processing Pre-Transfer (Soluble) RNA Processing Pre-Ribosomal RNA Processing Self-Splicing of rRNA In Tetrahymena Alternative RNA Splicing RNA Editing References Problems

17.1 17.6 17.12 17.16 17.18 17.21 17.27 17.29 

Genetic Code   

18.1 

Size of Genetic Code Glossary of Terms on Genetic Code Properties of Genetic Code Deciphering the Genetic Code Genetic Code at Work Genetic Code Specificity Expansion of Genetic Alphabet Recoding The Second Code References Problems

18.1 18.2 18.2 18.7 18.9 18.9 18.10 18.10 18.12 18.13 18.14

Protein Biosynthesis 

 

19.1 

Translation in Prokaryotes Ribosome – Site of Protein Synthesis Translation in Eukaryotic Cytoplasm Protein Synthesis in Chloroplasts and Mitochondria Role of MicroRNAs in Protein Synthesis Hybrid Arrested Translation Hybrid Released Translation Protein Engineering References Problems

19.1 19.4 19.16 19.20 19.23 19.24 19.24 19.24 19.25 19.27

Fate of Nascent Proteins, Ribosomes, and Messenger RNAs  Protein Modification Protein Secretion in Prokaryotes Protein Translocation in Prokaryotes Co-Translational Targeting (Secretory Pathways) in Eukaryotes Post-Translational Targeting in Eukaryotes Protein Folding Molecular Chaperone Protein Degradation Fate of Ribosomes After Translation Messenger RNA Decay References Problems

 

20.1  20.1 20.2 20.5 20.7 20.9 20.12 20.13 20.14 20.16 20.16 20.19 20.20

Contents 

xviii  21.  Gene Expression Analysis 

21.1 

Glossary Related to Gene Expression Analysis Global Gene Expression Profiling Expression System Gene Expression Networks mRNA Quantification Protein Quantification Applications of Gene Expression Analysis Limitations of Gene Expression Analysis Systems Biology Approach in Gene Expression Analysis References Problems

21.1 21.2 21.2 21.3 21.3 21.6 21.8 21.10 21.10 21.13 21.14

 

 

22.   

Gene Regulation in Bacteria 

22.1 

Gene Regulation at DNA Level Gene Regulation at Transcriptional Level Antisense RNA in Prokaryotic Gene Regulation Alternative Sigma Factors Post-Transcriptional Control Control at Translational Level Post-Translational Control Alarmones References Problems

22.1 22.2 22.18 22.20 22.20 22.22 22.22 22.24 22.25 22.26

Gene Regulation in Viruses 

23.1 

23. 

24. 

 

 

SpoI Bacteriophage Lambda Bacteriophage T4 Bacteriophage T7 bacteriophage Human Immunodeficiency Virus References Problems

23.3 23.4 23.14 23.15 23.15 23.16 23.16

Gene Regulation in Eukaryotes   

24.1 

Gene Control Levels in Eukaryotic Cells Transcriptional Initiation in Mammalian Genes Molecular Zippers in Gene Regulation Gene Regulation by Non-Histone Proteins Artificial Gene Repressors Specificity in Eukaryotic Gene Transcription Metal-Regulated Transcription in Eukaryotes Regulation of Transposable Elements Control of Cell Cycle Cohesion Complex Regulates Gene Expression Gene Regulation by Hormonal Action References Problems

24.2 24.12 24.15 24.16 24.17 24.18 24.20 24.21 24.21 24.23 24.24 24.28 24.29

Contents    

25. 

26. 

xix 

Epigenetics 

25.1 

Epigenetic Patterns Bacterial Methylase Systems Characteristics of DNA Methylation Host Restriction and Modification Cytosine Methylation in Vertebrates DNA Methylation and Gene Expression DNA Methylation in Gene Regulation and Differentiation Mutations and Epimutations DNA Methylation in Genome Stability and Gene Silencing Epigenetic Reprogramming Chromosome Imprinting Dosage Compensation Epigenetic Variations Paternal X chromosome Inactivation DNA Methylation in Invertebrates DNA Methylation in Plants Paramutation Defense Against Spread of Transposable Elements Cyclic Changes in DNA Methylation RNA Methylation Histone Modification General Functions of Histone Modification Consequences of Histone Modifications References Problems

25.1 25.2 25.3 25.5 25.7 25.8 25.10 25.10 25.11 25.11 25.13 25.16 25.17 25.18 25.18 25.18 25.19 25.19 25.20 25.20 25.21 25.25 25.26 25.27 25.30

Noncoding RNAs and Gene Silencing 

26.1 

Noncoding RNAs Antisense Noncoding RNAs Gene Silencing Transcriptional Gene Silencing Phenomenon Associated with TGS Post-Transcriptional Gene Silencing Methylation of Non-Coding RNAs Translational Gene Silencing Exploitation of Gene Silencing References Problems

26.1 26.2 26.3 26.4 26.7 26.12 26.21 26.21 26.22 26.23 26.26

27.  Molecular Techniques and Tools  Separation of DNA/RNA In situ Hybridization Squash Dot Hybridization Southern Blotting Northern Blotting Western Blotting Eastern Blotting Dot Blots and Slot blots

27.1  27.1 27.3 27.5 27.5 27.6 27.7 27.8 27.8

Contents 

xx  Electrophoresis Colony or Plaque Hybridization Chromosome Walking Chromosome Jumping/Hopping Chromosome Landing Nick Translation RNA Sequencing DNA Sequencing RNA Synthesis DNA Synthesis Gene Synthesis Machines DNA Fingerprinting DNA Markers Microarrays Restriction Endonucleases Recombinant DNA Technology Quantitative Trait Loci Mapping References Problems

27.9 27.10 27.11 27.12 27.12 27.13 27.13 27.17 27.22 27.23 27.29 27.31 27.33 27.39 27.41 27.43 27.45 27.47 27.52

  Glossary                                                                                                                                                G.1  Subject Index                                                                                                                                        S.1 

1 Introduction Genetics began in mid-nineteenth century when Gregor John Mendel (1866) carefully analyzed mechanism of inheritance. In classical genetics, we deal with inheritance of phenotypic differences. To understand how certain differences are inherited, genetic analysis is conducted following the Mendelian approach. Mendel’s hybridization experiments were simple which brought forth the most significant principles that determined how traits were passed from one generation to the next. His experiments set a stage where the subject could be understood and the rules could clearly be formed to detect the presence of genes through hybridization without knowing the nature of genes or gene products. Soon it was discovered that Mendelian principles of heredity are equally applicable to most of the higher organisms, e.g., wheat, rice, animals, and even human beings. Mendel's experiments were not appreciated immediately because nobody could believe that genes were discrete objects. When Mendelism was rediscovered independently by E. von Tschermak (1900), C. Correns (1900) and Hugo deVries (1900), birth of genetics took place. Genetic analysis of inherited differences became possible. In genetic analysis, it is explored whether a particular contrasting trait is governed by one or more gene difference and whether gene in question is located in nucleus or cytoplasmic organelle. In animals, it is possible to find out whether a gene is sex-linked or autosomal. Genetic analysis also includes assignment of a gene to a particular chromosome and to locate specific position of the gene in a chromosome in relation to the other known genes in the nuclear or extranuclear genome. Several prokaryotes and haploid and diploid eukaryotes have been found to be very useful organisms in genetic analysis. In mid-1900s when biochemistry became highly developed, geneticists began to think about biochemical nature of genes and the causes of genetic variation. The observations made between 1920s and 1940s pointed out deoxyribonucleic acid (DNA) as the genetic material in all the cellular organisms. A species stores biological information in DNA. Each cell of a given species has a constant amount of DNA, which is doubled by replication and cell division ensures equal distribution of DNA of parent cell to daughter cells. The gametes have half the amount of DNA while the zygote possesses the amount of DNA characteristic of the species.

BIRTH OF MOLECULAR GENETICS Birth of molecular genetics took place with the seminal discovery of double helical structure of DNA by J.D. Watson and F.H.C. Crick (1953). With this discovery, genetics entered the DNA age. This was a very exciting phase in the development of genetics. Replication of DNA is the molecular basis of reproduction and hence the continuity of life. Replication faithfully keeps the genetic information intact over generations. Continuity of germplasm between all the descendent generations of a species explains many biological similarities that are inherited. By the end of 1960s it was known how a gene

1.2

Essentials of Molecular Genetics

was copied, how a gene was expressed, how a mutation arose, how genes were turned on and off according to the needs of the cell or organism. It became possible to identify the products of thousands of genes. These developments constituted a part of important branch of genetics called molecular genetics. Molecular genetics attempts to explain various principles, concepts and phenomena of genetics at molecular level. Central focus of researches in last more than half a century has been on doxyribonucleic acid (DNA) and ribonucleic acid (RNA). Molecular genetics is the branch of genetics concerned with the structure and activity of genetic material at the molecular level. It deals with issues such as how a gene is copied, how a mutation arises, how genes are turned on and off. Thus it is the study of the expression, regulation and inheritance of genes at the level of DNA and its transcription products. It is the area of knowledge concerned with the genetic aspects of molecular biology, especially with DNA, RNA and protein molecules. Important contributions of Francis Crick in the area of molecular genetics include those on diffraction by a helix, structure of DNA, coiled-coils, the adaptor hypothesis, wobble pairing , the three-letter code, the structure of collagen, the prediction of an “RNA world” and selfish DNA (Orgel 2004).

MENDELISM DOWN TO MOLECULAR LEVEL With the advent of gene cloning and recombinant DNA technology in 1970s, it not only became possible to manipulate DNA, it also helped in understanding Mendelism at molecular level. In the year 1990, a group of scientists, working at the John Innes Institute of Norwich (U.K.), cloned the pea gene r (rugosus; old name wrinkled, w) which determines whether the seed is round or wrinkled. It was shown that an isoform of starch branching enzyme (SBEI) is present in round (RR or WW and Rr or Ww) seeds, but absent in wrinkled (rr or ww) seeds. The gene for SBEI is found to be located on r locus. Such a type of following of the molecular basis of a Mendelian trait comes under the purview of modern branch of genetics, called reverse genetics (from DNA to phenotype) as opposed to a classical dihybrid cross in Drosophila. In peas, wrinkled seeds are due to a defect in starch synthesis, due in turn to absence of an enzyme, starch-branching enzyme I (SBEI) that normally produces branched-chain amylopectin to supplement the straight-chain amylose (Bhattacharyya et al. 1990). It seemed that lack of SBEI was at the root of wrinkled syndrome. Complementary DNA from round and wrinkled embryos was isolated. In comparison with RR plants, the SBEI gene transcript in rr plants is usually large as well as much reduced in quantity, and that the corresponding genomic DNA sequence has an insert of 0.8 kb. This size difference was shown to segregate from crosses with the r locus, as if the r mutation were due to an insertion of an extraneous sequence into the wild-type R gene. Nature of the insertion was investigated by cloning EcoRI fragments, hybridizing to a subclone of the cDNA, form both RR and rr plants. A single fragment was cloned from each genotype, of size 3.3 kb and 4.1 kb from RR and rr plants, respectively. When these clones were used in turn as probes on genomic EcoRI digests, the 3.3-kb sequence hybridized to single 3.3-kb and 4.1-kb fragments from RR and rr, respectively, but the 4.1-kb fragment recognized some 30 or more fragments of similar sizes in each genome, except for a 4.1-kb fragment again seen in rr but not in RR DNA. This showed that r mutation was probably due to the R locus of a sequence that is a member of a multicopy sequence family. The 0.8 kb insert in recombinant DNA had inverted terminal repeats of 12 bp and was flanked by an 8-bp tandem duplication evidently generated from a single copy sequence at the site of insertion. Inverted terminal repeats and host-site tandem duplications are hallmarks of transposable DNA elements that have been characterized from several plant species (Berg and Howe 1989). The 0.8-kb insertion in r allele is within an exon of SBEI gene. Excision of this 0.8-kb fragment will leave behind some or the entire 8-bp tandem repeat, with the probable effect of a frameshift mutation. Evidently, it has not excised in this way in the long evolutionary history of the present rr stocks. It means the mutation is stable.

Introduction

1.3

Is this wrinkled mutant, which brings together the old and new in genetics, really the same one studied by Mendel (Fincham 1990)? Bhattacharyya et al. (1990) believe so, because the present r mutation is widely distributed among European cultivars and there is no evidence that any other mutant of similar type existed in Mendel’s days.

SPERM RNAs ARE DELIVERED TO OOCYTES AT FERTILZATION We have believed that sperm passes on only DNA to oocyte during fertilization. Now it has been reported that human male gametes pass over more to the oocyte than just the haploid male genome. Paternal messenger RNAs are also delivered to the egg at fertilization. Ostermejer et al. (2004) identified six transcripts (clusterin, AKAP4, protamine-2, HSBP1, FOXG1B and WNT5A) that spermatozoa delivered to ooplasm at fertilization. Some of these transcripts that encode proteins that bind nucleic acids, such as protamine-2, are likely to be deleterious and are probably degraded following entry. Similar fate may await other RNAs that gain access. But some RNAs may have role in the developing zygote. For example, clusterin (also known as sulfated glycoprotein-2, or SGP-2) is delivered to the oocyte and has been implicated in cell-substratum interactions, enhancement of fertility rate, lipid transportation, membrane recycling, stabilization of stress proteins and promotion or inhibition of apoptosis. Thus these sperm RNAs could be important in early zygotic and embryonic development.

NATURE, STRUCTURE, ORGANIZATION AND PROPERTIES OF GENETIC MATERIAL Genetic material is the substance of which the genes are made. Genetic material is responsible for transmission of biological properties from one generation to the next. First step in study of molecular genetics is to understand the nature, structure, organization, evolution and properties of genetic material. Knowledge from different disciplines of science is utilized to find out answers to the following questions at molecular level about the genetic material – DNA and RNA: (1) What is chemical nature, structure, evolution and organization of genetic material? (2) How are the building blocks of nucleic acids synthesized? (3) How is DNA organized in viruses, prokaryotes and in nucleus, chloroplasts and mitochondria of eukaryotic cells? (4) What are the principles and procedures of DNA analysis? (5) What are the principles, procedures and applications of DNA sequencing? (5) What are the principles, procedures and applications of DNA synthesis? (7) How is the genetic material packed into viral, bacterial, eukaryotic chromosomes, mitochondria and chloroplasts? (8) How is the genetic material copied? (9) How does DNA recombine during transmission from one generation to the next? (10) How does DNA change (mutate) or post-synthetically modified? (11) How does DNA or RNA move from one place to another within the nuclear genome? (12) How does the cell check and repair its DNA damage? What are various DNA safeguard mechanisms?

GENE STRUCTURE, FUNCTION AND EXPRESSION The second step in molecular genetics is the understanding of the following aspects of the gene: (1) What is the structure of the gene and what are different types of genes? (2) What function does gene perform? (3) How does the gene express itself? (4) How are the transcripts processed into functional molecules? (5) What are different levels at which gene expression is regulated? (6) What is the language of life? (7) How are the proteins synthesized by the cell? (8) What is fate of proteins synthesized by cells? (9) What happens to the ribosomes and used messenger RNAs after protein synthesis is over?

1.4

Essentials of Molecular Genetics

GENE REGULATION Fundamental questions that deal with regulation of gene expression are: (1) How is the expression of gene regulated in bacteria, viruses and eukaryotes? (2) What is the role of methylation of DNA in various biological functions? (3) What are various biological functions of histone modifications in relation to regulation of gene expression in eukaryotic cells? (4) What are important applications of seminal discoveries in molecular genetics? (5) What is the molecular basis of development? (6) What is molecular explanation of evolution?

APPLYING KNOWLEDGE ABOUT GENETIC MATERIAL AND GENES Knowing answers to above mentioned questions about genetic material and gene enables one to find out how these principles of molecular genetics are utilized to develop molecular tools to apply the properties of genetic material and gene for welfare of mankind. Researches done in last couple of decades in different disciplines of biological (genetics, biochemistry, microbiology) and physical (nuclear physics, organic chemistry) have lead to development of molecular biotechnology, a branch of science which is a multidisciplinary approach for utilizing the knowledge about genetic material (DNA and (RNA) and gene in medicine, solving the problem of environmental pollution, animal and crop improvement.

REFERENCES Berg, D.E., and M.M. Howe. (eds). 1989. Mobile DNA. Washington, D.C.: American Society for Microbiology. Bhattacharyya, M.K., A.M. Smith, T.H.N. Ellis, C. Hedley, and C. Martin. 1990. The wrinkled seed character of pea described by Mendel is caused by a transposon-like insertion in a gene coding starch-branching enzyme. Cell 60: 115-22. Correns, C. 1900. G. Mendel’s Regel uber das Verhalten der Nachkommenschaft er Rassen-bastarde. Berichie der deutschen botanischen Gesellschaft. 18: 158-68. deVries, H. 1900. Sur la loi de disjonction des hybrids. C.R. Acad. Sci. Paris 130: 845-7. Fincham, J.R.S. 1990. Mendel – now down to molecular level. Nature 343: 208-9. Mendel, G.J. 1866. Experiments in plant hybridization. English translation made by Royal Hort. Soc., London, Harvard University Press, 1916. Orgel, L.E. 2004. Francis Crick (1916-2004). Science 305: 1118. Ostermejer, G.C., D. Miller, J.D. Huntriss, M.P. Diamond, and S.A. Krawetz. 2004. Delivering spermatozoan RNA to the oocyte. Nature 429: 154. Tschermak, E. 1900. Umber kunstliche Kreuzung bai Pisun satium. Zeit. Landev. Versuch. Oest. 3: 465-555. Watson, J.D., and Crick, F.H.C. 1953. A structure for deoxyribose nucleic acids. Nature 171: 737-8.

PROBLEMS 1. 2.

Numerous questions in the area of molecular genetics have been addressed in this chapter. Try to write some more such questions and attempt to find their answers. Discuss these questions-answers with your teacher. But for the discovery of double-helical structure of nucleic acids, birth of science of molecular genetics would have not taken place. Comment.

2 Nature of the Genetic Material Hofmeister (1848) first saw chromosomes in microsporocytes of Tradescantia. The earliest work to determine the chemical nature of the genetic material dates back to 1868 when a Swiss biologist Friedrich Miescher while working on white blood cells extracted material from nucleus which he termed as ―nuclein‖. This substance was very rich in phosphorus. It was extracted from pus cells. Miescher separated it into a protein and an acid molecule (Miescher 1988). Waldeyer (1888) coined the term chromosome. A German pathologist Altmann (1889) introduced the term ―nucleic acid‖. It is now known as deoxyribonucleic acid (DNA). Mendel (1865) proposed particulate factors, which governed the development of characters in peas but he was not able to determine the nature of factors at that time, which we now know as genes. While the period from the early 1900s has been considered the ―golden age‖ of genetics, scientists still had not determined the nature and structure of the hereditary material. However, during this time a great many discoveries were made and the link between genetics and evolution was established.

CHARACTERISTICS OF GENETIC MATERIAL The genetic material of a virus, bacterial cell or an organism refers to those materials found in the nucleus, mitochondria and cytoplasm, which play a fundamental role in determining the structure and nature of cell substances, and capable of self-propagating and variation. The genetic material of a cell can be a gene, a part of a gene, a group of genes, a DNA molecule, a fragment of DNA, a group of DNA molecules, or the entire genome of an organism. The genetic material is that material in cells that is responsible for inheritance of characteristics from mother cell to the daughter cells, therefore, from an individual organism to its offspring. To function, the genetic material must possess certain characteristics: (1) It must contain, in a stable form, information encoding the organism’s structure, functions, development and reproduction. (2) It should replicate with such accuracy that identical copies of it are produced and duplicate copies produced are passed on equally to the daughter cells. (3) The genetic material must allow error in a very low frequency for the origin of genetic variation through mutations. "Genes lie in the chromosomes" is the thesis of the chromosomal theory of inheritance. Obviously some component of the chromosome must be the genetic material. Chromosome is made up of proteins and nucleic acids. Proteins are composed of polypeptides, which are made up of amino acids linked together. Twenty-two essential amino acids are known to be present in proteins. The proteins are quite complex in shape and composition and they appear to qualify as possible genetic substances. Nucleic acids are of two kinds – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Nucleic acids are long molecules (approximately 200,000 nucleotides made up four types of nucleotides. A nucleotide is in turn composed of three compounds linked together. These three compounds are (a) a phosphoric

2.2

Essentials of Molecular Genetics

acid, (b) a pentose sugar, and (c) a nitrogen-containing aromatic base. In nucleic acids, nucleotides are hooked together in form of sugar-phosphate-sugar-phosphate... linkage to form a polynucleotide chain. Nitrogen-containing aromatic bases are attached to sugar molecules as side chains. DNA molecules are very complex and, therefore, DNA is also candidate for genetic material. There are evidences to show that DNA is the genetic material in all the eukaryotes and prokaryotes. Some viruses have RNA as the genetic material. A few evidences are described here.

DISCOVERY OF DNA AS GENETIC MATERIAL Transformation Transformation is the process in which genetic material is transferred from one bacterium to another without involving any intermediate organism. A bacterial cell may incorporate in itself genetic material from medium also. This is also termed as transformation. Griffith effect: Griffith (1928) worked on bacterium Diplococcus pneumonia, which is associated with certain types of pneumonia. This bacterium occurs in two forms. The first type has smooth (S) cells, which secrete a covering capsule of polysaccharide materials causing the colonies on agar to be smooth and shiny. This type is virulent. It produces pneumonia in mice. The second type has rough (R) cells, which lack a polysaccharide capsule. Colonies appear rough and dull. This type is nonvirulent. It does not cause pneumonia in mice. Smooth (S) and rough (R) traits are genetically determined. Griffith performed the experiment shown in Figure 2.1. When laboratory mice were injected with S strain of bacteria, mice died. When he injected mice with R strain of bacteria, mice lived. When mice were injected with heat killed S strain of bacteria, mice lived. When mice were injected with R strain plus heat-killed S strain of bacteria, mice died. Heat killed S strain somehow transformed R strain to virule-

Figure 2.1 Transformation experiment by Griffith (1928) (Modified from Lents, N.H. DNA I - The Genetic Material.www.visionlearning.com/library/module _viewer....)

Nature of the Genetic Material

2.3

nce. This process was called "transformation". The agent that was responsible for transformation was called ―transforming principle‖ or ―transforming agent‖. Griffith’s work led other physicians and bacteriologists to research the phenomenon of transformation. Dawson and Sia (1931) at the Rockfeller Institute had not only confirmed Griffith’s observations but also did transformation experiment in vitro by growing R cells in a fluid medium containing anti-R serum and heat killed encapsulated S cells. Later, Alloway (1933) showed that crude extracts containing active transforming material in soluble form also effectively induced specific transformation. Transforming principle: Neufeld et al. (1931) and Alloway (1932) initiated work to understand the chemical nature of transforming principle. Avery et al. (1944) identified chemical nature of the transforming principle. They separated extract of S strain into different fractions, viz., polysaccharides, lipids, proteins, RNA and DNA. Only extract containing DNA was effective. Then they separately added these fractions to culture media containing living R type. The results shown in an experiment given in Figure 2.2 demonstrates that DNA is the transforming principle. It was thus shown that genes, the units of heredity, are made up of DNA. Protease Extract S strain

Virulence Amylase

Extract S strain

Virulence Lipidase

Extract S strain

Virulence RNase

Extract S strain

Virulence DNase

Extract S strain

No Virulence

Figure 2.2 Summary of results of experiment by Avery, MacLeod and McCarty (1944) (Modified from Lents, N.H. DNA I - The Genetic Material. www.visionlearning.com/library/ module _viewer....)

Transduction Clear implication of DNA as the genetic material came from transformation experiments by Zinder and Lederberg (1952) on mouse typhoid bacterium Salmonella typhimurium. Their experiment involved process of transduction. In this process, a bacterium-infecting virus (bacteriophage) serves as the vector transferring DNA from one bacterial cell to another. They used two strains of bacteria, LA2 (phe+ trp+ met– his–) and LA22 (phe– trp– met+ his+) and placed each strain in one of the two arms of a Davis U-tube separated by a sinerted glass filter. This filter prevents the transfer of bacterial cells but permits the free passage of growth medium between the two arms through the application of suction and pressure. Phage P22 is capable of generalized transduction. In one arm, there were lysogenic bacteria LA22 and dormant virus P22 whereas in other arm there were bacteria L2 susceptible to the virus, i.e., lytic bacteria LA2 was present (Figure 2.3). Some of the lysogenic bacteria lysed and released the virus particles crossed the filter and infected the susceptible strain. As a result a "filterable agent" (FA) arose in connection with LA2 that could produce prototrophs in LA22. This FA was phage P22. It was found that "phe+ trp+ factor" could pass through the filters that hold bacteria but allow

2.4

Essentials of Molecular Genetics

Figure 2.3 Transduction experiment of Zinder and Lederberg (1952) in Salmonella typhimurium providing one line of evidence for DNA as genetic material (Modified from http://bio3400.nicerweb.com/Locked/media/ch06/ U-tube.html)

viruses to pass through. The "phe+ trp+ factor" was located inside the virus. Production of recombinant viral DNA was result of transduction. Transduction is defined as the process of the transfer of genetic material from one genotype to another through agency of bacteriophages. Transformation and transduction showed that DNA is the genetic material in bacteria.

Viral Infectivity Hershey and Chase (1952) reasoned that virus’s genetic material showed two properties. First, it must pass into infected cell and second, it must be passed on to the next generation of viruses. They used two cultures of bacteriophage T2 for infection. This phage is relatively simple in molecular constitution. Most of its structure is protein with DNA contained inside the protein sheath of its head. Phosphorus is not found in proteins but is an integral part of DNA. Sulfur is not found in DNA but is present in proteins. In one of the cultures, DNA was labeled with radioactive 32P and in second culture 35 S-labeled proteins was used. After infecting bacteria with viruses and later on centrifugation it was found that in cells infected with protein-labeled bacteriophages bulk of radioactivity remained in empty coats and in the other case bulk of radioactivity passed inside the host cell (Figure 2.4).

Nature of the Genetic Material

2.5

Figure 2.4 Experiments of Hershey and Chase (1952) which showed that labeled DNA was transmitted whereas no labeled protein was transmitted to T2 during infection of bacteria (Modified from www.accessexcellence.org/ AB/GG/hershey.php)

This protein residue consists of material forming the protective membrane of the resting phage particle and plays no further role in infection after the attachment of phage to the bacterium. They thus demonstrated that genetic material of phage T2 is DNA, not protein. A.D. Hershey and M. Chase were awarded Nobel Prize in 1969 for providing proof for DNA as the genetic material in T2 phage. Above experiments conclusively proved DNA as the genetic material. All known cellular organisms have DNA as their genetic material.

DISCOVERY OF RNA AS GENETIC MATERIAL Some viruses use RNA as genetic material. Tobacco mosaic virus (TMV) is composed of RNA and protein; it contains no DNA. Gierer and Schramm (1956) showed that when purified RNA from TMV

2.6

Essentials of Molecular Genetics

is applied directly to tobacco leaves, they develop mosaic disease. Pre-treating the purified RNA with RNase destroys its ability to cause TMV lesions. While working on reconstitution of TMV, FraenkelConrat and Singer (1957) and Fraenkel-Conrat et al. (1957) demonstrated that RNA is the genetic material in TMV (Figure 2.5).

Figure 2.5 Viral reconstitution experiment of Fraenkel-Conrat and Singer (1957) showing that RNA is the genetic material in TMV (Modified from http://biosiva.50webs.org/rep1.htm)

In TMV, identical protein subunit are arranged helically forming a hallow cylinder in which RNA in form of that spiral is enclosed. RNA and protein components of viruses separate and aggregate. These scientists discovered that the RNA core and proteins coat from wild-type TMV and other viral strains could be isolated separately. In their work, RNA and coat proteins were separated and isolated from TMV and a second viral strain, Holmes Ribgrass (HR). Then, mixed viruses were reconstituted from the RNA of one strain and the proteins of the other. When this hybrid virus was spread on tobacco leaves, the lesions that developed corresponded to the type of RNA in the reconstituted virus, i.e., viruses with wild-type TMV RNA and HR protein coats produced TMV lesions and vice-versa. This experiment proved that RNA acts as the genetic material in some viruses.

Nature of the Genetic Material

2.7

Thus it was concluded that genetic information is carried and flows through polynucleotides in the form of DNA or RNA.

VIRAL GENOMES Either DNA or RNA, never both, is genetic material in viruses. RNA, which serves as genetic material is termed genetic RNA.

DNA Viruses DNA viruses can be further divided into following two classes: (a) those that have their genes in a double-stranded DNA molecule (dsDNA) and (b) those that have their genes in a molecule of singlestranded DNA (ssDNA). Double-stranded DNA viruses. Examples of dsDNA viruses are: smallpox (variola); vaccinia (used to immunize against smallpox until the disease was eliminated from the planet); varicella-zoster (causes chicken pox the first time; shingles the second); adenoviruses; SV40 (a virus that infects primate cells and causes tumors in rodent cells); T2 and T4 (from which much early information about gene structure and expression was learned); lambda (a popular vector); herpes viruses include herpes simplex viruses (Herpes simplex viruses are of two types: HSV-1 — usually infects the trigeminal nerves periodically causing "cold sores" on the lips and face and HSV-2 — usually infects the genitals); KSHV (causes Kaposi's sarcoma in AIDS patients and other people with suppressed immune systems. KSHV is also called human herpes virus 8 (HHV-8); cytomegalovirus (CMV) (most of us have it; can cause blindness — even death — in people with suppressed immune systems. It binds to the epidermal growth factor receptor (EGFR) on the cells); and Epstein-Barr virus (EBV) (causes mononucleosis and has been implicated in the development of Burkitt's lymphoma (a cancer) and Hodgkin's disease. Its genome has been completely sequenced: 172,282 bp of DNA encoding 80 genes). Single-stranded DNA viruses. Well-known examples of ssDNA viruses are X174 (a famous bacteriophage that infects Escherichia coli) that helped usher in the modern era of molecular genetics. Its single-strand of DNA has 5,386 nucleotides and encodes 10 genes) and adeno-associated virus (AAV) (can only grow in cells infected with adenovirus, shows great promise as a safe and effective vector for introducing therapeutic genes into human patients. It is a filamentous bacteriophage composed of circular ssDNA, which is 6407-nucleotide long, encapsulated in approximately 2700 copies of the major coat protein P8, and capped with 5 copies of two different minor coat proteins (P9, P6, P3) on the ends.

RNA Viruses RNA viruses may contain single-stranded RNA (ssRNA) or double-stranded RNA (dsRNA) as genetic material. Single-stranded RNA viruses. ssRNA viruses occur in two distinct groups – negative-stranded RNA viruses and positive-stranded RNA viruses. Negative-stranded RNA viruses. The genome of these viruses consists of single-stranded antisense RNA; that is, RNA that is the complement of the message sense. This is also called negative-stranded RNA (Lamb and Kolakofsky 1996). Examples of negative-stranded RNA viruses, where genome consists of single-stranded antisense RNA, are: measles, mumps, respiratory syncytial virus (RSV),

2.8

Essentials of Molecular Genetics

parainfluenza viruses (PIV), human metapneumovirus, rabies, Ebola, influenza. In addition to its antisense RNA genome, the core of the virion contains an RNA replicase, which is an RNA-dependent RNA polymerase. Once released in the host cell, this polymerase makes many complementary copies of the genome, which are "sense" and serve as messenger RNAs. These are translated into the proteins needed to assemble fresh virions, e.g., capsid proteins and RNA polymerase. Positive-stranded RNA viruses. These viruses have a genome that consists of single-stranded sense RNA; that is, the RNA has message sense (can act as a messenger RNA). This was also called positive-stranded RNA by Rueckert (1996). Examples of positive-stranded RNA viruses are: polioviruses, rhinoviruses (frequent cause of the common "cold"), coronaviruses (includes the agent of Severe Acute Respiratory Syndrome (SARS), rubella (causes "German" measles), yellow fever virus, West Nile virus, dengue fever viruses, equine encephalitis viruses, hepatitis A ("infectious hepatitis") and hepatitis C viruses, tobacco mosaic virus (TMV). In these viruses, the "sense" RNA encodes an RNA replicase (an RNA-dependent RNA polymerase) that is translated by the host machinery (ribosomes, etc.) into the enzyme, which catalyzes the synthesis of large number of "antisense" replicative intermediates. These intermediates serve as template for the synthesis of large numbers of mRNA molecules that are translated by the host cell machinery into the proteins needed to make fresh virions are incorporated into the new virions. Double-stranded RNA viruses. Reovirus and several plant viruses are the examples of viruses where genome consists of several molecules of dsRNA (Smart et al. 1999). The virus particle contains enzymatic machinery that transcribes each of the dsRNA molecules into mRNA (complete with cap) and exports these into the cytosol of the infected cell.

Retroviruses Retroviral RNA (also single-stranded) is copied by reverse transcriptase into a DNA genome within the host cell. The DNA intermediate may be incorporated into the host genome as DNA where it replicates by producing RNA copies during transcription. Retroviruses include the human immunodeficiency virus (HIV). HIV was first isolated in 1983 separately by doctors at Robert Gallo lab at the U.S. National Institutes of Health and Luc Montagnier laboratory at France's Pasteur Institute. HIV is responsible for acquired immune deficiency syndrome (AIDS). These viruses contain a reverse transcriptase that copies their RNA genome into DNA. Examples of retroviruses are: the Rous sarcoma virus (RSV), HIV-1 and HIV-2, HTLV-1 and HTLV-2. About 3 per cent of the people infected with HTLV-1 develop leukemia. Reteroviruses are RNA tumor viruses. They can cause cancer. Viruses make DNA copies, which are able to integrate into cellular DNA. Thus they become part of the genetic complement of the cell. Sequencing studies have shown striking similarities between these structures and transposable genetic elements from both prokaryotes and eukaryotes (Flavell 1981).

Plant comoviruses and animal picornaviruses Nearly 20 per cent of the packaged RNA in bean-pod mottle virus (BPMV) binds to the capsid interior in a symmetric fashion and is clearly visible in the electron density map (Chen et al. 1989). The RNA displaying icosahedral symmetry is single-stranded with well-defined polarity and stereochemical properties. Interactions with proteins are dominated nonbinding forces with few specific contacts. The tertiary and quaternary structures of the BPMV capsid proteins are similar to those observed in animal

Nature of the Genetic Material

2.9

picornaviruses (small RNA viruses) supporting the close relationship between plant comoviruses and animal picornaviruses.

LATENT VIRUSES Most of the infective cycles described for the various viruses end in the death of the host cell. Bacterial cells literally burst, a process called lysis, and similar infective cycles are called lytic cycles. In some cases though, the events of the lytic cycle are not completed. Escherichia coli infected by a DNA bacteriophage may resume its normal existence, including reproducing itself. Where has the virus gone? It is still there and, in fact, is present in the descendants of the bacterium. That these cells still harbor the virus can be demonstrated by irradiating the cells with ultraviolet rays or treating them with certain chemicals. Such treatment restores the normal lytic cycle. The phage is said to have been "rescued". The stable relationship between a bacteriophage and its host is called lysogeny. The viral DNA actually becomes replicated when the host's DNA is replicated prior to each cell division. During lysogeny, the phage is called a prophage. A. Lwoff was awarded Nobel Prize in 1965 for demonstration of prophage. Free bacteriophage found in cultures of Bacillus megaterium arises from the lysis of a limited number of cells of the culture. There can be little doubt that the bacteriophage is passed from cell to cell at division, and in all probability multiplies along with the cell. Lwoff and Gutmann (1950) followed one cell through 19 divisions, never observing lysis or demonstrating free bacteriophage. The cells of the 19 th generation were all lysogenic. The internal virus, called probacteriophage by Lwoff and Gutmann (1950), must multiply during cell division. In some cases, the prophage DNA becomes inserted into the chromosome of its host. In fact, when the phage is "rescued", the released virions may contain some host genes as well as their own. When these virions infect new hosts, they insert these bacterial genes into them. This process of genetic transfer, a virus-mediated transformation, is called transduction. What does the prophage do while it is a part of its host genome? It can express certain of its genes. For example, the gene that encodes diphtheria toxin is the property of a prophage in the diphtheria bacillus, not of the bacillus itself. Some animal viruses can also establish latent infections. Simian virus 40 (SV40) is a DNA virus that produces a lytic infection in the kidney cells of the African green monkey (these cells are used to cultivate viruses in the laboratory) but a latent infection in the cells of humans, mice, rats, and hamsters. Like lysogeny in bacteria, the SV40 genome becomes incorporated in the DNA of its host (in chromosome 7 in human cells). Latent infections may also cause the cell to become cancerous. The cell thus becomes transformed. In these cases, the word fulfills both of its biological meanings: (1) "transformed" by the incorporation of new DNA and (2) "transformed" as it becomes cancerous. R. Dulbecco was awarded Nobel Prize in 1975 for discovering the mechanism by which animal virus SV40 inserts its DNA into animal cell and causes transformation of animal cell (Dulbecco and Vogt 1954; Dulbecco et al. 1965). Crowded cultures of mouse kidney cells have a very low rate of DNA synthesis, and very low activities of the enzymes involved in DNA synthesis. After infection with polyoma virus, both the enzyme activities and the rate of DNA synthesis markedly increase. It is of special interest that the DNA synthesized in the infected cells is predominantly cellular. The ability of the virus to stimulate the synthesis of cellular DNA may be related to its tumorigenic property. In humans, lytic infections of plasma cells by the Epstein-Barr virus (EBV) occur in mononucleosis. Latent infections of B cells by EBV predispose the person to lymphoma. While lytic infections by human papilloma virus (HPV) cause genital warts, latent infections by some strains of HPV lead to cervical cancer.

2.10

Essentials of Molecular Genetics

REFERENCES Alloway, J.L. 1932. The transformation in vivo of Pneumococci into S forms of different specific types by the use of filtered pneumococcus extracts. J. Exp. Med. 55: 91-9. Alloway, J.L. 1933. Further observations on the use of pneumococcus extracts in effecting transformation of type in vitro. J. Exptl. Med. 57: 265-78. Altmann, R. 1889. Uber Necleisauren. Arch. Anat. Physiol. Abt. Physiol. 1889. Pp. 534-36. Avery, O.T., C.M. MacLeod, and M. McCarty. 1944. Studies on the chemical nature of substance inducing transforming of pneumococcal types. Induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type III. J. Exptl. Med. 79: 137-58. Chen, Z., C. Stauffacher, Y. Li, T. Schmidt, et al. 1989. Protein-RNA interactions in an icosahedral virus at 3Å resolution. Science 245: 154-9. Dawson, M.H., and R.H.P. Sia. 1931. A technique for inducing transformation of pneumococcal types in vitro. J. Exptl. Med. 54: 681-99. Dulbecco, R., and M. Vogt, 1954. Plaque formation and isolation of pure lines with poliomyelitis viruses. J. Exp. Med. 99: 967-82. Dulbecco, R., Hartwell, L. H., and Vogt, M., 1965. Induction of cellular DNA synthesis by polyoma virus. Proc. Nat. Acad. Sci. USA 53: 403-10. Flavell, A. J. 1981. Did retroviruses evolve from transposable elements? Nature 289:10–1. Fraenkel-Conrat, H., and B. Singer. 1957. Virus reconstitution. II. Combination of protein and nucleic acid from different strains. Biochimica et Biophysica Acta 24: 540-8. Fraenkel-Conrat, H., B. Singer, and R.C. Williams 1957. Infectivity of viral nucleic acid. Biochimica et Biophysica Acta 25: 87-96. Gierer, A., and Schramm, G. 1956. Infectivity of ribonucleic acid fron tobacco mosaic virus. Nature 177: 702-3. Griffith, F. 1928. The significance of Pneumococcal types. Jour. Hygiene 27:113-59. Hershey, A.D., and M. Chase. 1952. Independent functions of viral protein and nucleic and in growth of bacteriophage. Jour. Gen. Physiol. 36: 39-56. Hofmeister, W. 1848. Uber die Entwicklung des pollens.Bot. Zart. 6: 424-34, 649-58, 670-4. Lamb, R.A., and D. Kolakofsky. 1996. Paramyxoviridae: The viruses and their replication. In: Fields Virology. Eds Knipe, B.N. and Howley, P.M. pp. 1177-204. New York: Lippincott-Raven. Lwoff, A., and A. Gutmann. 1950. Reserches sur un B. megatherum lysogene. Ann. Ist. Parteur 78: 711-39. Mendel, G.J. 1866. Experiments in plant hybridization. English translation made by Royal Hort. Soc., London, Harvard University Press, 1916. Miescher, F. 1888. Der Athemmschieber-Ein neur Apparat zur kimstlischen Respiration und sieve Controllean lebenden Thiere, Cent. Bl. Physiol. 14: 342. Neufeld F., and R. Etinger-Tulczynska. 1931. Nasale Pneumokokkeninfektio-. nen und Pneumokokkenkeimträger im Tierversuch. Z Hyg Infektionskr. 112: 492-526. Rueckert, R.R. 1996. Picornavividae: The Viruses and Their Replication. Pp. 609-54. In: Fields, B.N. ed. Virology. Philadelphia: Lippincott-Raven. Smart, C.D., W. Yuan, R. Foglia, D.L. Nuss, D.W. Fulbright, and I. Hillman. 1999. Ryphonectria hypovirus 3, a virus species in the family hypoviridae with a single open reading frame. Virology 265: 66-73. Waldeyer, W. 1888. Über Karyokinese und ihre Bezichung zu den Befruchningsvorgängen. Arch. Mikr. Anat. 32: 1-35. Zinder, N.D., and J. Lederberg. 1952. Genetic exchange in Salmonella. J. Bacteriol. 64: 679-99.

PROBLEMS 1. 2.

If the proteins were the genetic material, what impact would it have on the mechanism of inheritance? Nucleic acids (DNA or RNA) as genetic material have been assigned the role of governing various characteristics of an organism. Think of the actual role of the genetic material in the cell.

3 Structure of the Genetic Material Deoxyribonucleic acid (DNA) is genetic material in prokaryotes, eukaryotes and many viruses. Ribonucleic acid (RNA) is genetic material only in some viruses. Now we will discuss chemical structure of these macromolecules.

STRUCTURE OF DNA The earliest work to determine the structure of nucleic acids was done by Levene and London (1929) who determined the presence of 2′-deoxyribose sugar and proposed tetranucleotide structure of DNA. Chargaff (1950) concluded that deoxypentose nucleic acids from animal and microbial cells contained varying proportions of the same four nitrogenous constituents. Their composition appeared to be characteristic of the species, but not of the tissue, from which they are derived. Many scientists worked on establishing the structure of DNA. Among them, most eminent groups were of Franklin and Gosling (1951), Chargaff (1950), Wilkins et al. (1953), Watson and Crick (1953a, b, c, d). J.D. Watson, F.H.C. Crick and M.H.F. Wilkins were awarded Nobel Prize in 1962 for proposing double helix model of DNA. There are four types of nucleotides in DNA – deoxyadenosine-5'-phosphate, deoxyguanosine-5'phosphate, deoxycytidine-5'-phosphate, and deoxythmidine-5'-phosphate. Each deoxyribonucleotide contains phosphate, deoxyribose sugar and one of the four nitrogen-containing bases. The four bases are: adenine (A), guanine (G), cytosine (C) and thymine (T). Adenine and guanine have a two-ring structure and are known as purines whereas thymine and cytosine have one-ring structure and are known as pyrimidines. Bases are hydrophobic. There are two systems of numbering pyrimidines – old and new (Figure 3.1). In this book, old system of numbering has been used. Structures of different molecules forming DNA are depicted in Figure 3.2. Structure of four types of deoxyribonucleotides is given in Figure 3.3. Total amount of pyrimidine (T+C) nucleotides is always equal to total amount of purine (A+G) nucleotides in DNA. Further, amount of T always equals amount of A and C always equals G. Width of DNA molecule was found to be 20Å units and each nucleotide pair was spaced 3.4Å apart. Keeping above-mentioned facts in mind, Watson and Crick (1953a) proposed a double helix structure for DNA. Salient features of their model are: DNA molecule is two-stranded and coiled like a rope. The two DNA strands are complementary, i.e., adenine pairs with thymine with two hydrogen bonds and guanine pairs with cytosine with three hydrogen bonds (Figure 3.4). Two strands of DNA run antiparallel, one strand is called 5'  3' and other one 3 ' 5' (Figure 3.5). Each strand is a chain of nucleotides held together by phosphodiester bonds. The two polynucleotide strands are held together

3.2

Essentials of Molecular Genetics

Figure 3.1 Old and new systems of numbering of nitrogenous bases in nucleic acids

Figure 3.2 Structure of molecules forming nucleic acids: 1. ribose, 2. 2′-deoxyribose, 3. phosphoric acid, 4. adenine, 5. guanine, 6. uracil, 7. thymine and 8. cytosine (Numbering has been done following the old system shown in Figure 3.1)

Structure of the Genetic Material

3.3

Figure 3.3 The four deoxyribonucleotides of DNA

by hydrogen bonds between bases. The purines and pyrimidines are joined with dexyribose by βglycosidic bonds. The intertwining of two strands of DNA is just like a staircase in which railings are made up of sugar-phosphate backbones and stairs are made up of bases. The bases stack on the top of one another in twisted structure of double helix and hydrophobic interactions keep nucleotide pairs at their positions. The double helix of DNA looks like two interlocked strands (Figure 3.6A). Unless circular, one end has free 5'-OH; other end has free 3'-OH. Oxygens of phosphates are polar and negatively charged.

Electrostatic Interactions in DNA Negative charges of phosphates can repel one another. Salt bridging causes shielding of phosphate anions. At low ionic strength the helix is destabilized and becomes insoluble. Adding a little salt (especially Mg2+) increases the solubility by forming electrostatic pairs with the backbone phosphates. Binding of polycations (e.g., proteins with many Lysine and Arginine residues) could affect solubility by neutralizing the phosphate charges. This might cause precipitation, if unordered. If the association of DNA with polycations is controlled, it can cause tight packing of the DNA. Most of the DNA in human cells is tightly packed with cationic proteins called histones.

3.4

Essentials of Molecular Genetics

Figure 3.4 Two complementary base pairs found in DNA. This shows that two DNA strands are complementary to each other, i.e., adenine pairs with thymine with two hydrogen bonds, and guanine pairs with cytosine with three hydrogen bonds

Figure 3.5 Glycosidic, phosphodiester, hydrogen and hydrophobic bonds hold DNA together

Structure of the Genetic Material

3.5

Figure 3.6 (A) Right-handed DNA double helix as predicted by Watson and Crick (1953). Major and minor grooves of DNA are also shown. (B) Left-handed DNA double helix

The helix axis is most apparent from a view directly down the axis. The secondary structure of DNA is a double helix, with the phosphates out and the bases in. Is this what DNA looks like in the cell? That's a tough question to answer because structural information is usually obtained by hitting crystallized molecules with an X-ray beam, conditions not compatible with most living organisms.

Advantages of Watson-Crick Model Double helical model of DNA proposed by Watson and Crick (1953a) had important implications in explaining a few important biological phenomena (Watson and Crick, 1953b). How is biological information stored in DNA? How does DNA replicate? How does genetic recombination take place? How do the mutations occur?

Some Facts about DNA Structure (1) (2) (3) (4)

Molecular weight of DNA is expressed in terms of DNA size in base pairs (bp) or kilo base pairs (kbp). A molecular weight of 1,000,000 corresponds to 1.5 kbp. One mm of DNA comprises of 2.9×106 nucleotide of double-stranded DNA. DNA is sensitive to acid conditions. Purine bases are selectively lost (depurination) by hydrolysis of the β-glycoside bonds.

Essentials of Molecular Genetics

3.6 (5) (6) (7) (8) (9)

(10)

(11)

(12) (13)

There are no restrictions on the sequence of bases along a polynucleotide chain; however, the precise sequence of bases carries the genetic information. Structura1 DNA is a remarkably long, threadlike macromolecule. DNA varies in length, depending on the organism. Certain viral DNA is 1.7 mm, whereas human DNA is nearly a meter in length. The DNA in cells is not inert. It is constantly moving and the bases are constantly pairing and unpairing. Sections of double helices will occasionally come apart and „breathe‟. This is normal. Genetic information present in DNA is expressed through the processes of transcription, in which RNA (ribonucleic acid) is synthesized from a DNA template strand, and translation, in which proteins are synthesized from RNA (specifically, messenger RNA) templates. DNA is damaged in a variety of ways, and all cells have mechanisms by which to repair the damage. DNA repair is possible because genetic information is stored in both strands of the double helix; therefore, information lost from one strand can be retrieved from the other. Nucleic acid sequences can have other effects on secondary structure if the sequences in the same strand are complementary to one another (palindromes). Single-stranded nucleic acids can form hairpins and pseudoknots; these structures play an important role in transcription termination in prokaryotes. In DNA, the glycosidic bonds between sugars and bases of given base pairs are not exactly opposite to each other and two grooves of unequal width are formed around double helix (Figure 3.6A). The major groove is 22Ǻ wide and minor groove is 12 Ǻ wide (Wing et al. 1980). The functional utility of grooves is that proteins interact with particular bases without disrupting the double helix. Depending upon shape of the DNA molecule DNA can be classified as (a) circular (as in most of the viruses, like SV40, and bacteria like Escherichia coli, Salmonella typhymurium, or (b) linear (as in phage T7 and eukaryotic chromosomes). Some amount of DNA is also present in the mitochondria and chloroplasts, which have characteristics of bacterial DNA.

ALTERNATE STRUCTURAL FORMS OF DNA The present picture of double helix is based on new additions since Watson and Crick discovery. The form of DNA described by Watson and Crick (1953a, b) is B-DNA. B-DNA is right-handed structure, which is the most stable structure under physiological conditions of the cell. Minor groove spine of hydration stabilizes the B-DNA at high humidity level. Models of B-DNA and C-DNA were given by Hopkins (1981). Some forms of DNA (e.g., Z-DNA) have left-handed structure. Left-handed and righthanded DNA helices are compared in Figure 3.6(A and B). Depending upon base composition and relative humidity, DNA exists in various forms: A-, Z-, B-, C-, D- E-, and P-DNA. These conformational forms of DNA differ from one another in relative rotation per base pair, repeating helix unit, mean base pair per turn, inclination of bases, rise per base pair along helix axis, glycosyl torsion angle and sugar-pucker conformations (Dickerson et al. 1982). Various forms of DNA are compared in Table 3.1 and described here briefly, following Potman and Sinden (2005).

A-DNA When the relative humidity is reduced to 75 per cent B-DNA undergoes a reversible conformational change to the so-called A-form. DNA forms a wider and flatter right-handed helix than B-DNA. ADNA has 11 bp per turn and a pitch of 28Å, which gives A-DNA an axial hole. In A-DNA, the planes of its base pairs are fitted at 20° with respect to the helix axis. A-DNA has very deep major groove

Structure of the Genetic Material

3.7

Table 3.1 Characteristics of alternate forms of DNA Helix type Rotation Base pairs/turn Vertical rise/ base pair Helical diameter (Å) Conditions

Base pair tilt Pitch (length) of helix Rotation per residue Configuration of glycosidic bond dA, dT, dC dG Sugar pucker dA, dT, dC dG

A form Right 11.0 2.56

B form Right 10.5 3.38

C form Right 9.33 3.32

Z form Left 12.0 3.71

D form – 8.0

E form – 7.5

23Ǻ

20Ǻ

19Ǻ

18Ǻ

75% RH; + + Na , K , + Cs ions 20° 28.2Ǻ

92% RH; low ionic strength –6° 34.3Ǻ

66% RH; + L ions

Very high salt concentration

– –

–7° 44.4Ǻ

– –

– –

32.7°

34.3°



–30°





anti anti

anti anti



anti syn

– –

– –

C3′endo C3′endo

C2′endo C2′endo

C2′ end

C2′ endo C3′ endo

C3′ exo –

– –

D and E forms of DNA lack guanine

P form Right 2.62 Over-stretched form, called P form in honor of Linus Pauling who erroneously suggested that phosphate groups are present inside the molecule

and shallow minor groove. The base pairs are pushed towards the minor groove. In A-DNA, there is no water spine. Hybrid double helices, which consist of one strand each of RNA and DNA, also assume A-DNA like conformation. A-DNA has, so far been observed in only one biological context. Grampositive bacteria undergoing sporulation contain a high proportion (20%) of small acid soluble spore proteins (SASPs). Some of these SASPs induce B-DNA to assume the A-form at least in vitro. The DNA in bacterial spores exhibits resistance to UV-induced damage that is abolished in mutants that lack these SASPs. This occurs because the B  A conformation change inhibits the UV-induced covalent cross-linking of pyrimidine bases, in part by increasing the distance between successive pyrimidines.

Z-DNA When cytosines are methylated by DNA methylase, equilibrium switches to Z-form. In eukaryotes, cytosine is methylated in DNA at position 5. In prokaryotes, adenine is methylated at position 6. In case of Z-DNA, alternate purine/pyrimidine bases are present. Particularly, under high salt concentrations, DNA with alternate purine-pyrimidine sequences tends to form left-handed double helix in vitro. No direct evidence is available to prove the form of natural Z-DNA within a cell. Transition of B-DNA to Z-DNA can now be identified with single nucleotide precision. This has provided new strategies for in vivo probing of Z-DNA formation. Z-DNA may be a transient configuration of B-DNA in living cells. It may play role in regulation of gene activity, as suggested by Kmiec and Holloman (1994). Z-DNA is commonly believed to provide torsional strain relief (supercoiling) while DNA transcription occurs. The potential to form a Z-DNA structure also correlates with regions of active transcription.

3.8

Essentials of Molecular Genetics

The crystal structure determination of d(CGCGCG) by Wang et al. (1979, 1981) revealed a lefthanded double helix. A similar helix is formed by d(CGCATGCG). The complementary polynucleotides with alternating purines and pyrimidines such as Poly d(GC).Poly d(GC) or poly d(AC).Poly d(GT), take up the Z-DNA conformation at the high salt concentrations. Sasisekharan and Brahamchari (1981) observed transformation in solid state such that diffraction pattern changes from B to Z form. Since this transformation takes place in solid state, the handedness of duplex in both these forms of DNA should be the same. Evidently, the Z-DNA conformation is not readily assumed by DNA segments with alternating purine and pyrimidine base sequences. This helix has 12-bp per turn with a pitch of 45Å. This DNA has a deep minor groove and no discernible major groove. The base pairs are flipped 180° relative to those in B-DNA. The structure of Z-DNA is significantly different from the Watson and Crick model because from other factors, it is characterized by left-handed coiling. In Z-DNA two types of bases, purines and pyrimidines, alternate along the double helix (Wang et al. 1979). Z-DNA is unstable in natural system due to presence of negatively charged phosphate groups which are close to each other across the minor groove (Wang 1986). The phosphate-phosphate ionic repulsion is one of the features that destabilize the Z-DNA. Several mechanisms found for the formation of Z-DNA are addition of methyl group, negative supercoiling of DNA and cationic binding (Rich 1984). The most convincing evidence for the existence of Z-DNA comes in the form of antibodies that can bind specifically to Z-DNA was provided by Lafer et al. (1985). The antibody studies point to the existence of many regions that clearly have potential to form Z-DNA which are dispersed throughout the DNA of many organisms, e.g., in polytene chromosomes of Drosophila melanogaster, rye chromosome and in macronucleus of Stylonichia. Function of Z-DNA is not clearly understood. Single crystal analysis has defined three basic types of DNA – A-, B-, and Z-DNA (Dickerson et al. 1982). In solution, all the three different forms of DNA are in equilibrium with each other along any particular segment of helix. The equilibrium from one form to other form can be altered either by changing humidity level or by changing concentration of salt solution around the DNA fibers.

C-DNA C-DNA is found under even greater dehydration conditions than those observed during the isolation of A- and B-DNA. It has only 9.3 bp per turn and is thus less compact. It is also right-handed. Its helical diameter is 19Å. Like A-DNA, C-DNA does not have its base pairs lying flat; rather they are tilted relative to the axis of the helix.

D-DNA D-DNA is also right-handed helix. It occurs in helix lacking guanine in their base composition. Ramaswamy et al. (1983) have shown that both right- and left-handed uniform helical models (RU and LU models) could be built to give satisfactory agreement with the fiber diffraction data of poly[d(I-C)] in the D-form. Atomic coordinates of these two models as well as Hoogsteen base-paired 7-fold helical structure are given (Drew et al. 1980). Clear 004 and 008 reflections in the diffraction patterns of poly[d(I-C)] and poly[d(A-T)] were observed. The available data strongly suggested an 8-fold helical structure for the D-form of DNA.

E-DNA E-DNA is right-handed helix. It occurs in helix lacking guanine in their base composition. It has 7 bp per turn. The crystal structures of the DNA hexamers d(GGCGm5CC)2 and d(GGCGBr5CC)2, were

Structure of the Genetic Material

3.9

reported by Vargason et al. (2000) to adopt a novel extended and eccentric conformation which the authors call E-DNA. This new conformation has no connection with Struther Arnott's earlier E-DNA, proposed in 1980 from fiber studies, which is a member of the B family with a 48° twist angle (Leslie et al. 1980). Like the previously reported crystal structure of d(CATGGGCCCATG) 2, the „new‟ EDNA has some conformational features intermediate between A-DNA and B-DNA with base pairs perpendicular to the helical axis and displaced off axis.

P-DNA The Pauling-like structure with exposed bases (currently termed as P-DNA) (Allemand et al. 1998) is a highly stretched and overwound form of DNA, with rise per residue along the helix axis of about 5.85 Å and 2.62 units per turn had been proposed as a model for a circular single-stranded DNA in the Pf1 filamentous phage. Thus, while the basic principles and essential structural elements for protein structure, elucidated during the early 1950s, viz., the -helix and β-structure from Pauling‟s group (Pauling and Corey 1956) and the coiled-coil triple-helical structure for collagen from Ramachandran‟s laboratory have remained virtually unchanged and unchallenged even today, with only their permutations, combinations and linking regions varying, in the more than 300 unique sequence protein structures currently known, the structure for a single DNA molecule seems to be able to adapt itself to its environment by twisting, turning and stretching into completely different conformations. Depending upon sequences, DNA can assume specific structural forms, namely cruciform DNA, triple helical DNA, curved and bent DNA, flexible DNA, quadruplex DNA, slipped DNA, singlestranded DNA and supercoiled DNA. These DNA types are explained below briefly.

Cruciform DNA One structural distortion of DNA is cruciform. The requirement for such structural deviations is a specific sequence, the length of the sequence, the temperature and the kind of cations. Cruciform DNA is produced in presence of inverted repeats such as CGATCTGG–CAGATCG, minor repeats such as GGTTGGCC–CCGGTTGG and direct repeats such as GGTTGGCC–GGTTGGCC. When such sequences have a length of 10 or more base pairs with a center of symmetry, double-stranded DNA can assume cruciform like a hairpin or a single-stranded DNA can assume stem loop structure. Such regions may be present in regulatory segments for recognition by specific protein factors.

Triple Helical DNA Three strands of DNA, which are complementary to each other, have propensity for triple helix formations. Homo- or hetero-polypurine or polypyrimidine tracts can assume triple-strand conformations. Triplex strands can affect transcription, replication and gene expression and they even represent specific protein binding sites. Triplex DNA structures can be of either intermolecular or intramolecular form. Hoogsteen (1963) first demonstrated Hoogsteen pairing in crystalline complex containing alkylated adenine and thymine derivatives. Similar structures containing alkylated adenine and uracil have also been described. Either 3′ half or the 5′ half of the polypyrimidine strand can be used as donated strand. Resulting conformers are termed as Hy-3 and Hy-5, respectively. The key event in the formation of H-DNA is the formation of the first base triplet (nucleation) in the interior to create the tip of the triplex structure. Subsequently, triplex is propagated outwards towards the ends of the repeat with a minimal number of unpaired nucleotides at any stage. This is accompanied by turning the donor and acceptor helices in combination, in the same plane but in opposite directions. Donor region produces an unpaired polypurine and polypyrimidine strand. The polypyrimidines are spooled up by

3.10

Essentials of Molecular Genetics

the acceptor helix of the repeat, thereby extending the triplex. Because both of these rotations reduce negative supercoiling, two turns are relaxed, for 10-11 pyrimidines that are transferred from WatsonCrick base pairs in the donor duplex to Hoogsteen base pairs in the triplex. The single-stranded polypurines form a stacked loop on one side of the triplex. The rotations continue until one or the other duplex runs out of TC.AG repeats.

Intramolecular triplex helical DNA These structures form easily, if the DNA segment contains minor repeat symmetry in addition to PuPy sequences. Intramolecular triplex DNA may contain repeats of G, GA, GGAA, GGGA or AAAGXN. Third strand pairing requires protonation of cytosine for pairing with guanine (requires lower pH). It can also exist in four different isoforms. Supercoiling of DNA favors intramolecular triplex form. The third strand can be produced during replication by slippage process, where the polymerase in certain regions after replicating a segment, transverses back and replicates the same strand second time and it can be repeated several times. Homopurine-homopyrimidine DNA sequences have been shown to form triple strand structures readily under appropriate conditions. These intermolecular triplex structures may act as antisense inhibitors of gene expression and may have a role in recombination. Skelnar and Feigon (1990) have designed and synthesized a 28-base DNA oligomer with a sequence that could potentially fold to form triplex containing both T.A.T. and C+.G.C. triplets (Figure 3.7). In case of formation of such a triple helix structure, DNA looping is seen. Base pairing between T.A.T. and C+.G.C. triplets is also shown.

Curved and Bent DNA DNA by its nature is a very flexible helix and the binding of proteins, which organizes Figure 3.7 Stable triplex formed from a synthesized 28+ genomic DNA into a very compact structure, bp oligomer containing TAT and C GC triplets. Also + makes them nonflexible. Importance of curved shows C GC and TAT base pairing DNA is recognized. During transcriptional initiation or activation the upstream regions of DNA have been found to be bent and curved, so the regulatory proteins bound at far regions, by curving or bending of the DNA, bring distant regions near to each other. Such curved DNA is also used during initiation of replication. Location of curvature segments may be present in the upstream, downstream or within the promoter regions of the gene. Such DNAs can be used in site-specific integration or recombination, and during DNA repair. Curving of DNA is one of the most efficient methods of compacting the genomic DNA, which is often of very large size.

Expanded DNA Slipping in extra benzene ring creates a broader DNA double helix that is similar to, but different from, natural DNA (Leconte and Romesberg 2006). This type of DNA is known as expanded DNA (xDNA) (Figure 3.8). In expanded DNA, a normal base and an expanded base (with an extra benzene ring) pair up according to hydrogen bonding and size complementarity to form a duplex with four possible base pairs. There are four viable partnerships in the expanded duplex: xG-C, G-xC, xA-T and A-xT.

Structure of the Genetic Material

3.11

Importantly, it can code more genetic information as it would provide an unexplored avenue towards expanding the genetic alphabet. In expanded duplex DNA, the so-called major and minor grooves of xDNA duplex are markedly wider and swallower, respectively. Such xDNA could have various biological and nanotechnological applications.

Flexible DNA Contrary to the curved DNAs, certain sequences like CTG nCAGn and CGGnCCGn exhibit 20 per cent faster movement than the normal BDNA in a gel. This unusual mobility depends upon the length of repeats, temperature and percentage of the acrylamide gel. Faster mobility is attributed to the change in “h” value, i.e., rotation per residue of the said sequences, and depending upon the length of such repeats it may induce greater avidity for writhe (supercoiling), hence faster movement, and such a DNA is called flexible DNA.

Figure 3.8 Expanded DNA. Expanded bases with an extra benzene ring are indicated by dotted lines

Quadruplex DNA Single-stranded complex of guanine-rich DNA sequences of chromosomal telomeres and elsewhere can associate to form stable parallel four-stranded DNA structures termed G-4 DNA by a process that is dependent on the particular alkali metal cation that is present (Sen and Gilbert 1990). Repeats of CGG in a single-strand can produce quadruplex forms. In the presence of potassium, sodium and lithium ions, CGG repeats readily form quadruplex structures. Quadruplex DNA moves faster on gel than other forms of DNA. Guanines are involved in four-stranded structures with Hoogsteen hydrogen bonding. Segments of 2X GCGC tetrads flanked by two repeats of GGGG (G-quartets) produce quadruplex strands. Many triplet repeats form hairpins and such multi-repeats can fold into quadruplex. Telomeric DNA is an excellent example of quadruplex DNA, which is found in the extreme tips of eukaryotic chromosomes. Telomeric DNA shows short sequences made of G‟s and T‟s. They are characteristic of individual species. Such species-specific sequences are repeated hundred to thousand times in telomeric DNA. Eukaryotic chromosomal DNA is double-stranded, linear and runs from one end of the chromosome to the other. It is expected that if the ends are open structures, they are susceptible for exonuclease digestion, but it is not the case. The ends are protected by quadruplex structural organization and furthermore they are associated with specific telomeric DNA binding proteins, which provide additional stability, protection and constancy.

Slipped DNA In slipped DNA, one of the strands contains sequence repeats. Slipped DNA arises during replication by DNA polymerases, where it can create greater length of DNA in one strand and deletions in the other strand. Slipped strands have been identified and isolated. Nine different loci in humans have been identified and they are found to be very unstable. The repeats found are CTG nCAGn, CTGnCAGn. Single-stranded oligos of CCG‟s or CGG‟s can form intermolecular duplexes or intramolecular duplexes. Such stable slipped segments can expand and become heritable. They can affect gene expression, protein binding, transcriptional initiation and possibly replication. Slipped DNA is found upstream of important regulatory sites.

3.12

Essentials of Molecular Genetics

Single-Stranded DNA Single-stranded DNA (ssDNA) molecules have 5′  3′ polarity. When ssDNA is suspended in aqueous solutions they show random coiled features, for each of the nucleotides have bases linked to their sugars by glycosidic bonds at C1′ position in such a way that their sugars show C2′-endo and C2′exo puckering, hence they show anti-configuration. The ssDNA strands also exhibit duplex, hairpin or stem-loop structural features wherever the sequences are complementary to each other. Most of ssDNA genomes, either linear or circular, are generally packed into viral proteinaceous coats or capsids. In host cells they exhibit supercoiled states where associated proteins provide stability. Replication of ssDNA is through double-stranded (ds) replicative form of DNA. Many viruses (e.g., X174, s13, F1 and M13) have single-stranded form of DNA as their genome.

Multicopy Single-Stranded DNA In 1984, multicopy single-stranded DNA (msDNA) was discovered in soil bacterium Myxococcus xanthus. Myxobacteria have been shown to contain a large number of branched RNA-linked singlestranded DNA (multicopy single-stranded DNA (msDNA) molecules (Dhundale et al. 1988). In addition, they found that Myxococcus xanthus contains another smaller msDNA-like molecule, designated mrDNA, consisting of a 65-base singlestranded DNA covalently linked by a 2',5'-phosphodiester linkage to a 49-base branched RNA. In spite of their different primary sequences, the RNAlinked mrDNA is remarkably similar in secondary structure to msDNA, sharing similar stem-loop folding as well as the unique 2',5'-phosphodiester Figure 3.9 Multicopy single-stranded DNA in soil linkage. These novel molecules are synthesized by bacteria Myxococcus xanthus (Redrawn from common molecular mechanisms. Varmus (1989) http://en. wikipedia.org/wiki/File:Stem-loop.svg) described two important features of msDNA: (1) the DNA is encoded in chromosomal DNA in one or a few copies and (2) 76 ribonucleotides remain linked to its 5′-end after purification (Figure 3.9).

Supercoiled DNA In bacteria and viruses, DNA is often circular. Supercoiling of DNA is an important feature of all chromosomes where large loops of DNA are found in such a way that each loop represents a circle. It occurs when DNA is either underwound (negative supercoiling) or overwound (positive supercoiling). If we break strand of a CCC DNA and rotate one strand for 360° around the complementary strand keeping the end fixed, a supercoil will be produced in the molecule (Figure 3.10). If the free end is rotated in the same direction as the DNA double helix is right-handed, a positive supercoil is produced. This leads to over-winding (superhelical), which can be created in vitro, but does not occur in nature. If the free end is rotated in the opposite direction (left-handed), negative supercoil is produced. This will actually relieve the torsion and will become underwound or even single-stranded in a region. This helps in unwinding of DNA for replication, etc. Supercoiling is often controlled by some enzymes and is described by parameters like linking number, twisting number and writhing number. One supercoil

Structure of the Genetic Material

3.13

is introduced every time the duplex thread is twisted about its axis. This type of supercoiling places a DNA molecule under torsion, which can be released by a break in one of the two strands. A DNA molecule without supercoiling is said to be relaxed or having zero supercoiling.

STRUCTURE OF RNA RNA was first of all isolated from yeast but the identity of RNA was not clear till Feulgen and Rossenbeck (1924) clearly showed that the dye which stained RNA was different from the one that stained DNA. RNA can be differentiated into two types – genetic RNA and nongenetic RNA. Genetic RNA is found in most of the plant viruses where RNA acts as genetic Figure 3.10 Supercoiling of DNA – positive and negative material instead of DNA. Nongenetic RNA is supercoiling (Redrawn from http://www.web-books. present in abundance but does not play any com/MoBio/Free/Ch7D.htm) genetic role. A variety of different types of nongenetic RNAs are made in a cell through the process of transcription. There are three major types of nongenetic RNA – messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). There are a number of other types of RNA present in smaller quantities as well, including small nuclear RNA (snRNA), small nucleolar RNA (snoRNA) and the 4.5S signal recognition particle (SRP) RNA. Novel species of RNA continue to be identified. Nongenetic RNA is further classified into two types – coding RNA and noncoding RNA. Only mRNA is coding RNA while all other nongenetic RNAs are noncoding. At this point of discussion, our emphasis is on genetic RNA. RNA is a close cousin of DNA. RNA is a polymer of ribonucleoside-phosphates. Its backbone is comprised of alternating ribose and phosphate groups. Phosphodiester bonds join successive ribonucleotides. Ribose is a five-carbon sugar that is found in a puranose, or five-membered ring, form in RNA. The phosphate groups link consecutive ribose groups and each bears one negative charge. Each monomer also has a nitrogenous base for a side chain. The four commonly found side chains in RNA are adenine, cytosine, guanine and uracil (Figure 3.11). Several other bases are occasionally found in RNAs including: thymine, pseudouridine and methylated cytosine and methylated guanine. RNA assumes A-conformation. RNA is sensitive to base hydrolysis, but DNA is not because it is lacking the 2'-OH. The 2'-OH in RNA participates in the hydrolysis by forming a 2',3'-cyclic-phosphate intermediate. Chemical forces are important for polynucleotide structure and solubility. RNA riboses have an extra OH group at the 2' position, which has a profound effect on the overall conformation of RNA, as well as form complex three-dimensional structures. The main effect of the presence of the 2'-OH group in RNA is on the conformation of the ribose, which is 2'-endo in B-DNA and 3'-endo in RNA. If the ribose adopted the 2'-endo conformation, the 2'-OH would clash with the C8 atom of the preceding nucleotide in the chain (assuming of course that the base is in the anticonformation). A consequence of the 2'-endo conformation is that the A-form helix seen in RNA is more extended, as the distance between adjacent phosphates is 7.0Å instead of 5.9Å in B-DNA. Molecules of RNA are generally single-stranded, linear polymers of ribonucleotides. Double-stranded RNA is also known to exist. Some RNA molecules assume, in part, a double helix configuration through complementary base pairing.

3.14

Essentials of Molecular Genetics

Figure 3.11 Structure of ribonucleotides joined together forming RNA. Usually RNA is single-stranded, doublestranded RNA is also known

What does RNA do in cells? RNA serves multitude of roles in living cells. These roles include: serving as a temporary copy of genes that is used as a template for protein synthesis (mRNA), functioning as adaptor molecules that decode the genetic code (tRNA) and catalyzing the synthesis of proteins (rRNA). There is much evidence implicating small RNA molecules in biological regulation and catalysis. Interestingly, RNA is the only biological polymer that serves as both a catalyst (like

Structure of the Genetic Material

3.15

proteins) and as information storage (like DNA). For this reason, it has been postulated RNA, or an RNA-like molecule, was the basis of life early in evolution. Although RNA molecules are linear polymers, they fold back on themselves to make intricate secondary and tertiary structures that are essential for them to perform their biological roles.

Secondary Structure of RNA By characterization of fragments, isolated from a nuclease digest of MS2 RNA, the entire nucleotide sequence of the coat gene was established (Min Jou et al. 1972). A “flower-like” model is proposed for its secondary structure (Figure 3.12, A-K). The genetic code makes use of 49 different codons to specify the sequence of the 129-amino acid-long coat polypeptide. Predominant element of RNA secondary structure is hairpin formed when RNA folds back (Uhlenbeck 1990). UNCG and GNRA tetraloops aid in predicting secondary structure of RNA.

Three-Dimensional Structure of RNA Hairpin loops are important structural elements of RNA, helping to define three-dimensional structure of large RNAs and providing potential nucleation sites for RNA folding (Cheong et al. 1990). Hairpin 5′GGAC(UUCG)GUCC is very stable and common. The sequence C(UUCG)G occurs very often in RNA folding and a protein-binding site. The loop is stabilized by G.U base pairs. RNAs fold into three-dimensional structures that subsequently undergo large, functionally important, conformational transitions in response to a variety of cellular signals. RNA structures are believed to encode spatially tuned flexibility that can direct transitions along specific conformational pathways. This hypothesis was examined by Zhang et al. (2007) by visualizing dynamics between two RNA helices that are linked to a functionally important trinucleotde buldge.

FUNDAMENTAL PROPERTIES OF GENETIC MATERIAL Storage of Information Genetic material carries from one generation to the next the information that specifies the characteristics of the plant or animal. This information in DNA/RNA is stored in form of specific nucleotide sequence. It is this nucleotide sequence that determines biological function of a DNA segment.

Continuity of Genetic Information The specificity in the order of sequence of nucleotides in DNA maintains its continuity because genetic material has the ability of undergoing a process called replication to produce more copies like themselves. As shown by Watson and Crick (1953a), DNA is a double helical structure. Each of the two original strands is complement of the other. When duplication occurs, the hydrogen bonds between the complementary bases break and the strands replicate as they unwind. Each of the strands acts as a template for the formation of a new complementary chain. Two pairs of chains thus appear after one cycle of replication whereas before replication only one pair existed. The mechanism of replication of DNA proposed by Watson and Crick (1953b) offered the important advantage of explaining how DNA molecules could form exact replicates of the old. Each single-strand is a template or mold for its complement, and a new helix has one old and one newly synthesized strand.

3.16

Essentials of Molecular Genetics

Figure 3.12 A model for the secondary structure of the coat protein gene (the “Flower” model). Arrows indicate splitting points for T1 ribonuclease. Eleven base pairing regions are termed A-K

Structure of the Genetic Material

Mutation The genetic material is subject to low rate of alteration, the changes being transmissible to next generation. A change in the genetic material at a particular locus in an organism is termed as mutation. How change in the sequence of nucleotide(s) leads to a mutation is shown in Figure 3.13. The term mutation includes point/gene mutations involving a single base change and chromosomal changes.

Recombination

3.17 T A C A A A A G C G T A Original DNA chain ↓ Mutation C  G T A C A A A A G G G T A Mutant DNA chain Figure 3.13 Change in sequence of nucleotide(s) leads to mutation

DNA breakage and reunion is responsible for genetic recombination in prokaryotes as well as eukaryotes. Endonuclease produces nicks in single strands of DNA and a DNA ligase then rejoins broken DNA strands. Eukaryotes mostly show reciprocal recombination products.

Repair Genetic material has ability to repair damage done to it by internal as well as external agent. What we observe as mutation is actually unrepaired DNA damage. Cells of all the organisms use complex DNA and non-DNA repair mechanisms for decreasing the mutation rate.

REFERENCES Allemand, J. F., D. Bensimon, R. Lavery, and V. Croquette. 1998. Stretched and overwound DNA forms a Pauling-like structure with exposed bases. Proc. Natl. Acad. Sci. USA 95: 14152-7 Chargaff, E. 1950. Chemical specificity of nucleic acids and mechanism of their enzymatic degradation. Experientia 6: 201-40. Cheong, C., G. Varani, and I. Tinoco Jr. 1990. Solution structure of an unusually stable RNA hairpin 5′GGAC(UUCG)GUCC. Nature 346: 680-2. Dhundale, A., M. Inouye, and S. Inouye. 1988. A new species of multicopy single-stranded DNA from Myxococcus xanthus with conserved structural features. J. Biol. Chem. 263: 9055-8. Dickerson, R.E., H.R. Drew, N.B. Comer, et al. 1982. The anatomy of A-, B- and Z-DNA. Science 216: 475-85. Drew, H., T., Takano, S. Tanaka, K, Itakura, and R.E. Dickerson. 1980. High salt d(CpGpCpGp) – a left-handed Z-DNA double helix. Nature 286: 567-73. Feulgen, R., and H. Rossenbeck, 1924. Der mikroskopischchemische Nachweis einer Nukleinsäure vom Typus Thymonukleinsäure und die darauf beruhende elektive Färbung eines Nukleinsäure von Zellkernen in mikroskopischen Präparaten. Hoppe-Seyler‟s Zeitschrift für Physiologische Chemie 135: 203-48. Franklin, R.E., and R.G. Gosling. 1953. Molecular configuration in sodium thymonycleate. Nature 171: 740-1. Hoogsteen, K. 1963. The crystal and molecular structure of a hydrogen bonded complex between 1methylthymine and 9-methyladenine. Acta Cryst. 16: 907-16. Hopkins, R.C. 1981. Deoxyribonucleic acid structure: a new model. Science 211: 289-91. Kmeic, E.B., and W.K. Holloman/ 1994. DNA strand exchange in the absence of homologous pairing. J. Biol. Chem. 269: 10163-8. Lafer, E.M., Sousa, R. and Rich, A. 1985. Anti-Z-DNA antibody binding can stabilize Z-DNA in relaxed and linear plasmids under physiological conditions. EMBO J. 304(13B): 3655–3660. Leconte, A.M., and F.E. Romesberg, 2006. A broader take on DNA. Nature 444: 553-55. Leslie, A.G., S. Arnott, R. Chandrasekaran, and Ratliff, R.L. 1980. Polymorphism of DNA double helices. J Mol. Biol. 143(1): 49-72. Levene, P.A., and E.S. London, 1929. On the structure of thymonucleic acid. J. Biol. Chem. 83: 793-802.

3.18

Essentials of Molecular Genetics

Min Jou, J., W., Haegeman, G., M. Ysebaert and W. Fiers. 1972. Nucleotide sequence od the gene coding for the bacteriophage MS2 coat protein. Nature 237: 82-8. Pauling, L. and R.B. Corey. 1956. Specific hydrogen bond formation between pyrimidines and purines in deoxyribonucleic acids. Arch. Biochem. Biophys. 65: 164-81. Potaman, V.N., and R.R. Sinden. 2005. DNA: Alternative conformations and biology. In: DNA Conformation and Transcription. Pp. 3-17. Ohyama, T. ed. Texas, USA: Landes Bioscience/Eurekah.com Ramaswamy, N., M. Bansal, G. Gupta, and V. Sasisekharan. 1983. Structure of D-DNA: 8-fold or 7-fold helix? EMBO. J. 2: 1557- 60. Rich, A. 1984. The chemistry and biology of left-handed Z-DNA. Annu. Rev. Biochem. 53: 791-846. Sasisekharan, V., and S.K. Brahamchari. 1981. B to Z transition of DNA fibre: The question of handedness. Curr. Sci. 50: 10-3. Sen, D., and W. Gilbert. 1990. A sodium-potassium switch in the formation of four-stranded G-4 DNA. Nature 344: 410-4. Skelnar, V., and J. Feigon. 1990. Formation of stable triplex form a single DNA strand. Nature 345: 836-8. Strickberger, M.W. 1968. Genetics. New York: Macmillan Company. Uhlenbeck, O.C. 1990. Tetraloops and RNA folding. Nature 346: 613-4. Van de Dande, J.H., N.B. Ramsing, M.W. Germann, et al. 1988. Parallel stranded DNA. Science 241: 551. Vargason, J.M,, B.F. Eichman, and P.S. Ho. 2000. The extended and eccentric E-DNA structure induced by cytosine methylation or bromination. Nat Struct Biol. 7(9): 758-61 Varmus, H.E. 1989. Reverse transcription in bacteria. Cell 56: 721-24. Wang, A.H.-J. 1986. Major and minor groove. Nature 319: 183-84. Wang, A.H.-J., G.J. Quigely, F.J. Kolpak, et al. 1981. Left-handed double helical DNA: Variations in the backbone conformation. Science 211: 171. Wang, A.H.-J., G.J. Quigley, F.J. Kolpak, et al. 1979. Molecular structure of the left handed double helical DNA fragment at atomic resolution. Nature 282: 680-6. Watson, J.D., and F.H.C. Crick. 1953a. A structure for deoxyribose nucleic acids. Nature 171: 737-8. Watson, J.D., and F.H.C. Crick. 1953b. Genetic implications of the structure of deoxyribonucleic acid. Nature 171: 964-9. Watson, J.D., and F.H.C. Crick. 1953c. Molecular structure of nucleic acids. A structure for deoxyribose nucleic acid. Ann. NY Acad. Sci. 758: 13–4. Watson, J.D., and F.H.C. Crick. 1953d. The structure of DNA. Cold Sp. Harb. Symp. Quant. Biol. 18: 123-31. Wilkins, M.H.F., A.R. Stokes, and H.R. Wilson. 1953. Molecular structure of desoxypentose nucleic acids. Nature 171: 738-40. Wing, R., H. Drew, T. Takano, et al. 1980. Crystal structure analysis of a complete turn of B-DNA. Nature 287: 755-8. Zhang, Q., C. Stelz, C.K. Fisher, and H. Al-Hashimi. 2007. Visualizing spatially dynamics that directs RNA conformational transitions. Nature 450: 1263-7.

PROBLEMS 1. 2. 3.

Assuming a single-stranded DNA fragment is 10-nucleotide long. At a given position any one of the four nucleotides (A, G, T or C) can exist. How many different types of fragments are expected to be formed? Suppose all RNA species required in the cell exist in sufficient quantities, do we still need DNA in the cell for its survival? Various properties of genetic material are discussed in this chapter. Can you add some more properties of genetic material to the list?

4 Extranuclear Genomes The first idea for the existence of extranuclear genetic factors came from non-Mendelian inheritance described by Baur (1908) and Correns (1909). All genetic factors that lie outside the nucleus are called extranuclear genetic factors or extranuclear genomes. Extranuclear genomes are present in mitochondria, chloroplasts, kinetoplasts and centrioles.

CHLOROPLAST GENOMES The chloroplast DNA (cpDNA) unlike mitochondrial DNA (mtDNA) is always circular. Their size varies from 11-63 m and the molecular weight from 22-1,500 MDa and number of base pairs 1.3 × 105-1.5 × 106. Size of chloroplast genomes of some species in terms of number of base pairs is given in Table 4.1. Table 4.1 Size of chloroplast genomes of some species (Compiled from Complete Chloroplast Genome sequences. http://www.bch.umontreal.ca/ogmp/projects/other/cp_list.html) Species Arabidopsis thaliana Daucus carota Eucalyptus globulus subsp. globulus Euglena gracilis Glycine max Gossypium hirsutum Lemna minor Lotus japonicus Marchantia polymorpha Nicotiana tabacum Oryza sativa indica 93-11 Oryza sativa japonica PA64S Phaseolus vulgaris 'Negro Jamapa' Pinus thunbergii Populus alba Solanum tuberosum Triticum aestivum cv. Chinese Spring Zea mays

Size (bp) 154,478 155,911 160,286 143,200 152,218 160,301 165,955 150,519 121,024 155,943 134,496 134,551 150,285 119,707 156,505 155,298 134,545 140,384

Reference(s) Sato et al. (1999) Ruhlman et al. (2006) Steane (2005) Hallick et al. (1993) Saski et al. (2005) Lee et al. (2006) Mardanov et al. (2008) Kato et al. (2000) Ohyama et al. (2003) Shinozaki et al. (1986) Yu et al. (2005) Yu et al. (2005) Guo et al. (2007) Wakasugi et al. (1998) Okumura et al. (2006) Chung et al. (2006) Ogihara et al. (2000) Maier et al. (1995)

4.2

Essentials of Molecular Genetics

Two Japanese laboratories have completely sequenced the chloroplast DNA of liverworts (Marchantia polymorpha) and a higher plant Nicotiana tocaccum. The size, gene content and organization of both these genomes are remarkably similar. Their genomes differ in “inverted repeat” regions only. There is no evidence for any abnormal use of genetic code, as found in bacteria. Therefore, cpDNA seems to obey the “universal rules”. Plastid DNA, like bacterial and mitochondrial DNA is organized into protein-DNA complexes called nucleoids. Plastid nucleoids are believed to be associated with the inner envelope in developing plastids and the thylakoid membranes in mature chloroplasts, but the mechanism of this re-organization is unknown. The cpDNA differs from nuclear DNA of same species in nucleotide composition. Plastid DNA is shown to replicate through semi-conservative mode in Chlamydomonas. In higher plant species, cpDNA replicates by rolling circle method. The chloroplast genomes of plants exhibit a far greater conservation of structure than plant mitochondrial genomes. These genomes are circular and the size of higher plant cpDNAs are either 150 kb (for example, spinach) or 120 kb (for example, pea). There are no histones associated with the cpDNA. The genome of the chloroplasts found in M. polymorpha (a liverwort) contains 121,024 bp in close circle. The difference in size can be accounted for by a deletion from the larger genome to generate the smaller. The gene order among all higher plant chloroplast genomes is essentially conserved. Many of chloroplast genes encode proteins that are involved in photosynthesis. In total, the genome appears to encode for a complete set of rRNA genes (16S-spacer23S-spacer-5S), tRNA genes (25-45) and 45 protein products, including larger subunit of RuBP carboxylase, thylakoid membrane protein, ATP synthase, cytochrome b oxidase, cytochrome c oxidase I, II and III, ATPase-6, NADH dehydrogenase, RNA polymerase, ribosomal protein genes, Ferredoxin, etc. There are some differences between species, but these differences primarily are between higher plants and algae, which also contain chloroplast DNA. The cpDNAs of tobacco is 155,844 bp in length. The organization of the single-copy genes in the chloroplast genomes of these two plants is remarkably similar considering that they are evolutionarily very distant from each other. The major difference between M. polymorpha and tobacco cpDNA is that the inverse repeat region containing the rRNA genes are considerably larger in tobacco. The best estimates of cpDNA gene number are 136 in Marchantia and 150 in tobacco. Organization of the chloroplast genome of the M. polymorpha is shown in Figure 4.1. The known genes and ORFs are shown as boxes except for the small tRNA genes which are indicated by single lines. The tRNA genes that contain inrons and are thus larger are denoted by triangles extending from the circular map. There are some surprising features of the cpDNA: (1) All identified proteins encoded by cpDNA are parts of protein complexes that have at least one component encoded by nuclear DNA. This may be a regulatory feature. (2) There is clustering and co-transcription of functionally related genes, as in Escherichia coli. (3) There are several example of overlapping genes, a prokaryotic feature. (4) Gene encoding for ribosomal protein S12 has exons located far apart in different strands of DNA. This suggests existence of trans-splicing mechanism in chloroplasts. (5) Pribnow box is present in chloroplast genes. (6) Upstream transcription termination sequences are also seen in chloroplast genes.

Chloroplast DNA is maternally Inherited To demonstrate that a trait is maternally inherited, specific crosses need to be made to generate the required offspring. A number of crosses were made between cultivated tomato (Lycopersicon esculentum) and a number of wild species. The cpDNA is a circular molecule 150 kb in size. Digestion of this molecule produces relatively few fragments. Within the plant cell, chloroplast DNA represents a significant portion (15%) of the DNA. This is a result of the large number of chloroplasts per cell (50)

Extranuclear Genomes

4.3

Figure 4.1 Organization of the genome of Marchantia polymorpha chloroplast DNA (Redrawn, with permission, from Gardner, E.J., Simmons, M.J., and Snustad, D.P. 2005. Principles of Genetics. Singapore: John Wiley & Sons)

and the large number of cpDNA molecules per chloroplast (150). Chloroplast DNA was obtained from F1 plants of a number of crosses in which L. esculentum was the female in the cross (Palmer and Zamir 1982). As can be seen in Figure 4.2, in each case the F 1 restriction fragment pattern was identical to L. esculentum (sample 8). This report provides conclusive evidence that chloroplast DNA is inherited in a maternal manner.

MITOCHONDRIAL GENOMES Mitochondrial DNA All living cells except bacteria, blue green algea and mature erythrocytes contain mitochondria. Mitochondria are centre of aerobic respiration. Mitochondrial DNA (mtDNA) replicates independent of nuclear DNA. The mtDNA varies in size, shape and molecular weight. Size of mitochondrial genomes, expressed as number of base pairs, in some species is given in Table 4.2. The mtDNA can be linear (as in lower eukaryotes) or circular (as in higher eukaryotes) with the size variation ranging from 0.74-31.0 m and molecular weight ranging from 1.49-1,600 MDa. Although mitochondria contain

Essentials of Molecular Genetics

4.4 their own DNA yet they depend considerably on the nucleus for their structure and function. Mitochondria of man and yeast have been studied comparatively in greater detail. Anderson et al. (1981) for the first time gave complete sequence of human mitochondrial DNA. This 16,569-bp of human mtDNA has almost as many genes as 70,000 bp of yeast mitochondrial DNA. The genes for the 12S and 16S rRNAs, 22 tRNAs, cytochrome c oxidase subunits I, II and III, ATPase subunit 6 and eight other protein coding genes in human mitochondria were located by Anderson et al. (1981). The genes have none or a few noncoding bases between them, and in many cases the termination codons are not coded in the DNA but are created post-transcriptionally by polyadenylation of the mRNAs. Through these elegant experiments it was demonstrated that the genetic code in human mitochondrial DNA is not universal. Subsequent studies showed that exceptions also existed in mitochondrial DNA of Drosophila melanogaster, Aspergillus, Neurospora and maize. Mitochondria have some of their own DNA, ribosomes, and can make many of their own proteins. The DNA is circular and lies in their matrix in structures called “nucleoids”. Each

Figure 4.2 Evidence that chloroplast DNA is inherited in a maternal manner. (Reproduced, with permission, from Palmer, J.D., and D. Zamir. 1982. Proc. Natl. Acad. Sci. USA 79: 5006-10)

Table 4.2 Size of mitochondrial genomes of some species (Compiled from Complete Mitochondrial Genome Sequences at http://www.bch.umontreal.ca/ogmp/projects/other/mt_list.html) Species Apis mellifera ligustica Arabidopsis thaliana Aspergillus niger Bombyx mori Brassica napus Bufo gargarizans Caenorhabditis elegans Chlamydomonas reinhardtii Danio rerio Drosophila melanogaster Homo sapiens Mus musculus Nicotiana tabacum Oryza sativa (japonica) Plasmodium falciparum Rattus norvegicus Saccharomyces cerevisiae Tetrahymena thermophila Xenopus laevis Zea mays

Size (bp) 16343 366924 31103 15643 221853 17277 13794 15758 16596 19517 16569 16301 430597 *490520 **5949 16300 86214 47577 17553 569630

Reference Crozier, R.H., and Y.C. Crozier (1993) Marienfeld, J., et al. (1996) Juhasz, A., et al. (2005) Lee,J.-S., et al. (1999) Handa, H. (2003) Cao, S.Y. et al. (2006) Okimoto, R., et al. (1990) Gray, M.W. (1993) Broughton et al. (2001) Lewis, D.L., et al. (1995) Andrews, R.M., et al. (1999) Mathews, C.E., et al. (2004) Sugiyama, Y., et al. (2005) Notsu,Y., et al. (2002) Joy, D.A., et al. (2003) Gadaleta, G., et al. (1989) Wei, W., et al. (2007) Brunk, C.F., et al. (2001) Roe, B.A., et al. (1985) Clifton, S.W., et al. (2004)

*The largest mitochondrial genome sequenced; **The smallest mitochondrial genome sequenced.

Extranuclear Genomes

4.5

nucleoid may contain 4-5 copies of the mitochondrial DNA. Size of mtDNA ranges from 16.5 to >1,000 kbp. A single-cell may contain 1,000 mitochondria and thus more than 5,000 mtDNA molecules. mtDNA is not associated with histones, has no introns or repetitive sequences. In mammals, 99.99 per cent of mtDNA is inherited from the mother. Human mtDNA encodes a number of mitochondrial proteins. Mutations in mammalian mtDNA do cause disease, because there is such a short sequence and very heavy information content in the sequence. Mice and cattle mitochondrial DNA is 15,275 and 16,338 nucleotide pairs in length, respectively. Human, mice and cattle mtDNA exhibited the same basic organization of genetic information. Mouse liver mtDNA was found to be circular (Sinclair and Stevens 1966). In most multicellular organisms the mtDNA is organized as a circular, covalently closed, doublestranded DNA, but in many unicellular (e.g., the ciliate Tetrahymena or the green alga Chlamydomonas reinhardtii) and in rare cases also in multicellular organisms (e.g., in some species of Cnidaria) the mtDNA is found as linearly organized DNA. Most of these linear mtDNAs possess telomerase-independent telomeres (i.e., the ends of the linear DNA) with different modes of replication, which have made them interesting objects of research, as many of these unicellular organisms with linear mtDNA are known pathogens (Nosek et al. 1998). All the basic functions of the central dogma of molecular biology are found in organelles. These functions include DNA replication, RNA transcription and protein translation. Thus, certain gene products will result from the expression of the organelle DNA. The number of protein products from mitochondrial transcription is limited. One set of genes that are expressed are the rRNAs and tRNAs that are required for translation. Five known gene products are produced from the mammalian mitochondrial genome. These include subunits I, II and III for cytochrome oxidase, the apoprotein for cytochrome b and subunit 6 of the mitochondrial ATPase. In addition to these gene products, six open reading frames (ORFs) have been identified. ORFs are the sequences that have transcription start and stop signals that bracket sequences that could produce a protein product, but the actual function of the product is not known. Expression of organelle genes has a unique effect on the inheritance of certain traits. Human and yeast mitochondria are the best studied organelle at molecular level. Organization of mitochondria from human and yeast, as given by Borst and Grivell (1981), is shown in Figure 4.3. Animal mitochondrial have small genomes which have been completely sequenced in man, mouse and cow, size being approximately 16.5 kb. Mitochondrial DNA is slightly larger in Drosophila and frog. Total amount of mtDNA is less than 1 per cent of the nuclear genome. In yeast Saccharomyces cerevisiae, the mitochondrial DNA is much larger; exact size varies from strain to strain but is approximately 84 kb long. There are approximately 22 mitochondria per cell and 4 genomes per mitochondria. In growing cell, mtDNA can be as high as 18 per cent. Plants show still a wide range of mtDNA size, minimum length being approximately 100 kb. The additional DNA may be noncoding. Within these circular DNA molecules, there are short homologous sequences recombination between which can give rise to smaller circular DNA molecules that co-exist along with their master genome explaining the complexity of the plant genome. Both mitochondria and chloroplasts are concerned with energy conversion. Although they synthesize their own proteins but most of the proteins are transported from the surrounding cytoplasm. In Trypanosoma bruce mitochondrial DNA, there are there are no genes for tRNA. It is likely that tRNAs are imported.

Mitochondrial DNA is maternally Inherited Experiments designed to show that mitochondrial DNA was maternally inherited were analogous to those which demonstrated that chloroplast DNA was maternally inherited (Gyllenstein et al. (1985). Mouse is the experimental organism used to demonstrate this principle here. Reciprocal crosses were made between the species Mus domesticus, common mouse, and the wild species M. spretus. F1 pro-

4.6

Essentials of Molecular Genetics

Figure 4.3 Organization of yeast and human mitochondrial genomes. The mitochondrial genome of Saccharomyces cerevisiae both interrupted and uninterrupted protein-coding genes. B. Human mitochondrial DNA has 22 tRNA genes, 2 rRNA genes, and 13 protein-coding regions. Fourteen of the 15 protein-coding or tRNA-coding regions are transcribed in the same direction. Fourteen of the tRNA genes are expressed in clock-wise direction and 8 are read counter clockwise

geny were then backcrossed with one of the two species serving as females. The restriction enzyme HincII is diagnostic for the mitochondrial DNA of the two species. M. domesticus mtDNA contains five restriction sites for the enzyme that generates four fragments whereas M. spretus mtDNA contains eight restriction sites which generate seven fragments. Only two of the fragments are of the same size and presumably contain the same sequence information. Figure 4.4 depicts the results of the backcross experiments by Gyllenstein et al. (1985). Seven mtDNA samples were fragmented with HincII. Fragments were separated electrophoretically. Size is indicated in kb. Lanes 2-4 represent mtDNA from spreptus maternal lineage after eight generations of crossing to domesticus males. Lane 6 and

Extranuclear Genomes

4.7

7 represent mtDNA from domesticus maternal lineage after six generations of crossing to domesticus maternal lineage after six generations of crossing to spreptus males. The female contributes the mitocho-ndrial genome. These experiments have been repeated in other species, such as human, and the female has been shown to contribute the mitochondrial genome.

Analysis of Mitochondrial Genome High copy number and random segregation confound genetic analysis of mitochondrial genome. Xu et al. (2008) developed an efficient selection for heritable mitochondrial genome mutations in Drosophila. Targeting a restriction enzyme to mitochondria in germline compromised fertility, but escaper progeny Figure 4.4 Evidence that mitochondrial DNA carried homoplasmic mtDNA mutations lacking the can be maternally inherited. cleavage site. Thus germline expression of mitochondrial restriction enzymes creates powerful selection and has allowed direct isolation of mitochondrial mutants in a metazoan.

Mitochondrial RNA RNA plasmids have been discovered in corn mitochondria. Finnegan and Brown (1986) have discovered independently replicating single- and double-stranded RNA species in S-type maize mitochondria. Four species of RNA plasmids have been discovered. First two are relatively short (900 and 850 nucleotides) and about one-tenth of their concentrations are two long species (2,850 and 2,150 nucleotides). Their origin and function in mitochondria are not yet known.

Mitochondrial Diseases In the recent past, the study of mtDNA has attracted many researchers and it has been proved that mtDNA can mutate or get damaged and cause diseases. Some of the diseases due to different types of mutations in the mtDNA, according to Taylor and Turnbull (2005), are given in Table 4.3. Diseases caused by mtDNA are also inherited but not in the pattern similar to that of diseases caused by nuclear DNA. Mitochondrial DNA mutations are maternally inherited. According to law of probability in nature, the subsequent divisions will ultimately tend to produce cells with all mutated mitochondria and cells with all normal mitochondria. Lots of mystery regarding mitochondrial DNA is still uncovered and if enough attention is paid to this side, the human sufferings can be quite lessened and more interestingly, the average and maximum life span can be increased significantly. Now that researchers have linked several diseases to mutations in the mitochondrial genome, mtDNA’s role in other diseases is getting attention (Palca 1990). Some sites of disease-causing mutations, as reported by Wallace (1997), are indicated in Figure 4.5.

KINETOPLAST DNA Kinetoplasts are modified mitochondria located near to the base of flagella. Each protozoan contains a single kinetoplast which may be larger than the nucleus. The kinetoplast DNA consists of a large

Essentials of Molecular Genetics

4.8 Table 4.3 Some of the diseases due to mutations in mitochondrial DNA Defect in mitochondrial DNA Deletions

Multiple deletions Duplications Point mutations

Depletion of mitochondrial DNA

Undefined damage

Individual mutations Individual mutations

Disease* Kearns Dayre syndrome Chronic progressive external ephthalmoplegis Diabetes mellitus Heteroplasmy Imbalance of normal products Leber’s Hereditary Optic Neuropathy Myoclonic epilepsy and ragged red fibres Mitochondrial enesphalomyopathy lactic acidosis and strokelike episodes Infatile cytochrome oxidase deficiency Abnormality of chaperon HSP-60 Aging Alzheimer’s disease Dystonia Leigh’s syndrome Mitochondrialmyopathy Neurogenic muscle weakness, antaxia and retinitis pigmentosa (NARP) Pearson’s syndrome Sensorineural hearing loss Exercise intolerance

*Certain of these conditions can also be caused by nuclear mutations or other processes that hinder mitochondrial function.

Figure 4.5 Some sites of disease-causing mutations in human mitochondrial DNA

Extranuclear Genomes

4.9

number of interlocked "minicircles". They are comparable with small bacterial plasmids. The kinetoplast DNA (kDNA) of trypanosomes and other parasitic members of the order Kinetoplastida is organized as a complex network containing thousands of catenated circular DNA molecules. Hajduk et al. (1986) found that the kDNA of a free-living kinetoplastida, Bodo caudatus, exists as a noncatenated structure. The kDNA of B. caudatus represents about 40 per cent of the total cellular DNA, and the major component of this DNA exists as large circles of 10 and 12 kb. These circles are analogous to trypanosome kDNA minicircles despite their large size and noncatenated form. The kDNA of B. caudatus also contains a minor component of 19 kb which is transcribed. The 19-kb molecules are probably analogous to the maxicircles of trypanosomes. The properties of the B. caudatus kDNA suggest that the catenated network structure of trypanosome kDNA is not required for maxicircle segregation during kinetoplast division or for the expression of the maxicircle genome.

CENTRIOLE DNA Presence of DNA in centriole is still disputed with different experiments giving different results. Centrioles, however, seem to contain double-stranded DNA. The temporal relationship between cell cycle events and centriole duplication was investigated electron microscopically in L cells synchronized by mechanically selecting mitotic cells (Rattner and Phillips 1973). Like DNA, centrioles need to duplicate only once per cell cycle. Rogers et al. (2009) uncovered a long-sought mechanism that limited centriole copying, showing that it depended on the timely demolition of a protein that spurred the organelles' replication. Centrioles start reproducing themselves during G1 or S phase. What prevents the organelles from xeroxing themselves again and again has puzzled researchers for more than a decade. The process could be analogous to the mechanism for controlling DNA replication. There, a licensing factor prepares the DNA for duplication. During DNA synthesis, the factor gets tagged with ubiquitin molecules that prompt its destruction, thus preventing another round of copying (Rogers et al. 2009). The organelles are duplicated once only. Tumor cells often bypass the limit on centriole duplication, and the work suggests that drugs to restrict the organelles' replication might hold promise as cancer treatments (Leslie 2009).

CYANELLE DNA Cyanelles are photosynthetically active organelles, which resemble the cynaobacterium for many morphological and physiological features. Cyanelles are found in a few eukaryotes like Cynophora paradoxa. Cyanelle DNA resembles both chloroplast DNA and cyanobacteriun DNA. Cyanelles may represent a bridge between cynobacteria and chloroplasts. The DNA of cyanelles, which are described as endosymbiotic cyanobacteria from C. paradoxa (strain LB555UTEX), is equivalent to 127 kbp (Bohnert and Löffelhardt 1982). It is characterized by two inverted repeat segments, 10 kbp in size, which are separated from each other by long single-copy DNA segments of unequal size. This morphology of the chromosome is also found in the chloroplast DNA of most of the higher plants and some green algae. The cyanelle DNA exists in two forms of circular molecules which differ only in the orientation of the two single-copy DNA segments relative to each other. This is likely due to intramolecular recombination within the inverted repeat segments. Michalowski et al. (1991) reported the nucleotide sequence of gene nadA and its location on cyanelle (= plastid) DNA of the unicellular alga C. paradoxa. The gene is located on the circular cyanelle chromosome. It is flanked by gene apcD, and it is close to a recently identified SI0-spc ribosomal protein gene cluster. The 1,305-nucleotide sequence contained a single open reading frame specifying 329 codons.

4.10

Essentials of Molecular Genetics

PROMISCUOUS DNA Promiscuous DNA is defined as nucleotide sequence which occurs in more than one of the three membrane-bound organellar genetic systems of eukaryotic cells and should be distinguished from a wide range of prokaryotic cells (Ellis 1982). The phenomenon of this DNA transfer is known as intracellular promiscuity and it diminishes the separate identity of each genome. Various examples clearly demonstrate that the movement of DNA between the various genomes has occurred. The two organelles that arose were the mitochondria, found in all the eukaryotic cells, and chloroplasts found in plants and algae. Some of the genetic information of the prokaryotic cell was transferred to the nucleus of the eukaryotic cell. Chloroplast and mitochondrial sequences have been found in the nucleus of plant cells. Furthermore, chloroplast sequences have been found in the mitochondrial of plant cells. This is clear evidence that genetic control of certain biochemical functions was relinquished (or taken from) the progenitor cells. Figure 4.6 shows the flow of genetic information from organelles in a plant cell. Chloroplast and mitochondrial DNA before invading eukaryotic Figure 4.6 The flow of genetic information from cell were free-living primitive prokaryotes. In organelles in a plant cell (Redrawn, with permission, due course of time these entrapped organisms from www. ndsu.edu/.../plsc431/maternal/maternal3. lost their independence by losing genetic htm. Copyright © 1997 Phillip McClean) information to nuclear genome. The movement of DNA between mitochondria, chloroplasts and nucleus has been observed through homology studies in corn, yeast, mungbean, spinach, watermelon, cucumber, muskmelon, maize and peas. Buy and Riley (1967) reported sequence homology of 60-70 per cent between the nuclear DNA and kinetoplast DNA of Leishmania enrietti and 40 per cent between the nuclear and mitochondrial DNA of mouse liver cells by hybridizing the labeled nuclear DNA with unlabelled nuclear, kinetoplast and mitochondrial DNA. Stern and Londsdale (1982) reported that 12-kb DNA of maize mitochondrial DNA is homologous to part of the inverted repeat of the maize chloroplast DNA. This sequence contains genes for 16S rRNA, tRNAVal, tRNAIleu, and part of the tRNAAla. The 12-kb sequence of maize mitochondrial DNA gets altered in case of male sterile lines and homology for this maize mitochondrial DNA has also been reported for wheat, mungbean and Chlamydomonas chloroplast DNA. The nuclear DNA shows homology to mitochondrial DNA from middle of the var1 gene and the 3′-end of the cytochrome b gene (Farraly and Butow 1983). From sequence divergence estimations, the nuclear genes are supposed to have diverged from the present mitochondrial genes about 25 mya. Mitochondrial DNA sequence, termed as senDNA, is associated with or is related to senescence. The senDNA is also reported to be present in nuclear genome and the movement of senDNA sequence from mitochondria to nucleus is supposed to be a normal part of the senescence process (Wright and Commings 2004). In sea urchin, the nuclear DNA contains sequences homologous to both cytochrome oxidase subunit 1 gene and a portion of 16S rRNA of mitochondrial DNA. Cytochrome oxidase gene is present twice, one complete copy and other truncated version. In Locusta migratoria, the detectable homology to mitochondrial DNA and mitochondrial rRNA are reported (Fox 1983). Boogart et al. (1982) reported that the gene coding for the ATPase subunit 9 is present in the nucleus in Neurospora crassa while this gene is present in mitochondria in yeast. Hybridization studies have shown that gene for this protein is also present in mitochondria but is silent.

Extranuclear Genomes

4.11

COUPLED EXPRESSION OF NUCLEAR AND ORGANELLE GENOMES Transfer of genetic information during the evolution of eukaryotic cells has also required the development of cooperative gene expression systems between the organelle and the nucleus. Cytochrome oxidase is one of the mitochondrial enzymes involved in ATP generation. As stated above, subunits I, II, and III are encoded in the mitochondria. The remainder of the subunits of this seven subunit protein is encoded in the nucleus. The best studied example of coupled nuclear DNA/chloroplast DNA gene expression is for the protein RUBISCO (ribulose bisphoshpate carboxylase oxygenase). This is the enzyme that begins the fixation of atmospheric CO2 into sugar molecules. This enzyme has 16 (8 large and 8 small) subunits. The large subunits are encoded by the chloroplast DNA whereas the small subunits are encoded by nuclear DNA. Thus, for those proteins that are encoded by genes in two different cellular locations, gene expression has to be coordinated between the two locations for proper protein functioning.

REFERENCES Anderson, S., A.T. Bankier, B.G. Barrell, et al. 1981. Sequence and organization of human mitochondrial genome. Nature 290: 457-65. Andrews, R.M., I. Kubacka, P.F. Chinnery, et al. 1999. Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat. Genet. 23(2): 147. Baldauf, S.L., and J.D. Palmer, 1990. Evolutionary transfer of the chloroplast tueA gene to the nucleus. Nature 344: 262-4. Baur, E. 1908. Untersuchungen über die Erblichkeitsverhältnisse einer nur in Blastardform lebensfähigen Sippe von Antirrhinum majus. Zeits. Ind. Abst. Vereb. 1: 124-50. Bogorad, L. 1975. Evolution of organelles and eukaryotes genomes. Science 188: 891-8. Bohnert, H.J., and W. Löffelhardt. 1982. Cyanelle DNA from Cyanophora paradoxa exists in two forms due to intramolecular recombination. FEBS Lett. 150: 403-6. Boogart, P.U., J. Samallo, and E. Agsteribbe. 1982. Similar genes for a mitochondrial ATPase subunit in the nuclear and mitochondrial genomes of Neurospora crassa. Nature 298: 187-9. Broughton, R.E., J.E. Milam, and B.A. Roe. 2001. The complete sequence of the Zebrafish (Danio rerio) mitochondrial genome and evolutionary patterns in vertebrate mitochondrial DNA. Genome Res. 11: 1958-67. Brunk, C.F., A.B. Tran, L.C. Lee, and J. Li. 2001. Complete sequence of the mitochondrial genome of Tetrahymena thermophila and comparison with the mitochondrial genome of Tetrahymena pyriformis. GenBank accession no. AF396436. Buy, H.G., and F.L. Riley. 1967. Hybridization between the nuclear and kinetoplast DNA’s of Leiachmania enriettii and between nuclear and mitochondrial DNAs of mouse liver. Proc. Natl. Acad. Sci. USA 57: 790-7. Cao, S.Y., X.B. Wu, P. Yan, Y.L. Hu, X. Su, and Z.G.Jiang. 2006. Complete nucleotide sequences and gene organization of mitochondrial genome of Bufo gargarizans. Mitochondrion 6(4): 186-93. Chung, H.-J., .D.J. Jung, H.-W. Park, et al. 2006. The complete chloroplast genome sequences of Solanum tuberosum and comparative analysis with Solanaceae species identified the presence of a 241-bp deletion in cultivated potato chloroplast DNA sequence. Pl. Cell Rep. 5: 1369-79. Clifton, S.W., P. Minx, C.M. Fauron. et al. 2004. Sequence and comparative analysis of the maize NB mitochondrial genome. Plant Physiol. 136(3): 3486-503. Cohen, S.S. 1973. Mitochondria and chloroplasts revisited. Am. Scient. 61: 437-45. Correns, C. 1909. Verebungsversuche mit blass gelb gritineur und buntättrigen sippen bei Mirabilis, Utrica, und Lunaria. Zeits. Ind. Abst. Vereb. 1: 291-329. Crotty, W.J., and M.C. Ledbetter. 1962. Membrane continuities involving chloroplasts and other organelles in plant cells. Science 182: 839-41. Crozier, R.H., and Y.C. Crozier. 1993. The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization. Genetics 133: 97-117.

4.12

Essentials of Molecular Genetics

Dobzhansky, Th., F.J Ayala, G. L Stebbins, and J.W. Valentine. 1977. Evolution. San Francisco: W. H. Freeman and Company. Ellis, J. 1982. Promiscuous DNA – chloroplast genes inside plant mitochondria. Nature 299: 678-9. Farrelly, F. and R.A. Butow. 1983. Rearranged mitochondrial genes in the yeast nuclear genome. Nature 301: 296301. Finnegan, P.M., and G.G. Brown. 1986. Autonomously replicating RNA in mitochondria of maize plant with Stype cytoplasm. Proc. Natl. Acad. Sci. USA 83: 5175-9. Fox, T.D. 1983. Mitochondrial genes in the nucleus. Nature 301: 371-2. Gadaleta, G., G. Pepe, G. De Candia, C. Quagliariello, E Sbisa, and C. Saccone. 1989. The complete nucleotide sequence of the Rattus norvegicus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J. Mol. Evol. 28(6): 497-516. Gardner, E.J., M.J. Simmons, and D.P. Snustad, 2005. Principles of Genetics. Singapore: John Wiley & Sons. Gray, M.W. 1983. The bacterial ancestry of plastids and mitochondria. Bioscience 33(11): 693-8. Gray, M.W. 1993. Complete sequence of Chlamydomonas reinhardtii mitochondrial DNA (Unpublished) NCBI Genome Project. Direct Submission. Guo, X., S. Castillo-Ramírez, V. González, et al. 2007. Rapid evolutionary change of common bean (Phaseolus vulgaris L) plastome, and the genomic diversification of legume chloroplasts. BMC Genomics 8: 288. doi:10.1186/1471-2164-8-288. Gyllenstein, U., D. Wharton, and A.C. Wilson. 1985. Maternal inheritance of mitochondrial DNA during backcrossing of two species of mice. J. Hered. 76: 321-4. Hajduk, S.L., A.M. Siqueira, and K. Vickerman. 1986. Kinetoplast DNA of Bodo caudatus: a noncatenated structure. Mol. Cell Biol. 6(12): 4372-8. Hallick, R.B., L. Hong, R.G. Drager, et al. 1993. Complete sequence of Euglena gracilis chloroplast DNA. Nucl. Acids Res. 21 (15): 3537-44. Handa, H. 2003.The complete nucleotide sequence and RNA editing content of the mitochondrial genome of rapeseed (Brassica napus L.): comparative analysis of the mitochondrial genomes of rapeseed and Arabidopsis thaliana. Nucl. Acids Res. 31: 5907-16. Jacobs, H.T., and B. Grimes. 1986. Complete nucleotide sequences of the nuclear pseudogenes for cytochrome oxidase subunit I and the large mitochondrial ribosomal RNA in the sea urchin Strongylocentrotus purpuratus. J. Mol. Biol. 187: 509-27. Joy, D.A., X. Feng, J. Mu, et al. 2003. Early origin and recent expansion of Plasmodium falciparum. Science 300: 318-21. Juhasz,A., Z. Hamari, F. Kevei, and I. Pfeiffer. 2005. Aspergillus niger strain N909 complete mitochondrial genome (Unpublished). Kato, T., T. Kaneko, S. Sato, Y. Nakamura, and S. Tabata. 2000. Complete structure of the chloroplast genome of a legume, Lotus japonicas. DNA Res. 7(6): 323-30. Koussevitzsky, S., A. Nott, T.C. Mockler, et al. 2007. Signals from chloroplasts converge to regulate nuclear gene expression. Science 316: 715-9. Lee, J.-S., Y.-S. Kim, S.-H. Sung, J.-S. Hwang, D.-S. Lee, and Suh,D.-S. 1999. Bombyx mori mitochondrion, complete genome. Submitted to the EMBL/GenBank/DDBJ databases. Lee, S.-B., C. Kaittanis, R.K. Jansen, et al. 2006. Complete chloroplast genome sequencing of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms. BMC Genomics 7: 61 61doi:10.1186/ 1471-2164-7-61. Leslie, M. 2009. Birth control for centrioles. J. Cell Biol. 184: 186. Lewis, D.L., C.L. Farr, and L.S. Kaguni. 1995. Drosophila melanogaster mitochondrial DNA: completion of the nucleotide sequence and evolutionary comparisons. Insect Mol. Biol. 4(4): 263-78. Maier, R.M., K. Neckermann, G.L Igloi, and H. Kössel. 1995. Complete sequence of the maize chloroplast genome: gene content, hotspots of divergence and fine tuning of genetic information by transcript editing. J. Mol. Biol. 251: 614-28. Mardanov, A.V., N.V. Ravin, B.B. Kuznetsov, et al. 2008. Complete sequence of the duckweed (Lemna minor) chloroplast genome: structural organization and phylogenetic relationships to other angiosperms. J. Mol. Evol. 66: 555-64.

Extranuclear Genomes

4.13

Marienfeld, J., M. Unseld, P. Brandt, and A. Brennicke. 1996. Genomic recombination of the mitochondrial atp6 gene in Arabidopsis thaliana at the protein processing site creates two different presequences. DNA Res. 3(5), 287-90. Michalowski, C.B., R. Flachmann, W. Loffelhardt, and H.J. Bohnert. 1991. Gene nadA encoding quinolinate synthetase, is located on the cyanelle DNA from Cyanophora paradoxa. Plant Physiol. 95: 329-30. Nosek, J., L. Tomáska, H. Fukuhara, Y. Suyama, and L. Kovác. 1998. Linear mitochondrial genomes: 30 years down the line. Trends Genet. 14(5): 184-8. Notsu,Y., Masood,S., Nishikawa,T., et al. 2002. The complete sequence of the rice (Oryza sativa L.) mitochondrial genome: frequent DNA sequence acquisition and loss during the evolution of flowering plants. Mol. Genet. Genomics 268(4): 434-45. Ogihara, Y., K. Isono, T. Kojima, et al. 2000. Chinese Spring wheat (Triticum aestivum L.) chloroplast genome: Complete sequence and contig clones. Pl. Mol. Biol. Rep. 18: 243-53. Ohyama, K., H. Fukuzawa, T. Kohchi, et al. 2003. Chloroplast gene organization deduced from complete sequence of liverwort Marchantia polymorpha chloroplast DNA. Nature 322: 716-21. Okimoto, R., J.L. Macfarlane, and D.R.Wolstenholme. 1990. Evidence for the frequent use of TTG as the translation initiation codon of mitochondrial protein genes in the nematodes, Ascaris suum and Caenorhabditis elegans. Nucl. Acids Res. 18: 6113-8. Okumura, S., M. Sawada, Y.W. Park, et al. 2006. Transformation of poplar (Populus alba) plastids and expression of foreign proteins in tree chloroplasts. Transgenic Res. 1: 637-46. Palca, J. 1990. The other human genome. Science 249: 1104-5. Palmer, J.D., and D. Zamir. 1982. Chloroplast DNA evolution and phylogenetic relationships in Lycopersicon. Proc. Natl. Acad. Sci. USA 79: 5006-10. Raff, R.A. and H.R. Mahler. 1972. The nonsymbiotic origin of mitochondria. Science 177: 575-82. Rattner, J. B., and S.G. Phillips. 1973. Independence of centriole formation and DNA synthesis. J. Cell Biol. 57: 359–72. Roe, B.A., D.P. Ma, R.K. Wilson, and J.F.Wong. 1985. The complete nucleotide sequence of the Xenopus laevis mitochondrial genome. J. Biol. Chem. 260: 9759-74. Rogers, G.C., N.M. Rusan, D.M. Roberts, M. Peifer, and S.L. Rogers 2009. The SCFSlimb ubiquitin ligase regulates Plk4/Sak levels to block centriole reduplication. J. Cell Biol. 184: 225-39. Ruhlman, T., S.-B. Lee, R.K. Jansen, et al. 2006, Complete plastid genome sequence of Daucus carota: Implications for biotechnology and phylogeny of angiosperms. BMC Genom. 7:222. doi:10.1186/1471-21647-222. Saski, C., S.B. Lee, H. Daniell, et al. 2005. Complete chloroplast genome sequence of Gycine max and comparative analyses with other legume genomes. Plant Mol. Biol. 59: 309-22. Sato, S., Y. Nakamura, T. Kaneko, E. Asamizu, and S. Tabata. 1999, Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6: 283-90. Shinozaki, K., M. Ohme, M. Tanaka, et al. 1986. The complete nucleotide sequence of the tobacco chloroplast genome: its gene organization and expression. EMBO J. l 5: 2043-9. Sinclair, J.H., and B.J. Stevens. 1966. Circular DNA filament from mouse mitochondria. Proc. Natl. Acad. Sci. USA. 56: 508-14. Steane, D.A. 2005, Complete nucleotide sequence of the chloroplast genome from the Tasmanian Blue Gum, Eucalyptus globulus (Myrtaceae). DNA Res. 12(3): 215-20. Stern, D.B., and D.M. Lansdale. 1982. Mitochondrial and chloroplast genomes of maize have a 12-kilo base DNA sequence in common. Nature 299: 698-702. Sugiyama, Y., Y. Watase, M. Nagase, et al. 2005. The complete nucleotide sequence and multipartite organization of the tobacco mitochondrial genome: comparative analysis of mitochondrial genomes in higher plants. Mol. Genet. Genom. 272(6): 603-15. Taylor, R.W., and D.M. Turnbull. 2005. Mitochondrial DNA mutations in human disease. Nat. Rev. Genet. 6(5): 389-402. Wakasugi, T., J. Tsudzuki, S. Ito, K. Nakashima, T. Tsudzuki, and M. Sugiura. 1998. Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii. Proc. Nat. Acad. Sci. USA 91: 9794-8. Wallace, D.C. 1997. Mitochondrial DNA and diseases. Scient. Am. 277: 40-7.

4.14

Essentials of Molecular Genetics

Wei, W., J.H. McCusker, R.W. Hyman, et al. 2007. Genome sequencing and comparative analysis of Saccharomyces cerevisiae strain YJM789. Proc. Natl. Acad. Sci. USA 104: 12825-30. Wright, R.M. and D.J. Cummings. 2004. Transcription of a mitochondrial plasmid during senescence in Podospora anserine. Curr. Genet. 7: 457-64. Xu, H., S.Z. DeLuca, and P.H. O’Farrell, 2008. DNA events manipulating the metazoan mitochondrial genome with targeted restriction enzymes. Science 321: 575-7. Yu, J., S. Hu J, Wang, et al. 2005, The genomes of Oryza sativa: a history of duplications. PLoS Bio. 3(2):e38. Epub. Feb 1. Zhang, D.-P. 2007. Signaling to the nucleus with a loaded GUN. Science 316: 700-1.

PROBLEMS 1. 2. 3. 4.

When we talk about genetic material in eukaryotes, do we mean only nuclear DNA or we also include DNA present in the cell organelles. Give reasons for your answer. What is difference in mode of transmission of DNA present in nucleus, mitochondria and chloroplasts from parents to offspring? Since mitochondria and chloroplasts have their own DNA why don’t these cell organelles perpetuate outside the cell? For genetic improvement of animals and plants we normally work on nuclear DNA. Do you think we can also bring about genetic improvement by working on mitochondrial and chloroplast DNA?

5 Organization of the Genetic Material Organization of the genetic material refers to arrangement of the genetic material (deoxyribonucleic acid or ribonucleic acid) in the cell. Genome is defined as the complete set of chromosomes inherited from a single parent; the complete DNA component of an individual; the definition often excludes organelles. The organization of total sum of genetic information of an organism is in the form of double-stranded DNA, except that viruses may have single-stranded DNA, double-stranded DNA, single-stranded RNA or double-stranded RNA genomes. In many viruses and prokaryotes, the genome is a single linear or circular molecule. In eukaryotes, the nuclear genome consists of linear chromosomes (usually as a diploid set) and the mitochondrial (in animals and plants) and chloroplast (in plants) genomes are small circular DNA molecules. In general, genome size increases with organism’s complexity. Information about DNA content in terms of kilobase pairs (kbp) in some organisms commonly used in molecular genetic studies is given in Table 5.1.

DNA KINETICS Molecular Weight of DNA Relative molecular weights of four deoxyribonucleotides present in DNA are: 2′-deoxyadenosine 5′monophosphate (C10H14N5O6P) = 331.2213, 2′-deoxythymidine 5′-monophosphate (C10H15N2O8P) = 322.2079, 2′-deoxyguanosine 5′-monophosphate = (C10H14N5O7P), 347.2207, and 2′-deoxycytidine 5′monophosphate (C9H14N3O7P) = 307.1966 (Doležel et al. 2003). Relative weights of nucleotide pairs are: AT = 653.4 and GC = 654.4. If the ratio of AT to GC pairs is assumed to be 1:1, the mean relative weight of one nucleotide pair is 653.9. The relative molecular weight may be converted to an absolute value by multiplying it by the atomic mass unit (1 u), which equals one-twelfth of a mass of 12C, i.e., 1.660539 × 10–27 kg. Consequently, the mean weight of one nucleotide pair would be 1.0855 × 10–9 pg, and 1 pg of DNA would represent 0.921 × 109 base pairs (bp). The formulas for converting the number of nucleotide pairs (or base pairs) to picograms of DNA and vice-versa are: genome size (bp) = (0.921 × 109) × DNA content (pg); DNA content (pg) = genome size (bp)  (0.921 × 109); 1 pg = 921 Mb.

C-Value In molecular terms, genome is usually expressed as the amount of DNA per haploid cell (usually expressed as picograms, pg) or the number of kilobases (kb) per haploid cell and is known as the Cvalue. There is enormous variation in C-value from as little as 106 bp for a mycoplasma to as high as 1011 bp for some plants and amphibians. There is increase in minimum genome size as the complexity increases in different evolutionary phyla. A dramatic example of the range of C-values can be seen in

5.2

Essentials of Molecular Genetics

Table 5.1 DNA content in terms of C-value in different organisms (Adapted from www.garlandscience.com/ textbooks/0815341385/.../G3%20Ch7.pdf) Organism φX174 Lambda phage Escherichia coli Saccharomyces cerevisiae Neurospora Aspergillus nidulans Tetrahymena pyriformis Caenorhabditis elegans Drosophila melanogaster Bombyx mori (silkworm) Strongylocentrotus purpuratus (sea urchin) Locusta migratoria (locust) Takifugu rubripes (pufferfish Homo sapiens Mus musculus (mouse) Arabidopsis thaliana (vetch) Oryza sativa (rice) Zea mays (maize) Pisum sativum (pea) Triticum aestivum (wheat) Fritillaria assyriaca (fritillary)

C value (DNA content in kilobases per haploid genome) 0 5.4  10 1 4.5  10 5 4.2  10 4 1.21  10 4 2.7  10 5 2.54  10 5 1.9  10 4 9.7  10 5 1.8  10 5 4.9  10 5 8.45  10 6 5.0  10 5 4.0  10 6 3.2  10 6 3.3  10 5 1.25  10 5 4.66  10 6 2.5  10 6 4.8  10 7 1.6  10 7 1.2  10

the plant kingdom where Arabidopsis (125 Mb/haploid genome) represents the low end and lily (36 Gb/haploid genome) the high end of complexity. In terms of weight, this value is 0.07 pg per haploid Arabidopsis genome and 100 pg per haploid lily genome.

C-Value Paradox One immediate feature of eukaryotic organisms highlights a specific anomaly that was detected early in molecular research. Even though eukaryotic organisms appear to have 2-10 times as many genes as prokaryotes, they have many orders of magnitude more DNA in the cell. Furthermore, the amount of DNA per genome is correlated not with the presumed evolutionary complexity of a species. Each human diploid cell measures 174 cm. If a polypeptide has 300 amino acids, this will need 1,000 bp for its synthesis, which is 0.67 × 106 Da. If haploid genome content in humans as estimated by one study, is 2.8 × 107 bp, which is 1.8 × 1012 Da, then this genome size should contain 3 × 106 genes. However, estimated number of genes in humans is 32,000 to 35,000. This is stated as the C-value paradox: the amount of DNA in the haploid cell of an organism is not related to its evolutionary complexity. Another important point to be kept in mind is that there is no relationship between the number of chromosomes and the presumed evolutionary complexity of an organism. Moore (1984) describes the concept of C-value paradox in detail. The C-value paradox expresses the existence of time puzzling features. There may be large variation between species. For example, in amphibians, smallest genome is 10 9 bp while the largest is 1011 bp. If the number of genes is roughly similar, most of the DNA in the species with the largest genome cannot be conserved with proteins. What can be its function? There is apparent excess of DNA

Organization of the Genetic Material

5.3

compared with the amount that could be expected to code for proteins. It is often referred to as problem of excess eukaryotic DNA. We know that genes are much larger than the sequences needed to code for the proteins. Two models that have been proposed to explain C-value paradox are: master and slave hypothesis, and repetitive and unique sequences. According to master and slave hypothesis, a cell has several copies of a gene, one of it being the master gene and the others being slave genes. Repetitive and unique sequences hypothesis envisages that only small proportion of DNA is meant for genes carrying the genetic information and rest of it has some other functions like control of gene activity. Now we know that genes in higher organisms are present both in repeated and unique sequences, and amplification of all the genes is not necessarily found.

Cot Curve Denaturation and renaturation property of DNA is used to determine the sequence complexity of any genome. When DNA is subjected to heat or treated with some denaturants, denaturation of DNA strands occurs. When after heat treatment, DNA is allowed to cool down or in case of treatment with a denaturant, the denaturant is removed, renaturation, also called reannealing, of DNA occurs. When denaturized DNA is mixed with complementary RNA, hybridization occurs. Rate of hybridization depends upon (a) concentration of DNA molecule (Co), (b) time given for renaturation (t) and extent of homology between molecules to be hybridized. More is the homology faster is rate of hybridization. The renaturation curve we get is known as Cot curve. Cot curves for several genomes are shown in Figure 5.1, which plots fraction of the DNA that has reassociated (1 – C/Co) against the log of Cot. The form of each curve is similar but the curve in each case is very different. In the Cot curve, the Y-axis represents the per cent of the DNA that remains single-stranded. This is expressed as a ratio of the concentration of single-stranded DNA (C) to the total concentration of the starting DNA (C o). The Xaxis represents a log-scale of the product of the initial concentration of DNA (in moles/liter) multiplied

Figure 5.1 Reassociation patterns of double-stranded DNA from various sources. All curves are typically S-shaped indicating homogeniety but they are displaced to different values of C ot due to genome size (Redrawn, with th permission, from Gupta, P.K. 2009. Genetics 4 edition. Meerut: Rastogi Publ.)

5.4

Essentials of Molecular Genetics

by length of time the reaction proceeded (in seconds). The curve itself is called a C ot curve. As can be seen the curve is rather smooth which indicates that reannealing occurs gradually over a period of time.

Cot½ One particular value that is useful is Cot½ – the Cot value where half of the DNA has reannealed. The steps involved in DNA denaturation and renaturation experiments are: (a) Shear the DNA to a size of about 400 bp. (b) Denature the DNA by heating to 100 °C. (c) Slowly cool and take samples at different time intervals. (d) Determine the per cent single-stranded DNA at each time point. The shape of a Cot curve for a given species is a function of two factors – the size or complexity of the genome; and the amount of repetitive DNA within the genome. If we plot the "C ot" curves of the genome of three species such as bacteriophage lambda, Escherichia coli and yeast we will see that they have the same shape, but the Cot½ of the yeast will be the largest, E. coli next and lambda the smallest. Physically, larger the genome size longer it will take for any one sequence to encounter its complementary sequence in the solution. This is because two complementary sequences must encounter each other before they can pair. More complex the genome (i.e., more the unique sequences that are available), longer it will take for any two complementary sequences to encounter each other and pair. Given similar concentrations in solution, it will then take a more complex species longer to reach C ot½. Through the analysis of DNA renaturation, the large-sized eukaryotic genomes reveal repeated DNA sequences that are fast annealing and unique (non-repeated) DNA sequences that are slow annealing. For example, calf DNA has ~40 per cent repeated and ~60 per cent non-repeated DNA. Eukaryotic genomes actually have a wide array of sequences that are represented at different levels of repetition. Genomes that contain these different classes of sequences reanneal in a different manner than genomes with only single-copy sequences. Instead of having a single smooth Cot curve, three distinct curves can be seen, each representing a different repetition class (Figure 5.2). The first seque-

Figure 5.2 Reassociation kinetics of eukaryotic DNA showing calculation of complexity and repetition frequency (Redrawn, with permission, from Gupta, P.K. 2009. Genetics. Merrut: Rastogi Publ.)

Organization of the Genetic Material

5.5

nces to reanneal are the highly repetitive sequences because so many copies of them exist in the genome, and because they have a low sequence complexity. The second portion of the genome to reanneal is the middle repetitive DNA, and the final portion to reanneal is the single-copy DNA. Repeated DNA sequences have distinctive effects on Cot curves. If a specific sequence is represented twice in the genome it will have two complementary sequences to pair with and as such will have a Cot value half as large as a sequence represented only once in the genome. Cot values for three types of sequences in whole nuclear genome of eukaryotic DNA is made up of three separate C ot curves whose Cot½ values are indicated (Figure 5.3). On the basis of the Cot½ values and the propor-

Figure 5.3 Cot curve of mouse DNA. Vertical lines indicate approximate Cot½ values of three apparent segments that make up this single Cot curve

tion of the genome that each segment comprises, the degree of repetitiveness of each segment can be determined. The unique DNA makes up most of the structural genes, much of it is transcribed. Eukaryotic genome can be classified into various components on the basis of their abundance and function (Twyman 1998) (Table 5.2). Samples are drawn at regular interval. Double-stranded DNA formed upon renaturation can be measured by the following techniques: (i) Single-stranded DNA has a higher optical density (O.D.) at a specific wave length than double-stranded DNA. O.D. would decline with time, and change in O.D. may be used to estimate the amount of double-stranded DNA formed (Figure 5.4). (ii) DNA sample may be passed through hydroxyapatite column which retains only double-stranded DNA and single-stranded DNA passes through. Thus the amount of DNA retained can be easily known. (iii) If the sample is treated with S1 nuclease, which digests single-stranded DNA, DNA left in the sample will be double-stranded whose quality could be known. (iv) In general, larger the fragment size, larger is the time taken for renaturation. Renaturation curve of human DNA is different from that of E. coli as man contains highly repetitive, moderately repetitive and non-repetitive (unique) DNA sequences.

Reassociation Kinetics Reassociation kinetics of DNA depends on sequence complexity. Sequence complexity in eukaryotic DNA can be studied by using DNA-DNA denaturation, renaturation and hybridization techniques. When DNA is denatured by heating, it unwinds into single-strands. Reassociation between complementary sequences of DNA occurs by pairing in the reverse process of denaturation, the process known as renaturation. The rate of renaturation is dependent on the concentration of nucleotide

Essentials of Molecular Genetics

5.6

Table 5.2 Eukaryotic genome components classified by abundance and by function DNA class By abundance Unique sequence (single-copy, lowcopy, non-repetitive DNA) Moderately repetitive DNA

Highly repetitive DNA

By function Genic DNA

Regulatory DNA

Intergenic DNA, spacer DNA Satellite DNA

Selfish DNA

Junk DNA

Definition Sequences present as one or a very few copies per genome. Contains most genes and includes introns, regulatory sequences and other DNA of unknown function. Sequences present 10-10,000 copies per genome. Generally dispersed repeats corresponding to highly conserved multigene families (functional genes and pseudogenes) and transposable elements. Occasionally clustered. Sequences present 100,000-1,000,000 copies per genome. Generally found as tandem repeats although some superabundant (dispersed) transposable elements also fall into this class, e.g., Alu elements. Genes, i.e., DNA which is expressed. Genic DNA may be further classified as mDNA (protein encoding), rDNA, tDNA, snDNA, etc. representing the different classes of gene product. DNA whose role is the regulation of gene expression (e.g., promoters, enhancers) or the regulation of DNA function (e.g., origins of replication, matrix-associated regions). Introns and the DNA which separate genes from each other. Highly repetitive DNA found near centromeres, telomeres and at other sites. Some satellite DNA may play a role in chromosome function. DNA whose role appears to be to mediate its own replication and survival within the genome, e.g., some satellite DNA, and transposable elements. DNA with no assigned function.

Figure 5.4 Estimates of amount of double-stranded DNA

strands and their sequences given that temperature of renaturation is kept constant and the sample is broken into small pieces. So the kinetics of the reassociation reaction reflects variety of sequences that are present, so the reaction cannot be used to quantitate genes and their RNA products. When performed in solution, such reactions are described liquid hybridization.

Organization of the Genetic Material

5.7

Renaturation of DNA depends on random collision of the complementary strands and follows second order kinetics. The rate of reaction is governed by the equation dC/dt = –KC2, where C is the concentration of DNA that is single-stranded at time t, and K is reassociation rate constant. By integrating this equation between the limits of Co at t = 0, and C at a time t½, when reaction is half complete, we get the following expression: C Co

= ½ =

1 1 + K.Co.t½

Simplifying above equation, we get K=

1 Cot½

Since Cot½ is the product of the concentration and time required to proceed half the way, a greater Cot½ implies slower reaction. Cot½ is directly related to the amount of DNA in the genome. This reflects a situation in which, as the genome becomes more complex, there are fewer copies of any particular sequence within a given mass of DNA. For example, if concentration (C o) of DNA is 12 pg per unit volume, it will contain 3,000 copies of each sequence in the bacterial genome. Contrary to this, if a eukaryotic genome is 4 pg, then there will be only 3 copies in 12 pg, i.e., concentration is 1,000 times less than in bacterial genome, provided there is no repetition of sequence in the genome of 4 pg. Since the rate of reassociation depends on the concentration of complementary sequence, for the eukaryotic sequence to be present at the same relative concentration as the bacterial sequence it is necessary to have 750× more DNA. Thus Cot½ of the eukaryotic reaction is 750× the Cot½ of the bacterial reaction. The Cot½ of a reaction therefore indicates the total length of different sequences that are present. This is described as complexity, usually given in base pairs. Cot½ will indicate the length of all the different sequences (represented only once) in a genome which will be less than the length of total DNA in a genome when there is repetition. This will be described as kinetic complexity of the genome. Renaturation of DNA of any genome should display a Cot½ that is proportional to its complexity. Thus complexity of any DNA can be determined by comparing its Cot with that of standard DNA of known complexity. Usually E. coli is used as standard. Its complexity is taken to be identical with the length of its genome (implying that every sequence in the E. coli genome of 4.2 × 106 bp is unique). Following relationship is used to calculate kinetic complexity. Cot½ of DNA of any genome Cot½ of E. coli genome (=9)

=

Kinetic complexity of genome =

Kinetic complexity of the genome 4.2 × 106 bp 4.2×106 bp 9

× Cot½ of genome

Chemical complexity of a DNA sequence can be determined if chemical complexity of the genome is known. For example, if genome size is 12 × 108 bp and the component represents 25 per cent of the genome, then the chemical complexity of this component is 3 × 108 bp. If kinetic complexity is known from the equation given earlier (4.2 × 106 bp  9 × Cot½ of genome), then repetition frequency (f) of repetitive DNA component can be determined using the following formula:

Essentials of Molecular Genetics

5.8 f=

Chemical complexity Kinetic complexity

=

Cot½ of non-repetitive DNA Cot½ of repetitive DNA

Above exercise results to determine the size of three different DNA components and their Cot½ values along with their kinetic complexity and the repetition frequencies.

ORGANIZATION OF GENETIC MATERIAL Prokaryotic and eukaryotic chromosomes are similar with respect to two features – unineme structure and semi-conservative DNA replication. Unineme structure. Single continuous length of DNA double helix forms one chromosome. Genes are arranged in prokaryotic and eukaryotic chromosomes in linear order. T4 phage and E. coli confirmed this view for prokaryotes. The unineme structure of rat lampbrush chromosomes was supported by electron microscope studies of Miller and Beatty (1969a,b). They showed that after digestion the remaining thickness of lateral loops was about 20Å. Semi-conservative DNA replication. In E. coli, semi-conservative mode of DNA replication was confirmed by Meselson and Stahl (1958). In Vicia faba, semi-conservative mode of DNA replication was shown by Taylor et al. (1957). Several differences in organization of prokaryotic and eukaryotic genomes are discussed here with respect to split versus non-split nature of gene, coding versus noncoding DNA, application of operon concept, polycistronic versus unicistronic mRNAs, nature of regulatory control, unicellularity versus multicellularity, singularity versus plurality of chromosomes, DNA packaging, haploidy versus diploidy, unique versus repeated sequences, and colinearity between gene and its product. Split versus non-split nature of gene. In prokaryote, nature of gene is non-split and no processing is required for a transcript to become mRNA. In eukaryotes, most of the genes are split in nature, i.e., coding sequences are separated by noncoding sequences. Transcript requires to be processed in the nucleus before finished mRNA is obtained. Exceptions to the general rule that prokaryotic genes are non-split and eukaryotic genes are split are known. Coding versus noncoding DNA. In prokaryotes, DNA codes for proteins, tRNA, or rRNA. Average gene size is 1,000 bp. Noncoding DNA is mostly promoters. Generally no introns are present in prokaryotes. In eukaryotes, most DNA is noncoding (introns); however, some eukaryotic genes do not have introns. Average gene size is 27,000 bp. Application of operon concept. Metabolically-related genes are linked together in prokaryotes but in eukaryotes, metabolically-related genes are not contiguous, in fact they may even be located in different chromosomes. That is why operon concept applies only to prokaryotes but not to eukaryotes. There are many exceptions to this general rule. Polycistronic versus unicistronic mRNAs. Messenger RNAs are polycistronic in prokaryotes, i.e., one mRNA may code for two or more different polypeptide chains. On the other hand, eukaryotic mRNAs are unicistronic, i.e., each mRNA codes for only one polypeptide chain. Nature of regulatory control. In prokaryotes, all the genes are always active and in this case regulatory mechanisms are designed to turn off those genes whose product is not required by the cell. In eukaryote, all genes are normally inactive unless turned on by gene regulatory mechanisms. In prokaryotes, the genes for enzymes involved in a particular pathway are linked in operon fashion. They are under the control of the same promoter. In eukaryotes, genes are scattered amongst different chromosomes. Each eukaryotic gene is under the control of its own promoter.

Organization of the Genetic Material

5.9

Unicellularity versus multicellularity. Prokaryotes are unicellular organisms where the genome is not separated from cytoplasm. Eukaryotes may be unicellular or multicellular. Nuclear DNA, mitochondrial DNA, and chloroplast DNA are separated from one another through membranes. Singularity versus plurality of chromosomes. Main genetic element carried by a prokaryotic cell is single circular 1 mm long double-stranded DNA molecule called nucleoid. In addition to main chromosome, there may be one or more small-sized circular double-stranded molecules called plasmids. Plasmids are negatively supercoiled DNA molecules that carry usually nonessential (often drug resistance) genes. Plasmids are autonomously replicating when lying independently of the bacterial chromosome. Some plasmids have the ability to get integrated in to the main bacterial chromosome. In this situation, plasmid DNA will replicate along with main bacterial chromosome. Plasmids have been very useful in construction of recombinant DNA molecules. In addition to nucleoid and plasmids, some bacteria carry bacteriophages. In eukaryotes, genetic material is organized into a number of large chromosomes and sometimes a few small (supernumerary) chromosomes. In eukaryotes, DNA is coiled around nucleosome particles. In eukaryote, in addition to nucleus, genetic material is present in cell organelles like mitochondria, chloroplasts and kintoplasts. DNA packaging. Although length of DNA molecule of E. coli is 1 mm, it occupies a diameter of at most only 1 m. Bacterial chromosome is folded in order to remain within this confined space in absence of nuclear membrane. Prokaryotes package their DNA in bacterial chromosomes and plasmids. The bacterial chromosome is localized in the nucleoid region of the cell (no nucleus) and is looped into negative coils. The loops are 50-100 kbp in length (similar to eukaryotic chromosomes) which are held in place by RNA and small basic (histone-like) proteins. In eukaryotic chromosome, DNA is complexed with basic (histone) and acidic (non-histone) proteins. Number of chromosomes is characteristic of a species. Haploidy versus diploidy. Prokaryotic chromosomes exit in single-copy and that is why they are called haploids. In eukaryotes (except some haploid eukaryotes like fungi), chromosomes exist in pairs and, therefore, called diploids. Doubling the chromosome number in eukaryotes is probably responsible for their greater complexity. Unique versus repeated sequences. In bacteria, most of the genome is unique sequence DNA, representing genes and regulatory elements. Some genes and other sequences are repetitious, but the copy number (or repetition frequency) is generally low, usually less than 10. Examples of such repetitious sequences include the ribosomal RNA genes (there are seven in E. coli) and transposable elements such as IS elements. Occasionally, certain sequence motifs may be moderately repetitive. In 1.8 Mbp Haemophilus influenzae genome, there are about 1,500 copies of the 30-bp DNA uptake site. Similarly, E. coli genome contains many copies of two repetitive elements enterobacterial repeated internal sequences (ERIC) and repeated extragenic palindrome (REP). Together, however, repetitive DNA accounts for less than one per cent of bacterial genomes and genome size is a direct reflection of complexity. Eukaryotes carry very large portion of repetitive sequences of DNA in their genomes. Colinearity. In prokaryotes, there is perfect correspondence (called colinearity) between a DNA sequence (gene), its RNA transcript (mRNA), and the polypeptide chain (product). The sequence of the amino acids in a polypeptide chain corresponds to the sequence of the codons coding for those amino acids. Moreover, the distance between any two amino acids is proportional to the distance between the codons coding for them. In eukaryotes, the nuclear codon and amino acid sequences coded by them are similar to those in prokaryotes. However, the distance between any two amino acids is not always proportional to the distance between the codons coding for them. In many eukaryotic proteins, two specific amino acids are situated close together but the codons coding for them are located far

5.10

Essentials of Molecular Genetics

apart in the gene. The entire eukaryotic gene is transcribed. The product is the primary RNA transcript, which itself does not serve as a functional RNA. Rather it serves as a precursor of a functional mRNA molecule. Some parts of the RNA transcript (introns) are excised out and the remaining segments (exons) are joined together to yield mRNA. On mRNA template, translation of polypeptide chain takes place. Whereas all intervening sequences (introns) are noncoding, all exons are not coding with respect to a particular gene product. The first exon, either partly or entirely, may be forming the leader sequence of the mRNA, and is, therefore, noncoding partly or wholly. The 3'-part of the mRNA is also noncoding. Amino acids coded by the same exon are separated from one another by a distance corresponding to the distance between the codons. However, amino acids coded by different exons are separated from each other by a distance, which is much less than the distance separating the codons coding for them in the gene. In some cases, the three nucleotides of a single codon are split by an intervening sequence. In many cases the parts of a eukaryotic gene made up of exons is much less than the parts contributed by the introns. Eukaryotic genomes also carry large portion of DNA sequences whose function is not clearly established.

PROKARYOTIC GENOME Structural organization of the prokaryotic genes within the genome was considered to be simple but discovery of overlapping genes in bacteriophages, animal viruses, bacterial insertion sequences and split genes in prokaryotic genomes suggest that organization of prokaryotic genome is quite complex. Large-scale rearrangement of bacterial genome has created enormous genetic variation. Illegitimate recombination can also alter the genetic material and has played significant role in the organization of the prokaryotic genome. It also regulates gene expression. Illegitimate recombination events are largely affected by DNA sequences which shift their position within and in between chromosomes, called transposable elements, bacteriophages and retroviruses. The cell must handle amounts of DNA that are many times longer than the cells they are in. DNA packaging must be very efficient while still allowing for DNA replication and transcription to occur. Most of the well-characterized prokaryotic genomes consist of double-stranded DNA organized as a single circular chromosome 0.6-10 Mb in length and one or more circular plasmid species of 2 kb-1.7 Mb. The past few years, however, have revealed some major variations in genome organization (Campbell 1993). In addition, a recent accumulation of data has shown that the location and orientation of the genes and repeated sequences (including prophages and transposons) on and among these elements is not always random. Some of the non-randomness is probably the result of unique historical events; in other cases it reflects selection for the optimization of function. Most prokaryotic genomes have very few repeat sequences – there are virtually none in the 1.64 Mb genome of Campylobacter jejuni NCTC11168 (Parkhill et al. 2000b). The meningitis bacterium Neisseria meningitidis Z2491, which has over 3,700 copies of 15 different types of repeat sequence, collectively making up almost 11 per cent of the 2.18 Mb genome (Parkhill et al. 2000a).

EUKARYOTIC GENOME When DNA of eukaryotic genome is characterized by reassociation kinetics, usually the reaction occurs over a range of Cot values having upto eight orders of magnitudes. This is much broader than the 100-fold range expected. The reason for this is that the expectations are based on single kinetically

Organization of the Genetic Material

5.11

pure reassociating component. A genome actually includes several such components, each reassociating with its own characteristic kinetics. Let us once again refer to Figure 5.2 which shows reassociation of a hypothetical eukaryotic genome starting at Cot of 10–4 and terminating at Cot value of 104. The reaction falls into three distinct phases, shown in three shaded areas I, II, and III. A plateau separates phases I and II but phases II and III overlap slightly. Each of these phases represents different kinetic component. Fast component. The first fraction (I) to reassociate represents 25 per cent of the total DNA renaturing with Cot value of 10–4 and ~2 × 10–2 with a Cot½ value of 0.0013. Intermediate component. The second fraction (II) represents 30 per cent of the DNA. It renatures between Cot values of ~2 × 10–2 and 102 with a Cot½ value 1.9. Slow component. The last fraction (III) to renature represents 45 per cent of total DNA. It extends over Cot range from ~102 to 104 with Cot½ value of 630. The genes in eukaryotic chromosomes are DNA sequences, which are categorized into unique and repetitive sequences. The repeat sequences are further classified as moderately repetitive (dispersed repeats) and highly repetitive sequences (clustered repeats). The clustered repeats are known to be mobile as their location in the genome may vary in individuals of the species. Clustered repeats are present in tandemly repeated arrays and some clustered repeats are transcriptionally active. To calculate complexities, each fraction must be treated as independent kinetic component whose reassociation is compared with standard DNA. Slow component represents 45 per cent of DNA, so its concentration in the reassociation reaction is 0.45 of the measured concentration (Co). Thus the Cot½ of the slow fraction alone is 0.45 × 630, i.e., 283. So if the slow component is isolated from other components, it renatures with Cot½ value of 283. Suppose under such conditions the E. coli DNA reassociates with Cot½ value of 4.0, the complexity of slow component comes out to be 3.0 × 108 bp. Similarly, complexity of intermediate and fast components comes out to be 6.0 × 105 and 340, respectively. Thus faster the component reassociates, lower is its complexity. Using the equations discussed earlier, repetition frequency of intermediate and fast components is found to be 1,350 and 500,000, respectively. One classification for DNA found in eukaryotic cells relates to the number of copies of a certain sequence that exist in a cell. (This is when that sequence exists at a rate that substantially exceeds what would be expected by random chance.) The DNA is classified as non-repetitive, unique or single-copy DNA if there are 1-10 copies per genome; moderately repetitive DNA if 10-100,000 copies; highly repetitive if more than 100,000 copies. Sequence distribution for some selected species is given in Table 5.3.

Non-repetitive DNA Non-repetitive sequences are found once or a few times in the genome. Many of the sequences which encode functional genes fall into this class. These sequences contain but are not exclusively composed of protein coding sequences. A length of DNA with no repetitive nucleotide sequences is termed as unique, non-repetitive, or single-copy DNA. The complexity of slow component corresponds with its physical size. Suppose that the genome reassociation in the above example has a haploid DNA content of 7.0 × 108 bp as determined by chemical analysis. Then 45 per cent of this, i.e., 3.15 × 108 bp, which is only marginally greater than the value of 3.0 × 108 bp measured by kinetic reassociation. The complexity of the slow component is same whether measured chemically or kinetically. Thus these two values are referred to as chemical complexity and kinetic complexity. The coincidence of these values means that the slow component comprises of sequences that are unique in the genome. On denaturation, each single-strand sequence is

Essentials of Molecular Genetics

5.12

Table 5.3 Sequence distribution for selected species (Reproduced, with permission, from http://www.ndsu.edu/ pubweb/~mcclean/plsc431/eukarychrom/eukaryo3.htm © 1997 Phillip McClean) Species Bacteria Mouse

Human

Cotton

Corn

Wheat

Arabidopsis

Sequence distribution 99.7% single-copy 60% single-copy 25% middle repetitive 10% highly repetitive 70% single-copy 13% middle repetitive 8% highly repetitive 61% single-copy 27% middle repetitive 8% highly repetitive 30% single-copy 40% middle repetitive 20% highly repetitive 10% single-copy 83% middle repetitive 4% highly repetitive 55% single-copy 27% middle repetitive 10% highly repetitive

able to renature only with the corresponding complementary sequence. This part of the genome is almost the sole component of the prokaryotic DNA and is usually a major component in eukaryotes. We can use the kinetic complexity of non-repetitive DNA to estimate the complexity of the genome. This is done by reversing the earlier calculations. The complexity of non-repetitive DNA is 3.0 × 108 bp. If this fraction is unique and represents 45 per cent of the genome, the whole genome should have a size 3.0 × 108 ÷ 0.45, i.e., 6.6 × 108 bp. This provides an independent assessment of genome size that can be compared with the chemical complexity, which in this case is 7.0 × 108. Figure 5.5 shows relationship between genome size as determined by reassociation kinetics of nonrepetitive DNA and the haploid DNA content as determined by chemical analysis. The agreement between these two parameters demonstrates that non-repetitive component consists of individual sequences present as only one copy per genome. The sole exceptions are some plants that were generated by polyploidization, where there is more than one-copy of every sequence (expressed as P). The presence of non-repetitive DNA implies that large genomes are not generated simply by increasing the number of copies of the same sequences. If this were the case, the large genomes would behave as though polyploids. So there would be no non-repetitive DNA. Most structural genes lie in non-repetitive DNA Mendelian genetics for simple trait implies that there is present only one-copy of each gene in the haploid genome. The genes can be mapped to particular loci and the simple assumption is that each such locus comprises of a DNA sequence representing a single protein. This is classic view of structural gene, which is a unique component of the genome, the only sequence coding for its protein and, therefore, identifiable by mutation that impedes its protein function. The sequence of a unique

Organization of the Genetic Material

5.13

Figure 5.5 Correlation between kinetic complexity and chemical complexity of eukaryotic genomes (indicated by P) (Redrawn from http://www.scribd.com/doc/7233931/Conditions-Favoring-Renaturation)

structural gene, unrelated to any other sequence in the genome should form part of the non-repetitive DNA component. In case where multiple copies of gene that can code for a protein exist in the genome, it would not be possible to identify gene by point mutation. So we must return to direct analysis of DNA of the genome to find number and proportions of unique and repeated sequences. Between these two extremes, a structural gene can be unique in the sense that it indeed is the only sequence coding for its exact protein, other related sequence may not code for this protein to 100 per cent extent. Usually a family of either sort consists of only few members and given the effects of mismatching on reassociation, the relevant genes are likely to appear in non-repetitive DNA. For the purpose of identifying and characterizing structural gene, messenger RNA is the ideal intermediate because we can move in both directions from this molecule either to gene (DNA) or polypeptide. A population of messenger RNAs, manifested as the spectrum of the sequences found in the polygenes defines the entire set of genes expressed in a cell or tissue. Thus constitution of messenger RNA reveals both nature and number of structural genes. Are non-repetitive or repetitive DNA sequence represented in messenger RNA? The genome sequence components represented in messenger RNA can be determined by using messenger RNA as a tracer in reassociation experiment. A very small amount of radioactively labeled RNA is included together with a much larger amount of cellular DNA. The reassociation of the tracer mRNA is followed by the entry of its radioactivity into duplex form. With a population of mRNA, a typical result resembles the one shown in Figure 5.6. A small portion of RNA, generally 10 per cent or less, hybridizes with a Cot½ corresponding to moderately repetitive sequences. The major component hybridizes with a Cot½ identical with or very close to that of non-repetitive DNA. Usually this represents upto 50 per cent of RNA. Most of the material that does not hybridize probably represents non-repetitive sequences. What is the relationship between the messenger RNA sequences that hybridize with non-repetitive DNA and those that hybridize with repetitive DNA? They can be

5.14

Essentials of Molecular Genetics

separated into different classes by retrieving the RNA that hybridizes first. This shows that independent molecules are involved; one class represents genes that lie in non-repetitive DNA and other corresponds to genes that lie in repetitive DNA. How many non-repetitive genes are expressed? If RNA is added to denatured DNA, RNA/DNA hybrids will be formed. This is called DNA/RNA hybridization. Renaturation would depend on copy number of the segment present in a unit DNA solution. More is the copy number more are the chances of collision and faster is the renaturation. The number of non-repetitive DNA sequences represented in RNA can be determined directly in terms of the proportion of DNA able to hybridize with RNA when a small amount of single-stranded DNA is hybridized with large amount of RNA. All the sequences in the DNA that are complementary to the RNA should react to form RNA-DNA hybrid.

Figure 5.6 Hybridization of an mRNA tracer preparation in a reassociation curve shows that most mRNA sequences are derived from non-repetitive DNA, the remainder from moderately repetitive DNA, and none from highly repetitive DNA (Redrawn from http://202.116.45.236: 8080/jybdtk/ppt/Gene7-03.files/frame.htm)

In this type of saturation experiments, excess of RNA should be such as to pair with the available DNA sequences that are complementary to it. Because this reaction is RNA driven, the controlling parameter is Rot. The per cent of hybridizing DNA is plotted against R ot value. The curve thus obtained is shown in Figure 5.7. Reaction is completed at Rot 300 but reaction is extended to make sure that plateau has been reached. At saturation, 1.35 per cent of the available non-repetitive DNA is hybriddized. Only one strand of DNA acts as a template in transcription, the other strand being its complementary. Thus, 1.35 × 2 = 2.7 per cent of the total sequences of non-repetitive DNA are represented as mRNA. Therefore, very small proportion of the genome of the order of 1-2 per cent is represented in the mRNA form.

Repetitive DNA In human genome, 15 per cent of DNA is repetitive. One-third is large piece repeats (10,000-300,000 bp) often copied from one chromosome location; 3 per cent is single DNA short sequence repeats of -GTTAC-. Repeats sequences vary in size between 5-500 nucleotides. Most repeats are located at teleomeres and centromeres.

Organization of the Genetic Material

5.15

Figure 5.7 Hybridizing an excess of mRNA with non-repetitive DNA until saturation is reached shows that only a small portion of the DNA is represented in the mRNA

Putative functions of repeated DNA are suggested to be many, including involvement in chromosome pairing, control of gene expression, processing of RNA (excision of introns) and participation in DNA replication. So far none has been established. A small gene family gives rise to 7S RNA, a molecule that was discovered to be an essential component of a particle that mediates the secretion of proteins from cells. Some repetitive DNA will undoubtedly be shown to have a function, in the formal sense. Some will likely be shown to exert important effects; and the remainder will have no function or effect at all and can, therefore, be called selfish DNA, junk DNA, nonsense DNA, innocent DNA or parasitic DNA. Identical and non-identical repetitive DNA sequences. Repetitive DNA sequences may be identical or non-identical. This is how these two types of sequences differ. Identical repeats: Advantage of such repeats is that they can make millions of identical products, such as ribosomes, as are needed for mass production of a protein; primary transcript is processed. For example, identical repeats of rDNA genes are tandem repeats of DNA coding for rRNA. Non-identical repeats: Non-identical DNA sequences are also known as nearly identical DNA sequences. Similar DNA sequences are repeated in this case. For example, non-identical repeats of globin genes which produce  and β polypeptides of hemoglobin, chromosome 16 holds  gene family while chromosome 11 holds β gene family.  and β genes are expressed at different developmental stages. Non-identical repeats include psuedogenes. The repeated DNA is present in two categories – tandemly repeated DNA and interspersed repeated DNA. Tandemly repeated DNA Tandemly repeated DNA (10-15 per cent of mammalian genomes) is made up of rows of many copies of the same sequence. The repeated unit ranges from 1-2,000 bp in length. Often the repeat is less than 10 bp and is referred to as simple-sequence repeated (SSR) DNA or satellite DNA (due to centrifugation "satellite" bands). These may provide special physical properties to some stretches of the chromosome. Centromeres and telomeres are rich in SSR DNA. At a given site, the amount of SSR DNA may vary greatly. In DNA minisatellites, the satellites may vary between 100 and 100,000 bp in

Essentials of Molecular Genetics

5.16

length. DNA fingerprinting is used to distinguish individuals by analyzing microsatellites (repeats of 14 bp) which often differ by 10-100 bp. Tandem repeat is a segment of DNA containing a collection of similar or identical tandemly repeating units. The gene is the largest unit (structural gene + spacer) that is tandemly repeated. Some sequences of satellite DNAs have revealed tandem repeats (Table 5.4). It is often supposed that highly repetitious DNAs arise only as a result of unusual mechanisms or in response to selection pressure. A pattern of tandem repeats seems to be the natural state of DNA whose sequence is not maintained by selection. Periodicities can develop readily from non-repetitious DNA as a result of random mutations and random homology-dependent unequal crossovers. The lengths of these periodicities, and the patterns of subrepeats within them, would fluctuate in evolution, with the probability of a given pattern being dependent on the unknown exact nature of the cross over mechanism. Qualitatively then, unequal cross over provides a reasonable explanation for the production of highly repeated sequences in DNA and for the patterns of periodicity they evince. Table 5.4 Sequenced satellite DNAs Organism

D. virilis

7

Arms of Y chromosome and centric heterochromatin of chromosome 2 only; also distal end of 2L Centric heterochromatin of all chromosomes and tip of 2L Centric heterochromatin

Cancu borealis (marine crab)

2

?

Sequence of one strand AGAGG ATATT ATATAAT AATAACATAG AGAGAAGAAG ACAAACT (I) ATAAACT (II) ACAAATT (III) AT

Pagurus pollicaris (hermit crab)

4

?

ATCC

3

?

CTG

Cavia poriella (guinea pig)

6

Centric heterochromatin

CCCTAA (α)

Dipodomys ordii (kangaroo rat)

10

Centric heterochromatin

ACACAGCGGG CHS-β

Drosophila melanogaster

bp per repeat 5 7 10

Tandem location

Different mechanisms proposed to explain repetitive DNA are saltatory replication (Britten and Kohne 1968) and unequal crossing-over (Smith 1976). According to saltatory replication hypothesis, repetitive DNA is formed by sudden event, i.e., within the short period of time, a sequence is multiplied many times in the genome to form identical tandem repeats. Satellite DNA like sequences can be built up by unequal crossovers from a DNA sequence, which initially does not possess periodicity. Heterogeneity in heterochromatin may also be explained by unequal crossing-over mechanism. In addition, chromosomal rearrangement and hybrid induced changes cause changes in heterochromatin. Interspersed DNA Even though the genomes of higher organisms contain single-copy, middle repetitive and highly repetitive DNA sequences, these sequences are not arranged similarly in all species. In some species repeated DNA is interspersed.

Organization of the Genetic Material

5.17

The family of sequence within families of satellite DNA (clustered repeats at centromeres and telomeres) and families of dispersed repeats too, in any other species is striking. There must be mechanisms that homogenize the sequences against the constant tendency to diverge. Examination of DNA repeats in different species shows that there is a sharp distinction in sequence between satellites of related species and to a somewhat lesser extent in dispersed sequences too. Thus, when a change does occur in repeat families, it runs through very rapidly. Not only does the sequence of repeat families appear to shift abruptly but also so does the number of members in the family. Some of the repeat families seem to explode in number when new species form. One observation that might be taken as evidence of function in repeated sequences is the frequency of transcription of DNA into RNA. Davidson et al. (1983) have shown that in sea urchin the spectrum of repeat families that are transcribed changes during development, a strong argument for some regulatory function. Alan Weiner of Yale University described results from his work on the U repeat family, which indicated the production of incomplete copies of putative functional genes which are then inserted back into the genome as inactive members of the family (Weiner 1988). It may be that many of the repeat families are multiple pseudogenes. Middle repetitive DNA Middle repetitive DNA, also called moderately repetitive DNA, is found from more than 10,000 to 100,000 copies per haploid genome. Examples of these include rRNA and tRNA genes and storage proteins in plants such as corn. Middle repetitive DNA can vary in size from 100-300 bp to 5,000 bp and can be dispersed throughout the genome. Some moderately repetitive DNA sequences correspond to highly conserved multigene families. Middle repetitive sequences are also found as short and long interspersed nuclear elements which are interspersed with unique sequences; these elements are, respectively, known as SINES and LINES. Short interspersed nuclear elements (SINES). In many mammals, a large portion of the intermediately repetitive DNA is composed of many copies of a short sequence dispersed throughout the genome. For example, in human beings, there are about 500,000 copies of a 300 bp sequence and constitutes 5-6 per cent of the total human genome. Because this sequence is cleaved by the restriction endonuclease Alu1; it is called the Alu family. The possibility exists that some of the members of this family are transcribed into 7S RNAs. In some cases, large RNAs contain Alu sequences. Alu may be involved in the numerous origins of DNA replication along eukaryotic chromosomes or it may be involved in creating secondary structure in messenger RNAs. In case of plants, similar types of repeated and dispersed sequences are present in species having large genomes, e.g., maize, rye, wheat. Long interspersed nuclear elements (LINES). Each long interspersed sequence is 5,000-7,000 bp long. Each is flanked on either side by short direct repeats of 255-400 bp. Thus, they resemble proretroviruses. It is possible that retroviruses evolved from them. They are also mobile. Mammalian genome has 20,000-50,000 copies of LINES which are 6-7 kb long and dispersed throughout the genome. LINE-1 or L-1 family. Each species has specific L-1 family on the basis of which it can be characterized. Each LINE contains one or two open reading frames (ORFs). L-1 elements are capable of transposition and make target site duplication. Some families of moderately repetitive sequences are dispersed throughout the genome while others are clustered in tandem fashion at the centers (centromere) and ends (telomeres). In addition, members of functional genes can be formally placed in this group, such as those for rRNA, tRNA, histones, actin, β-globin and immunoglobulins. Approximately 10 3-105 repeats may be found for these sequences. Moderately repetitive sequences be transcriptionally active or inactive, in the form of clustered or dispersed repeats. Transcriptionally active clustered repeats are of four types – rRNA

5.18

Essentials of Molecular Genetics

genes, tRNA genes, histone genes and small nuclear RNA (snRNA) genes. Transcriptionally active dispersed type repeats include reteroposons and reterotransposons. Highly repetitive DNA The most abundant sequences are found in the highly repetitive DNA class. These sequences are found from 100,000 to more than one million times in the genome and can range in size from a few to several hundred bases. These sequences are found in regions of the chromosome such as heterochromatin, centromeres and telomeres and tend to be arranged as a tandem repeats. The following is an example of a tandemly repeated sequence: ATTATA ATTATA ATTATA // ATTATA. Highly repetitive DNA sequences are five to a few hundred-bp long. Highly repetitive sequences are thought to be structural component of chromosomes as they reside mainly at centromere and telomeric positions but their function is not known. Some of the highly repetitive DNA sequences are dispersed while other may be clustered. Tandemly repetitive sequences exist as satellite DNA. Unequal crossing-over is mainly responsible for spreading of this DNA. For dispersed repetitive sequences, integration mechanism (such as in bacterial transposons) is considered to be responsible for evolution of this DNA. Most of the highly repetitive DNA is located in genetically inactive heterochromatic regions of chromosomes. Postulated functions for highly repetitive DNA include (a) structural or organizational role in chromosome, (b) involvement in chromosome pairing during meiosis, (c) involvement in crossing-over or recombination, (d) ―protection‖ of important structural genes, like histone, rRNA, or ribosomal protein genes, (e) a repository of nonessential DNA sequences for use in the future evolution of the species, and f) no function at all — just ―junk‖ DNA that is carried along by the processes of replication and segregation of chromosomes. Highly repetitive DNA generally consists of very short sequences repeated many times in tandem array. Because of its short repeating unit, it is sometimes described as simple sequence repeat (SSR) DNA. This is generally present in higher eukaryotic genome. The tandem repetition of short sequences often creates a fraction with distinctive physical properties that may be used to isolate it. In some cases, the repetitive sequence has a base composition distinct from the genome average, which may allow it to form a separate fraction by virtue of its buoyant density. The buoyant density of a duplex DNA depends on its G-C content. Satellite DNA When a eukaryotic DNA is centrifuged on a density gradient, two types of bands may be distinguished on the basis of optical absorbance (Figure 5.8). Most of the genome forms a continuum of fragments that appear as a rather broad peak centered on the buoyant density corresponding to the average G-C content of the genome. This is called main band. Sometimes an additional smaller peak(s) is/are seen at a different value. This material is called satellite band and the DNA in this band is termed as satellite DNA. Satellite DNA consists of (a) a highly repetitive segment, (b) a segment that is of intermediate repetitiveness, and (c) a segment of unique DNA. They may be either heavier or lighter than the main band. The proportion of DNA found in the form of satellite DNA may vary from 3-5 per cent to as high as 40 per cent in some and 75 per cent, respectively, of total mouse DNA. The highly repetitive DNA is found primarily around the centromeres and the telomeres; it is not known to be transcribed. Satellites are present in cases. Lower limit of 3 per cent is arbitrary, since less than 3 per cent of DNA cannot be easily separated. When actual base composition of a satellite DNA is determined, it is often different from what had been predicted from its buoyant density. When a highly repetitive DNA does not separate as a satellite, on isolation its properties often prove to be similar to those of satellite DNA referred to as cryptic satellite. The centromeric location of satellite DNA suggests that it may have some structural function in the chromosome, since the centromeres are the regions where the kinetochores are formed at mitosis and meiosis for controlling chromosome movement.

Organization of the Genetic Material

5.19

In Arthropodes, satellite DNA appears to be rather homogenous. Usually a single short repeating unit accounts for >10 per cent of the satellite. Drosophila virilis has three major and one cryptic satellites (Krebbs et al. 2011) (Table 5.5). The three major satellites are closely related sequences. A single base substitution is sufficient to generate either satellite II or III from satellite I. The Satellite I sequence is present in other species of Drosophila related to virilis. The main feature of these satellites is that the size of the repeating unit is only 7 bp. Some other species of Drosophila also have short repeating units of size 5, 7, 10 or 12 bp. These sequences represent very long stretches of DNA of very low sequence complexity. Uncertainty in complexity in evaluation of repetitive DNA is covered by the effect of mismatching on the reassociation reaction when any satellite DNA is digested with an enzyme that has a recognition site in its repeating unit in which the site occurs. When DNA of a eukaryotic genome is digested with a restriction enzyme, most of it gives a general view due to random distribution of cleavage sites. But satellite DNA generates sharp bands because a large number of

Figure 5.8 Density centrifugation of embryonic DNA of Drosophila melanogaster, showing a main band and a satellite band (Redrawn, with permission, from Gupta, P.K. 2009. Genetics. Merrut: Rastogi Publ.)

Table 5.5 Predominant sequences of three major and one cryptic satellite DNA in Drosophila virilis Satellite I II III Cryptic

Predominant sequence ACAAACT TGTTTGA ATAAACT TATTTGA ACAAATT TGTTTAA AATATAG TTATATC

Copies per genome 7 1.110

Part of genome (%) 25

6

8

3.610

6

8

-

-

3.610

fragments of identical or almost identical size are created by cleavage at restriction sites that lie a regular distance apart. The crossover fixation model actually predicts that any sequence of DNA that is not under selective pressure will be taken over by a series of identical tandem repeats generated in this way.

REPEATED DNA SEQUENCES AND DISEASES Diseases due to Triplet Repeat Amplification A number of human diseases are caused by having triplet repeat amplification such as in Huntington's disease where 11-34 repeats of CAG in the Huntington's disease gene is normal but ~50 to 100 results in the disease. Many interspersed repeated DNA sequences are transposable elements. Simple repetitive DNA sequences are abundantly interspersed in eukaryote genomes. Simple repeats have also

5.20

Essentials of Molecular Genetics

been identified in some prokaryotic genomes with respect to their influences on the DNA structure, gene expression and genomic (in)stability. Three examples of microsatellites in the human major histocompatibility complex (HLA) investigated by Epplen et al. (1997) are: a (GT) n microsatellite situated 2 kb 5' off the lymphotoxin alpha (LTA) gene, a (GAA)n block in the 5' part of the HLA-F gene and a composite (GT)n(GA)m stretch in the second intron of HLA-DRBl gene.

Expanded DNA Repeats and Human Diseases Nearly 30 hereditary disorders in humans result from an increase in the number of simple repeats in genomic DNA (Mirkin 2007). These DNA repeats seem to be predisposed to such expansion because they have unusual structural features, which disrupt the cellular replication, repair and recombination machineries. The presence of expanded DNA repeats alters gene expression in human cells, leading to disease. Many of these diseases are caused by repeat expansions in the noncoding regions of their resident genes. Peculiar structures of repeat-containing transcripts are at the heart of the pathogenesis of these diseases.

REFERENCES Britten, R.J., and D.E. Kohne, 1968. Repeated sequences in DNA. Science 161: 529-40. Campbell, A.M. 1993. Genome organization in prokaryotes. Curr. Opin. Genet. Dev. 3: 837-44. Davidson, E.H., H.T. Jacobs, and E.J. Britten. 1983. Very short repeats and coordinate induction of genes. Nature 301: 468-70. Doležel, J., J, Bartoš, H, Voglmayr, and J .Greilhuber. 2003. Nuclear DNA content and genome size of trout and human. Cytometry 51A(2): 127-8. Epplen, C., E.J. Santos, W. Mäueler, P. van Helden, and J.T. Epplen. 1997. On simple repetitive DNA sequences and complex diseases. Electrophoresis 18: 1577-85. Gupta, P.K. 2009. Genetics. Merrut; Rastogi Publ. Meselson, M. and Stahl, F.W. 1958. The replication of DNA in Escherichia coli. Proc. Natl. Acad. Sci. USA 44: 671-82. Miller, O.L., and B.R. Beatty. 1969a. Visualization of nucleolar genes. Science 164: 955-7. Miller, O.L., and B.R. Beatty. 1969b. Nucleolar structure and function. In: Handbook of Molecular Cytology. ed. A. Lima-de-Faria, pp. 604-19. Amsterdam and London: North Holland Publishing Co. Mirkin, S.M. 2007. Expandable DNA repeats and human disease. Nature 447: 932-40. Moore, G.P. 1984. The C-value paradox. BioScience 34: 424-9. Parkhill, J, B.W. Wren, K. Mungall, J.M. et al. 2000b. The genome sequence of the food-borne pathogen Campylobacter jejuni reveals hypervariable sequences. Nature 403: 665–8. Parkhill, J., M. Achtman, and K.D. James. 2000a. Complete genome sequence of a serogroup A strain of Neisseria meningitidis Z2491. Nature 404: 502–6. Smith, G.P. 1976. Evolution of repeated DNA sequences by unequal crossover. Science 191: 528-35. Taylor, J.H., P.S. Woods, and W.T. Hughes. 1957. The organization and duplication of chromosomes as revealed by autoradiographic studies using tritium thymidine. Proc. Natl. Acad. Sci. USA 43: 122-8. Twyman, R.M. 1998. Advanced Molecular Biology. Oxford, UK: Bios Scientific Publishers. Weiner, A.M. 1988. Eukaryotic nuclear telomere. Molecular fossil of the RNP world. Cell 52: 155-7.

PROBLEMS 1. 2. 3.

Which kinetic properties of DNA are useful in understanding organization of genetic material? Why do we need to understand structural organization of the genetic material in eukaryotes at molecular level? In terms of the diverse functions that DNA has the potential to perform, do you think C-value paradox problem really exists. Does organization of DNA reflect structural complexity of genetic material of a species?

6 Packaging of Nucleic Acids A general principle is evident in the organization of all cellular genetic material. It exists as a compact mass, occupying a limited volume; and its various activities, such as replication and transcription, must be accomplished within these confines. The organization of this material must accommodate transitions between inactive and active states. The structure of the nucleoprotein complex is determined by the interactions of the proteins with the DNA (or RNA). A common problem is presented by the packaging of DNA into bacteriophages/viruses, bacterial cells, and eukaryotic nuclei, chloroplasts, and mitochondria. The length of the DNA as an extended molecule exceeds the dimensions of the compartment that contains it. The DNA must be compressed exceedingly tightly to fit into the space available. So in contrast with the customary picture of DNA as an extended double helix, structural deformation of DNA to bend or fold it into a more compact form is the rule rather than exception. DNA packaging has been studied in viruses/bacteriophages, prokaryotes (nucleoids and plasmids) and eukaryotes (nuclear, mitochondria and chloroplast chromosomes).

DNA PACKAGING IN VIRUSES/BACTERIOPHAGES Viruses/bacteriophages have a single hereditary unit, i.e., viral chromosome, which has a singlemolecule of nucleic acid. This viral chromosome may be linear or circular and rod-like or spherical. The nucleic acid may be DNA or RNA and single-stranded or double-stranded. DNA containing viruses are known as deoxyviruses and RNA containing viruses are known as riboviruses. Viral genomes are packaged into capsids which are either symmetrical or quasisymmetrical structures assembled from only one or a few types of protein monomers. The capsid is either like a rod or a spherical-appearing polyhedron with icosahedran symmetry. There are two general patterns of viral DNA packaging: (1) The protein shell is assembled around the nucleic acid, condensing the DNA or RNA by protein-nucleic acid interactions during this process. (2) The capsid is first formed into which the nucleic acid is inserted being condensed as it enters the capsid. In case of DNA viruses, the spherical capsids are first assembled from a small set of proteins. Initially the capsid has a protein case, but subsequently the capsid expands in size but retains its shape. Then DNA entry into capsid takes place. Attached to the capsid, or incorporated into it, are other structures, assembled from distinct proteins, and necessary for infection of the host cell. The virus particle is tightly constructed. The internal volume of the capsid is rarely much greater than the volume of the nucleic acid it must hold. From the molecular weight of a double-stranded (duplex) viral DNA we can calculate its contour length (its helix length), given that each nucleotide pair has an average molecular weight of about 650 Da and a duplex length of 3.4Å. Many viral DNAs are circular during at least part of their life cycle. During viral replication within a host cell, specific types of viral DNA called replicative forms may appear. For example many linear DNAs become circular and all single-stranded DNAs become double-stranded. The genomes of DNA

Essentials of Molecular Genetics

6.2

viruses vary greatly in size which ranges from 3.5-267 kb and contain 2-250 genes. For example, Tobacco mosaic virus (TMV) is a single-stranded 3.5-kb long RNA virus having 4 genes, SV40 is a circular double-stranded DNA virus of the same length and has 5 genes; λ phage (50 kb), T-even phages (165 kb) and smallpox virus (267 kb) have linear double-stranded DNA, which have about 250 genes each. A typical medium-sized DNA virus is bacteriophage λ (lambda), which infects Escherichia coli. In its replicative form inside cells, λ DNA is a circular double helix. This dsDNA contains 48,502 bp and has a contour length of 17.5 µm. Bacteriophage X174 is a much smaller DNA virus; the DNA in X174 viral particle is a single-stranded circle, and its double-stranded replicative form contains 5,386 bp. The contour lengths of viral DNAs are much greater than the long dimensions of the viral particles that contain them. The size of DNA and length of some viral particles and bacteriophages are given in Table 6.1. Table 6.1 The sizes of DNA and viral particles for some bacteriophages Virus X174 T3 λ (lambda) T4

Length of viral DNA (nm) 1,939 14,377 17,460 60,800

Length of viral particle (nm) 25 78 190 210

T4 Phage DNA Packaging Hsiao and Black (1977) and Black (1981) elucidated the role of capsid structural proteins in the packaging process in T4 phage. DNA translocation, which generally accompanies DNA packaging, results from a major irreversible change in the assembled major capsid protein of the prohead. Pathway of phage T4 head assembly is shown in Figure 6.1(A-B). A number of models proposed that energy stored in the assembled unexpanded shell might drive packaging. However, it has been demonstrated that both T4 and T3 expanded proheads can be packaged in vitro (Rao and Black 1985; Shibata et al. 1987). It is now generally accepted that prohead expansion does not have an essential energetic function in DNA packaging. The proteins with essential roles in DNA translocation apparently can be as few as the terminase proteins, the prohead portal protein, and the expanded major capsid protein. The terminases are ATPbinding proteins that assemble to from multimeric packaging–protein complexes. In addition to cutting DNA at pac or cos sites, these proteins perform necessary activities throughout DNA translocation in vivo and in vitro (Black 1988). The prohead is not a passive DNA receptacle. The assembled T4 portal vertex protein can undergo structural change that initiates DNA packaging either by temperature shift activation or by specific terminase mutations (Hsiao and Black 1977). The expansion and transformation of the major capsid protein probably reorganizes this protein for a packaging role (Black 1988). Mutations in the major capsid protein gene can block packaging in morphologically normal proheads (Katsura 1986). The T4 genome is a concatamer; its insertion begins at a random point in the concatameric precursor. Phage T4 is the most thoroughly studied member of the group characterized by apparently randomly permuted DNAs (Black 1988). Phage T4 packages non-T4 DNAs in vitro and in vivo (Black 1981; Black 1986). The initiating cut in T4 DNA is in part regulated by interaction of the terminase proteins with the prohead portal protein (Hsiao and Black 1977). The insertion continues till the whole genome, actually a slightly greater amount, is inserted into the head.

Packaging of Nucleic Acids

6.3

Figure 6.1 Pathway of T4 head assembly. (A) Order of function of phage T4 gene products in bacteriophage T4 assembly. Order of function of gene products is inferred from temperature shift-up and shift-down experiments with cs20-ts double mutants. (B) Structural development of the phage T4 head as inferred from previous studies. Parts A and B are linked by the knowledge that cs20 is blocked after protein cleavage but before DNA packaging (Modified, with permission, from Hsiao, C.L. and L.W. Black. 1977. Proc. Natl. Acad. Sci. USA 74: 3652-6)

DNA PACKAGING IN BACTERIA Bacterial Nucleoid In bacteria, genome is simple and consists of a long DNA molecule. Escherichia coli has around 3,000 genes on a single double-stranded circular molecule. Once thought to be "naked" DNA when compared to the well-packaged mitotic chromosomes of eukaryotes, simple basic amino acids are found to be associated with the bacterial chromosome. The bacterial chromosome is a circular DNA molecule called a nucleoid; it lacks a membrane enclosure. Supercoiling and associated basic proteins influence the folding of the DNA molecule. It forms looping domains. Bacterial chromosome assumes its shape due to supercoiling of DNA molecule. DNA gyrase is an enzyme that introduces superhelical turns into bacterial DNA (Gellert et al. 1976). The chromosome of E. coli consists of 4,639,221 bp and has a contour length of about 1.7nm, some 850 times the length of the E. coli cell. A single E. coli cell contains almost 100 times as much DNA as a bacteriophage λ particle. Bacterial DNA must have an even more tightly compacted tertiary structure than viral DNA. When E. coli cells are lysed, fibers are released in the form of loops attached to the broken envelope of the cell. The DNA of these loops is not found in the extended form of a free duplex, but is compacted by association with proteins. Bacterial chromosomes are in a highly folded and coiled state; such a chromosome is known as folded genome. The bacterial nucleoid consists of hundreds of DNA loops, shaped in part by non-specific DNA bridging proteins such as histone-like nucleoid structuring protein (H-NS), leucine-responsive regulatory protein (Lrp), and structural maintenance of chromosomes (SMC) proteins. Dame et al. (2006) showed that H-NS protein is dynamically organized between two DNA molecules in register

6.4

Essentials of Molecular Genetics

Figure 6.2 Structural models of H-NS and H-NS-DNA2 complexes

with their helical pitch. They have proposed a structural model of H-NS and H-NS-DNA2 complexes (Figure 6.2). Dimeric structure of H-NS protein is hypothesized. H-NS-DNA interactions are shown. H-NS interacts with DNA duplexes as a dimer in either a parallel or antiparellel configuration.

Supercoiling in Bacterial Chromosome A folded genome of E. coli has about 100 loops, folds or domains; these folds are stabilized by RNA molecules (Figure 6.3A-B). Each fold in the chromosome is independently supercoiled. The formation of loops and of supercoils within the loops packs the >1,000µ long E. coli chromosome into a structure of 1-2µ diameter. This structure is Feulgen-positive, and is seen as a mass of transversely and tangentially cut thin fibrils in electron micrographs; this fibrillar region of prokaryotes is called nucleoid. The folded genome is stabilized by RNA and proteins associated with it. Genomes of all prokaryotes, including blue green algae, are packaged in this manner. Supercoils are produced in vitro by the enzyme DNA gyrase which introduces two negative supercoils at a time in a DNA molecule. In simple words, DNA gyrase cuts both the strands of DNA molecule that is folded upon itself passes the intact segment of the molecule through this cut and finally seal the cut ends of the two strands in their original position. DNA gyrase holds on both the cut ends of each of the two strands of the DNA molecule in order to reunite them without any error. The folding of the DNA also does not interfere with DNA replication and transcription; on the contrary, the DNA packing is such that any bacterial gene is accessible for transcription at any given time. Folded chromosomes of bacteria contain about 30 per cent RNA and about 10 per cent protein by weight (Pettijohn et al. 1973). Chromosomes of all the bacteria studied, of Drosophila and the replicative forms of X174 contain negatively supercoiled regions. The available evidence indicates that negative supercoiling is involved in recombination, gene expression, regulation of gene action and DNA replication. This is possibly because negative supercoiling may aid in unwinding of the two strands of a DNA molecule. DNA gyrase uses energy from ATP hydrolysis.

Packaging of Nucleic Acids

6.5

Figure 6.3 (A) Supercoiling in a bacterial chromosome (Modified, with permission, from Gardner, E.J., M.J. Simmon, and D.P. Snustad. 2005. Principles of Genetics. New Delhi: Wiley India Ltd.). (B) Plectonic and toroidal supercoiling (Modified, with permission, from Willenbrock, H., and D.W. Ussery. 2004. Genome Biol. 5: 252. doi:10.1186/gb-2004-5-12-252 © 2004 BioMed Central Ltd.)

Histone-like Proteins of Bacteria All higher organisms contain small, basic, abundant, DNA-binding proteins called histones. Bacteria contain proteins, termed histone-like, that share some properties with eukaryotic histones. The bacterial histone-like proteins have not been shown to interact as a unit with DNA to from complexes analogous to nucleosomes, and it has been difficult to develop a definition that applies to all of the proteins considered by various authors to be histone-like. The many similarities between the protein called HU and eukaryotic histones first led to the idea that bacteria contain histone-like proteins (Rouviere-Yaniv and Gros 1975). Several DNA-binding proteins isolated from E. coli are HU, H, IHF, H1, HLP1, FirA and P. Many of these proteins resemble the eukaryotic histones. HU is abundant DNA-binding protein capable of wrapping DNA, and its primary structure is highly conserved among bacterial species. The amino acid composition of HU resembles that of the eukaryotic histone H2B. Model proposed by Drlica and Rouviere-Yaniv (1987) for DNA wrapping by HU is given in Figure 6.4. Accordingly, DNA is wrapped around tetramers of HU. Each turn consists of 38-39 bp, and each linker is about 19-bp long. Thus adjacent tetramers are inverted relative to each other. Protein H is the most histone-like; it has an amino acid composition similar to eukaryotic histone H2A. IHF is similar to protein HU, while protein P is similar to protamines that bind to DNA in the sperms of some animal species. HLP1 is small basic Figure 6.4 Model for DNA wrapping protein and binds to DNA. RNA is also associated with the by histone-like protein HU bacterial nucleoids. However, the role of these proteins, as well as

6.6

Essentials of Molecular Genetics

that of the RNA associated with the nucleoids, in the formation of the nucleoid structure is not clearly understood. Another small, abundant protein appears capable of compacting DNA. This protein H1 binds very strongly to DNA (Laine et al. 1984) and may be able to compact it (Spassky et al. 1984).

Condensins in Bacteria Condensins are conserved proteins containing structural maintenance of chromosome (SMC) molecules that organize and compact chromosomes. Case et al. (2004) reported that MukBEF, the condensin in E. coli, cooperatively compacts a single DNA molecule into a filament with an ordered, repetitive structure in an adenosine triphosphate (ATP) binding-dependent manner. They suggested a model for how MukBEF organized the bacterial chromosome in vivo. MukBEF complexes are depicted as dumbbell shaped molecules with a central hinge. DNA is shown as a thin line, and the two ATPs between two adjacent heads as a circle. Formation, extension and recondensation of the condensed filament are now understood. In an ATP-dependent manner, free MukBEF molecules with various hinge angles cooperatively polymerizes along the length of DNA, trapping and supercoiling variable lengths of DNA. The binding propagates along the length of DNA, creating a compact, microheterogeneous, and repetitive structure. Intermolecular head-head binding provides forceresistant interactions, whereas intramolecular head-head binding traps DNA in a force-sensitive manner. When the filament is extended, the force-sensitive intramolecular interactions are sequentially broken from one end, releasing variable lengths of sequestered DNA. When the force is lowered, recondensation begins with a random nucleation event that cooperatively propagates along the extended filament by reforming the trapped DNA supercoils. Nucleation can occur at different positions along the extended filament which results in multiple recondensation pathways.

Bacterial Plasmids In addition to the very large, circular DNA chromosome in the nucleoid, many bacteria contain one or more small, extrachromosomal, covalently closed, circular and independently replicating DNA molecules called plasmids. Plasmids were discovered by Hayes (1952) and so named by Lederberg (1952). DNA packaging is also there in case of plasmids. Plasmids are capable of independent transmission. A few classes of plasmid DNAs are sometimes temporarily incorporated into the chromosomal DNA. Most plasmids are only a few thousand base pairs long, but some contain over 10 5 bp. In many cases plasmids confer no obvious advantage on their host, and their sole function appears to be self-propagation. However, some plasmids carry genes that make a host bacterium resistant to antibacterial agents. Plasmid DNA is a double-stranded closed circular ring like structure. Supercoiling is present in plasmid DNA. Plasmid DNA has plectonemic coiling (twisted thread). This term can be applied to any structure in which strands are intertwined in some simple and regular way, and it well describes the general structure of supercoiled DNA in solution.

DNA PACKAGING IN NUCLEUS OF EUKARYOTIC CELLS Diploids have two complete sets of chromosomes, except for the sex chromosomes in one of the sexes. Eukaryotes also have many more genes, eleven times as many. Number of base pairs in a eukaryotic chromosome is very large. For example, human chromosome 1 is 200 million bp long. The genes are packaged onto more than one chromosome, 46 in the case of humans. The uncompacted length of DNA of each E. coli is 1mm; humans have 2 meters of DNA in each cell. How do you get 2 meters of DNA into a cell that is typically only 6 μm wide and 2 μm deep? The same way you get a mile of

Packaging of Nucleic Acids

6.7

fishing string on a real. You wind it around something. The chromosomes that you are familiar with from textbook photos and diagrams are actually mitotic or meiotic chromosomes. In eukaryotes, chromosomes are diffused throughout the nucleus during interphase. These appear in form of darkly stained bodies during cell division. These bodies were observed for the first time by Nägeli (1842).

Chemical Composition of Chromosomes Chemical components of chromosomes are DNA, RNA, histone proteins, non-histone proteins, lipids, and calcium, Magnesium and Ferrous ions. The relative proportions of individual proteins and RNA (taken DNA as 1.0) in interphase chromatin of some of the mammals are given in Table 6.2. DNA to histone ratio is fairly constant in different mammals as compared to other ratios. This reveals that histone has a fundamental role in chromosome structure. On the average, ratio of DNA to histone is 1:1 but varies among species and according to the state of cell development as well as metabolism. DNA-histone complex may comprise as high as 96 per cent of a chromosome or as low as 60 per cent of it (Bhatia and Dhand 2002). DNA double helix polynucleotide chain is very long but it lies packed in only a few micrometer-long chromosome. Thus a single human chromosome may have as much as 1 cm of linear DNA. The total length of DNA in the diploid human genome is 1.74m. With some 10 14 cells present in human adult, the total DNA present in human body would be 1.74 × 1014 m or 1.74 × 1011 km. The average distance between earth and sun is 1.496 × 108 km – a dramatic illustration of the extraordinary degree of DNA compaction in our cells. Table 6.2 Relative proportion of the individual protein (histones and non-histones) and ribonucleic acid (RNA) in different cells, taken DNA as 1 Material Calf thymus Rat liver HeLa cells

Histones 1.14 1.00 1.02

Non-histones 0.33 0.67 0.71

RNA 0.007 0.003 0.009

Histone Proteins Discovered in 1884, histone are relatively small proteins with a preponderance of the basic, positively charged amino acids lysine and arginine, which makes sense because DNA is an acid. The histones‟ strong positive charge enables them to bind to and neutralize the negatively charged DNA throughout the chromatin. Histones make up half of all chromatin protein by weight. All five types of histones – H1, H2A, H2B, H3 and H4 – appear throughout the chromatin of nearly all diploid eukaryotic cells. H1 histone is called linker histone as it is found on the linker DNA – the DNA that connects two adjacent nucleosomes. Two copies of each of the four histones, H2A, H2B, H3 and H4 form an octamer, which serves as a core particle in the formation of the nucleoosme. Histones are rich in basic amino acids. H1 is lysine-rich. Recent evidence reveals an unexpected role for the linker histone H1.2 in DNA damage-induced apoptosis. DNA double-strand breaks induce translocation of nuclear H1.2 to the cytoplasm, where it promotes release of cytochrome C from mitochondria by activating the Bcl-2 family protein, Bak (Gillespie and Vousden 2003). H2A and H2B histones are slightly lysine-rich. Littau et al. (1965) reported that the dense, inactive chromatin of thymus lymphocyte nuclei is formed by lysine-rich histone cross-linking the chromatin fibrils. Arginine-rich histone is bound to the chromatin fibrils but does not cross-link them; H3 and H4 histones are arginine-rich. Methylation of histone H3 lysine 9 creates a binding site for hetero-

Essentials of Molecular Genetics

6.8

chromatin protein 1 (HP1) (Lachner et al. 2001). It contributes to higher order chromatin organization by propagating heterochromatin-specific affinity in the histone H3 amino-terminus. This histone in combination with other proteins is able to compact the 11nm fiber into higher order structures such as the 30nm fiber. H3 is responsible for the stable inheritance of the heterochromatic state. Bannister et al. (2001) gave a model suggesting that methylated lysine 9 on H3 is responsible for the stable inheritance of the heterochromatic state. In the genome, thousand of nucleosomes are organized on a continuous DNA helix in linear strings separated by 10-60 bp of linker DNA (Figure 6.5). Thus, the lowest functional unit of chromatin might actually be considered the “nucleosomal array” (Horn and Peterson 2002).

Figure 6.5 Two adjacent nucleosomes connected by linker DNA (Redrawn from http://www.sivabio.50webs. com/nucleus.htm; http://course1.winona.edu/sberg/308s10/Lec-note/DNA&Chromos.htm )

Histone synthesis occurs during S-phase of the cell cycle, when chromosome replication requires more histones for the packaging of newly made DNA. Molecular weight and composition of five types of histones present in eukaryotic chromosome are given in Table 6.3. Excess histones are degraded. Rad53 and Mec1 are protein kinases required for DNA replication, and recovery from DNA damage in S. cerevisiae. Gunjan and Verreault (2003) showed that rad53, but not mec1 mutants, are extremely sensitive to histone overexpression, as Rad 53 is required for degradation of excess histones. Table 6.3 Some molecular properties of histones (Reproduced , with permission, from Kucheria, K. and G. Sanyal, The Structure of Chromatin, in: Talwar, G. P. and L. M. Srivastava, eds., Textbook of Biochemistry and Human rd Biology, 3 edition, p.674, New Delhi: PHI Learning Private Limited (formerly Prentice-Hall of India Private Limited) Histone type

Molecular weight

H1 H2A H2B H3 H4

21,500 14,000 13,800 15,300 11,300

Number of amino acid residues 215 129 125 135 102

N-terminal amino acid

C-terminal amino acid

Ac-Ser Ac-Ser Pro Ala Ac-Ser

Lys Lys Lys Ala Gly

Basic amino acids (Mol. per cent) Lysine Arginine 25 3 11 9 16 6 10 13 11 14

Lysine/ Arginine ratio 8.3 1.2 2.7 0.8 0.8

Packaging of Nucleic Acids

6.9

Consequently, excess histones accumulate in rad53 mutants, resulting in slow growth, DNA damage sensitivity, and chromosome loss phenotypes that are significantly suppressed by reduction in histone gene dosage. Rad53 monitors excess histones by associating with them in a dynamic complex that is modulated by its kinase activity. These results argue that Rad53 contributes to genome stability independently of Mec1 by preventing the damaging effects of excess histones both during normal cell cycle progression and in response to DNA damage. Regulation of histone protein levels by Rad53, as reported by Gunjan and Verreault (2003), is shown in Figure 6.6. During early or mid-S phase, the rates of histone and DNA synthesis are balanced such that histones are not present in excess of their requirement for chromatin assembly. However, the drastic reduction in rate of DNA synthesis that occurs during late S/G2 phase or upon DNA damage during S phase results in saturation of histone chaperones and the appearance of excess histones. Rad53 detects excess histones and targets them for degradation, thereby preventing non-specific binding of these free histones to chromatin, which interferes with many processes that require access to genetic information.

Figure 6.6 Regulation of histone protein levels by Rad53

Replacement of sperm chromosomal proteins by maternally provided proteins. In sexually reproducing animals, a crucial step in zygote formation is the decondensation of the fertilizing sperm nucleus into a DNA replication-competent male pronucleus. Genome-wide nucleosome assembly on paternal DNA implies the replacement of sperm chromosomal proteins such as protamines, by

6.10

Essentials of Molecular Genetics

maternally provided histones. This fundamental process is specifically sesame (ssm), a unique Drosophila maternal effect mutant that prevents male pronucleus formation. Loppin et al. (2005) showed that ssm is a point mutation in the Hira gene, thus demonstrating that histone chaperone protein HIRA is required for nucleosome assembly during sperm nucleus decondensation. In vertebrates, HIRA has been shown to be critical for nucleosome assembly pathway independent of DNA synthesis that specifically involves the H3.3 histone variant. They also showed that nucleosomes containing H3.3, and not H3, were specifically assembled in paternal Drosophila chromatin before the first round of DNA replication. The exclusive marking of paternal chromosomes with H3.3 represents a primary epigenetic distinction between parental genomes in the zygote, and underlines an important consequence of the critical and highly specialized function of HIRA at fertilization. Chaperone of Histone H3/H4. Anti-silencing function 1 (Asf1) is a highly conserved chaperone of histones H3/H4 that assembles or disassembles chromatin during transcription, replication, and repair. The structure of the globular domain of Asf1 bound to H3/H4 heterodimer determined by X-ray crystallography to a resolution of 1.7Å showed how Asf1 bound the H3/H4 heterodimer, enveloping the C terminus of histone H3 and physically blocking formation of H3/H4 heterotetramer (English et al. 2006). Unexpectedly, the C terminus of histone H4 that formed a mini-β sheet with histone H2A in the nucleosome undergoes a major conformational change upon binding to Asf1 and adds a β strand to the Asf1 β sheet sandwich. Interactions with both H3 and H4 were required for Asf1 histone chaperone function in vivo and in vitro. The Asf1-H3/H4 structure suggested a “strand-capture” mechanism whereby disassembly/assembly might be used ubiquitously by histone chaperones. Prokaryotic origin of histones H2A and H4. Histones have been identified recently in many prokaryotes. These histones, unlike their eukaryotic homologs, are of a single uniform type that is thought to resemble the archetypal ancestor of the eukaryotic histone family. The histone amino acid sequence of hyperthermophile prokaryote Methanopyrus kandleri has a novel structure consisting of two tandemly repeated histone fold motifs in a single polypeptide. Sequence analyses conducted by Slesarev et al. (1998) indicated that the N-terminal repeat is most closely related to eukaryotic H2A and H4 histones, whereas the C-terminal repeat resembles that found in prokaryotic histones. These results imply an early divergence within the histone gene family prior to the emergence of eukaryotes and may represent an evolutionary step leading to eukaryotic histones.

Non-histone Proteins The order of the nucleosomes in the eukaryotic chromosome is thought to be controlled by non-histone DNA binding proteins. The chromatin of a diploid genome contains 200-2,000,000 molecules of each kind of non-histone protein. Not surprisingly, this large variety of proteins fulfills many different functions, only a few of which have been defined to date. Some non-histone proteins play a purely structural role, helping to package DNA into structures distinct from the histone-containing nucleosomes. The proteins that form the structural backbone, or scaffold, of the chromosome fall in this category. Other non-histone proteins, such as DNA polymerase, are active in replication. Still others are active in chromosome segregation; for example, the motor proteins of kinetochores help move chromosomes along the spindle apparatus and thus expedite the transport of chromosomes from parent to daughter cells during mitosis and meiosis. The largest class of non-histone proteins is the one that regulates transcription and RNA processing during gene expression. Mammals carry 5,000-10,000 different proteins of this kind. Proteins with molecular weights below, 30,000, called high mobility group proteins (HMG proteins), occur in large quantities in the nucleus (about 10 6 molecules per cell). These proteins belonging to four major species (HMG1, HMG2, HMG14, and HMG17) seem to associate themselves with the nucleosome (Goodwin et al. 1977).

Packaging of Nucleic Acids

6.11

Rules of Chromosome Packaging Regardless of their origins, functions, and base compositions, all DNAs are scriptures written following the same grammatical rule (Ohno 1990). At the level of syllables, two, CG and TA are seldom used, while three, TG, CT and CA, are utilized with abundance. Accordingly, at the level of three-letter words, two complementary base trimers, CTG and CAG, invariably enjoy frequent usage and thus serve as the equivalent of word „the‟. Inasmuch as two of the three frequently used syllables, TG and CA are complementary to each other, while two seldom used syllables, CG and TA, are both palindromes, two complementary strands of DNA are inherently symmetrical with each other. Consequently, palindromic sequences as favorite targets of DNA-binding proteins occur at unsuspectedly high frequencies, if they contain TG and CA or CTG and CAG. Nevertheless, there are grammatical rules operating among these high frequency palindromes as well; e.g., the palindromic tetramer TGCA occurs nearly two times more often than its reciprocal; CATG. Thus, DNA-binding proteins are provided with a wealth of abundant targets whose densities are influenced by a regional difference in GC/AT ratios to variable degrees. One palindromic heptamer CAGNCTG is an ideal target of one DNA-binding protein engaged in chromosome packaging and in generation of banding patterns. This heptamer occurs once every 1,000 bases in moderately GC-rich sequences, while its incidence is reduced to once every 3,000 bases in extremely AT-rich sequences. The above must be the very reason that a solitary human X-chromosome DNA coated with mouse DNA-binding proteins in mouse-man somatic hybrids still maintains the original banding pattern and that the inactive X remains inactive, while the active X remains active.

Three Levels of Packaging To achieve the overall packing ratio, DNA is not packaged directly into final structure of chromatin. Instead, it contains several hierarchies of organization. Three levels of DNA packaging in eukaryotic chromosome are nucleosome, solenoid structure and scaffolding protein complex. The DNA is wound around histones to form a nucleosome (Figure 6.7 and Figure 6.8). From 2 nm of the double helix to 11 nm in thickness, H1 acts as a linker; it links together nucleosome beads and then influences them to form a solenoid-like structure (Figure 6.9A-B). This makes a 30-nm thick structure. There is additional

Figure 6.7 “Beads-on-a-string” model (Redrawn from http://www.biology.ewu.edu/aHerr/Genetics/Bio310/ Media/ch9jpegs/9_24.JPG)

6.12

Essentials of Molecular Genetics

Figure 6.8 Nucleosome structure (Redrawn from http://www.biology.ewu.edu/aHerr/Genetics/Bio310/Media/ ch9jpegs/9_25T.JPG)

Figure 6.9 Solenoid formation by H1 polymers leads to 300Å structure. (A) H1 histone links two nucleosomes. (B) One turn of solenoid (Redrawn from www.web-books.com/MoBio/Free/Ch3D4.htm)

structure and packaging that is not yet understood. There is some evidence for a central axis scaffolding protein complex, which is thought to be required for the amount of packaging found in the highly compact chromosomes like mitotic chromosomes. Packing ratio is defined as the length of DNA divided by the length into which it is packaged. For example, the shortest human chromosome contains 4.6 × 107 bp of DNA (about 10 times the genome size of E. coli). This is equivalent to 14,000 µm of extended DNA. In its most condensed state during mitosis, the chromosome is about 2 µm long. This gives a packing ratio of 7,000 (14,000/2).

Packaging of Nucleic Acids

6.13

Nucleosome The first level of packing is achieved by the winding of DNA around a protein core to produce a "bead-like" structure called nucleosome. This gives a packing ratio of about 6. While studying X-ray diffraction patterns of chromatin, Wilkins and Hamilton (1960) and Luzzati (1963a,b) observed patterns which suggested repeating units in chromatin. Similar repeats were inferred from X-ray diffraction of a mixture of DNA and histones also. Electron microscopy was also used for a study of chromatin structure. Woodcock (1973) and Olins and Olins (1974) reported results of electron microscopy of chromatin obtained from interphase nuclei lysed in water. Under electron microscope, chromatin fiber appeared as arrays of spherical particles, about 10 nm in diameter, connected by filaments, which were about 2 nm in diameter. These particles were called υ or nu bodies. A purified human protein, chromatin assembly factor 1 (CAF1), has been shown to have the ability to wrap newly synthesized DNA around histones forming the first level of higher order chromatin structure beyond naked DNA. Recent studies reveal details of the organization of DNA within the nucleosome. The arginine-rich histones are essential to DNA folding (Felsenfeld 1978). Nucleosomes or structures related to them seem to be present at the points of DNA replication and transcription; interactions within and between nucleosomes are likely to play a critical part in these processes. The nucleosome structure is invariant in both the euchromatin and heterochromatin of all chromosomes. The compact protein assembled two histone tetramers one above the other. C-terminal ends form globular structure whereas N-terminal ends form finger-like projections. These end terminal tails are free and project outside for reaction with DNA, allowing the basic residues to interact with their own nucleosomal DNA, linker DNA and DNA and protein sites on adjacent nucleosomes. Two molecules of each of H2A, H2B, H3, and H4 are present in octamer form in chromatin. The histones are a group of positive charged (lysine- and arginine-rich) proteins, which stabilize DNA (negatively charged). Chromatin contains equal amounts of histones H2A, H2B, H3 and H4 and approximately one-half the amount of H1 plus numerous non-histone proteins. Histones provide the basis for the nucleosome, the basic unit of chromatin structure, as seen as "beads-on-a-string" structures on electron micrographs. The nucleosome core is comprised of a histone octamer [(H2A-H2B)2(H3-H4)2]. Kornberg and Thomas (1974) suggested that chromatin is an oligomer of the histones and Kornberg (1974) suggested that chromatin is a repeating unit of histones and DNA. The photochemical crosslinking of DNA in situ in chromatin when blocked over short intervals revealed the lengths of protected regions which occur most frequently in tandem and have base lengths of 160 to 200 bp (Henson et al. 1976). The length of DNA that is associated with the nucleosome unit varies between 154 and 241 bp between species (Kornberg 1977). Chromatin structure is based on a repeating unit of eight histone molecules and about 200-bp DNA. Later on, electron microscopy was also done on chromatin subjected to nuclease digestion, and the nucleosome structure was confirmed. Helical repeat of DNA upon nucleosome formation has periodicity close to 10.0 than to 10.5 (Klug and Travers 1989). Nucleosome is the arrangement of DNA and histone protein forming regular spherical structures in eukaryotic chromatin. Oudet et al. (1975) used term nucleosome for these particles. Nucleosome consituents have a total mass of 262,000 kDa. Interaction of histones and DNA must involve protein complexes tightly bound over very well defined lengths of DNA (Axel et al. 1974a,b). Only 50 per cent of DNA in nucleosome is protected by digestion with Stephlococcus nuclease regardless of concentration of nuclease used. This is called “closed” chromatin. Major effect of chromatin proteins is to restrict transcription by decreasing the number of available sites. Finch and Klug (1976) and Lutter (1978) showed universality of nucleosomes. They found that nucleosomes are 57Å in height and 110Å in diameter. The DNA double helix is wrapped around (~1.7 times) the histone octamer. With nuclease digestion, 146 bp of DNA are tightly associated with the

6.14

Essentials of Molecular Genetics

nucleosome but ~200 bp of DNA in total are associated with the nucleosome. The difference is the linker DNA. Length of linker DNA can vary from 8-114 bp. This variation is species-specific, but variation in linker DNA length has also been associated with the developmental stage of the organism or specific regions of the genome. Nucleosomes are packed to form chromatin fibers and chromosomes. The nucleosome is only the first level of packaging nuclear DNA. The beads-on-a-string fibers are 10-nm in diameter, which can be compacted further into a “30-nm” chromatin fiber (Richmond and Widom 2000; Horn and Peterson 2002). The core particle possesses nearly dyadic symmetry (Morse and Simpson 1988). The protein core is a cylinder 73Å in diameter by 40Å in height; the DNA wraps around this core in a shallow helical path, with one complete turn comprising 80 bp of DNA. DNA bends and kinks its helicase form by its incorporation into the nucleosome. The helical twist of DNA is altered in solution from 10.5 to about 10.0 bp per turn in the nucleosome. Nucleosome structure depends on histone-histone interaction (Henson et al. 1976). Nucleosomes are found only when DNA is in the form of chromatin. Each histone molecule has two parts, uncharged hydrophobic region and charged region having basic amino acids. The hydrophobic parts of these eight molecules are directed towards the center of octamer. The charged ends are directed towards the outside as tails. The basic amino acid residues of the tails develop positive charge at neutral pH. They from ionic bonds with negatively charged phosphate groups of DNA. Some workers believe that H1 is associated with DNA of a nucleosome twice, one at the time of entry and the other at the time of exit (Kornberg and Klug 1981). In chromatin containing H1, the DNA enters and leaves the nucleosome on the same side but in chromatin depleted of H1 the entrance and exit points are much more random and more or less on opposite sides of the nucleosome. H1 stabilizes the nucleosome and is located in the region of the exit and entry points of the DNA (Thoma et al. 1979). Figure 6.10 shows how H1 “seals off” the nucleosome. The 146-bp nucleosome core particle is extended by 10 bp at each end to form a full two-turn particle of 166 bp. Each core histone contains two separate functional domains: a signature “histone-fold” motif sufficient for both histonehistone and histone-DNA contacts within the nucleosome, and NH2-terminal and COOH-terminal “tail” domains that Figure 6.10 How H1 “seals off” nucleosome contain sites for posttranslational modifications (Luger et (Reproduced, with permission, from al. 1997). Thoma, F. et al. 1979. J. Cell Biol. 83: 403-27 Compaction of DNA even into 11nm fibers (beads-on© The Rockefeller University Press) a-string) is associated with gene repression, as genes are not physically accessible for transcription. In order to allow transcription and replication to occur, chromatin has to unfold to expose the template upon which these processes begin. Of the five types of histones found in chromatin, H3 and H4 are among the most conserved proteins known in evolution; they even have identical sequences in species as far distant as cow and pea. This suggests that their functions may be identical in all eukaryotes. The histones H2A and H2B show considerable speciesspecific variation but are recognizable in all eukaryotes. In contrast, H1 consists of a set of rather closely related proteins with overlapping amino acid sequences; they show appreciable variation between species and between different tissues of the same species. Chromatin of SV40 was used by Griffith (1975) to infect African Green monkey cell cultures. Proteins of SV40 were removed by treatment with 1N HCl. Circular DNA pool accumulates in the nucleus of infected cells where histones are present in 1:1 ratio with viral DNA. This 1:1 ratio and equal portion of arginine-rich histones and slightly lysine-rich histones is typical of eukaryotic chromosomes. This in a way was a small chromosome, called minichromosome. Packing ratio of DNA

Packaging of Nucleic Acids

6.15

was observed to be 7:1. Minichromosomes segmented into 100-Å, each segment containing about 200 bp of DNA. This form of packaging resembles eukaryotic chromosome and is consistent with chromatin structure known. DNA is severely bent in the nucleosome structure, and DNA flexibility strongly affects intrinsic histone-DNA affinity. For instance, GC-rich sequences are believed to facilitate nucleosome formation by increasing DNA flexibility, whereas relatively rigid poly-AT sequences disfavor nucleosome assembly. In addition, because DNA bends differently in different directions, AA/TT/TA dinucleotides occur preferentially where the minor groove faces the histone octamer, whereas GC/CC/GG dinucleotides tend to occur where the minor groove points away. In vitro, there is clear evidence that DNA sequences can position nucleosomes both translationally and rotationally. Solenoid One model of additional compaction beyond nucleosomal winding proposes that the 110 Å nucleosomal chromatin supercoils into a 300-Å superhelix called solenoid. Solenoid model for superstructure in chromatin was given by Finch and Klug (1976). Nucleosomes, connected by about 20-60 bp of linker DNA, form a 10-nm beads-on-a-string array, which can be compacted further into a “30-nm” chromatin fiber (Horn and Peterson, 2002). The 30-nm fiber, a second level of chromatin organization, provides an approximately, 100-fold compaction of the DNA. This helical structure has 6 nucleosomes per turn. H1 histone seems to play role in formation of this superhelix. The chromatin fiber (300-Å) condenses into a chromatid of compact metaphase chromosome with the help of nonhistone proteins (Figure 6.11). Two chromatids when join together by a centromere form a complete chromosome. Thoma et al. (1979) suggested that H1 stabilized the nucleosomes and helped in the formation of solenoid structure of 300Å in diameter by making H1 polymers (Figure 6.12). Bak et al. (1977) proposed that human metaphase chromatid has a simple organization based on folding and coiling of a long, regular and hollow cylindrical structure with a diameter of 4,000Å. This super solenoid is formed by the coiling of 300Å solenoid itself composed by coiling the basic string of the nucleosomes. Scaffold structure Adolf et al. (1977) and Handlaczky et al. (1981) proposed a scaffold model of the chromosome structure in human being. Scaffold is a chromosome-shaped body composed of non-histone proteins. Thirty non-histones were found in scaffold. It was suggested that non-histone proteins are also responsible for chromosome structure. This model proposes that certain non-histone proteins, including topoisomerase II, bind to chromatin every 60-100 kb and tether the supercoiled, nucleosome studded 300Å fiber into structural loops. Evidence that non-histone proteins fasten these loops comes from chemical manipulations in which the removal of histones does not cause the chromatin to unfold completely. Other non-histone proteins may, in turn, gather the loops into daisy-like rosettes; and additional non-histone proteins may then compress the rosettes centers into a compact bundle. A range of non-histones thus forms the condensation scaffold. This proposal of looping and gathering is known as the radial loop-scaffold model of compaction (Figure 6.12). The radial loop-scaffold model of chromosome packaging offers a simple explanation of progressive chromosome compaction from interphase to metaphase chromosomes. Okada et al. (1979) demonstrated the presence of rosettes and proposed that these rosettes might be the sub-bands of major bands just like the bands of polytene chromosomes in diptera. The final packaging occurs when the fiber is organized in loops, scaffolds and domains that give a final packing ratio of about 1,000 in interphase chromosomes. DNA and associated proteins are known as chromatin. Some of the protein components have been characterized and are well understood.

6.16

Essentials of Molecular Genetics

Figure 6.11 The organization of DNA within the chromatin structure (Redrawn from http://harunyahya.com/ en/Guncel_Yorumlar/38634/Inimitable_tasks_of_cells)

Other non-histone chromosomal proteins have been studied; many are yet to be discovered (Figure 6.13). Looped (active) domains (50-100 kb) of the 30-nm fiber are attached to the non-histone chromosomal scaffold. The 30-nm chromatin forms a loop-like structure. Mullinger and Johnston (1980) proposed that the assembly of the complex DNA axis of the metaphase chromosome from its extended interphase counterpart plays a major role in increasing the DNA packaging ratio in the mitotic cell.

Packaging of Nucleic Acids

6.17

Figure 6.12 Model of higher level packaging. (A) Radial-loop scaffold model. (B). Additional non-histone proteins may gather several loops together into daisy-like rossette

Chromomeres The chromomeres are the bead-shaped structures along the entire length of a pachytene chromosome at specific regions. These are formed due to local condensation or tight folding of the chromatin. Heterochromatin chromomeres stain darker and are larger than euchromatic chromomeres. Number, size and position of the chromomeres on each chromosome are reasonably constant and can be used as reliable morphological markers of chromosomes. Chromomeres are also formed in polytene chromosomes and lamp brush chromosomes. Chromonemata (plural chromonema) is defined as the spirally coiled central filament of a chromatid along which the chromomeres are aligned. During cell division the chromonema provide for the spiralization of the Figure 6.13 Some constituents of chromatin chromosome. In modern cytology, however, the concept (Redrawn from Rehttp://www.biology.ewu. of the chromonema is less clearly defined. Most edu/aHerr/Genetics/Bio310/Media/ch9jpegs contemporary researchers believe that the chromonema is /9_18.JPG) an elementary deoxyribonucleoprotein (DNP) thread 100200Å in diameter. At interphase, the nucleosome-studded chromatin forms many structural loops, which are anchored together in rosettes in some areas. This initial looping and gathering compresses the genetic

6.18

Essentials of Molecular Genetics

material sufficiently to fit into the nucleus, where it appears as a mass of tangled string. As the chromosomes enter prophase of mitosis, looping and gathering increase, and bonding through protein cross-ties begins. By metaphase, the height of looping, gathering, and bundling achieves a 250-fold compaction of the roughly 40-fold compacted 300-Å fiber, giving rise to highly condensed, rod-like shapes referred to as mitotic chromosome. Eukaryotic genomes are organized into condensed, heterogeneous chromatin fibers throughout much of the cell cycle. Compaction of eukaryotic genomes into condensed chromatin fibers is required to fit over a meter of DNA within the limited volume of the nucleus. Processes ranging from gene expression to chromosome dynamics during cell division are regulated by the folding of DNA into chromatin.

Chromatin Chromatin prepared by a method involving limited nuclease digestion contains the same repeating structure in the nucleus, whereas chromatin prepared by conventional methods involving shear does not (Noll et al. 1975). Finch and Klung (1976) reported that chromatin showed a series of maxima or bands. Chromatin folding determines the accessibility of DNA constituting eukaryotic genomes and Consequently is profoundly important in the mechanism of nuclear processes such as gene regulation. Nucleosome arrays compact to form a 30-nm chromatin fiber to hitherto disputed structure. Three models for the DNA path in the chromatin fiber have been proposed by Dorgio et al. (2004) in which nucleosomes are either arranged linearly in a one-start, higher order helix or a zigzag back and forth in a twostart helix (Figure 6.14). They analyzed compact nucleosome arrays stabilized by introduction of disulfide cross-links and showed that the chromatin fiber comprises two stacks of nucleosomes in accordance with the two-start model.

Euchromatin and Heterochromatin

Figure 6.14 Models for the DNA path in the chromatin fiber. (A) One-start solenoid model. (B) Two-start supercoiled model. (C) Two-start twisted model

An interphase chromatin exists in two forms – heterochromatin (80%) and euchromatin (20%). About 70 per cent of eukaryotic genome is unique (single copy sequences), 10-20 per cent middle repetitive (rDNA, tDNA and 5S DNA) and 1-10 per cent is highly repetitive (satellite DNA). Repetitive sequences are localized in the heterochromatin. The term heterochromatin, coined by Heitz (1928), was used to define chromosomes or chromosome segments which remain condensed or heteropycnotic throughout the interphase. Heterochromatin may occur as a fixed component of the genome (facultative and constitutive) or as a fixed component of the genome (supernumerary segments or chromosomes) or as a fixed but purely germline component of the genome. Its location within the genome may be procentric, interstitial or telomeric. It has long been evident that the amount of heterochromatin is not necessarily constant from individual to individual within a species. Polymorphic variation in heterochromatin content may affect the fitness of the species. Heterochromatin is characterized by hypoacetylation (in all eukaryotes) and by methylation of histone H3 on lysine 9 in higher eukaryotes but not in single-celled eukaryotes such as yeast (Elgin and Grewal 2003). Histone methylated at lysine 9 is bound by heterochromatin protein 1 (HP1), a highly conserved protein that is directly associated with pericentric heterochromatin.

Packaging of Nucleic Acids

6.19

DNA PACKAGING IN EUKARYOTIC ORGANELLES The mitochondria and chloroplasts are the eukaryotic organelles which are known as semi-autonomous organelles. These organelles also contain their own circular DNA molecules. Both organelles have evolved from prokaryotes, by endocytosis into proto-eukaryotic cells. That is why, DNA packing of organelles DNA is somewhat similar to that of bacterial DNA but amount of DNA is much smaller.

Mitochondrial DNA Packaging Mitochondrial DNA (mtDNA) is packed into highly organized structures called mitochondrial (mt) nucleoids. Sasaki et al. (2003) identified and characterized a novel mtDNA packaging protein, termed Glom (a protein inducing agglomeration of mitochondrial chromosome), from highly condensed mt nucleoids of the true slime mold, Physarum polycephalum. This protein binds to the entire mtDNA and packages mtDNA into a highly condensed state in vitro. Glom specifically localized throughout the mt-nucleoid. Glom has a lysine-rich region with proline-rich domain in the N-terminal half and two HMG boxes in C-terminal half. Deletion analysis of Glom revealed that the lysine-rich region was sufficient for the intense mtDNA condensation in vitro. When the recombinant Glom proteins containing the lysine-rich region were expressed in E. coli, the condensed nucleoid structures were observed in E. coli. Such in vivo condensation did not interfere with transcription or replication of E. coli chromosome and the proline-rich domain was essential to keep those genetic activities. The expression of Glom also complemented the E. coli mutant lacking the bacterial histone-like protein HU and the HMG-boxes region of Glom was important for the complementation. Glom is thus a mitochondrial histone-like protein having a property to cause intense DNA condensation without suppressing DNA functions. A set of core nucleoid proteins is found in both native and cross-linked nucleoids, including 13 proteins with known roles in mtDNA transactions (Bogenhagen et al. 2007).

Chloroplast DNA Packaging The structural organization of sugarbeet chloroplast DNA was studied by electron microscopic and biochemical methods (Kiseleva et al. 1989). Levels of chloroplast DNA packing were demonstrated. It was observed that deoxynucleoprotein (DNP) fibrils released from the chloroplasts after a mild osmotic shock have 13-14 nm nucleosome-like globules on their fibrils. The fibrils form rosettes with loops of a contour length of 15-20 μm. Minicircular plasmid-like DNA molecules with nucleosomelike globules were found in the chloroplasts. By means of monoclonal antibodies against the β-subunit of E. coli RNA polymerase and protein A-colloidal gold complexes, the sites of DNP fibrils to which chloroplast RNA polymerase are attached were localized. The highest numbers of RNA polymerase molecules were found to be associated with the relaxed smooth DNA fibrils which are almost devoid of nucleosome-like granules. The compactly packed DNP fibrils with a high number of nucleosomelike granules were free of RNA polymerase molecules. The former regions of the chloroplast nucleoid are presumably the transcriptionally active sites of the genome, while the latter are the transcriptionally inactive. The RNA polymerase molecules were also found to be attached to the minicircular DNA.

REFERENCES Adolf, K.W., S.M., Chang, J.R. Poulson, and V.K. Laemmb. 1977. Isolation of a protein scaffold from mitotic HeLa cell chromosomes. Proc. Natl. Acad. Sci. USA 74: 4937-41. Alberts, B., D. Bray, J. Lewis, M. Raff,, K. Roberts, and J.D. Watson. 1994. Molecular Biology of the Cell. New York: Garland Science.

6.20

Essentials of Molecular Genetics

Altenberger, W., W. Horz, and H.G. Zachau, 1977. Comparative analysis of three guinea pig satellite DNAs by restriction endonucleases. Eur. J. Biochem. 73: 393-400. Arrighi, F.E., T.C., Hsu, S. Pathak, and H. Swada. 1974. The sex chromosomes of Chinese hamster: constitutive heterochromatin deficient in repetitive DNA sequences. Cytogenet. Cell Genet. 13: 268-74. Axel, R., H. Cedar, and G. Felsenfeld, 1974a. Chromatin template activity and chromatin structure. Cold Sp. Symp. Quant. Biol. 38: 773-83. Axel, R., W. Melchior, Jr., B. Sollner-Webb, and G. Felsenfeld. 1974b. Site-specific sites of interaction between histones and DNA in chromatin. Proc. Natl. Acad. Sci. USA 71: 4101-5. Bak, A.L., J. Zeuthen, and F.H.C. Crick, 1977. High order structure of human mitotic chromosomes. Proc. Nat. Acad. Sci. USA 74: 1595-99. Balbiani, E.G. 1881. Sur la structure du novau des cellulaz salivaries chez les larves de Chironomous. Zool. Ang. 4: 637-41. Bannister, A.J., P. Zegerman, J.F. Partridge, et al. 2001. Selective recognitin of methylated lysine 9 on histone H3 by the HP1 chromo domain. Nature 410: 120-4. Bhatia, K.N. and N. Dhand. 2002. Cytogenetics, Molecular Biology and Evolution. Delhi: Trueman Book Co. Black, B.E., D.R. Foltz, S. Chakravarthy, K. Luger, V.L. Woods Jr., and D.W. Cleveland. 2004. Structural determinants for generating centromeric chromatin. Nature 430: 578-82. Black, L.W. 1981. In vitro packaging of bacteriophage T4 DNA. Virology 113: 336-44. Black, L.W. 1986. In vitro packaging into phage T4 particles and specific recircularization of phage lambda DNAs. Gene 46: 97-101. Black, L.W. 1988. DNA packaging in dsDNA bacteriophages. In: The Bacteriophages. ed. Calendar, R. 2: 321-73, New York: Plenum. Bogenhagen, D.F., D. Rousseau, and S. Burke, 2007. The layered structure of human mtDNA nucleoids. J. Biol. Chem. 283: 3665-75. Bridges, C.B. 1936. The bar “gene”, a duplication. Science 83: 210-11. Britten, R.J., and D.E. Kohen, 1968. Repeated sequences in DNA. Science 161: 529-40. Brown, R. 1833. Observations on the organs and mode of fecundation in Orchideae and Asciepiadeae. Trans. Linnean Soc. Lond. 16: 685-742. Case, R.B., Y.-P. Chang, S.B. Smith, J. Gore, N.R. Cozzarelli, and C. Bustamante, 2004. The bacterial condensing MukBEF compacts DNA into a repetitive stable structure. Science 305: 222-7. Chen, Y., Y. Yang, M. van Overbeek, et al. 2008. A shared docking motif TRF1 and TRF2 used for differential recruitment of telomeric proteins. Science 319: 1092-6. Dame, R.T., M.C. Noom, and G.J.L. Wuite, 2006. Bacterial chromatin organization by H-NS protein unraveled using DNA manipulation. Nature 444: 387-90. Dorgio, B., T. Schalch, A. Kulangara, S. Duda, R.R. Schroeder, and T.J. Richmond, 2004. Nucleosome arrays reveal the two-start organization of the chromatin fiber. Science 306: 1571-3. Drlica, K., and J. Rouviere-Yaniv, 1987. Histonelike proteins of bacteria. Microbiol. Rev. 51: 301-19. Elgin, S.C., and S.I. Grewal, 2003. On heterochromatin. Curr. Biol. 13: R895. English, C.M., M.W. Adkins, J.J. Carson, M.E.A. Churchill, and J.K. Tyler, 2006. Structural basis for the histone chaperone activity of Asf1. Cell 127: 495-508. Felsenfeld, G. 1978. Chromatin. Nature 271: 115-22. Finch, J.T., and A. Klug, 1976. Solenoidal model for superstructure in chromatin. Proc. Natl. Acad. Sci. USA 73: 1897-901. Flemming, W. 1882. Zellusubstanz Kern and Zelteihing. Leipzig: Vogel. Gardner, E.J., M.J. Simmon, and D.P. Snustad. 2005. Principles of Geneics. New Delhi: Wiley India Ltd. Gellert, M., K. Mizuuchi, M.H. O‟Dea, and A. N. Howard 1976. DNA gyrase: An enzyme that introduces superhelical turns into DNA. Proc. Natl. Acad. Sci. USA 73: 3872-6. Gillespie, D.A.F., and K.H. Vousden, 2003. The secret life of histones. Cell 114: 655-56. Goodwin, G.H., L. Woodhead, and E.W. Johns. 1977. The presence of high mobility group non-histone chromatin proteins in isolated nucleosomes. FEBS Lett. 73: 85-8. Griffith, J.D. 1975. Chromatin structure: deduced from a minichromosome. Science 187: 1202-3. Gunjan, A., and A. Verreault. 2003. A Rad53 kinase-dependent surveillance mechanism that regulates histone protein levels in S. cerevisiae. Cell 115: 537-49.

Packaging of Nucleic Acids

6.21

Handlaczky, G., A.T. Sumner, and A. Ross. 1981. Protein-depleted chromosomes. II. Experiments concerning the reality of chromosome scaffolds. Chromosoma 81: 557-67. Hayes, W. 1952. Recombination in bacteria E. coli K12: unidirectional transfer of genetic material. Nature 169: 118-21. Heitz, E. 1928. Das Hetrochromatin der mouse I. Jahrb. Wiss. Bot. 69: 762-818. Henson, C.V., C.K.J. Shen, and J.E, Hearst. 1976. Crosslinking of DNA in situ as a probe for chromatin structure. Science 193: 62-4. Holden, C. 2004. Long-term stress may chip away at the ends of chromosomes. Science 306: 1666. Horn, P.J., and C.L. Peterson, 2002. Chromatin higher order folding: wrapping up transcription. Science 297: 1824-7. Hsiao, C.L., and L.W. Black. 1977. DNA packaging and the pathway of bacteriophage T4 head assembly. Proc. Natl. Acad. Sci. USA. 74: 3652-6. Ishii, K., V. Ogiyama, Y. Chikashige, S. Soejima, F. Masuda, T. Kakuma, Y. Hiraoka, K. Takahashi. 2008. Heterochromatin integrity affects chromosome organization after centromere dysfunction. Science 321: 1088-91. Katsura, I. 1986. Structure and inherent properties of the bacteriophage lambda head shell. V: Amber mutants in gene E. J. Mol. Biol. 190: 577-86. Kiseleva, E.V., N.A. Dudareva, A.E. Dikalova, et al. 1989. The chloroplast genome of Beta vulgaris L.: Structural organization and transcriptional activity. Plant Sci. 62: 93-103. Klug, A., and A.A. Travers. 1989. The helical repeat of nucleosome-wrapped DNA. Cell 56: 9-11. Kornberg, R.D. 1974. Chromatin structure; a repeating unit of histones and DNA. Science 184: 868-71. Kornberg, R.D. 1977. Structure of chromatin. Annu. Rev. Biochem. 45: 931-54. Kornberg, R.D., and J.O. Thomas 1974. Chromatin structure: oligomers of the histones. Science 184: 865-8. Kucheria, K. and Sanyal, G. 2003. The structure of chromatin. Pp. 667-679. In: Talwar, G.P. and Srivastava, L. M. eds. Textbook of Biochemsitry and Human Biology. New Delhi: Prentice-Hall of India Lachner., M., D. O‟Carroll, S. Rea, K. Mmechtler, and T. Jenuwain. 2001. Methylation of histone H3 lysine 9 creates a binding site for HP1 proteins. Nature 410: 116-20. Laine, B., P. Sautiere, A. Spassky, and S. Rimsky. 1984. A DNA-binding protein from E. coli. Isolation, characterization and its relationship with proteins H1 and B1. Biochem. Biophys. Res. Common. 119: 114753. Lederberg, J. 1952. Cell genetics and hereditary symbiosis. Physiol. Rev. 32: 403-27. Littau, V.C., C.J. Burdick, V.G. Allerey, and S.A. Mirsky. 1965. The role of histones in the maintenance of chromatin structure. Proc. Natl. Acad. Sci. USA 54: 1204-12. Loppin, B., E. Bonnefoy, C. Anselme, A. Laurencon, T.L. Karr, and P. Couble, 2005. The histone H3.3 chaperone HIRA is essential for chromatin assembly in the male pronucleus. Nature 437: 1386-90. Lutter, L.C. 1978. Characterization of DNase-I cleavage sites in the nucleosomes Cold Sp. Harbor Symp. Quant. Biol. 42: 137-47. Luzzati, V. 1963a. Structure of nucleohistones and nucleoprotamines. J. Mol. Biol. 7: 142-63. Luzzati, V. 1963b. The structure of DNA as determined by X-ray scattering technique. Progr. Nucl. Acid. Res. Mol. Biol. 1: 347-68. McClintock, B. 1951. Chromosome organization and gene expression. Cold Sp. Harb. Symp. Quant. Biol. 16: 1347. Miklos, G.L.G., and R.N. Nankivell. 1976. Telomeric satellite DNA functions in regulating recombination. Chromosoma 56: 143-67. Mohd-Sarip, A., and C.P. Verrijer. 2004. Chromatin DNA packaging and gene silencing. Science 306: 1484. Morris, C.A., and D. Moazed. 2007. Centromere assembly and propagation. Cell 128: 647-50. Morse, R.H., and R.T. Simpson. 1988. DNA in the nucleosome. Cell 54: 285-7. Mullinger, A.M., and R.T. Johnston. 1980. Packaging DNA into chromosomes. J. Cell Sci. 46: 61-86. Murray, A. 1990. Telomeres: All‟s well that ends well. Nature 346: 797-8. Nacheva, G.A., D.Y. Guschin, O.V. Preobrazhenskaya, V.L. Karpov, K.K. Ebalidse, and A.D. Mirzabekov. 1989. Change in pattern of histone binding to DNA upon transcriptional activation. Cell 58: 27-36. Nägeli, C. 1842. Zur Entwickelungsgeschichte des Pollens bei den Phanerogamen. Ovell & Füssli, Zürich. Noll, M., J.C. Thomas, and R.D. Kornberg 1975. Preparation of native chromatin and damage caused by shearing. Science 187: 1203-6.

6.22

Essentials of Molecular Genetics

Ohno, S. 1990. Grammatical analysis of DNA sequences provides a rationale for the regulatory control of an entire chromosome. Genet. Res. 56(2/3): 115-20. Okada, T.A. and D.E. Commings. 1979. Higher order structure of chromosomes. Chromosoma 72: 1-14. Olins, D.E., and A.L. Olins. 1974. Spheroid chromatin units (v bodies). Science 183: 330-2. Oudet, P., M. Gross-Bellard, and P. Chambon. 1975. Electron microscopic and biochemical evidence that chromatin structure is a repeating unit. Cell 4: 281-300. Pardon, J.F., M.H.F. Wilkins, and R.M. Richards. 1967. Superhelical model for nucleohistone. Nature 215: 508-9. Peacock, W.J., A.R. Lhe, W.L. Gerlach, P. Drusmvir, E.S. Dannis, and R. Apples. 1977. Fine structure and evolution of DNA in heterochromatin. Cold Sp. Harb. Symp. Quant. Biol. 42: 1121-33. Pettijohn, D.E., R.M. Hecht, O.G. Stonington, and T.D. Stamato. 1973. Factors stabilizing DNA folding in bacterial chromosomes. Steenbock Symposium on DNA synthesis in vitro. Baltimore: University Park Press. 145-62. Rao, V.B., and L.W. Black. 1985. DNA packaging of bacteriophage T4 proheads in vitro : evidence that prohead expansion is not coupled to DNA packaging. J. Mol. Biol. 185: 565-78. Rhoades, M.M. 1968. Studies on the cytological basis of crossing over. In: Replication and Recombination in Genetic Material. Eds. Peacock, W.J. and R.D. Brock. pp. 229-41. Canberra: Aust. Acad. Sci. Richmond, T.J., and J. Widom. 2000. Chromatin Structure and Gene Expression. Elgin, S.C.R. and Workman, J.L. Eds. Oxford: Oxford Univ. Press. pp 1-23. Rouviere-Yaniv, J., and F. Gros. 1975. Characterization of a novel, low molecular weight DNA-binding protein from Escherichia coli. Proc. Natl. Acad. Sci. USA 72: 3428-32. Ruckert, J. 1892. Zur Entwicklungsgeschichte des Ovarialeies bei Selachiern. Anat. Anz. 7: 107-58. Sasaki, N., H. Kuroiwa, H. Nishitani, et al. 2003. Glom is a novel mitochondrial DNA packaging protein in Physarum polycephalum and causes intense chromatin condensation without suppressing DNA functions. Mol. Biol. Cell. 14: 4758-69. Shibata, H., H. Fujisawa, and T. Minagawa. 1987. Characterization of bacteriophage T3 DNA packaging reaction in vitro in a defined system. J. Mol. Biol. 196: 845-51. Skinner, D.M. 1977. Satellite DNAs. Bioscience 27: 790-6. Slesarev, A.I., J.A. Lake, K.O. Stetter, M, Gellert, and S.A. Kozyavkin. 1994. Purification and characterization of DNA topoisomerase V. An enzyme from the hyperthermophilic prokaryote Methanopyrus kandleri that resembles eukaryotic topoisomerase I. J Biol Chem. 269: 3295-303. Smith, G.P. 1976. Evolution of repeated DNA sequences by unequal crossover. Science 191: 528-35. Spassky, A., S. Rimsky, H. Garreau, and H. Buc. 1984. H1a, an E. coli DNA-binding protein which accumulates in stationary phage, strongly compact DNA in vitro. Nucl.Acids Res. 12: 5321-40. Sueoka, N. 1961. Variation and heterogeneity of base composition of deoxyribonucleic acids: A comparison of old and new data. J. Mol. Biol. 3: 310-40. Thoma, F., Th. Koller, and A. Klug. 1979. Involvement of histone H1 in the organization of the nucleosome and of the salt-dependent superstructures of chromatin. J. Cell Biol. 83: 403-27. Wilkins, H.M.F., and J.D. Hamilton. 1960. The molecular models and Fourier transforms. J. Mol. Biol. 2: 38-64. Willenbrock, H., and D.W. Ussery. 2004. Chromatin architecture and gene expression in Escherichia coli. Genome Biol. 5: 252. doi:10.1186/gb-2004-5-12-252 Woodcock, C. 1973. Ultrastructure of inactive chromatin. J. Cell Biol. 59: 368a. Yamamoto, M., and G.L.G. Milkos, 1978. Genetic studies on heterochromatin in Drosophila melanogaster and their implications for the functions of satellite DNA. Chromosoma 66: 71-98. Yunis, J.J., and W.G. Yasmineh. 1971. Heterochromatin, satellite DNA, and cell function. Science 174: 1200-9.

PROBLEMS 1. 2. 3. 4.

Is DNA ever found naked (uncomplexed with proteins) in the nucleoid, nucleus, mitochondria and chloroplasts? If not so, why? Does DNA become naked when performing some function? How does DNA differ in euchromatin and heterochromatin? Are the nucleosomes characteristic of only nuclear DNA packaging or such structures are also present in mitochondria and chloroplasts? Is the nuclear octamer histone core or linker histone more important in higher order packaging of the nuclear DNA?

7 Replication of Nucleic Acids Depending upon number of strands, DNA can be classified as (a) double helical DNA (as found in many viruses, bacteria and all organisms) or (b) single-stranded DNA (as found in X174, s13, F1 and M13 viruses). Depending upon shape of the DNA molecule DNA can be classified as (a) circular (as in most of the viruses, like SV40, and bacteria like Escherichia coli, Salmonella typhymurium, or (b) linear (as in phage T7 and eukaryotic chromosomes). In some viruses (e.g. tobacco mosaic virus), RNA is the genetic material. Questions concerning nucleic acid replication are: (a) Is the mode of replication same in double-stranded and single-stranded DNA? (b) Is the mode of replication same in circular and linear DNA? (c) Is the mode of replication same in prokaryotes and eukaryotes. (d) What is the mode of replication of RNA? We shall attempt to answer these questions here. Replication is a process where nucleic acid molecules are made in the cell by copying complementary sequence of the various nucleotides from the original template, which is not destroyed. Here we will deal with replication of deoxyribonucleic nucleic acid (DNA) and ribonucleic acid (RNA). When Watson and Crick (1953) gave the double helical model of DNA, the feature that excited the world of genetics was the self-complementary nature of DNA, i.e., the relationship between the sequences of the bases on its intertwined polynucleotide chains. It was the self-complementary nature of DNA that led most of the geneticists to accept Avery‟s conclusion that DNA, not a form of protein, was the carrier of genetic information. This was further supported by decisive evidence and enzyme-logical proof and thus gave rise to the study of DNA replication.

DIFFERENT MODES OF DNA REPLICATION Delbruck and Stent (1957) gave three modes of DNA replication – semi-conservative, conservative and dispersive. In order to decide the actual mode of replication of DNA, chemical labeling of DNA was done. In later duplications the label was detected. Our expectations in the three modes of replication are shown in Figure 7.1. In case of conservative replication, both strands of newly synthesized DNA double helix should be labeled while the old double helices remain unlabeled. In case of semi-conservative replication, the replicated double helices carry one old and one new stand, and all are, therefore, labeled. In dispersive mode of replication also, label in both the strands of both the helices will be expected. Semi-conservative and dispersive modes can be differentiated in next cycle of DNA replication.

DNA REPLICATION IN PROKARYOTES The first evidence at the molecular level that replication might be semi-conservative was provided by Levinthal (1956) in an autoradiographic study of the distribution of 32P-labeled DNA from phage T2.

7.2

Essentials of Molecular Genetics

Figure 7.1 Three modes of DNA replication: expectations in two cycles of replication. Dark color indicates label

Meselson and Stahl (1958) labeled the DNA of E. coli bacteria with "heavy" nitrogen, 15N, by growing them on a 15N-containing medium that was sole source of nitrogen for many generations. The DNA extracts of such cells gave a characteristic UV-light pattern showing a dark band near one end of a tube that had been spun at high speed in a ultracentrifuge. When these labeled 15N cells were grown on non-labeled media (containing 14N), the DNA extracted was shown to consist of a hybrid DNA carrying both 14N and 15N at the same time, i.e., DNA had not replicated in the separately labeled and unlabeled conservative form. Conservative replication seemed excluded. To distinguish between semiconservative and dispersive replication, they allowed the next generation of growth on unlabeled media. Unlabeled DNA was formed in amounts equal to partially labeled hybrid DNA. Additional generations of growth on unlabeled media gave a relative increase in the amount of unlabeled DNA. Since unlabeled DNA was always formed despite the presence of labeled hybrid DNA, duplication evidently did not involve random dispersive labeling of all newly formed DNA (Figure 7.2). The dual 15 N/14N composition of the hybrid DNA was also demonstrated by heating this band of DNA to 100 °C and then reanalyzing it by ultracentrifuging. Denaturation with heat broke the A=T and G≡C hydrogen bonds between two complementary DNA strands and two single-strands of DNA were produced. Two separate bands were formed in ultracentrifuge density gradient, one heavy ( 15N) and one light (14N). This experiment supported the semi-conservative mode of DNA replication.

Problem of Unwinding of DNA Objection was raised with respect to the amount of time it would take for such replication to occur in a very long molecule. T2 phage DNA is 200,000 nucleotide pairs. Since these are 10 bp to each rotation, such a molecule would have 20,000 rotations. The molecules will be 20,000 × 34 = 68 microns in length. Such a molecule would have to be folded a number of times to fit into the phage head. This problem of unwinding of DNA and replication appeared profound. Answer to this problem is that replication proceeds before the parental DNA is completely unwound. Cairns (1963) presented evidence to this. The E. coli chromosome is circular and a double-stranded structure. Two component

Replication of Nucleic Acids

7.3

Figure 7.2 Meselson-Stahl experiment and its interpretation providing proof that mode of DNA replication is semi-conservative. Solid lines indicate labeled strand while dotted lines indicate unlabeled strand

strands separate during replication, with each strand duplicating individually as shown by an autoradiograph (Figure 7.3). The fact that this duplication produces a Y type of joint indicates that unwinding of two complementary strands of DNA is not completed before replication begins. In the second round of replication,  structure appeared which showed that one of the arcs was twice as heavily labeled as the other one. It clearly supported the semiconservative model of DNA replication.

Direction of Replication Cairns (1963) also proved that there existed only one origin of replication in prokaryotes. DNA replication is of two types with respect to direction of replication – unidirectional and bidirectional. Prescott and Kuempel (1972, 1973) answered this question. When E. coli cells were grown on the medium containing 3H thymidine at low level, it was enough to visualize the track of DNA from origin. After some time, the cells were exposed to high level of 3H Thymidine. As a result, two short tracks of dense silver grains were observed at each end of growing chromosome. This proved bidirectional mode of

Figure 7.3 Cairn's experiment on Escherichia coli

7.4

Essentials of Molecular Genetics

DNA replication. DNA replication proceeds outwardly from the point of start in both the directions. Evidence for bidirectional replication was provided by Kriegstein and Hogness (1974) in Drosophila. Trans-configuration of two SSG1 forks in an eye form is shown in Figure 7.4. Bidirectional replication is characteristic of prokaryotes and eukaryotes (Arai and Kornberg 1981). Unidirectional replication is seen in X174 and some other single-stranded DNA viruses.

DNA Replication is Almost Continuous in Bacteria

Figure 7.4 Trans-configuration of two SSG1 forks in an eye form. Arrows indicate the single-stranded regions (Redrawn, with permission, from Kreigstein H.J., and D.S. Hogness 1974. Proc. Natl. Acad. Sci. USA 71: 135-9)

In prokaryotes, the process of replication has mostly been studied in E. coli where length of the chromosome is 1 mm and one round of DNA replication is completed in 30 min. This time is almost equal to the cell doubling time. Rate of replication is approximately 30-40  per minute. Division of the cell quickly follows replication of DNA, and replication immediately restarts in the daughter calls. Thus, DNA replication is almost continuous in bacteria. DNA replication in both prokaryotes and eukaryotes involves a large number of proteins which determined the composition of the double helix both before and during replication process. These enzymes cause the twisting of the double helix into supercoils in either positive or negative direction. DNA replication begins at a unique site on the chromosome, called origin of DNA replication and requires RNA primer. Since these enzymes change the topological forms of the molecules without changing their chemical composition, they are called topoisomerases. It has been suggested that supercoiling is necessary to produce initial kink at the replication origin in order to allow the separation of two strands and formation of a replicating growing point, called fork. The topoisomerases are also helpful in forming loops in the circular DNA molecule with the help of RNA molecule.

Replicons Replicon is unit of DNA replication. It consists of origin and terminus. Origin is that sequence of DNA, which supports initiation of DNA replication, for example, OriC in prokaryotes, Oriι in ι phage. Terminus is the point, which stops the fork movement, hence DNA replication terminates. It lies exactly opposite to the point of origin. Whole of the E. coli chromosome is one replicon. Model for bacterial replicon was proposed by Jacob et al. (1963). Following enzymes/proteins are required to form a discrete multiprotein structure, called replisome, for DNA replication: topoisomerases, including gyrases; unwinding proteins; single-strand binding (SSB) proteins; primase; DNA polymerases; polynucleotide ligase; exonucleases; endonuclease; and pilot proteins. DNA polymerases are discussed here.

Prokaryotic DNA polymerases Characteristics of three major (DNA polymerase I, II, and III) and some other prokaryotic DNA polymerases are given, according to Kornberg and Baker (1992), in Table 7.1. There are a total of five DNA polymerases identified in prokaryotes. These enzymes are described below briefly. DNA polymerase I. DNA polymerase I (PolI) plays role in termination of DNA replication. PolI from E. coli was the first DNA polymerase characterized. There are approximately 400 molecules of the

Replication of Nucleic Acids

7.5

Table 7.1 Characteristics of different prokaryotic DNA polymerases (Adapted, with permission, from Kornberg, A. and D. Baker. 1992. DNA Replication. University Science Books) Enzyme

Template

Primer

Other activities

E.coli DNA pol I

DNA

DNA/RNA

3'5' exo, 5'3' exo

E.coli DNA pol II (Klenow fragment) E.coli DNA pol III

DNA

DNA/RNA

3'5' exo

C-terminal fragment

DNA

DNA/RNA

Multimeric structure

Taq pol

DNA

DNA/RNA

DNA/RNA None required

DNA/RNA DNA

3'5' exo (on a separate subunit) extendase (adds 3'-A overhangs) (Ribonuclease H)

Reverse transcriptases Terminal transferases

Other features Monomeric

Thermostable, used in PCR Used to make cDNA Will synthesize DNA in nontemplated reaction

enzyme per E. coli cell. PolI is coded by PolA gene. The enzyme is a single large protein with a molecular weight of approximately 103 kDa. The enzyme requires a divalent cation (Mg++) for activity and has three enzymatic activities associated with it: (a) 5'  3' DNA polymerase activity, (b) 3'  5' exonuclease (proofreading) activity and (c) 5'  3' exonuclease (nick translation) activity. Completion of Okazaki fragment synthesis leaves a nick between the Okazaki fragment and the preceding RNA primer. PolI extends the Okazaki fragment while 5′  3′ exonuclease activity removes RNA primer. This is called nick translation. PolI dissociates after extending Okazaki fragment 10-12 nucleotides. The location of the three enzymatic activities within the protein is known, and it is possible to remove the 5'  3' exonuclease activity using an enzyme called a protease to cut PolI into two protein fragments (Klenow fragment of DNA PolI) (Table 7.1); both the polymerization and 3'  5' exonuclease activities are on the large Klenow fragment and the 5'  3' exonuclease activity is on the small fragment. DNA polymerase II. DNA polymerase II (PolII) can add new nucleotides at the 3′-end. PolII has 100 molecules/cell. Enzyme is ~90,000 kDa in size. It is coded by PolB gene. Synthesis is induced during S-phase. It can reinitiate DNA synthesis downstream of gaps. It has low error rate but is too slow hence cannot be used during normal DNA synthesis. The crystal structure of a target fragment of yeast PolII reveals a heart-shaped dimeric protein with a large central hole. It provides a molecular model of the enzyme as ATP-modulated clamp with two sets of jaws at opposite ends connected by multiple joints (Berger et al. 1996). PolII is involved in repair of damaged DNA and has 3'→5' exonuclease activity. DNA polymerase III. DNA polymerase III (PolIII) is the main polymerase for in vivo replication and has 3'→5' exonuclease proofreading ability. It is coded by PolC gene and is also referred to as replicase, PolIII, PolC, DnaE, or the alpha subunit. This is a major enzyme for DNA replication. The PolIII has 10 molecules/cell. PolIII from E. coli is a single protein of molecular weight 130 kDa. Though the molecule has DNA polymerase activity by itself, it works to replicate DNA in the bacterial cell in conjunction with other proteins. This multi-protein complex is referred to as the PolIII holoenzyme. The proteins (called subunits) that associate with PolIII in the holoenzyme perform several functions. The most interesting subunit is called beta, which forms a donut shaped ring around the DNA and helps to anchor the holoenzyme to the DNA during replication (Figure 7.5). By acting as a sliding "clamp", beta helps the holoenzyme to replicate long stretches of DNA without "falling off"

7.6

Essentials of Molecular Genetics

the strand (this is called processivity). PolIII subunits∙ holoenzyme directs both leading and lagging strand synthesis simultaneously by virtue of having two polymerases DNA polymerase IV. McKenzie et al. (2001) reported that the SOS-inducible, error-prone DNA polymerase IV (PolIV), encoded by dinB, is required for adaptive point mutation in the E. coli lac operon. A non-polar dinB Figure 7.5 β subunit of DNA polymerase III mutation reduces adaptive mutation frequencies by 85 per forms a donut-shaped structure around cent but does not affect adaptive amplification, growthDNA to anchor the enzyme during DNA dependent mutation, or survival after oxidative or UV replication (Redrawn from oregonstate. damage. They showed that PolIV, together with the major edu/.../bb451/figslett/FigP.html) replicase, PolIII, can account for all adaptive point mutations at lac. These results thus identified a role for PolIV in inducible genetic change. DNA polymerase V. DNA polymerase V (PolV) participates in bypassing DNA damage. When E. coli is exposed to high levels of radiation or of a mutagen, major damage to the bacterial DNA can occur. The cell responds by inducing a special "last-resort" repair pathway called the SOS repair pathway. Among the SOS genes that are induced are umuC and umuD. The products of these two genes form PolV. Actually, one copy of UmuC and two copies of a truncated form of UmuD are required to form this polymerase.

STEPS OF DNA REPLICATION A unified scheme of DNA replication includes unwinding of double helical DNA, stabilization of single-stranded DNA, synthesis of pre-primer, formation of primer, extension of primer by DNA polymerase III, excision of RNA, gap filling by DNA polymerase I, and zipping. These tasks are accomplished in three steps of DNA replication – initiation, elongation and termination.

Initiation Initiation includes activation of deoxyribonucleotides, exposure of unwinding of double helix and RNA primer. The first step, activation of deoxyribonucleotides, is further divided into two substeps – synthesis of deoxyribonucleotides and phosphorylation of deoxyribonucleotides. DnaA is an originbinding protein. It binds cooperatively to the four 9-bp repeats in OriC. The initial closed complex contains the origin DNA wrapped around an assembly of 10-20 monomers of DnaA complexed with ATP. An open complex forms when the three AT-rich 13-bp repeats in OriC unwind as a consequence of the DNA wrapping around the assembly of DnaA. OriC plasmids must be negatively supercoiled in order to form open complexes (relaxed OriC plasmids can only form closed complexes). DnaA then guides the DnaB (helicase) hexametric protein from a DnaB-DnaC complex in solution to its places around each strand. DnaB unwinds DNA strands using ATP energy and moves processively (i.e., it does not leave the DNA until replication is finished; it encircles the DNA strand) in the 5'3' direction along DNA. DnaA together with the use of ATP energy is required to load DnaB onto DNA in the form of a DnaB-DnaC complex. After loading DnaB onto the replication fork, DnaC is released from the DnaB-DnaC complex and leaves the DNA. Thus DnaC is required for loading DnaB onto DNA. DnaB helicase activity is stimulated more than 10-fold by making contact with PolIII holoenzyme. Helicases and gyrases (topoisomerases) are involved in exposure of the two DNA strands. Strand separation, in case of double helical DNA, is the first step in DNA replication. Three proteins required

Replication of Nucleic Acids

7.7

Figure 7.6 Strand separation

in strand separation are helix destabilizing protein (HDP), unwinding protein and gyrase (Alberts and Sternglanz 1977). The action of some proteins involved in strand separation is shown in Figure 7.6. Active and passive unwinding of DNA. Helicases are molecular motors that separate DNA strands for efficient replication of genomes. Johnson et al. (2007) probed the kinetics of individual ring-shaped T7 helicase molecules as they unwound double-stranded DNA (dsDNA) or translocated on single-stranded DNA (ssDNA). A distinctive DNA sequence dependence was observed in the unwinding rate that correlated with the local DNA unzipping energy landscape. The unwinding rate increased ~10-fold (approaching the ssDNA translocation rate) when a destabilizing force on the DNA fork junction was increased from 5 to 11 pN. These observations revealed a fundamental difference between the mechanisms of ring-shaped and non-ring-shaped helicases. The observed force-velocity and sequence dependence was not consistent with a simply passive unwinding model. However, an active unwinding model fully supports the data even though the helicase on its own does not unwind at its optimal rate. Notation for modeling the helicase movement toward a fork junction is given in Figure 7.7(A). Passive and active unwinding mechanisms, as suggested by Johnson et al. (2007), are presented in Figure 7.7(B-C). In passive unwinding mechanism, the DNA fork thermally fluctuates between dsDNA and ssDNA states (step 1). When the amount of ssDNA between the helicase and the junction is greater than or equal to the helicase step size (δ) the helicase may forward translocate (step 2). In active unwinding mechanism, the helicase destabilizes a region (the cloud) of dsDNA near the junction (step 1). This makes the junction more likely to be open so that the helicase is able to step forward (step 2) more frequently. Helicases and gyrases. Helicases unwind the double helix by providing single-strand so the DNA polymerase can work upon. It uses the energy of nucleoside triphosphate hydrolysis to break the Hbonds. They are processive in nature. It translocates along 5′3′ or 3′5′. Their activity is affected by the presence of single-strand binding proteins (SSBs) and the concentration of topoisomerases. The unwinding activity of helicase at one position along a DNA molecule causes the "downstream" portion of the same DNA molecule to become overwound. Unless this over winding is relieved, DNA replication will soon stop. Topoisomerases. Topoisomerases relax the topological stress in the positively supercoiled regions of DNA. They alter the linking number of DNA by lysing. These enzymes are of two types – topoisomerase I (produces transient nicks in only one strand of DNA) and topoisomerase II (produces transient nicks in both the strands of DNA then revolving one cut end by 360° before rejoining the two strands). Topoisomerases are also known as gyrases. DNA gyrase is in fact a type II topoisomerase. Topoisomerases relieve the torsional strain in landscape in which the topoisomerase has a small but quantifiable probability to religate DNA that is built up during replication and transcription. They are

7.8

Essentials of Molecular Genetics

Figure 7.7 (A) Notation for modeling the helicase movement toward a fork junction. (B-C) Cartoon of passive and active unwinding mechanisms

vital for cell proliferation and are a target for poisoning by anti-cancer drugs. Type IB topoisomerase (TopIB) forms a protein clamp around the DNA duplex and creates a transient nick that permits relaxation of supercoils. Using real-time single-molecule observation, Koster et al. (2005) showed that TopIB released supercoils by a swivel mechanism that involves friction between the rotating DNA and the enzyme activity; i.e., the DNA does not freely rotate. Unlike a nicking enzyme, TopIB does not release all the supercoils at once, but it typically does so in multiple steps. The number of supercoils removed per step follows an exponential distribution. The enzyme is bound to be torque-sensitive, as the mean number of supercoils per step increases with the torque stored in the DNA. They proposed a model for topoisomerization in which the torque drives the DNA rotation over a rugged periodic energy landscape in which the topoisomerase has a small but quantifiable probability to religate the DNA once per turn. Gyrase uses ATP energy to introduce negative supercoiling into the DNA. The negative super-coiling compensates for the positive supercoiling generated during DNA replication. Gyrase is not limited to the site of DNA replication. It can act at any position on the DNA molecule. Gyrase can be considered as the swivel for replicating molecules. In prokaryotes, gyrase relaxes negatively supercoiled DNA in the absence of ATP. RNA primer synthesis. RNA primer is homogenous in length but sequence is not unique, not even at RNA-DNA junction. RNA primer is required for DNA replication of only those DNA templates that do not have free ends. E. coli primer synthesis seems to take place in a similar way. All known DNA polymerases require a pre-existing "primer" from which to begin replication. A primer is an oligonucleotide that is base paired with template strand and provides 3′-hydroxyl group upon which to

Replication of Nucleic Acids

7.9

start polymerizing a new DNA strand. E. coli uses these small RNAs as primers which are synthesized by an enzyme called DnaG (primase), coded by the dnaG gene. DnaG makes RNA primers (about 10 nucleotides long) that are used by PolIII holoenzyme to start DNA synthesis. DnaG acts distributive (does not remain associated with DNA). It drops off DNA after primer synthesis, then reloads onto DNA again and again by protein-protein interactions with DnaB to synthesize the next primer on the lagging strand. There are many places along the template where RNA primers can be made by primase. Primosome assembly. In fact RNA primes the synthesis of DNA. A specialized RNA polymerase (primase) joins the prepriming complex in a multi subunit assembly, called the primosome. Primase synthesizes a short stretch of RNA, which is complementary to one of the DNA strands. This primer RNA is removed at the end of replication by 5′3′ exonuclease activity of PolI. An RNA primer would be unnecessary if DNA polymerases could start chains de novo. However, such a property would be incompatible with the very high fidelity of DNA polymerases, which is due in part to their proofreading of nascent DNA. RNA polymerases can start chains de novo because they do not examine the preceding base pair. Consequently, their error rates are orders of magnitude higher than those of DNA polymerases. The ingenious solution is to start DNA synthesis with a low-fidelity stretch of polynucleotide but mark it temporary by placing ribonucleotides in it. The use of an RNA polymerase to initiate DNA synthesis is also plausible from an evolutionary viewpoint, because RNA was probably present long before DNA emerged as a more reliable and stable store of genetic information (Bollum amd Peterson 1983).

Elongation Elongation comprises of formation of leading and lagging strands, chain growth, removal of RNA primers, gap filling and nick sealing. Elongation is considered as a nucleotidal group transfer reaction because the 3′-OH of the pre-existing nucleotide makes a nucleophilic attack on the 5′-phosphate of the incoming nucleotide. The deoxyribonucleotide triphosphates are generated in phosphorylation step of initiation. Hence, phosphodiester bond is formed, liberating two inorganic phosphates. Enzymes that replicate DNA using a DNA template are called DNA polymerases. However, there are also enzymes that synthesize DNA using an RNA template (reverse transcriptases) and even enzymes that make DNA without using a template (terminal transferases). Most organisms have more than one type of DNA polymerases. For example, E. coli has five DNA polymerases, but all work by the same basic rules: (1) Polymerization occurs in only 5'3' direction. (2) Polymerization requires a template to copy: the complementary strand. (3) Polymerization requires 4 dNTPs: dATP, dGTP, dCTP, dTTP (TTP is sometimes not designated with a 'd' since there is no ribose containing equivalent). (4) Polymerization requires a pre-existing primer from which Table 7.2 Subunits of DNA polymerase III and to extend. (4) The primer is RNA in most organisms, their functions (Reproduced from http:// but it can be DNA in some organisms; very rarely the oregonstate.edu/instruction/bb492/lectures/ primer is a protein in the case of certain viruses only. DNAII.html) Processivity of a DNA polymerase. Exonuclease activity has been preferentially observed on inSubunit Function correctly paired bases. So, if by chance a wrong base DNA polymerase  is added onto a growing chain, it has a very high  3΄5΄ exonuclease probability of being removed before the next base is  Stimulates 3΄5΄exonuclease added. Thus 3′5′ exonuclease activity provides a ψ Activates DnaB, Helicase activity proofreading activity which gives DNA replication a Binds ATP  very high degree of accuracy. Subunits of PolIII and Unknown  their functions are given in Table 7.2. β Sliding clamp

7.10

Essentials of Molecular Genetics

Bacterial replicative DNA polymerases, such as PolIII, share no sequence similarity with other polymerases. The crystal structure determined at 2.3Ǻ resolution, of a large fragment of PolIII (residues 1-917), reveals a unique chain fold with localized similarity in the catalytic domain to Polβ and related nucleotidyl transferases. The structure of PolIII is strikingly different from those of members of the canonical DNA polymerase families, which include eukaryotic replicative polymerases, suggesting that DNA replication machinery in bacteria arose independently. A structural element near the active site in PolIII that is not present in nucleotidyl transferases but which resembles an element at the active sites of some canonical DNA polymerases suggests that at a more distant level all DNA polymerases may share a common ancestor. The structure also suggests a model for interaction of PolIII with the sliding clamp and DNA. Directionality problem of replisome. In all organisms, the protein machinery responsible for the replication of DNA, the replisome, is faced with directionality problem. The antiparellel nature of duplex DNA permits the leading-strand polymerrase to advance in a continuous fashion, but forces the lagging-strand polymerase to synthesize in Figure 7.8 Protein-protein contact between DNA opposite direction. polymerase III and DnaB helicase stimulates helicase Continuous synthesis of leading strand. Singleactivity (Redrawn from oregonstate.edu/.../bb451/ strand binding (SSB) protein does not itself figslett/FigBF.html) unwind DNA, but binds to and stabilizes unwound single-stranded DNA. Thus single-strand binding (SSB) protein holds denatured OriC open. DnaB helicase has been loaded onto replication forks as shown in Figure 7.8(A-F). DnaG primase is recruited to the replication fork by protein-protein interactions with DnaB helicase. Primase synthesizes short (about 10 nucleotides long) RNA molecules that will act as primers for DNA synthesis. Primase requires contact with SSB protein for tight binding to its primed site and must be displaced so that the beta clamp can be loaded onto DNA. The clamp loader (gamma complex) of PolIII holoenzyme puts a beta clamp around an RNA primer using ATP energy. A primer-terminus (3' hydroxyl of the RNA primer) is also required in order for the clamp loader to place a beta ring onto the RNA-DNA structure. The beta ring is constrained to the primer since it cannot slide over single-stranded DNA. The chi subunit of the clamp loader is also the primase-displacing subunit. The underlying mechanism is a competitive contact between chi and primase for SSB protein. Upon establishing the chi-to-SSB protein contact, the primase-to-SSB protein contact is disrupted and primase diffuses away allowing gamma complex to assemble beta onto the primed site. This contact allows the attachment of the alpha polymerase subunit to the beta clamp and positions the 3'-hydroxyl end of the RNA primer in DNA polymerization active site so that DNA synthesis can begin. Hence, the primase-to-polymerase switch is orchestrated by an ordered set of reactions at a primed site in which SSB protein makes mutually exclusive contacts with primase and the chi-subunit of the gamma complex needed to load beta. Since

Replication of Nucleic Acids

7.11

beta is needed to tether core polymerase to DNA, the end result is a primase-to-polymerase switch. Tethered PolIII now synthesizes the leading strand. DNA synthesis proceeds at a rate of 1,000 nucleotides per second. Protein-protein contact between PolIII and DnaB helicase stimulates helicase activity. Discontinuous synthesis of lagging strand as Okazaki fragments. By extending RNA primers, the lagging-strand polymerase restarts at short intervals and produces Okazaki fragments (Okazaki et al. 1968). At least in prokaryotic systems, this directionality problem is solved by the formation of a loop in the lagging strand of the replication fork to reorient the lagging-strand DNA polymerase so that it advances in parallel with the leading-strand polymerase. The replication loop grows and shrinks during each cell cycle of Okazaki fragment synthesis. Synchronization of polymerization on leading and lagging strands. A hallmark feature of DNA replication is the coordination between the continuous polymerization of nucleotides on the leading strand and the discontinuous synthesis of DNA on the lagging strand. This synchronization requires a precisely timed series of enzymatic steps that control the synthesis of an RNA primer, the recycling of the lagging-strand DNA polymerase, and the production of an Okazaki fragment. Primases synthesize RNA primers at a rate that is orders of magnitude lower than the rate of DNA synthesis by the DNA polymerase at the replication fork. Recycling of lagging-strand DNA polymerase from a finished Okazaki fragment to a new primer is inherently slower than the rate of nucleotide polymerization.

Termination Termination is the last step in DNA replication. The termination sites are located between 23 and 29 min (TerA, TerD, and TerE), 33 and 36 min (TerB and TerC), and at 48 min (TerF); thus these sites are spread over a long distance (1 min is approximately equal to 50 kb. Termination site has been hypothesized to be Tshaped. Replication forks meeting at the top of the T are arrested (that is, the clockwise fork will pass through sites TerE, TerD, and TerA, but will stop at TerC, TerB, or TerF). A protein called Tus binds to the Ter sites, and this binding stops DnaB (helicase) action. Major protein and enzyme activities during prokaryotic DNA replication are shown in Figure 7.9.

Figure 7.9 Some of the enzyme and protein activities in DNA replication (Modified from http://www.Biology reference.com/Po-Re/Replication. html)

7.12

Essentials of Molecular Genetics

PLASMID REPLICATION A plasmid can replicate only within a host cell. But there is enormous variation with respect to enzymology and mechanism of replication. Plasmids depend heavily on host replication proteins. However, in E. coli, PolIII is a major replication protein. All plasmids examined to date replicate semi-conservatively but there are several repliFigure 7.10 Bidirectional replication stops cation patterns. Some plasmids replicate unidirectionally when growing forks collide while others replicate bidirectionally. Bidirectional replication stops when growing forks collide (Figure 7.10). In most carefully studied plasmids, replication occurs by butterfly or rabbit ears mode (Figure 7.11). It contains untwisted replicated portions in usually  replication and a supercoiled unreplicated portion. When circle is completed, one of the circles must be cleaved in order to separate the daughters. The result after one round of replication is one nicked molecule and one supercoiled molecule. The nicked molecule is sealed later and becomes supercoiled. Whether this is general mechanism of plasmid replication is not known. A nick converts a butterfly molecule to a  molecule. Plasmid replication occurs through rolling circle mechanism.

Figure 7.11 Plasmid replication occurs as explained by butterfly or rabbit’s ear model. A nick converts a butterfly molecule to a  molecule

DNA REPLICATION IN VIRUSES Whether or not viruses are technically alive, they certainly exhibit an important property of life – the ability to duplicate, albeit with the help of a host cell (Villarreal 2004). A semi-conservative replication was demonstrated for phage ι (Meselson and Weigle 1961) and phage T7 (Meselson 1960). Steps of DNA replication in viruses are presented in Figure 7.12.

DNA REPLICATION IN EUKARYOTIC NUCLEAR CHROMOSOMES Eukaryotes have Multiple Origins of DNA Replication Because of complexities of eukaryotic chromosomes rate of replication of eukaryotic chromosome is slow (2,600 nucleotides per minute, compared to 50,000 nucleotides per minute for E. coli). At a rate of 2,600 nucleotides per minute, a single origin of replication would require a little over 2 weeks to

Replication of Nucleic Acids

7.13

The viruses attach itself to its host cell  The virus or its genetic information penetrates the cell  The nucleic acid is uncoated which frees the DNA or RNA from its capsomeres or lipid envelop and permits the host cell to read out (express) the genetic functions of the virus  At this stage, only a portion of the viral genetic information is expressed, resulting in synthesis of only the subset of viral-encoded proteins collectively called early viral gene functions (proteins). Alternatively, some viruses that can duplicate themselves only in actively dividing host cells produce proteins that stimulate host-cell division  The viral nucleic acid is then synthesized to produce hundreds or thousands of copies of viral chromosome  At this time, a second subset of the viral genetic information, commonly termed the late proteins, is expressed. These are the structural proteins, including the capsomeres of the virus  The capsomeres are assembled to form a new shell around the nucleic acid of the virus  The mature virus having duplicated its new copies from the infected cell to attack a new cell and repeat this process Figure 7.12 Steps in DNA replication in viruses (Developed from text at http://library.thinkquest.org/26802/ infection.html) Table 7.3 Number of replicons, average length of replicon and rate of fork movement (Adapted from http:// molbiol4masters.masters.grkraj.org/html/ProkaryoticDNAReplication2-ReplicationOrigins.htm) Organism

No. of replicons

Excherichia coli Saccharaomyces cereviseae Drosophila melanogaster Musca domestica Xenopus laevis Mus musculus Homo sapiens

1 500 3,500 4,000 15,000 25,000 25,000-30,000

Average length of replicon (kbp) 4,200 40 40 45 200 150 200

Fork movement (bp/minute) 50,000 3,600 2,600 2,800 500 2,200 2,800-3,000

replicate one chromosome. Number of replicons is in thousands in a eukaryotic chromosome. It takes only 4 minutes for a Drosophila cell to replicate its DNA, indicating the presence of 6,000-7,000 origins of replication (replicons). Average length and rate of fork movement differ from organism to organism (Table 7.3). Different cell types have different cell division times. The number of active origins of replication governs the replication time. In more slowly growing cells, the number of origins of replication per chromosome is greatly decreased. DNA from Drosophila egg chambers undergoing chorion gene amplification was analyzed by Heck and Spradling (1990). At stage 10, 34 per cent DNA contained replication forms or bubbles. These forms were intermediate in the process of amplification. Multiple origins gave rise to these intermediates. Origin of replication is attached to nuclear cage and is very

Essentials of Molecular Genetics

7.14

specific. In Saccharamyces cerevisiae and Tetrahymena thermophila origins of replication have been localized to compact regions. But in higher eukaryotes initiation events may occur throughout extended regions rather than at clonable origin-specific sequences (Benbow et al. 1992). The mammalian origin region studied in most detail is the hydrofolate reductase (DHFR) amplicon in CHO cells. Vaughn et al. (1990) suggest that replication can initiate anywhere within a 28 kb downstream from DHFR 3′-end. Four modular elements were found in common clustered in each origin which when arrayed in proper context can lead to origins of DNA replication. Superficially chromosomal origins of DNA replication in higher eukaryotes would seem to be different from prokaryotes. These differences arise mainly due to large genome size, which requires a complex regulatory mechanism. Salient differences in prokaryotes and eukaryotes for DNA replication are given in Table 7.4. Table 7.4 Differences in prokaryotic and eukaryotic DNA replication DNA replication in prokaryotes Takes place in cytoplasm Goes on almost continuously Proceeds from single origin mostly in both the directions Both lagging and leading strands synthesized by DNA polymerase III RNA primers removed by DNA polymerase I. No PCNA protein involved No telomerase activity No nucleosomes formed Okazaki fragments 1,000-2,000 nucleotides long Replication fast; 50,000 nucleotides/minute One replicon

DNA replication in eukaryotes Takes place in nucleus. Occurs in S phase of the cycle Proceeds from many origins in both the directions Leading and logging strands synthesized by DNA polymerase I respectively RNA primers removed by DNA polymerase II PCNA protein associated with DNA polymerase. Telomerase completes the lagging strand in the telomere regions of chromosome DNA immediately after replications associated with histones to form nuclesomes Okazaki fragments less than 200 nucleotides long Replication slow; 2,600 nucleotides/minute As many as 500-60,000 replicons/ chromosome

Replicons in Eukaryotes In a linear double-stranded DNA molecule, replication starts internally. Once DNA synthesis starts, it moves in both the directions generating an eye-shaped intermediate which is converted to a Y-shaped molecule when left-hand replicating fork reaches the respective terminal. The daughter polynucleotide strands are synthesized almost as soon as the parental strands separate. RNA primers seem to be universally needed for initiating DNA replication (Watson 1976). RNA primers are ordinarily removed from the new DNA chain. Extreme 3′-ends of linear DNA are left incomplete (Figure 7.13). So 3′-ends have single-stranded ends. Because of the redundant terminal sequences the tails will have complementary sequence at the 3′-tail that can pair to form two-unit length concatemer (Figure 7.14). Gap filling is done by PolI and ligase. This can be replicated to form tetramer and so forth. Specific nucleases then act to produce staggered nicks. These nicks are filled by a DNA polymerase to yield double helices.

Eukaryotic Chromosome Replication is Semi-Conservative At chromosomal level, DNA replication was shown to be semi-conservative by Taylor et al. (1957). They labeled dividing root-tip cells of the broad bean Vicia faba with 3H-thymidine, which is speci-

Replication of Nucleic Acids

7.15

Figure 7.13 Removal of RNA primer

fically incorporated into newly synthesized DNA (Figure 7.15). It was found that a new chromosome was replicating semi-conservatively because it consisted of half radioactive newly formed chromosome and half parental chromosome. When the new chromosomes (hybrid) were allowed to grow and divide on unlabeled media, half of the new chromosomes were labeled and other half were unlabeled. Thus random labeling of chromosomes did not occur, and the mode of replication was consequently not dispersive. Semi-conservative replication of the eukaryotic chromosome is based essentially on an underlying structure that is a continuous length of DNA double helix. Examination of human, Chinese hamster and grasshopper chromosomes revealed the semi-conservative distribution of tritiumlabeled DNA (Taylor 1960a,b). With the help of the technique used by Meselson and Stahl (958) semi-conservative mode of DNA replication was also proved in higher plant Nicotiana tobaccum. A semi-conservative replication was also demonstrated for the DNA of alga Chlamydomonas (Sueoka 1960).

Figure 7.14 Gap filling with DNA polymerase I and DNA ligase

7.16

Essentials of Molecular Genetics

Figure 7.15 Experiment of Taylor and his associates on root tips of Vicia faba showing semi-conservative replication of eukaryotic chromosomes (Modified, with permission, from Taylor, J.H., et al. 1957. Proc. Natl. Acad. Sci. USA 43: 122-8)

Although the basic mechanism of DNA replication is same in prokaryotes and eukaryotes but due to complexity there lie differences. In ciliates, where trophic and genetic DNA is segregated into macronucleus and micronucleus, respectively, the S phase occurs at a different time in each nucleus. Mitochondria and chloroplast DNA have a different time of replication from that of nuclear DNA. Every sequence is replicated once. The mass of cell must be double, so that there is sufficient material for the formation of daughter cells. Nuclear membrane plays an important role in DNA replication only if the origin of DNA replication lies on the membrane otherwise it doesn‟t play any role. Before replication of DNA, cells enter G2 phase, during which they interpret flood signals. Any mistake at this level may lead to cancer. G2 phase prevents the newly duplicated chromosomes to replicate again before cell division.

Some Features of Eukaryotic DNA Replication Kinases initiate eukaryotic DNA replication. Cyclin-dependent kinases (CDKs) derive major cell cycle events including the initiation of chromosomal DNA replication. Zagerman and Diffley (2007) identified two S phase CDK (S-CDK) phosphorylation sites in the budding yeast Sld3 protein that, together, are essential for DNA replication. When phosphorylated, these sites bind to the aminoterminal BRCT repeats of Dpb11. An Sld3-Dpb11 fusion construct bypasses the recruitment for both Sld3 phosphorylation and the N-terminal BRCT repeats of Dpb11. Co-expression of this fusion with a phosphor-mimicking mutant is a second essential CDK substrate, Sld2, promotes DNA replication in the absence of S-CDK. Therefore, Sld2 and Sld3 are the minimal set of S-CDK targets required for DNA replication. DNA replication in cells lacking G1 phase CDK (G1-CDK) required expression of the Cdc7 kinase regulatory subunit, Dbl4, as well as Sld2 and Sld3 bypass. This explains how G1- and S-CDKs promote DNA replication in yeast.

Replication of Nucleic Acids

7.17

In eukaryotic cells, cyclin-dependent kinases (CDKs) have an important involvement at various points in cell cycle. At the onset of S phase, active CDK is essential for chromosomal DNA replication. In budding yeast, the replication protein Sld2 is essential CDK substrate, but its phosphormimetic form (Sld2-IID) alone neither affects cell growth nor promotes DNA replication in the absence of CDK activity, suggesting that other essential CDK substrates promote DNA replication. Tanaka et al. (2007) showed that protein products of both an allele of CDC45 (JET1) and high-copy DBP11, in combination with Sld2-IID, separately conferred CDK-independent DNA replication. Although Cdc45 is an essential CDK substrate, CDK-dependent phosphorylation of Sld3, which associates with Cdc45, is essential and generates binding site for Dpb11. Both the JET1 mutation and high-copy Dpb11 bypass the requirement for Sld3 phosphorylation in DNA replication. Because phosphorylated Sld2 binds to the carboxy-terminal pair of BRCT domain in Dpb11, it was proposed that Dpb11 connects phosphorylated Sld2 and Sld3 to facilitate interactions between replication proteins, such as Cdc45 and GINS. CDKs regulate interactions between BRCT-domain-containing replication proteins and other phosphorylated proteins for the initiation of chromosomal DNA replication. Similar regulation may take place in higher eukaryotes. Eukaryotic DNA replicates only once per cell cycle. Understanding of how cells prevent re-initiation of DNA replication within every cell cycle is very important. In several eukaryotes, cyclin-dependent kinases (CDKs) have been implicated in promoting the block to reinitiation. But the function of CDKs in this process is not understood. Nguyen et al. (2001) showed that B-type CDKs in Saccharomyces cerevisiae prevent reinitiation through multiple overlapping mechanisms, including phosphorylation of the origin recognition complex (ORC), downregulation of Cdc6 activity, and nuclear exclusion of the Mcm2-7 complex. Only when all three inhibitory pathways are disrupted do origins re-initiate DNA replication in G2/M cells. Each of the three independent mechanisms of regulation is functionally important. Initiation of eukaryotic DNA replication. For DNA replication to start, histones are displaced to one side. Hay and Russel (1989) suggest that initiation requires attachment of Tumor-antigen (T-ag) to helix for unwinding it and single-strand binding (SSB) proteins to stabilize single-stranded DNA, Pol primase to synthesize RNA primer and polymerization of deoxyribonucleotides during initiation and synthesis of Okazaki fragments and Pol-PONA for polymerization of nucleotides on leading strand. Removal of RNA primer. DePamphilis and Wassarman (1980) suggested that removal of RNA primer is done by RNA polymerase H with another unknown enzyme. Gap is filled by Pol and ligase. Completion of replicons occurs when two oncoming replicons merge. Okazaki fragments. Kriegstein and Hogness (1974) showed that discontinuous DNA synthesis in Drosophila proceeds in steps of 200 bp. This gives some indication of unit of packing of histone and DNA. DNA polymerase on the leading strand assembles the molecule in a 5′3′ direction but polymerase on the lagging strand must go „against the grain‟, assembling DNA from a series of RNAprimed fragments (Alberts 1990). End-joining reaction catalyzed by DNA ligases. The end-joining reaction catalyzed by DNA ligases is required by all organisms and serves as ultimate step of DNA replication, repair and recombination processes. One of three well-characterized mammalian DNA ligases, DNA ligase I, joins Okazaki fragments during DNA replication. Pascal et al. (2004) report the crystal structure of human DNA ligase I (residues 233 to 919) in complex with a nicked 5′-adenylated DNA intermediate. The structure shows that the enzyme redirects the path of the double helix to expose the nick termini for the strandjoining reaction. It also reveals a unique feature of mammalian ligases: a DNA-binding domain that allows Ligase I to encircle its DNA substrate, stabilizes the DNA in a distorted structure, and positions the catalytic core on the nick. Similarities in the toroidal shape and dimensions of DNA ligase I and the

Essentials of Molecular Genetics

7.18

proliferating cell nuclear antigen (PCNA) sliding clamp are suggestive of an extensive protein-protein interface that may coordinate the joining of the Okazaki fragments.

Eukaryotic DNA Polymerases There are five major DNA polymerases known in eukaryotes. Activities and cellular functions of different DNA polymerases and their role in DNA replication are given in Table 7.5. Table 7.5 Different eukaryotic DNA polymerases with respect to their activities and cellular functions (Reproduced from http://www.scribd.com/doc/23275282/Ch-12-DNA-Replication-and-Recombination)

β (alpha) α (beta) γ (gamma) δ (delta)

5′→3′ Polymerase Activity Yes Yes Yes Yes

3′→5′ Exonuclease Activity No No Yes Yes

ε (epsilon) ζ (zeta) ί (eta) θ (theta) θ (iota) κ (kappa) λ(lambda) μ (mu) σ (sigma)

Yes Yes Yes Yes Yes Yes Yes Yes Yes

Yes No No No No No No No No

DNA Polymerase

Cellular Function Initiation of nuclear DNA synthesis and DNA repair DNA repair and recombination of nuclear DNA Replication of mitochondrial DNA Leading- and lagging-strand synthesis of nuclear DNA, DNA repair, and translesion DNA synthesis Unknown; probably repair and replication of nuclear DNA Translesion DNA synthesis Translesion DNA synthesis DNA repair Translesion DNA synthesis Translesion DNA synthesis DNA repair DNA repair Nuclear DNA replication (possibly), DNA repair, and sisterchromatid cohesion

DNA polymerase . Its synonyms are RNA primase, DNA polymerase, Pol. It forms a complex with a small catalytic (PriS) and a large non-catalytic (PriL) subunit, with the Pri subunits acting as a primase (synthesizing an RNA primer), and then with Polα elongating that primer with DNA nucleotides. Molecular weight of Pol is >100 kDa. It copies activated double-stranded DNA. It has almost no ability to copy oligo-homopolymer AndT15. It is strongly inhibited by reagents that block sulfhydral groups. It is an acidic protein and is free of nuclease activity. Pol is involved in the replication of cellular DNA and some mammalian viruses, such as SV40 and polyoma. Pol is believed to carry out lagging strand replication. DNA polymerase β. Also known as Polβ, this enzyme is implicated in base excision repair and gapfilling synthesis. Molecular weight of Polβ is 100 kDa. It requires sulfhydral-containing compounds for maximal activity. Polγ is also seen in eukaryotic cells in resting and differentiated states but is usually localized in mitochondria, and it is believed to be involved exclusively in mitochondrial DNA replication. Polγ

Replication of Nucleic Acids

7.19

is the sole DNA polymerase in animal mitochondria. Polγ also has proofreading 3' → 5' exonuclease activity. Earlier, DNA repair and recombination were considered to be limited or absent in mitochondria. Both these processes have been demonstrated in mitochondria (Kaguni 2004). Mitochondrial replicase is also apparently responsible for the relevant DNA synthesis reactions in these processes. Polγ comprises a catalytic core in the heterodimeric complex with an accessory subunit. The subunit holoenzyme is an efficient and processive polymerase, which exhibits high fidelity in nucleotide selection and incorporation while proofreading errors with its intrinsic 3' → 5' exonuclease. DNA polymerase . DNA polymerase  (Pol) is highly processive and has proofreading 3' → 5' exonuclease activity. It is thought to be the main polymerase involved in lagging strand synthesis. Polδ has been purified from several species. S. cerevisiae Polδ contains three subunits, whereas fission yeast S. pombe Polδ has four or five subunits. The mammalian enzyme consists of at least four subunits with apparent molecular weights of 125, 68, 50 and 12 kDa. The message level and enzyme activity of Polδ are upregulated when quiescent cells are induced to proliferate. During the cell cycle, both the mRNA and protein levels of Polδ fluctuate moderately with a two to three fold increase at the G1/S phase. An E2F consensus binding-like element is found in the promoter of human Polδ gene, but there is no report of a role for E2F in the regulation of the Polδ gene. However, transcription factors Sp1 and Sp3 have well-characterized roles in the stimulation of Polδ. The expression of Polδ gene is downregulated by DNA damage and this repression is mediated by p53. Polδ is a phosphoprotein and is hyperphosphorylated in S phase. When co-expressed with different Cyclin-CDKs in insect cells, Polδ interacts with and becomes phosphorylated by cyclin D3/CDK4 and Cyclin E/CDK2. Late G1 and G1/S specific CDKs potentiate Polδ for S phase. DNA polymerase . DNA polymerase  (Pol) is also highly processive and has proofreading 3' → 5' exonuclease activity. It is related to Polδ and is thought to be the main polymerase involved in leading strand synthesis. It also has a role in gap filling of lagging strand. Pursell et al. (2007) constructed a derivative of yeast Polε which retains high replication activity but has strongly reduced replication fidelity. Yeast strains with this Polε allele have elevated rates of T to A substitution mutations. The position and rate of these substitutions depend on the orientation of the mutational reporter and its location relative to origins of DNA replication and reveal a pattern indicating that Polε participates in leading strand DNA replication. The budding yeast Polε consists of five polypeptides with molecular weights of 256, 79, 34, 30 and 29 kDa and a stoichiometry of 1:1:4:1:4. The Polε subunits are encoded by POL2 (256 kDa A subunit), DPB2 (79 kDa B subunit), DPB3 (34 kDa C subunit and its 30 kDa proteolytic product) and DPB4 (29 kDa D subunit) genes. The 256-kDa polypeptide may undergo a proteolytic cleavage and the major product, a 145 kDa polypeptide, without associated subunits, retains the catalytic properties of polymerase and exonuclease activities. POL2 and DPB2 subunits are essential for DNA replication and viability (Morrison et al. 1990; Araki et al. 1991a,b; Ohya et al. 2000). Human DNA polymerase η. Almost all DNA polymerases show a strong preference for incorporating the nucleotide that forms the correct Watson-Crick base pair with the template base. The catalytic efficiencies with which any given polymerase forms the four possible correct base pairs are roughly the same. Human DNA polymerase-η (hPolη) is an exception to these rules. hPolη incorporates the correct nucleotide opposite a template adenine with a several hundred to several thousand fold greater efficiency than it incorporates a correct nucleotide opposite a template thymine, whereas its efficiency for correct nucleotide incorporation opposite a template guanine or cytosine is intermediate between these two extremes. Nair et al. (2004) presented the crystal structure of hPolη bound to a template primer and an incoming nucleotide. The structure revealed a polymerase that is „specialized‟ for

7.20

Essentials of Molecular Genetics

Hoogsteen base-pairing, whereby the template base is driven to the syn conformation. Hoogsteen basepairing offers a basis for the varied efficiencies and fidelities of hPolη opposite different template bases, and it provides an elegant mechanism for promoting replication through minor-groove purine adducts that interfere with replication. DNA polymerases ε, η, θ, and Rev1. DNA polymerases Polε, Polη, Polθ, and Rev1are less characterized with respect to their structure and function. DNA polymerase δ. DNA polymerase δ (Polδ) is involved in the bypass of DNA damage. DNA polymerases ζ, ι, φ, σ, and κ. DNA polymerases Polζ, Polι, Polφ, Polζ, and Polκ are some other eukaryotic polymerases known, which are not well-characterized.

Eukaryotic Gyrase The presence of gyrase has not yet been clearly demonstrated in eukaryotes. However, DNA synthesis in CHO cells is blocked by norbiocin, an inhibitor of DNA gyrase. DNA gyrase is an enzyme which is used in replication of prokaryotic as well as eukaryotic DNA. Once unwinding of DNA strands proceeds from the point of origin onwards with the help of helicase (unwinding enzyme), supercoils are induced in the remainder of the molecule by unwinding process. It become necessary to compensate the induced supercoiling by unwinding the molecule in opposite direction. DNA gyrase identified by Gellert et al. (1981) seems to fulfill this function. Some heat labile and heat sensitive proteins are thought to extend the period of DNA synthesis and stimulate the joining of Okazaki fragments. In eukaryotes, gyrase relaxes negatively supercoiled DNA in the presence of ATP and Mg++. Type I enzyme can also catalyze catenation of double-stranded circular DNA, or catenanes can be separated into simple circles provided at least one circle contains a single-stranded break (Bollum and Peterson 1983) (Figure 7.16). Type II enzyme can also carry out catenation reaction; in this case, no single-stranded break is required. DNA topoisomerases are essential cellular enzymes that also function in the segregation of newly replicated chromosome pairs in chromosome condensation and in altering DNA superhelicity (Berger et al. 1996). These enzymes work by cleaving one molecule of DNA and passing a second molecule through the resulting opening before resealing the break. It has two sets of jaws at opposite ends. An enzyme with bound DNA can admit a second DNA duplex through one set of jaws, transport it through the cleaved first duplex, and expel it through the other set of jaws.

Eukaryotic Primases

Figure 7.16 Catenation by topoisomerase. Two circular DNAs can be catenated by type I isomerase only if one of the DNAs is nicked. This is not necessary when using a type II topoisomerase

Eukaryotic primases form a complex with Pol and its accessory β subunit. The small eukaryote primase subunit contains the active site for RNA synthesis, and its activity correlates with DNA replication during the cell cycle (Frick and Richardson 2001). DNA template is now exposed, but new DNA cannot be synthesized until a primer is constructed. How is this primer formed? An important clue comes from the observation that RNA synthesis is essential for the initiation of DNA synthesis. This finding, taken together with the fact that

Replication of Nucleic Acids

7.21

RNA polymerases can start chains de novo, suggested that RNA might prime the synthesis of DNA. Kornberg and Baker (1992) then found that nascent DNA is covalently linked to a short stretch of RNA.

Some other Proteins Specially Required in Eukaryotic DNA Replication Proliferating cell nuclear antigen. Inheritance requires genome duplication, reproduction of chromatin and its epigenetic information, mechanisms to ensure genome integrity, and faithful transmission of the information to progeny. Proliferating cell nuclear antigen (PCNA) – a cofactor of DNA polymerases that encircles DNA – orchestrates several of these functions by recruiting crucial players to the replication fork (Moldovan et al. 2007). PCNA is a trimeric subunit that forms a β subunit. Many factors that are involved in replication-linked processes interact with a particular face of PCNA and through the same interaction domain, indicating that these interactions do not occur simultaneously during replication. Replication factor A. Replication factor A (RFA) is a heterotrimeric single-strand binding (SSB) protein which is conserved in all eukaryotes. Since the availability of conditional mutants is an essential step to define functions and interactions of RFA in vivo, Longhese et al. (1994) have produced and characterized mutations in the RFA1 gene, encoding the p70 subunit of the complex in S. cerevisiae. This analysis provides the first in vivo evidence that RFA function is critical not only for DNA replication but also for efficient DNA repair and recombination. Moreover, genetic evidence indicated that p70 interacted with both the DNA Polα-primase complex and with Polδ. Replication factor C. Replication factor C (RFC) is structurally, functionally and evolutionary related to γ complex. RFC is a five-subunit protein complex that is required for DNA replication. The subunits of this heteropentamer are named Rfc1, Rfc2, Rfc3, Rfc4, and Rfc5 (in S. cerevisiae). RFC is used in eukaryotic replication as a clamp loader, similar to the γ Complex in E. coli. Its role as clamp loader involves catalyzing the loading of PCNA on to DNA. It binds to the 3′-end of the DNA and uses ATP hydrolysis to open the ring of PCNA so that it can encircle the DNA. Replisome progression complex. The identity of the DNA helicase(s) involved in eukaryotic DNA replication is still a matter of debate, but the minichromosome maintenance (MCM) proteins are the chief candidate (Aparicio et al. 2006). Six conserved MCM proteins, Mcm2–7, are essential for the initiation and elongation stages of DNA replication, contain ATP binding pockets and can form a hexameric structure resembling that of known prokaryotic and viral helicases. However, biochemical proof of their presumed function has remained elusive. Several recent reports confirm that the MCM complex is part of the cellular machine responsible for the unwinding of DNA during S phase. The helicase activity of Mcm2–7 is finally revealed when purified in association with two partners: initiation factor Cdc45 and a four-subunit complex called GINS. The Cdc45-MCM-GINS complex could constitute the core of a larger macromolecular structure that has been termed the replisome progression complex (RPC). Helix destabilizing (HD) proteins. Helix destabilizing (HD) proteins, also known as single-strand binding (SSB) proteins, prevent the reannealing of complementary DNA strands by binding specifically to single-stranded DNA and unwinding enzymes that are single-strand-dependent ATPases that catalyze unwinding of DNA duplex. These proteins have been isolated from Vaccinia virus, yeast, rodent calf and human cells. Mammalian HD proteins bind to single-stranded DNA while keeping the strand in extended condition and increasing the affinity of DNA Pol for the template. In Ad replication, there is a 72-kDa SSB protein which keeps natural single-stranded DNA in extended condition. This protein forms major portion of the Ad DNA replication complex. If the replication complexes are exposed at 38 °C, there is inactivation of 72-kDa protein. Perhaps HD and unwinding proteins act synergistically to prevent rewinding of DNA.

7.22

Essentials of Molecular Genetics

RNA Primer Synthesis and its Removal Since all DNA polymerases require a template and a 3′-OH nucleotide primer to begin DNA synthesis, de novo RNA synthesis was suggested as a mechanism for providing primers for initiation of DNA synthesis. RNA primers are oligoribonucleotides 8-12 residues long which contain a purine ribonuceoside-5′-phosphate at 5′-end and 1-3 deoxyribonucleotide at 3′-end. The removal of RNA primers from Okazaki fragments to facilitate the union of two small Okazaki fragments takes place in two steps. In first step, all except last ribonucleosides are removed by RNase H that acts endonucleolytically. In second step, the actual prN(p and N) 2 linkage is broken by DNA polymerase 5′  3′ exonuclease activity. At any stage of the cell, if nascent DNA is isolated, Okazaki fragments are spotted in three stages: (a) those with only a phosphodiester gap, i.e., requiring a ligase activity; (b) those requiring gap filling as well as a phosphodiester bond gap thus requiring DNA polymerase as well as ligase activity; and (c) those with intact RNA primers, which need DNA polymerase, RNase and DNA ligase activity. In addition, some cytoplasmic factors (proteins) are also required for gap filling.

Histone Synthesis takes Place along with DNA Replication Along with DNA replication, histones are also synthesized. Mode of distribution of new and old histones is conservative, i.e., older histones go with the older strand and the new histones go with the new strand of the DNA molecule. Proof for conservative mode of distribution of histones comes from the following experiment. When cells are treated with cyclohexamide, which is inhibitor of protein synthesis, and then DNA is observed under the electron microscope, stretch of DNA which has one arm with nucleosomes while the other arm is free of nucleosomes. Thus, in absence of new histones, only the older histones exist and they remain in one arm and the other arm is without histones. However, normal configuration is regained after a few minutes of removal of cyclohexamide.

Telomere Replication Chromosome end replication problem The ends of linear chromosomes present a problem for the replication machinery. Specifically, DNA polymerase cannot synthesize the extreme 5′-end of the lagging strand. Even if a RNA primer were paired with the 3′-end of the DNA template, it could not be replaced with DNA (recall that DNA polymerase operated only in the 5′  3′ direction, it can only extend an existing primer, and the primer must be bound to its complementary strand). The primary difficulty with telomeres is the replication of the lagging strand. Because DNA synthesis requires a RNA template (that provides the free 3'-OH group) to prime DNA replication, and this template is eventually degraded, a short single-stranded region would be left at the end of the chromosome. This region would be susceptible to enzymes that degrade single-stranded DNA. The result would be that the length of the chromosome would be shortened after each division. This would eventually lead to the loss of essential genetic information at the ends of the chromosomes. Modes of replication of DNA termini Three modes of replication of DNA termini are: (a) terminal hairpins, (b) recombination-mediated replication, and (c) telomerase-mediated replication. Drosophila chromosomes without telomeres are not particularly susceptible to degradation, end-to-end fusions, or loss. Problem with DNA replication at telomeres is that the 3′-end at telomere remains single-stranded after removal of primer. Telomere

Replication of Nucleic Acids

7.23

terminal transferase somehow adds (T 2G4)n sequences at the ends of Tetrahymena and (TTAGG)n in human telomeres at the 3′-end. To have gap compensation of the bases caused by removal of primer at 5′-end and the 3′-tail could either form a hairpin structure by non-Watson-Crick base pairing which serves as primer for the single-stranded 3′-tails of two telomeres can function as primer for each other. Nucleosomes segregate conservatively with old histones on leading strands and new histones on lagging strand. Telomerase Telomerase, an excellent candidate for a telomere-specific polymerase, is a ribonucleoprotein enzyme with essential RNA and protein components (Blackburn 1990, 1992). Telomerase extends old 3′-end of G-rich strand, providing a longer template for the new 5′-end. Telomerase binds to the ends of chromosome. Telomerase RNA repeats serve as templates for adding repeats at the end of telomere after it acquires new repeat ((Figure 7.17). The telomere translocates the newly synthesized end.

Figure 7.17 Replication of chromosome end (Modified from bioserv.fiu.edu/.../dna_replication.htm)

7.24

Essentials of Molecular Genetics

Additional round of telomere replication occurs. Mutating CAACCCCAA sequence in the RNA component of telomerase causes in vivo synthesis of new telomere sequences corresponding to the mutated RNA sequence, demonstrating that the telomerase contains the template for telomere synthesis (Yu et al. 1990). The mutations also lead to nuclear and cell division defects, and senescence, establishing an essential role for telomerase in vivo. DNA termini form Tetrahymena and Oxytricha, which bear C4A2 and C4A4 repeats, respectively, can support telomere formation in Saccharamyces cerevisiae by serving as substrates for the addition of yeast telomeric C 1-3A repeats (Wang and Zakian 1990). Results provide strong evidence for a novel recombination process involving a gene conversion event that require little homology at or near the boundary of telomeric and non-telomeric DNA and resembles the recombination process involved in bacteriophage T4 DNA replication. The action of the telomerase enzyme ensures that the ends of the lagging strands are replicated correctly. A well-studied system involves the Tetrahymena protozoa organism. The telomeres of this organism end in the sequence 5'-TTGGGG-3'. The telomerase adds a series of 5'-TTGGGG-3' repeats to the ends of the lagging strand. A hairpin occurs when unusual base pairs between guanine residues in the repeat form. Next the RNA primer is removed, and the 5'-end of the lagging strand can be used for DNA synthesis. Ligation occurs between the finished lagging strand and the hairpin. Finally, the hairpin is removed at the 5'-TTGGGG-3' repeat. In humans, telomere has a repeated sequence of 5'-TTAGGG-3'.

REPLICATION OF MITOCHONDRIAL DNA Mitochondria in animal cells contain a circular DNA. Mitochondrial DNA is replicated by a displacement loop, or D-loop, mechanism (Figure 7.18A-C). Origin of replication for one strand is at a different location from the origin for the second strand. About 100 minutes are needed for completion of mitochondrial DNA replication. A single D-loop is found as an opening of 500-600 bases in mammalian mitochondria. The short strand that maintains the D-loop is unstable and turns over: it is frequently degraded and resynthesized to maintain the opening of the duplex at this site. Some mitochondrial DNAs, such as Xenopus laevis, possess a single but longer D-loop. Others possess several D-loops; there may be as many as six in the linear mitochondrial DNA of Tetrahymena.

Figure 7.18 D-loop model for replication of mitochondrial DNA (Modifed, with permission, from Gupta, P.K. 2009. Genetics. Meerut: Rastogi Publi.)

Replication of mitochondrial DNA starts in the same way as replication of nuclear DNA: with the synthesis of a short RNA it is extended by a DNA polymerase. The leading strand (heavy strand or Hstrand) origin in mammalian mitochondrial DNA is provided by the promoter for transcription of the H-strand. RNA synthesizes starts at this promoter: if it continues around the template it generates a

Replication of Nucleic Acids

7.25

mitochondrial transcript; if it terminates in the region of the D-loop, the RNA is either degraded or used to sponsor DNA replication. The mtDNA replication is initiated by a large RNA primer synthesized by RNA polymerase, which is cleaved by the ribonucleoprotein endonuclease MRP. DNA synthesis is initiated by Polγ. To replicate mammalian mitochondria DNA the short strand in the Dloop is extended. The displaced region of the original complementary strand (light strand, L-strand) becomes longer, expanding the D-loop. This expansion continues until it reaches a point about twothirds of the way around the circle. Replication of this region exposes an origin in the displaced Lstrands. Synthesis of an H-strand is initiated at this site, proceeding around the displaced singlestranded L template in the opposite direction from L-strand synthesis. Because of the lag in its start, H-strand synthesis precedes only a third of the way around the circle when L-strand synthesis finishes. This releases one completed duplex circle one gaped circle, which remains partially single-stranded until synthesis of the H-strand is completed. Finally, the new strands are sealed to become covalently intact.

REPLICATION OF CHLOROPLAST DNA Chloroplasts contain multiple copies of a DNA molecule. One DNA molecule in one chloroplast is called plastome. Plastome encodes many of the gene products required to perform photosynthesis. The plastome is replicated by nuclear-encoded proteins and its copy number seems to be highly regulated by the cell in a tissue-specific and developmental manner (Heinhorst and Carnon 1993). A partially purified algal protein mixture which supports in vitro DNA replication consists of soluble proteins and proteins extracted from thylakoid membrane. The membrane extract is essential for the specific initiation of replication at a displacement loop (D-loop) site previously mapped by electron microscopy (Nie et al. 1987). D-loop site and its flanking sequences have been cloned and sequenced. Some proteins in the membrane extract bind strongly and specifically with a 494-bp restriction fragment which partially overlaps the D-loop site. Analyses of the protein-DNA complex identified three DNA-binding polypeptides with apparent molecular weights of 18, 24 and 26 kDa, respectively. Treatment with chloramphenicol, an inhibitor of chloroplast protein synthesis, for 1 h has no obvious effect on the contents of the 24- or 26-kDa polypeptide but significantly reduces the content of the 18-kDa polypeptide in the membrane extract. Chloroplast DNA replication in C. reinhardtii is initiated by the formation of a displacement loop (D-loop) at a specific site (Madeline et al. 1986). One D-loop site with its flanking sequence was cloned in recombinant plasmids SC3-1 and R-13. The sequence of the chloroplast DNA insert in SC31, which includes the 0.42-kb D-loop region, as well as 0.2 kb to the 5′-end and 0.43 kb to the 3′-end of the D-loop region, was determined. The sequence is A+T-rich and contains four large stem-loop structures. An open reading frame potentially coding for a polypeptide of 136 amino acids was detected in the D-loop region. One stem-loop structure and two back-to-back prokaryotic-type promoters were mapped within the open reading frame. The 5.5-kb EcoRI fragment cloned in R-13 contains the 1.05-kb SC3-1 insert and its flanking regions. A yeast autonomously replicating (ARS) sequence and an ARC sequence, which promotes autonomous replication in Chlamydomonas, have been mapped within the flanking regions (Vallet and Rochaix 1985). Both R-13 and SC3-1 were active as templates in a crude algal preparation that supports DNA synthesis. In this in vitro system, chloroplast DNA synthesis is initiated near the D-loop site. Chiang and Sueoka (1967) presented unequivocal evidence for a semi-conservative mode of chloroplast DNA replication and a difference in time of replication of chloroplast DNA and

7.26

Essentials of Molecular Genetics

chromosomal DNA during the vegetative growth cycle of C. reinhardi. These results showed for the first time that replication of chloroplast DNA is semi-conservative and regulated independently from that of chromosomal DNA. Regular replication of chloroplast DNA ensures its physical conservation and regular transmission to succeeding generations. The results also supported the idea that the chloroplast had a genome and self-duplicating mechanism of its own.

MODELS OF DNA REPLICATION There are five models of DNA replication – rolling circular model, loop rolling model, Butterfly model, D-loop model and fork model. These models are discussed briefly.

Rolling Circular Model Gilbert and Dressler (1968) proposed this model. Rolling circle is an alternative form of replication for circular DNA molecules in which an origin is nicked to provide a priming end. One strand of DNA is synthesized from this end, displacing the original partner strand, which is extruded as a tail (Figure 7.19). One round of replication, i.e., from origin to origin leads to formation of one concatemer. As the replication is not terminated, a number of copies of concatemers are formed. Each concatemer has an ability to give rise to a copy of bacteriophage and thus a number of bacteriophages are produced. The rolling circle model has been valid in a number of organisms, including the late stages of growth of phage lambda, transfer of the E. coli sex-factor, replication of particular DNA sections in Xenopus are specifically associated with production of ribosomal RNA, and replication of some single-stranded DNA phages such as X174.

Loop Rolling Model Rolling circles are used to replicate some phages. The A protein that nicks the X174 origin has the unusual property of cis-action. It acts only on the DNA from which it was synthesized. It remains attached to the displaced strand until an entire strand has been synthesized, and then nicks the origin again, releasing the displaced strand and starting another cycle of replication (Figure 7.20). Rolling circles also are involved in bacterial conjugation, when an F plasmid is transferred from a donor to a recipient cell, following the initiation of contact between the cells by means of the F-pilli. A free F plasmid infects new cells by this means; an integrated F factor creates an Hfr strain that may transfer chromosomal DNA. In case of conjugation, replication is used to synthesize complements to the single-strands remaining in the donor and to the single-strand transferred to the recipient, but does not provide the motive power.

Butterfly Model DNA replication via this model occurs in double-stranded plasmids, e.g., cryptic plasmids. The butterfly mode of replication slightly resembles that of theta. Topoisomerases nick one strand and it unwinds as in theta replication. However, in theta replication, the unreplicated portion of the plasmid DNA remains in its usual circular form, while in butterfly model, it is supercoiled. Therefore, following one round of replication, one of the strands is nicked and the other is supercoiled. The nicked molecule is then sealed by topoisomerase and also supercoiled.

Replication of Nucleic Acids

Figure 7.19 Rolling circular (cycle) replication model of circular double helix (Redrawn from http:// biosiva.50webs. org/rep3.htm)

7.27

Figure 7.20 Loop rolling model of DNA replication (Redrawn from http://biosiva.50webs.org/rep3. htm)

D-Loop Model The origins of replication in both prokaryotic and eukaryotic chromosomes are static structures: they comprise sequences of DNA that are recognized in duplex form and used to initiate replication at the appropriate time. Initiation requires separating the DNA strands and commencing bidirectional DNA synthesis. In some animal cells (vertebrates), D-loops have been observed in replicating mitochondrial DNA. (See Figure 7.18 for D-loop model of DNA replication.)

Fork Model The most extensively utilized mechanism of DNA replication involves formation of replication forks, moving in one direction in unidirectional replication and in both directions in bidirectional replication. This model can apply to both linear as well circular DNA molecules. Accordingly, fork model is of two types: linear fork model and circular fork model. Linear fork model is unidirectional, i.e., DNA replication proceeds in one direction with respect to origin. This gives rise to Y-fork shape structure, e.g., DNA replication in colE1, puc plasmids, pBR322, etc. Circular fork model has only been studied

7.28

Essentials of Molecular Genetics

in prokaryotes. This model is bidirectional, i.e., replication proceeds in both the directions with respect to origin this gives rise to  structure which rotates around its axis to give rise to 8-structure, which further differentiates to form two daughter strands (Figure 7.21).

DUPLICATION OF RNA The similarity in general structure between RNA and DNA, when first noted, also indicated a similarity in the duplication mechanism. Actual knowledge about RNA duplication was gained through work on some RNA viruses. The genetic RNA of viruses is self-replicating, i.e., it can reproduce its own replica by itself. So its mode of replication is called RNA-dependent RNA synthesis. The viral RNA functions directly as a messenger RNA, which in association with ribosomal apparatus of the host directs the synthesis of both the RNA polymerase (required for RNA replication) and the proteins of the viral coat. With the mediation of RNA polymerase and on the standard base-pairing principles, the viral RNA serves as a template in the synthesis of a complementary RNA chain, and thus a double-stranded structure is produced.

General Strategies in RNA Virus Replication

Figure 7.21 Model for replication of circular double helical DNA based on formation of double forks due to bidirectional DNA replication (Modified, with permission, from Gupta, P.K. 2009. Genetics. Meerut: Rastogi Publ.)

Replication of phage RNA has two unique features. Firstly, RNA serves as a template for the synthesis of an RNA strand. Secondly, there is direct generation of single-strands without the formation of duplex structures. Both, the template and the product, are single-stranded. The virion (genomic) RNA is double-stranded and so cannot function as mRNA; thus these viruses also need to package an RNA polymerase to make their mRNA after infection of the host cell. RNA viruses without a DNA phase Viruses that replicate via RNA intermediates need an RNA-dependent RNA-polymerase to replicate their RNA, but animal cells do not seem to possess a suitable enzyme. Therefore, this type of animal RNA virus needs to code for an RNA-dependent RNA polymerase. No viral proteins can be made until viral messenger RNA is available. Thus, the nature of the RNA in the virion affects the strategy of the virus.

Replication of Nucleic Acids

7.29

Viral RNA replicase Replication of phage RNA takes places by viral RNA replicase (RNA-dependent RNA polymerase). The enzyme consists of four subunits; three (α, γ, δ) of them are host polypeptides while one is a virusspecified polypeptide. The γ and the δ subunits are the protein synthesis elongation factors EF-Tu and EF-Ts, respectively. α subunit is a component of the 30S ribosomal subunit. For plus strand repli cation, a host-specific factor, HFI, is required. The replicases of the different phages show high specificity and recognize only the RNAs of phages of the same group. Their recognition site is a proper stearic arrangement of two CCC sequences. One of these is present at the 5' end of all phage RNAs, while the other is present at slightly different distances in the different groups of viruses. The secondary structure of the RNA fixes the relative positions of the two CCC sequences. RNA viruses with a DNA phase These are the retroviruses. In this case, their virion RNA, although plus-sense, does not function as mRNA immediately on infection since it is not released from the capsid into the cytoplasm. Instead, it serves as a template for reverse transcriptase and is copied into DNA. Reverse transcriptase is not available in the cell, and so these viruses need to code for this enzyme and package it in virions.

POLYMERASE CHAIN REACTION In 1983, Kary B. Mullis, a biochemist working for the Cetus Corporation, devised the technique which is known as the polymerase chain reaction (PCR) (Saiki et al. 1985). He was awarded Nobel Prize in 1993 for discovery of PCR (Mullis 1993). Polymerase chain reaction is also known as “molecular photocopying”. Technique of PCR is based on our knowledge gained over a long period of time of in vitro system of DNA replication. The only requirement is that the sequence of nucleotides on either side of the sequence of interest should be known. Different steps of PCR are shown in Figure 7.22. Since the sequence of nucleotides on either side of the DNA sequence to be amplified is known, primers are constructed on either side of the sequence of interest. Once that is done, the sequence

Figure 7.22 Different steps of polymerase chain reaction (PCR) (Redrawn from http://academic.brooklyn.cuny. edu/biology/bio4fv/page/genetic-engin/pcr.html)

7.30

Essentials of Molecular Genetics

between the primers can be amplified. The primers and the ingredients for DNA replication are added to the sample. Then, the mixture is heated to denature the DNA (e.g. 95 °C for 20 seconds). The temperature is then lowered so that primers can anneal to their complementary sequences (e.g., 55 °C for 20 seconds). The temperature is then raised for DNA replication elongation/extension (e.g., 72 °C for twenty seconds). Then, a new cycle of replication is initiated. The various stages in the cycle are controlled by changes in temperature since the temperatures for denaturation, primer annealing, and DNA replication are different. The three steps – denaturation, annealing and elongation/extension – are carried out in an automated machine called thermocycler in which temperature is controlled automatically. In about 20 cycles, a million copies of the DNA are made. In 30 cycles, a billion copies are made. The technique is aided by using DNA polymerase from a hotsprings bacterium, Thermus aquaticus, known as Taq polymerase, which can withstand the denaturing temperatures (Saiki et al. 1988). Thus, after each cycle of replication, new components do not have to be added to the reaction mixture. PCR has application in forensic. In its most discriminating form, genetic fingerprinting can uniquely discriminate any one person from the entire population of the world. Minute samples of DNA can be isolated from a crime scene, and compared to that from suspects, or from a DNA database of earlier evidence or convicts. Simpler versions of these tests are often used to rapidly rule out suspects during a criminal investigation. A common application of PCR is the study of patterns of gene expression. Tissues (or even individual cells) can be analyzed at different stages to see which genes have become active, or which have been switched off. Standard PCR suffers from a few drawbacks. The sequence of the 5′-end of each strand of target DNA should be known. It works towards the target. Contamination occurs in PCR due to some reason which may create problem both in research as well as in diagnostic applications. There are many variants of polymerase chain reaction.

REFERENCES Alberts, B.M. 1990. Recipes for replication. Nature 346: 514-5. Alberts, B.M., and R. Sternglanz. 1977. Recent excitement in the DNA replication problem. Nature 269: 655-61. Allfrey, V.G., and A.E. Mirsky. 1962. Vidence for the complete DNA-dependence of RNA synthesis in isolated thymus nuclei. Proc. Natl. Acad. Sci. USA 48: 1590-6. Aparicio, T., A. Ibarra, and J. Méndez. 2006. Cdc45-MCM-GINS, a new power player for DNA replication. Cell Div. 1: 1-18. Arai, K., and A. Kornberg. 1981. Unique primed start of phage X174 DNA replication and mobility of primosome in the direction opposite to chain synthesis. Proc. Natl. Acad. Sci. USA 78: 69-73. Araki, H., R.K. Hamatake, A. Morrison, A.L. Johnson, L.H. Johnston, and A. Sugino. 1991b. Cloning DPB3, the gene encoding the third subunit of DNA polymerase II of Saccharomyces cerevisiae. Nucl. Acids Res. 19: 4867-72. Araki, H., R.K., Hamatake, L.H. Johnston, and A.Sugino. 1991a. DPB2, the gene encoding DNA polymerase II subunit B, is required for chromosome replication in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA 88: 4601-5. Baker, T.A., and S.H. Wickner. 1992. Genetics and enzymology of DNA replication in Escherichia coli. Annu. Rev. Genet. 26: 447-77. Benbow, R.M., J. Zhao, and D.D. Larson. 1992. On the nature of origins of DNA replication in eukaryotes. BioEssays 14(10): 661-70. Berger, J.M., S.J. Gamblin, S.C. Harrison, and J.C. Wang, 1996. Structure and mechanism of DNA polymerase II. Nature 379: 225-32. Blackburn, E.H. 1990. Telomere structure and synthesis. J. Biol. Chem. 265: 5919-21. Blackburn, E.H. 1992. Telomerases. Annu. Rev. Biochem. 61: 113-29.

Replication of Nucleic Acids

7.31

Bollum F.J., and R.C. Peterson. 1983. DNA metabolism. Chapter 20. In: Zubay, G. (ed.). Biochemistry. California: Addison-Wesley Publishing Company. Biochemistry. pp. 737-85. California: Addison-Wesley Publishing Company. Chiang, K.S., and N. Sueoka. 1967. Replication of chloroplast DNA in Chlamydomonas reinhardi during vegetative cell cycle: its mode and regulation. Proc. Nalt. Acad. Sci. USA 57: 1506-13. Delbruck, M., and G.S. Stent, 1957. On the mechanism of DNA replication. In: The Chemical Basis of Heredity. p. 699-736. Eds. McElory, W.D., and B. Glass. Baltimore: John Hopkins Press. DePamphilis, M.L,. and P.M. Wasserman. 1980. Replication of eukaryotic chromosomes: a close-up of the replication fork. Annu. Rev. Biochem. 49: 627-66. Frick, N.D., and Richardson, C.C. 2001. DNA Primases. Annu. Rev. Biochem. 70: 39-80. Gellert, M., L.M. Fisher, H. Ohmori, M.H. O‟Dea, and K. Mizuuchi. 1981. DNA gyrase: site-specific interactions and transient double-strand breakage of DNA. Cold Sp. Harbor Symp. Quant. Biol. 45: 391-8. Gilbert, W., and D. Dressler, 1968. DNA replication: the rolling circle model. Cold Sp. Harb. Symp. Quant. Biol. 33: 473-84. Hay, R.T., and W.C. Russel. 1982. Recognition mechanisms in the synhesis of animal virus DNA replication. Biochem. J. 258: 3-16. Heck, M.M.S., and Spradling, A.C. 1990. Multiple replication origins are used during Drosophila chorion gene amplification. J. Cell Biol. 110: 903-14. Heinhorst, S., and G. Carnon. 1993. DNA replication in chloroplasts. J. Cell Sci. 104:1-9. Jacob, F., S. Brenner, and F. Cuzin. 1963. On the regulation of DNA replication in bacteria. Cold Sp. Harb. Symp. Quant. Biol. 28: 329-48. Johnson, D.S., L. Bai, B.Y. Smith, S.S. Patel, and M.D. Wang 2007. Single-molecule studies reveal dynamics of DNA unwinding by the ring-shaped T7 helicase. Cell 129: 1299-309. Kornberg, A., and D. Baker. 1992. DNA Replication. University Science Books. Koster, D.A., V. Croquette, C. Dekker, S. Shuman, and N.H. Dekker, 2005. Friction and torque govern the relaxation of DNA supercoils by eukaryotic topoisomerase IB. Nature 434: 671-4. Kriegstein, H.J., and D.S. Hogness. 1974. Mechanism of DNA replication in Drosophila melanogaster: structure or replication forks and evidence for bidirectionality. Proc. Natl. Acad. Sci. USA 71: 135-9. Levinthal C. 1956. The mechanism of DNA replication and genetic recombination in phage. Proc. Natl. Acad. Sci. USA 42: 394-404. Longhese, M.P., P. Plevani, and G. Lucchini. 1994. Replication factor A is required in vivo for DNA replication, repair, and recombination. Mol. Cell Biol. 14: 7884-90. Madeline W., J. K. Lou, D. Y. Chang, C. H. Chang, and Z. Q. Nie. 1986. Structure and function of a chloroplast DNA replication origin of Chlamydomonas reinhardtii. Proc. Natl. Acad. Sci. USA 83: 6761-5. McKenzie, G.J., P.L. Lee, M.J. Lombardo, P.J. Hastings, and S.M. Rosenberg. 2001. SOS mutator DNA polymerase IV functions in adaptive mutation and not adaptive amplification. Mol. Cell. 7: 571-9. Meselson, M.S. 1960. The Cell Nucleus. Mitchell, J.S. ed. pp. 240-245. New York: Academic Press. Meselson, M.S., and F.W. Stahl. 1958. The replication of DNA in Escherichia coli. Proc. Natl. Acad. Sci. USA 44: 671-82. Meselson, M.S., and J.J. Weigle. 1961. Chromosome breakage accompanying genetic recombination in bacteriophage. Proc. Nat. Acad. Sci. USA 47: 857-68. Moldovan, G.-L., B. Pfander, and S. Jentsch. 2007. PCNA, the maestero of the replication fork. Cell 129: 665-79. Morrison, A., H. Araki, A.B. Clark, R.K. Hamatake, and A. Sugino. 1990. A third essential DNA polymerase in S. cerevisiae. Cell, 62: 1143-51. Mullis, K. 1993. The Polymerase Chain Reaction. Nobel Lecture. Nair, D.T., R.E. Johnson, S. Parkash, L. Parkash, and A.K. Aggarwal. 2004. Replication by human DNA polymerase-i occurs by Hoogstein base-pairing. Nature 430: 377-80. Nguyen, V.Q., C. Co, and J.J. Li. 2001. Cyclin-dependent kinases prevent DNA re-replication through multiple mechanisms. Nature 411: 1068-72. Nie, Z.Q., D.Y. Chang, and M. Wu, 1987. Protein-DNA interaction within one cloned chloroplast DNA replication origin of Chlamydomonas. Mol. Gen. Genet. 209: 265-9. Ohya, T., S. Maki, Y. Kawasaki, and A. Sugino. 2000. Structure and function of the fourth subunit (Dpb4p) of DNA polymerase epsilon in Saccharomyces cerevisiae. Nucl. Acids Res. 28: 3846-52.

7.32

Essentials of Molecular Genetics

Okazaki, R., T. Okazaki, K. Sakabe, K. Sugimoto, and A. Sugino. 1968. Mechanism of DNA chain growth. I. Possible discontinuity and unusual secondary structure of newly synthesized chains. Proc. Natl. Acad. Sci. USA 59: 598-605. Pascal, J.M., Brien, P.J. Tomkinson, A.E. and T. Ellenberger. 2004. Human DNA ligase I completely encircles and partially unwinds nicked DNA. Nature 432: 473-8. Prescott, D.M., and P. L. Kuempel, 1973. Autoradiography of individual DNA molecules, p. 147-56. In: Methods in Cell Biology. Diacumakos, E.G. Ed. New York: Academic Press Inc. Prescott, D.M., and P.L. Kuempel. 1972. Bidirectional replication of the chromosome in Escherichia coli. Proc Natl Acad. Sci. USA 69: 2842-5. Pursell, Z.F., I. Isoz, E.-B. Lundstrom, E. Johansson, and T.A. Kunkel. 2007. Yeast DNA polymerase ε participates in leading-strand DNA replication. Science 317: 127-30. Saiki, R., D. Gelfand, S.Stoffel, et al. 1988. Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487-91. Saiki, R., S. Scharf, F. Faloona, K. Mullis, et al. 1985. Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anema. Science 230: 1350-4. Sueoka, N. 1960. Mitotic replication of deoxyribonucleic acid in Chlamydomonas reinhardi. Proc. Natl. Acad. Sci. USA 46: 83-91. Tanaka, S., T. Umemori, K. Hirai, S. Muramatsu, and Y. Kamimura. 2007. CDK-dependent phosphorylation of Sld2 and Sld3 initiates replication in budding yeast. Nature 445: 328-32. Taylor, J.H., P.S. Woods, and W.L. Hughes, 1957. The organization and duplication of chromosomes as revealed by autoradiographic studies using tritium-labeled thymidine. Proc. Natl. Acad. Sci. USA 43: 122-8. Vallet, J.M., and J.D. Rochaix, 1985. Chloroplast origins of DNA replication are distinct from chloroplast ARS sequences in two green algae. Curr. Genet. 9: 321-4. Wang, S.-S., and V.A. Zakian. 1990. Telomere-telomere recombination provides an express pathway for telomere acquisition. Nature 345: 456-58. Watson, J.D. 1976. Molecular Biology of the Gene. Menlo Park, CA: Benjamin. Watson, J.D., and F.H.C. Crick. 1953. A structure for deoxyribose nucleic acids. Nature 171: 737-8. Yu, G.-L., J.D. Bradely, L.D. Attardi, and E.H. Bluckburn. 1990. In vitro alternation of telomere sequences and sequence caused by mutated Tetrahymena telomerase RNAs. Nature 344: 126-32. Zagerman, P., and F.X. Diffley. 2007. Phosphorylation of Sld2 and sld3 by cyclin-dependent kinases promotes DNA replication in budding yeast. Nature 445: 281-5.

PROBLEMS 1. 2.

Why is replication of genetic material necessary for the cell? What would have been the consequence on evolution of life if there would have been no DNA replication evolved? 3. DNA replication machinery is more complex in eukaryotes compared to that in prokaryotes. Comment. 4. What is the advantage to organisms in having semi-conservative mode of DNA replication? 5. Is DNA replication involved in various biological functions of DNA such as genetic recombination, mutation and repair? 6. What strategy do eukaryotes adopt to replicate a large amount of its DNA in less time? 7. Why are we interested to understand plasmid replication? Do plasmids have some practical utility? 8. DNA cannot replicate without help from RNA. Comment. 9. Without telomerase how perfectly would have the eukaryotic chromosomes replicated? 10. Why different types of viruses have different strategies for replication of their genetic material?

8 Genetic Recombination At molecular level, genetic recombination, also known as DNA recombination, may be defined as the physical exchange of DNA sequences between chromatids of the homologous chromosomes. This exchange results in a duplex deoxyribonucleic acid (DNA) in which two regions of opposite parental origins are connected by a stretch of hybrid (heteroduplex) DNA in which one strand is derived from each parent. Recombination is also the separation of linked genes and formation of new gene combinations. The exchange of DNA segments occurs both in prokaryotes (parasexual) and eukaryotes (mitosis and meiosis). One of the important consequences of meiosis is genetic recombination. Various bacteriophages, bacteria, lower eukaryotes and higher eukaryotes have been used as model system to understand mechanism of recombination. Now RNA recombination has been observed in RNA viruses and is included in genetic recombination.

DNA RECOMBINATION IN BACTERIOPHAGES Genetic recombination was found to occur between different strains of bacteriophages by Delbruck and Bailey (1946) and Hershey (1946). Viral recombination is not confined to simple pairing between two homologous chromosomes and to only one set of recombinational events in the "zygotes". Since many copies of viral chromosomes are present in any one cell, genetic exchange may occur between members of numerous populations of chromosomes. Viral chromosome multiplication is geometric or clonal, i.e., one viral chromosomes replicates to produce two progeny, each of which replicates to produce two, and so on. Same viral chromosomes may pair and exchange genetic material more than once and may possibly mate with more than one chromosome at a time. Virus must exchange genetic material in a host cytoplasm. Thus, viral recombination is a population phenomenon whose events would have to be described statistically. Streisinger and Bruce (1960) demonstrated that three presumed linkage groups within each species actually form single linkage maps. The method used by these workers utilized selection of particular class of mutants. By use of this method the maps of T2 and T4 bacteriophages were shown to be connected end-to-end in the form of a circle.

DNA RECOMBINATION IN BACTERIA In bacteria recombination occurs through parasexual means. The different forms of parasexual recombination in bacteria are transformation, transduction and conjugation. The first evidence of genetic recombination or exchange of hereditary material in bacteria was noted by Griffith in the transformation of harmless pneumococci into virulent ones. Avery et al. (1944) demonstrated the transforming agent as pure DNA. Transformation seems to arise from some form of recombination

8.2

Essentials of Molecular Genetics

mechanism which produces gene exchange similar to that produced by sexual reproduction. Some complementary base pairing between single-stranded DNA sections appears necessary in transformation. A transforming molecule carrying gene A may also carry gene B. If these genes are closely linked, there is a good likelihood that transformation at the locus A produced transformation at the locus B (double transformation). If genes A and B are not linked within one transforming molecule, double transformation is caused by two independent events.

The F Factor The transmission of F factor in bacteria seemed to be independent of the transmission of chromosomal genes. Hfr strain has proven to be very useful for mapping genes of bacterial chromosome. Among recombinants of a cross Hfr (A+ B+ C+ D+....) × (A– B– C– D–....), different wild-type genes appeared in different frequency. A+, for example appeared most frequently, D+ less frequently, J+ still less frequently and Z+ showed itself least frequent of all. If we consider genes A to Z in one linkage group, it appears as though the Hfr donor gene A consistently entered into the recipient first and, therefore, had the greatest opportunity for recombination. Genes B to Z sequentially entered in that order and their recombination were determined in order of entry. In some cases, F– strain becomes Hfr because of entry of Hfr in the recipient cell. The Hfr factor appeared to be directly connected to the terminal end of the chromosome. We don't have similar situation in all the cases. Hfr can break the bacterial chromosome and orient its transfer at different points. This means in one cross one gene (say A+) can show the highest frequency while in another cross some other gene (D+, H+ or K+) can show the highest frequency. In all these cases if both ends of the chromosome are connected to form a circle, the sequence is identical for each strain. This shows that bacterial chromosome is circular and Hfr can break this chromosome at different points. F factor is composed of double-stranded DNA having about 105 base pairs. The bacterial chromosome is 40 times longer than the F factor. In F + stain, F factor is circular and one copy is present in each bacterial cells. This factor replicates midway through each bacterial chromosome replication cycle, thereby maintaining its frequency. Effective contact of different types of cells is followed by cytoplasmic connection. F factor may remain outside the bacterial chromosome as an independent cytoplasmic inclusion, or it recombines with a section of bacterial chromosome so that chromosome now has a directional orientation. In first case, only F factor will enter the recipient cell and F– becomes F+ which may now act as donor. In the second case, the chromosome itself enters the recipient cell. When F+ is integrated into the bacterial chromosome it becomes Hfr strain. Occasionally, Hfr becomes F+. F factor acts as a replicating unit that begins replication of DNA upon sexual contact and transfers only one of the replication donor DNA strands to the recipient, which enters the recipient at 5' end. This strand then acquires a complementary strand through DNA synthesis in the recipient before undergoing recombination with the recipient chromosome. The length of the donor chromosome transferred to the bacterial recipient cell depends upon the time for which conjugating cells keep in contact with each other.

Transduction Generalized transduction J. Lederberg and his graduate student Norton Zinder showed in 1952 that bacteriophages could transfer genetic information between bacteria in Salmonella (Zinder and Lederberg 1952). This process, called transduction, explained how bacteria of different species could gain resistance to the same antibiotic very quickly. J. Lederberg was awarded Nobel Prize in 1958 for discovering transduction and conju-

Genetic Recombination

8.3

gation in bacteria. This is a most widely used bacterial mapping approach. It is an effective way to map closely linked bacterial genes as well as those that are more widely separated. Generalized transduction occurs when a bacteriophage picks up a random piece of host DNA during the lytic phase in one bacterium and carries this DNA into a second bacterium. Entrant and host DNAs have homology. The genetic exchange (recombination) occurs. Sexduction

+

F + F

Integration of F element Hfr

F-duction and F-mediated sexduction are Hfr other terms for sexduction. F-element of E. Excision of F element coli is capable of independent existence and it can integrate into the bacterial chromosome. Therefore, F factor qualifies as an episome. When Felement is present in bacterial cell independent of chromosome, the cell is termed as F+ and when Felement is integrated into the bacterial chromosome, the cell is termed as Hfr. F-element as a plasmid is diagrammatically represented above. Error may occur during excision and genes from bacterial chromosome become included in the Felement which after excision is called F'-element. Thus, F-element when carries a few bacterial chromosomal genes becomes F'-element. When F'-element is infectiously transmitted to an F– cell of differing genetic constitution, the recipient cell and its descendants become partially diploid, also known as merozygote or merodiploid, for the bacterial genes introduced by F'-element. Bacterial genes in F'-element may frequently take part in exchange with the recipient's chromosomal genes to produce true (reciprocal) recombinants. Various F' elements that arrive independently within a single Hfr strain can be characterized and frequencies with which different genes are transmitted together can be calculated. These frequencies are similar to co-transduction frequencies and are used in constructing genetic maps in the same way that co-transduction frequencies are used.

Conjugation Conjugation involves cell-to-cell contact and formation of conjugation tube to transfer DNA from one bacterial cell to another. Davis (1950) from his U-tube experiment concluded that a cell-to-cell contact is necessary for transfer of DNA from one bacterial cell to the other. Later it was Hayes (1952) who studied that this transfer of DNA is always from F+ to F– cell. The bacterial cell, which contains a fertility factor F, is termed as F+ and the cell without fertility factor is termed F–. Transfer of bacterial DNA from F+ strain to F– strain is shown in Figure 8.1(A) while transfer of DNA from Hfr to F– strains is shown in Figure 8.1(B). In both cases the result is that the fertility factor F is transferred into the recipient bacteria, rendering it as F+ strain. The recovery of recombinants is quite high in case of transfers involving Hfr strains. The frequency of recombinants resulting from F + transfers is 1 in 1,000,000, while recovery of recombinants involving transfers from Hfr strains is of the order of 1 in 1,000. Interrupted mating technique is used as conjugation can be interrupted by violent agitation in a Waring Blender. Thus, length of donor chromosome that entered the recipient cell is controlled. A 5to 8-minute interval after conjugation is necessary for chromosome transfer to begin. The relationship between genes and their position on chromosome could be mapped in terms of time unit, in which one time unit is equal to 1 minute. Bacterial chromosome can be mapped as a complete length of 90 time units (Bachmann et al. 1976) (Figure 8.2). This method is suitable for genes that are located three or more time units apart. For detailed linkage map of E. coli K-12 and definition of genes, reader is referred to Bachmann (1990) and Berlyn (1998). For detecting linkage order between genes separated by distances smaller than 3 time units, recombination mapping can be used. It has been found that 20 per cent recombination = 1 time unit. This method is suitable for mapping only those genes that are separated by less than 3 time units. Why?

Essentials of Molecular Genetics

8.4

+





Figure 8.1 Conjugation between (A) F and F cells and (B) Hfr and F cells in bacteria (Reprinted, with permission, from Thind, B.S. 2012. Procaryotic Pathogens and Plant Diseases. Jodhpur: Scientific Publ.)

DNA RECOMBINATION IN EUKARYOTES DNA recombination in eukaryotes includes both mitotic and meiotic recombination. Mitotic recombination occurs in somatic cells while meiotic recombination takes place in germ cells. Both mitotic and meiotic recombination involves exchange of segments. Function of mitotic recombination is repairing damaged chromosomes and stabilization of genome while meiotic recombination leads to creation of new genes and new linkages. So, meiotic recombination is more important in sexually

Genetic Recombination

8.5

Figure 8.2 Selected loci on map of E. coli. Units on the map are in minutes. Arrows within the circle refer to Hfr strain transfer starting points, with directions. The two thin regions are the only areas not covered by P1 transduction phages

reproducing organisms. The meiotic prophase-I is sub-divided into leptotene (where chromosomes are threadlike structures), zygotene (formation of synaptonemal complex, also known as synaptinemal complex, takes place and synapsis occurs), pachytene (exchange of segments, i.e., crossing-over between non-sister chromatids of two homologous chromosomes), diplotene (sites of exchanges appear as chiasma) and diakinesis (terminalization of chiasma). So in the light of pairing of homologous chromosomes and DNA recombination, zygotene and pachytene stages are more important. Studies at zygotene stage of prophase-I especially in plants have revealed that a stage called bouquet stage is present. At this stage, clustering of telomeres to a small region of nuclear envelope gives a bouquet like appearance. Bouquet formation is an active process in plants and this clustering of telomeres precedes the alignment of non-telomeric loci or it is one of the first steps in pairing process.

Synaptonemal Complex The axial elements holding the two sister chromatids of a homolog are responsible for initiating the synthesis of lateral, transverse and central elements after they get stimulus from DNA break. First the lateral elements are developed along the lateral surface of both the chromosomes (Westergaard and VonWettestein 1970)), which later gives rise to central and transverse elements. The relative orientation of these three elements is given in Figure 8.3.

8.6

Essentials of Molecular Genetics

Figure 8.3 Synaptonemal complex in Neotellia (Redrawn from picture in Westergaard, M., and D. vonWettstein (1970) C. R. Trav. Lab. Carlsberg. 37(11): 239-68)

Recombination Nodules Recombination nodules are dense cytologically detectable structures distributed along the chromosome in meiotic prophase. They are of two types – early recombination nodules, which appear during zygotene coinciding with synapsis, and late recombination nodules, which occur during pachytene when synaptonemal complex formation is complete. The early and late recombination nodules differ in their morphology. Carpenter (1979) suggested that recombination nodules perform an active role in the recombination process. Various studies have found that recombination nodules may lead to gene conversion in absence of exchanges (Zickler and Kleckner 1999). The recombination nodules with two homologous chromosomes at synapsis have been shown in Figure 8.4. Figure 8.4 The orientation of recombination nodules with two

Chiasma (plural Chiasmata)

homologous chromosomes at synapsis (Redrawn from http:// accessscience.com/content/ Crossing-over-(genetics)/168850)

Chiasmata are the cytological manifestation of crossing-over. Chiasmata were originally identified by Janssens (1909) as cytologically observable “crosses or nodes between the arms of the chromosome pairs during late prophase-I”. He thought that the chiasmata were formed at the sites of genetic exchange, this idea was later developed into chiasma type theory. Morgan and Catell (1912) coined the term crossing-over. Crossing-over is a reciprocal exchange of genetic material of homologous chromosomes leading to recombination of the linked genes. Tease and Jones (1978) studied the correlation between chiasma frequency and genetic exchange. These exchanges were found to enhance the fidelity of meiotic chromosome segregation. When heat shocks were given during early and late phase of prophase I, it was found that heat shocks during early prophase I lead to reduction in the recombination frequency while late prophase I heat shocks had not much influence on recombination frequency. Such studies provided the evidence that exchanges occur earlier and chiasma appear later at the site of exchange (Holger and Barbara 1996). First evidence that crossover causes recombination was provided by Stern (1931) in Drosophila and Creighton and McClintock (1931) in maize. The study of Stern (1931) was on X-chromosome of Drosophila, where he studied Bar and carnation eye mutants while that of Creighton and McClintock (1931) was on chromosome 9 of maize, where they studied kernel size and kernel texture. These studies provided the first evidence that exchanges occurred at fourstrand stage of chromosomes.

Genetic Recombination

8.7

Chromosome Breakage during Crossing-Over At leptotene stage of prophase I chromosomes are thread like structures. As meiosis proceeds, the chromosomes are condensed. So after entering the zygotene stage chromatin fibers are encountered by torsional stress which can be one possibility of their breakage. Another possibility of breakage of the chromosomes (to allow exchange) during prophase I is the action of enzymes (endonucleases) on DNA. This possibility of enzymatic breakage of chromosome is well accepted.

Synapsis and Crossing-Over Stern and Hotta (1974) studied biochemical events associated with recombination in prokaryotes. They detected two important components of meiosis – the zygotene DNA (Z-DNA) and pachytene DNA (PDNA). The Z-DNA is associated with synapsis while P-DNA is concerned exclusively with repair replication and thereby leads to crossing-over. Is synapsis essential for recombination? There are two views – traditional view and recent view. According to the traditional view, synapsis is essential for recombination. Reasons for this belief were (1) In achiasmate organisms like male Drosophila, crossing-over is absent. (2) Mutants defective in synapsis are also defective in crossing-over. (3) Mean synaptonemal complex (SC) at pachytene and mean chiasma frequency at diplotene meiosis I are correlated. According to the recent view, synapsis is not required for recombination. Reasons for saying so are: (1) Mutants defective in double-stranded break repair (DSBR) display defects in synapsis. (2) Some mutants allow normal levels of meiotic recombination (red1, mer1, hop1, Zip1), but do not form SC. (3) Occurrence of ectopic recombination events supports the notion that recombination is not dependant on formation of SC. In almost all organisms genetic information is stored in the form of DNA sequences. DNA recombination involves the exchange of segments between two corresponding DNA molecules. This exchange of DNA can be between two homologous chromosomes or it can be among homologous sequences of non-homologous chromosomes resulting in a duplex DNA in which two regions of opposite parental origins are connected by a stretch of hybrid (heteroduplex) DNA.

Meiotic Segregation Pattern During meiosis, cohesins – protein complexes that hold sister chromatids together – are lost from the chromosomes in a stepwise manner. Loss of cohesins from chromosome arms is necessary for homologous chromosomes to segregate during meiosis I. Retention of cohesins around centromeres is required until meiosis II for the accurate segregation of sister chromtids. Brar et al. (2006) show that phosphorylation of the cohesin subunit Rec8 contributes to stepwise cohesin removal. Two other key regulators of meiotic chromosome segregation implicated in this process are the cohesin protector Sgo1 and meiotic recombination. This establishes meiotic segregation pattern.

Requirements of Recombination The requirements of recombination in other organisms like yeasts are similar to those in bacteria. Analogous proteins for recombination are present in eukaryotes and prokaryotes. For example, Dmc1 proteins, present in yeast and humans, are analogous to RecA and can form helical nucleoprotein filament. In plants, meiotic recombination is explained by double-strand repair model (Puchta and

8.8

Essentials of Molecular Genetics

Hohn 1991). The frequency of meiotic recombination varies from one plant species to other (1centimorgan, cM = 150 kbp in Arabidopsis and 2.3 to 4.7 mbp in maize) as well as across chromosome regions within the same species (1.15 to 22 mbp in wheat). Meiotic recombination is largely confined to structural genes (Schnable et al. 1998). Genes exhibit 10 to 100 times more recombination as compared to genome average. Intragenic recombination is an important mechanism of novel allele formation. Transgenic approaches that were based on incorporation of bacterial genes like recA or sitespecific recombination were employed for manipulating or enhancing recombination in plants (Page and Hawley 2003).

TYPES OF RECOMBINATION For sake of discussion, DNA recombination is categorized into five types: homologous (or general) recombination, DNA transposition, illegitimate or non-homologous recombination, artificial recombination, and site-specific recombination; of these, first three are discussed here.

Homologous Recombination Homologous recombination requires homology between the recombining partners. It involves exchange of parts between non-sister chromatids of two homologous chromosomes. Homologous recombination usually involves the production of heteroduplex DNA, DNA strands contributed from two different complexes. The proteins mediating this process are not sequence-specific but are homology-dependent. Long regions of homology are often involved. RecA protein of E. coli can promote the formation of duplex DNA in vitro by exchange of DNA strands between two helical structures, duplex DNA and a helical recA nucleoprotein filament containing a single-strand of DNA (Honigberg and Radding 1988). Complete winding of the parental duplex and rewinding of one strand with a new complement requires rotation of the helical structures about one another, or about their respective longitudinal axis. There is an association of torsional stress with strand exchange. Strand exchange is accompanied principally by concomitant rotation of duplex DNA and the recA nucleoprotein filament, each about its longitudinal axis. Homologous recombination involves the exchange of covalent linkages between DNA molecules in regions of highly similar or identical sequence. It serves opposing roles in different stages of life of a plant – somatic recombination contributes to repairing damaged chromosomes and thus stabilizes the genome whereas the meiotic recombination is a factor enabling evolution, through the creation of new genes and new linkages. Evidences show that homologous recombination frequency is not constant as differences at sub-genomic, intergenic and intragenic levels always exist. Now the genes controlling the homologous pairing of chromosomes have also been identified in other crops like maize (Wojciech et al. 2004). Homologous recombination has a crucial function in the repair of DNA double-strand breaks and in faithful chromosome segregation.

DNA Transposition A lady geneticist, Barbara McClintock, based on her study on color variations in maize kernel, discovered transposable genetic elements, also known as mobile genetic elements, jumping genes, or insertion sequences (McClintock 1948). She found a locus on chromosome 9 of maize which was responsible for determining color to the maize kernel. She concluded from her study that invasion of certain elements on this locus render the gene to be inactive. The first transposable elements

Genetic Recombination

8.9

discovered and named by her were activator and dissociation (Ac-Ds) elements. Her discovery of these unusual elements was not readily accepted due to two main reasons: One, this phenomenon of occurrence of such elements was not general at that time. So questions asked were: why such elements occurred only in maize and were not present in other organisms; second, prevalence of the concept that genes have the fixed position on the chromosome. So how could they be mobile? The mind-blowing work of this geneticist was recognized after discovery of such similar elements in bacteria, plants and animals. In recognition of her work, Barbara McClintock was awarded Nobel Prize in 1983. Now we know that transposons are present in almost all the prokaryotes and eukaryotes. Transposition does not require homology between recombining partners. The proteins mediating this process (transposases, integrases) recognize short, specific DNA sequences in one of the recombining partners only, which is a transposable element. The recipient site is usually relatively non-specific in sequence, and recombination integrates the transposable element into the host DNA. In crops like wheat, such exchanges are possible only when the locus (ph-5B) controlling homologous pairing of chromosomes is mutated.

Illegitimate Recombination Illegitimate recombination requires little or no homology between recombining partners and results from aberrant cellular processes. It involves exchange between homologous sequences of nonhomologous chromosomes. Unequal crossing-over is often described as illegitimate, although the mechanism is normal (albeit mismatch) homologous recombination. There are six mechanisms of illegitimate recombination, which are discussed here very briefly. Illegitimate end-joining. This reaction occurs frequently in eukaryotes, which repair double-strand chromosome breaks by direct religation of the ends. When this repair mechanism occurs on illegitimate ends, it can cause translocations and other rearrangements. It also allows transfected linear DNA to be integrated into the genome. En- joining is less common in prokaryotes which cannot ligate DNA without sticky ends. Illegitimate replication. This process occurs in repetitive DNA and DNA which forms stabilizing secondary structures like hairpins and cruciforms. The primer strand jumps out of register with the template and generates insertions and deletions. Repeats and self-complementary motifs stimulating illegitimate replication are often mutation hot spots. Illegitimate strand exchange. Normal recombination processes generate single-strands as reaction intermediates. Illegitimate strand exchange occurs when these free strands become joined to the wrong partner, for example, a single-strand which just happens to be in the vicinity of the reaction. This can also occur with systems not usually involving recombination, such as topoisomerases and the nickases which initiate replication in plasmids. Unequal crossing-over. This occurs if homologous duplexes synapse out of register due to the presence of repetitive DNA. Following resolution, it generates an insertion in one duplex and a deletion in the other. Since homology is required for this process, the mechanism is normal homologous recombination rather than illegitimate recombination. Aberrant site recognition. In this case, recombination relies on the recognition of a specific sequence by a protein. The chance presence of a similar sequence, a cryptic recognition site, in nearby DNA sometimes results in an aberrant reaction involving extra DNA. This type of recombination is responsible for most aberrant excision events involving transposons and site-specific episomes. Illegitimate V-(D)-J joining. Illegitimate recombination between recombination signal sequences (RSS) in the immunoglobulin and T-cell receptor (TCR) gene loci and cryptic elements located else-

8.10

Essentials of Molecular Genetics

where in the genome is responsible for a spectrum of chromosome aberrations often associated with lymphoid tumors.

MODELS OF GENETIC RECOMBINATION The links between mechanisms of recombination and cell repair converted the study of the mechanism of crossing-over from a purely internal problem of genetics into a general problem of molecular biology. A number of models were put forward after 1960. The hybrid DNA model as proposed by Whitehouse (1963), Holliday (1964) and Whitehouse and Hastings (1965) became the focal point for experimental approaches. Molecular model of genetic recombination presented by Holliday (1964) has the largest amount of experimental support. This model has, however, undergone various modifications. The common feature of models proposed by these scientists are: (a) recombination occurs by breakage-and-reunion of homologous DNA molecules, (b) heteroduplex or hybrid DNA structures are an essential intermediate in recombination, (c) mismatched bases within hybrid DNA are corrected to give normal base-pairing, and (d) explain both reciprocal and non-reciprocal recombination by one overall mechanism. To explain the meiotic and somatic recombination processes different models have been proposed. A model of recombination should satisfy the following conditions: (a) alignment of two homologous chromosomes, (b) strand cutting, (c) strand exchange and ligation, (d) branch migration, and (e) resolution. Three classical models which satisfy all these conditions have been proposed for different systems: (a) Holliday double-strand invasion (DSI) model (Holliday 1964), (b) single-strand invasion (SSI) or one-sided invasion (OSI) model (Meselson and Radding 1975) and (c) double-strand break repair model (DSBR) (Szostak et al. 1983), and single-strand annealing (SSA) model (Lin et al. 1984).

Holliday Model The first classical model for recombination was proposed by Holliday (1964). This model involves breakage, reunion and repair of DNA. According to this model, two homologous chromosomes first get aligned. Recombination is initiated by the introduction of nicks in each DNA molecule at exactly the same location. The free ends thus created invade the other homologous chromosome and then ligate to form Holliday intermediate followed by branch migration. A heteroduplex region is thus formed, which is resolved by specific enzymes either into crossover products or non-crossover products depending upon the cut sites. All other models also use the same principle except for the nature of breaks in the homologous chromosomes. Chi-sites. Recombination in bacteria starts with the nicking of the DNA and binding of RecBCD at free double-stranded ends (Handa et al. 2009) (Figure 8.5). RecBCD proteins by their exonuclease activity widen the gap created by the strand break. As soon as the Rec BCD protein comes in contact with the Chi-site, it loses its D subunit. The resultant RecBC has a helicase activity that unwinds the DNA using two ATP molecules per base pair. As RecBC continues to unwind the DNA, the resulting single-stranded DNA is coated with single-strand binding (SSB) proteins. Later RecA protein binds to this single-stranded DNA, stabilizes it and helps it to find its complementary sequence in its homologue for binding to generate a heteroduplex molecule. This four-stranded heteroduplex molecule is called Holliday junction which is ultimately resolved to generate recombinant products. Alberts (2003) describes mode of action of RecA proteins and its importance in DNA recombination. The enzyme RuvABC complex is involved in resolving Holliday intermediate into the products of recombination (Connolly et al. 1991). Although the genetic lengths of the genomes of different eukaryotic organisms are constant, the physical size of genome increases with evolutionary complexity of the organism. The activity of RecBCD is controlled by specific DNA sequence elements known as

Genetic Recombination

8.11

Figure 8.5 Role of some proteins and enzymes in genetic recombination RecF pathway

chi-sites (Dixon and Kowalcykowski 1993). RecA protein helps in synapsis and strand exchange between homologous DNA molecules (Cox and Lehman 1987). Chi-sites are hotspots for recombination in E. coli whose sequence is 5′-GCTGGTGG-3′. Chi-site is the site where RecBCD losses its D subunit and by that its exonuclease activity. Heteroduplexes. These are the regions on the recombinant DNA molecules where the two strands are not exactly complementary. They arise when primary pairing region, together with adjacent region of branch migration, encompasses the site(s) of genetic differences between the two parental chromosomes. Thousands nucleotide-long heteroduplexes have been seen in T4 and λ phage. Branch migration

8.12

Essentials of Molecular Genetics

transfers strands from one double helix to another. Heteroduplex generation during crossing-over provides us with the proof that (a) fundamental recombination event involves base pairing between regions of single-stranded DNA and (b) strand exchange takes place during crossing-over. Strand resolution. Three main proteins have been found to take part in resolution process in E. coli. These proteins are RuvA, RuvB and RuvC. RuvA protein searches for the heteroduplex region and binds to it. RuvB protein is involved in branch migration while RuvC protein has the function of nicking of two opposite strands. The direction of nicking of the Holliday junction decides the products of the exchange. If nicking occurs along vertical (V)-direction it leads to formation of heteroduplex and recombinant is recovered between f-f' and F-F' regions (Figure 8.6). If the nicking occurs in horizontal (H)-direction, it leads to the formation of heteroduplexes and parental combinations are recovered. However, it could cause gene conversion.

Figure 8.6 Holliday model in view of modern discoveries (Redrawn from www.web-books.com/MoBio/Free/ Ch8D2.htm)

Junction-resolving enzyme. Holliday junction is the central intermediate in homologous recombination, a ubiquitous process that is important in DNA repair and generation of genetic diversity. The penultimate stage of recombination requires resolution of this DNA junction into nicked-duplex species by the action of a junction-resolving enzyme, examples of which have been identified in a wide variety of organisms. These enzymes are nucleases that are highly selective for the structure of branched DNA. Hadden et al. (2007) presented the crystal structure of the junction-resolving enzyme phage T7 endonuclease I in complex with a synthetic four-way DNA junction (Figure 8.7). The junction comprises four arms labeled B, H, R and X. Arms B and R are 5-bp loops.

Genetic Recombination

8.13

Figure 8.7 Schematic representation of the Holliday junction in the crystal structure

Branch migration and resolution of Holliday junctions complete the recombination process. During the genetic recombination and the recombinational repair of chromosome breaks, DNA molecules become linked at points of strand exchange. Branch migration and resolution of these crossovers, or Holliday junctions complete the recombination process. Liu et al. (2004) showed that extracts from cells carrying mutations in the recombination/repair genes RAD51C or XRCC3 had reduced level of Holliday junction resolvase activity. Moreover, depletion of RAD51C from fractionated human extracts caused a loss of branch migration and resolution activity, but these functions were restored by complementation with a variety of RAD51 paralog complexes containing RAD51C. They concluded that the RAD51 paralogs were involved in Holliday junction processing in human cells. Holliday junction resolution is necessary for chromosome segregation. Holliday junctions are formed during homologous recombination and DNA repair, and their resolution is necessary for chromosome segregation. Ip et al. (2008) identified nucleases from budding yeast and human cells that promoted Holliday junction resolution, in a manner analogous to that shown by the E. coli Holliday

8.14

Essentials of Molecular Genetics

junction resolvase RuvC. The human Holliday junction resolvase GEN1, and its yeast ortholog, Yen1, were independently identified.

One-Sided Invasion (OSI) Model This model, also known as single-strand invasion model, was proposed by Meselson and Radding (1975) to explain gene targeting in somatic cells of plants. This model, as presented by Puchta and Hohn (1996), is shown in Figure 8.8. This is a non-conservative type of DNA recombination as the original parental sequences are lost during recombination. Thus OSI model involves illegitimate type of recombination.

Figure 8.8 One-sided invasion model of recombination by Meselson and Radding (1975)

Double-Strand Break Repair (DSBR) Model This model was proposed by Szostak et al. (1983) to explain meiotic recombination in plants, intrachromosomal recombination in somatic cells, i.e., with inverted repeats and gene conversion. It explains conservative mode of DNA recombination (Puchta and Hohn 1996) (Figure 8.9).

Single-Strand Annealing (SSA) Model This model was proposed by Lin et al. (1984) to describe extrachromosomal recombination in somatic plant cells. This is a non-conservative type of DNA recombination (Puchta and Hohn (1996) (Figure 8.10).

Genetic Recombination

Figure 8.9 Double-strand break repair (DSBR) model of genetic recombination by Szostak et al. (1983)

Figure 8.10 The single-strand annealing (SSA) model of recombination by Lin et al. (1984)

8.15

Essentials of Molecular Genetics

8.16

Among prokaryotes more than 25 different types of proteins are involved in DNA recombination in bacteria E. coli. While amongst eukaryotes, about 49 different types of proteins are involved in recombination in yeast. Some important proteins known in E. coli, bacteriophage T4 and yeast S. cerevisiae and their general and biochemical functions in DNA recombination are listed, following Bianco et al. (1998), in Table 8.1. Table 8.1 General and biochemical functions of proteins involved in genetic recombination General function Escherichia coli Initiating protein(s)

Protein

Biochemical functions

RecBCD

DNA strand exchange

RecA

ssDNA-binding protein Accessory protein(s)

SSB RecF

ATP-dependent dsDNA and ssDNA exonuclease, ATPstimulated ssDNA endonuclease, DNA helicase, recombination hotspot c -recognition DNA-dependent ATPase, DNA- and ATP-dependent coprotease, DNA renaturation, DNA strand exchange ssDNA binding, stimulates DNA strand exchange ssDNA, dsDNA binding, weak ATPase, interacts with RecR protein ssDNA, dsDNA binding; interacts with RecR; RecOR prevents end-dependent disassembly of RecA filaments; interacts with SSB Interacts with RecF; RecFR complex attenuates RecA filament extension into dsDNA regions DNA helicase, branch migration of Holliday junctions Binds to Holliday-, cruciform-, and four-way junctions, interacts with RuvB protein DNA helicase, branch migration of Holliday junctions, interacts with RuvA protein Binds to four-way junctions, cleaves Holliday junctions Type II topoisomerase Type I topoisomerase, w protein DNA ligase DNA polymerase, 5’;3’ exonuclease, 3’;5’ exonuclease

RecO

RecR Branch migration

RecG RuvA RuvB

Holliday Junction cleavage Other proteins

Bacteriophage T4 Initiating protein(s)

RuvC DNA gyrase DNA topoisomerase I DNA ligase DNA polymerase I gp46 gp47 gp41 gp59

DNA strand exchange

UvsX

ssDNA-binding protein

gp32

Accessory protein(s)

UvsY

Branch migration

Dda UvsW

Interacts with gp47; endo- and exonuclease Interacts with gp46; endo- and exonuclease; stimulates gp46 action DNA-dependent NTPase, ssDNA binding, ATP- or GTPdependent DNA helicase ssDNA binding, stimulates ATPase and helicase activities of gp41, interacts with gp32 and gp41 DNA-dependent ATPase, DNA renaturation, DNA strand exchange ssDNA binding, stimulates DNA strand exchange, interacts with UvsY and UvsX Stimulates DNA strand exchange, interacts with UvsX and gp32 DNA helicase, stimulates branch migration by UvsX protein DNA helicase, branch migration of Holliday junctions, functional analog of RecG

Genetic Recombination

8.17

Table 8.1 contd… gp41 gp59 Holliday Junction cleavage S. cerevisiae Initiating protein(s)

gp49 Mre11

Rad50

Xrs2 Spo11 DNA strand exchange

Rad51

ssDNA-binding protein

RPA

Accessory protein(s)

Rad52 Rad54

Rad55

Rad57

Other proteins

Rad59

DNA-dependent NTPase, ssDNA binding, ATP- or GTPdependent DNA helicase ssDNA binding, stimulates ATPase and helicase activities of gp41, interacts with gp32 and gp41 Binds to and cleaves Y-junctions and Holliday junctions Forms complex with Rad50 and Xrs2 which is possibly responsible for resection of double-strand DNA breaks; with Rad50, ssDNA endo- and 3’ to 5’ dsDNA exonuclease Forms complex with Mre11 and Xrs2 which is possibly responsible for resection of double-strand DNA breaks, ATP-dependent binding to dsDNA, contains ATP-binding Motif Forms complex with Mre11 and Rad50 which is possibly responsible for resection of double-strand DNA breaks Binds DNA, likely catalytic subunit responsible for doublestrand break formation DNA-dependent ATPase, DNA strand exchange, interacts with Rad52, Rad54, and Rad55 proteins ssDNA binding, stimulates DNA strand exchange, interacts with Rad52 protein Stimulates DNA strand exchange, interacts with Rad51 and RPA proteins Contains both ATP-binding, DNA helicase motifs, interacts with Rad51, hydrolyzes ATP, stimulates DNA strand exchange Stimulates DNA strand exchange, contains Walker ATPbinding motif, interacts with Rad51 protein, forms stable heterodimer with Rad57 protein; shows homology to Rad51 Stimulates DNA strand exchange, contains Walker ATPbinding motif, forms stable heterodimer with Rad55 protein; shows homology to Rad51 Shows homology to Rad52 protein, function is unknown

CROSSING-OVER EVENT Base Pairing in Crossing-Over Current experiments provide support to the hypothesis that crossing-over starts with pairing between complementary single-stranded tails growing out from double helical DNA molecules. The singlestranded regions in DNA may arise from cuts by an endonuclease. This creates free ends at which DNA polymerases can add new nucleotides, thus displacing the pre-existing strands to form a number of single-stranded tails. Random collisions of tails with complementary sequences lead to the formation of double helical junctions. An endonuclease nicks the other strands during one recombinant molecule and two molecular fragments with overlapping terminal sequences.

8.18

Essentials of Molecular Genetics

Specific Enzymes in Crossing-Over When bacteriophages T4 and λ multiply, much more crossing-over occurs than is observed in corresponding lengths of E. coli DNA. Simultaneously, various viral-specific enzymes appear. Each enzyme is coded by a specific gene on viral chromosome. In T4 infection, both a viral-specific endonuclease and a DNA polymerase-like enzyme have been discovered. λ Infection is marked by the appearance of new exonuclease and endonuclease activities. Mutations which block synthesis of λ exonuclease lead to greatly reduced levels of recombination. In E. coli also, nucleases have been linked to recombination. Two closely linked genes (recB and recC) code for two polypeptide subunits of a powerful nuclease which attacks both singlestranded and double-stranded DNA. Mutations in recB and recC genes result in reduced crossing-over rates. Figure 8.11 The mutual interchange of parallel strands

Parallel Strand Switches in Crossing-Over

between double helices (branch migration) as a consequence of right-handed axial rotation

Parallel strand switches provide an alternate hypothesis for crossing-over. This mode of recombination involves direct exchange between parallel-aligned double helices, following cuts in two identically oriented strands (Figure 8.11). Direct proof that strand exchange can occur comes from electron microscopic visualization of the crossing-over process. The λ system has shown clear-cut results. Denaturation mapping shows that the cross bridges always link together homologous regions of the pairing partners (Figure 8.12). It can be shown how the box-shaped crossingover diagram is generated by 180° rotation of the conventional criss-cross diagram.

GENE CONVERSION

Figure 8.12 Schematic representation of how crossingover diagram is generated by 180˚ rotation of the conventional criss-cross diagram

The term gene conversion, also known as non-reciprocal recombination, was first given by Winkler (1930) to indicate an apparent conversion of an allele, which is defined as non-reciprocal recombination. Gene conversion is an event in DNA genetic recombination, which occurs at high frequencies during meiotic division but which also occurs in somatic cells. It is a process by which DNA sequence information is transferred from one DNA helix (which remains unchanged) to another DNA helix, whose sequence is altered. It is one of the ways a gene may be mutated. Gene conversion may lead to non-Mendelian inheritance and has often been recorded in fungal crosses. Gene conversion is not a mutation as its frequency is much higher than spontaneous mutation rate. If it was a mutation, reversion rate can be as high as “wild-type” allele can change to a “mutant” allele. Mutation can be considered from another aspect also as alleles can convert to the nearby alleles. C.C. Lindegren’s concept of gene conversion (Lindegren 1953) was confirmed by Mortimer and Vonborstal

Genetic Recombination

8.19

(1963). This phenomenon was later discovered in other ascomycetes also, i.e., Ascobolus, Sordana and Podospora, and in Drosophila. Gene conversion is a non-reciprocal transfer of genetic information. Figure 8.13(A) shows comparison between gene conversion and DNA crossover. An origin of gene conversion is explained in Figure 8.13(B). Gene conversion is nothing but the DNA repair synthesis associated with recombi-

Figure 8.13 (A) Comparison between gene conversion and DNA crossover: (a) Two DNA molecules. (b) Gene conversion. (c) DNA crossover – the two DNAs exchange part of their genetic information (B) An origin of gene conversion: (a) Heteroduplexes formed by the resolution of Holliday structure or by other mechanisms. (b) The blue DNA uses the invaded segment (e') as template to "correct" the mismatch, resulting in gene conversion. (c) Both DNA molecules use their original sequences as template to correct the mismatch. Gene conversion does not occur (Redrawn from www.web-books.com/MoBio/Free/Ch8D4. htm)

8.20

Essentials of Molecular Genetics

nation. The phenomenon of gene conversion has been well studied in fungus Neurospora, based on the order of ascospores in an ascus produced by this model fungus. In a cross where ascospores are expected to be in the ratio of 2:2, a ratio of 1:3 or 3:1 is obtained. This deviation from the expected ratios could be explained on the basis of phenomena of gene conversion. It differs from DNA recombination, as no reciprocal exchange of sequences occurs here. Paquette and Rossignol (1978) reported that all conversions occur via hybrid DNA formation whether it is symmetrical or asymmetrical. Fogel et al. (1978) found that gene conversion in yeast is mediated mainly by the formation of heteroduplex DNA which is primarily of an asymmetrical nature. This conversion of one allele to the other is due to base mismatch repair during recombination: if one of the four strands during meiosis pairs up with one of the four strands of a different chromosome, as can occur if there is sequence homology, mismatch repair can alter the sequence of one of the chromosomes, so that it is identical to the other. Gene conversion can result from the repair of damaged DNA as described by the double-strand break repair model. Here a break in both strands of DNA is repaired from an intact homologous region of DNA. Resection (degradation) of the DNA strands near the break site leads to stretches of single-stranded DNA that can invade the homologous DNA strand. The intact DNA can then function as a template to copy the lost DNA. During this repair process a structure called a double Holliday structure is formed. Depending on how this structure is resolved either cross-over or gene conversion products result.

Effect of Gene Conversion In haploid baker’s yeast S. cereviseae, two meiotic products are normally obtained in 2:2 ratio and in Neurospora crassa, in absence of crossing-over, a ratio of 4:4 is obtained. Lindegren (1953) reported 3:1 ratio in yeast and Mitchell (1955) found 6:2 ratio in Neurospora. Later, 5:3 and 7:1 ratios were also reported in Neurospora. Normally, a diploid organism that has inherited different copies of a gene from each of its parents is called heterozygous. This is genotypically represented as Aa (i.e., one copy of allele A, and one copy of allele a). When a heterozygote creates gametes by meiosis, the alleles normally split and end up in a 1:1 ratio in the resulting cells. However, in gene conversion, a ratio other than the expected 1A:1a is observed, in which A and a are the two alleles. Such examples are 3A:1a, 1A:3a, 5A:3a or 3A:5a. In other words, there can, for example, be three times as many A alleles as a alleles expressed in the daughter cells, as is the case in 3A:1a.

Biased Gene Conversion Classical genetic studies show that gene conversion can favor some alleles over others (Marais 2003). Molecular experiments suggest that gene conversion favors GC over AT base pairs, leading to the concept of biased gene conversion towards GC (BGC(GC)). The expected consequence of such a process is the GC-enrichment of DNA sequences under gene conversion. Recent genomic work suggests that BGC(GC) affects the base composition of yeast, invertebrate and mammalian genomes. Hypotheses for the mechanisms and evolutionary origin of such a strange phenomenon have been proposed. Most BGC(GC) events probably occur during meiosis, which has implications for our understanding of the evolution of sex and recombination.

GENETICS AND ENZYMOLOGY OF RECOMBINATION Recombination is a biochemical process involving different enzymes. More than 25 gene products are involved in recombination. Some well identified gene-enzyme systems are described here.

Genetic Recombination

8.21

Rec System Genes that mediate recombination in E. coli are designated rec. The recA gene appears to be involved in general recombination. recA– mutants are incapable of recombining during conjugation, transduction or transformation. The recA gene may specify a regulatory protein which influences the synthesis or effectiveness of several proteins required for DNA breakage and rejoining. Alternatively, recA gene product may interact directly with chromosomes to mediate recognition and alignment. recA– are defective in repair of UV-induced DNA damage. The recB and recC genes reduce conjugational and transductional recombination rates. These genes code for two subunits of ATP-dependent exonuclease V of E. coli. This protein also exhibits endonuclease and unwinding activity. It attacks double- and single-stranded DNA degrading them without regard to their chemical direction. It represents the type of enzyme one would expect to participate in breakage and rejoining of chromosome. The recB– recC– mutants also are deficient in repairing UV-induced damage to DNA. RecA is central in homologous recombination as it helps in synapsis and strand exchange between homologous DNA molecules (Cox and Lehman 1987). The first enzyme involved to start recombination is RecBCD, which helps to process broken DNA molecules to generate single-stranded DNA (ssDNA). It also helps to load RecA strand-exchange protein on ssDNA ends. The activity of RecBCD is controlled by specific DNA sequence elements known as Chi sites (Dixon and Kowalcykowski 1993). RecBCD enzyme is a heterotrimeric helicase/nuclease that initiates homologous recombination at double-stranded DNA breaks. Several of its activities are regulated by the DNA sequence χ (5'GCTGGTGG-3'), which is recognized in cis by the translocating enzyme. When RecBCD enzyme encounters χ, the intensity and polarity of its nuclease activity are changed, and the enzyme gains the ability to lead RecA protein onto the χ-containing, unwound single-stranded DNA. Spies et al. (2003) showed that interaction with χ also affected translocation by RecBCD enzyme. By observing translocation of individual enzymes along single molecule of DNA, they could see RecBCD enzyme pause precisely at χ. After pausing at χ, the enzyme continues translocating but at approximately one-half the initial rate. They proposed that interaction with χ resulted in an enzyme in which one of the two motor subunits, likely the RecD motor, is uncoupled from the holoenzyme to produce the slower translocase. Model for the control of the DNA translocation behavior of RecBCD enzyme by χ, has been given by Spies et al. (2003). RecBCD enzyme is suggested as a bipolar helicase with its two motor subunits translocating on opposite strands of the DNA substrate molecule. Proteins that participate in DNA replication and repair are also required for general recombination. Low level of recombination ability in recB– recC– is restored to more or less normal level (rec+) by mutation in either of these two genes, sbcA and sbcB (suppressor recBC). Genes recF, recJ, recK and recL result in recombination deficiency in the recB recC strain. Gene 43 codes for a phage-specific unwinding protein which participates in general recombination perhaps by holding single-strands open and protecting them from intracellular nucleases during recognition-alignment phase. T4 phage having an amber mutation in Gene 43 fails to form recombination intermediates when it infects a nonsuppressing host. Amber mutation in Gene 30 of T4 phage makes it ligate defectively. In this case, recombination intermediates form in non-suppressor hosts but cannot mature. These recombination intermediates accumulate as gap-containing joint molecules (Goodenough 1994). If these joint molecules are isolated and presented with DNA ligase and polymerase in vitro, they are repaired to form covalently joined recombinant chromosomes. If ExoV (which has an endonuclease as well as exonuclease activity) is defective, the function of ExoV can be replaced by two ways – (1) inactivation of exonuclease I in sbcB mutants and (2) activation of exonuclease VIII in sbcA mutants. Clark (1974) suggested that there is a recE pathway opened by the presence of exonuclease I, as well as usual recB-recC pathway. The two supplementary

8.22

Essentials of Molecular Genetics

pathways (E and F) are detected only if exonuclease V is inactive. All the three mechanisms appear to produce DNA with gaps so as to allow new associations during the process of DNA repair. All the required enzymes for recombination are the same as, or are similar to, DNA replication enzymes, which are known to be present. The kind of DNA synthesis, which occurs during recombination, however, involves reaction in only small portions of DNA molecules. This is called DNA repair synthesis.

RuvABC System Enzyme complex RuvABC helps to resolve the Holliday intermediate into recombination products (Connolly et al. 1991). The binding of RuvABC complex causes the junction to adopt an open squareplanar configuration. Within such a structure, DNA isomerization can have little role in determining the orientation of resolution. To determine the role that junction-specific protein assembly has in determining resolution bias, a defined in vitro system was developed in which it was possible to direct the assembly of the RuvABC resolvasome. Van Gool et al. (1999) found that the bias toward resolution in one orientation or the other was determined simply by the way in which the Ruv proteins were positioned on the junction. Additionally, they provide evidence that supports current models on RuvABC action in which Holliday junction resolution occurs as the resolvasome promotes branch migration.

RAD System The isolation of yeast mutants deficient in DNA damage repair and blocked during meiosis has been useful for identification of recombination genes. RAD50, a yeast DNA repair gene required for meiotic interchromosomal exchanges between homologues, is required for intrachromosomal recombination (Gottliech et al. 1989). However, only intrachromosomal events in non-ribosomal DNA are dependent on RAD50; those in rDNA occur in the absence of this gene. Non-ribosomal DNA sequences retain their RAD50 dependence even when inserted into non-ribosomal DNA array. It suggests that there are at least two pathways of meiotic intrachromosomal recombination. RAD50 is not required either for interchromosomal or intrachromosomal spontaneous mitotic recombination. RAD50 encodes a DNA binding protein which functions in the repair of double-strand breaks in mitotic cells. The RAD52 epistasis group of genes is involved in recombinational repair. Predominant among these is RAD51, which is homologous to E. coli recA. RAD51 is inducible by DNA damage. There appear to be several homologs of RAD51 in each species. In yeast, for example, DST1 gene encodes a RAD-51-like product whose activity is restricted to meiotic cells. RAD52 gene is essential for both mitotic and meiotic recombination, the repair of double-stranded DNA breaks and mating type switching. RAD54 gene encodes a helicase required for recombination and other forms of DNA repair.

phs1 Gene Pairing, synapsis, and recombination are prerequisites for accurate chromosome segregation in meiosis. Wojciech et al. (2004) identified that homologous pairing in maize is governed by phs1 gene. Mutation of this particular locus leads to abnormal separation of chromosomes at anaphase of meiosis and by that loss of viability in pollen. In the phs1 mutant, homologous chromosome synapsis is completely replaced by synapsis between non-homologous partners. The phs1 gene is also required for installation of the meiotic recombination machinery on chromosomes, as the mutant almost completely

Genetic Recombination

8.23

lacks chromosome foci of the recombination protein RAD51 (Pawlowski et al. 2004). Thus, in the phs1 mutant, synapsis is uncoupled from recombination and pairing. The protein encoded by the phs1 gene likely acts in a multistep process to coordinate pairing, recombination, and synapsis.

DNA RECOMBINATION IN MITOCHONDRIA In fungi, mitochondrial DNA (mtDNA) recombination has long been documented but only in laboratory experiments and only under conditions in which heteroplasmy is ensured. Despite this experimental evidence, mtDNA recombination has not been documented in a natural population. Evidence from natural populations is prerequisite to understanding the evolutionary impact of mtDNA recombination. Saville et al. (1998) investigated the possibility of mtDNA recombination in an organism with the demonstrated potential for heteroplasmy in laboratory matings. Using nucleotide sequence data, they reported that the genotypic structure of mtDNA in a natural population of the basidiomycete fungus Armillaria gallica was inconsistent with purely clonal mtDNA evolution and is fully consistent with mtDNA recombination. Human mitochondrial DNA (mtDNA) is a 5-kb, circular genome essential for the maintenance of mitochondrial function and is present in multiple copies in most cell types. Mt DNA recombination occurs in yeast. Some evidence for occurrence of recombination in human mtDNA has also been obtained (Krytsberg et al. (2004).

DNA RECOMBINATION IN CHLOROPLASTS Differences in the restriction endonuclease fragmentation patterns of chloroplast DNA (cpDNA) from Chlamydomonas eugametos and C. moewusii were used to study the inheritance of these DNAs in interspecific hybrids (Lemieu et al. 1981). Analysis of the cpDNAs from ten randomly selected F 1 hybrids, in each case revealed cpDNA to be recombinant for AvaI and BstEII restriction sites, although fragments characteristic of C. eugametos, the mt+ parent, were typically found in excess of those for C. moewusii, the mt– parent. In backcrosses between an F1 mt+ hybrid and C. moewusii mt–, seven randomly selected B1 hybrids showed cpDNA restriction patterns either identical to or highly similar to that of the mt+ parent. They proposed that cpDNA molecules are predominantly transmitted by the mt + parent in both F1 and B1 generations but that selection favors survival of F1 progeny with recombinant chloroplast genomes which avoid interspecific incompatibilities. On the surface, the inheritance of recombinant cpDNA contrasts with the simultaneous uniparental inheritance of two putative chloroplast markers (sr-2 and er-nM1+). These two markers might by chance be associated with cpDNA sequences of the mt+ parent which were selected in all F1 hybrids.

INTRAGENIC RECOMBINATION Though mutation is primary source for creating genetic variability, it is not enough, for evolution. One type of amplification is at gene level and the other type is at genomic level. After induction of mutation, genetic variability is amplified at different levels. A mutation produces a new allele (m1) in certain site of the wild-type (+) gene. Another mutation produces another allele (m2). Assume, there was a heterozygote possessing two mutations m1 and m2 in heterozygous form and crossing-over takes place within two mutant sites, we get one chromatid that is non-mutant (wild-type) and another is mutant at two sites. Thus a new mutant allele m1.2 has been added to a population by intragenic recombination, without a new mutation. Intragenic recombination was first described by Oliver (1940).

Essentials of Molecular Genetics

8.24

Role of intragenic recombination in generating genetic variability was emphasized much later by Ohno (1970) in which he called it a mutational-like event, since origin of third allele (m1.2) involves mutations. If there is a third unisite mutation (m3) the allele m1.2 will, upon intragenic recombination give rise to a new allele m1.2.3. Table 8.2 gives formulas used to calculate extent of amplification of genetic variation due to intragenic recombination. Some generalizations can be drawn from intragenic recombination. Intragenic recombination adds new alleles only when there are at least two mutant alleles, affecting different amino acid residues. Alleles produced by intragenic recombination are multisite mutant alleles. When there are at least three mutant alleles, affecting three different amino acid residues, the number of alleles contributed by intragenic recombination is more than the number of unisite mutant alleles. Corresponding to an arithmetic increase in the number of unisite mutant alleles, there is a geometric increase in the number of multisite mutant alleles. Intragenic recombination occurs at a frequency of 1.4 × 10–3 at the Esterase-2 locus in Zaprionus paravittiger (Kumar 1978); 2 × 10–3 for the locus controlling blood group B system in cattle (Stormont 1965); 2.6-2.9 × 10–2 for one of the two loci controlling LDH in brook trait (Wright and Antherton 1968) and 0.6 × 10–2 for the dehydrogenase in quail (Ohno et al. 1969). Table 8.2. Amplification of genetic variation No. of unisite mutant alleles 0 1 2 3 4 5 n

Expected number of alleles mutant at sites 2 3 4 5 0 0 0 0 0 0 0 0 1 0 0 0 3 1 0 0 6 4 1 0 10 10 5 1 n n n n C2 C3 C4 C5

No. of multisite alleles 0 0 1 4 11 26 n 2 – (n+1)

Total no. of alleles (N) 1 2 4 8 16 32 n 2

Total no. of genotypes 1 3 10 36 136 528 N+N(N – 1)/2

Coyne (1976) uncovered 23 alleles of xanthine dehydrogenase (xdh) gene in D. persimillis. Singh et al. (1976) uncovered 37 alleles for the same enzyme in D. pseudoobscura. Johansson and Rendel (1968) uncovered more than 300 alleles for the blood group B locus in cattle. If two mutations are very near, recombination frequency is very low. So frequency of intragenic recombination is naturally expected to be quite low. But it is not really that low. With 12 alleles, 44 genotypes of females heterozygous for a pair of them were constructed and 4×10 4 of their F1 progenies per genotype were examined electrophoretically. 83 variants different from both of their parental alleles were obtained from a total of 2×106 progenies (Tsuno 1988). These observations suggested very high variation of est locus generated by recombinants from female heterozygotes. Intragenic recombination events were monitored between two physically separated rosy alleles ry301 and ry2 utilizing DNA restriction site polymorphism as genetic markers (Clark et al. 1988). It was observed that recombination could initiate at a large number of sites within the rosy locus of D. melanogaster. Intragenic recombination results in multisite mutants. Clear-cut evidence that multisite mutants are present comes from human hemoglobin. Out of 200 mutations analyzed, 182 were single-site mutations. If Table 8.2 is correct, large number of multisite-mutation alleles may appear. But only three 2-site mutations alleles have been observed in human hemoglobin. One multisite allele Hemoglobin J-Singapore mutation of β-polypeptide chain at 68th position where asparagine is replaced by aspartic acid and at 79th position alanine is replaced by glutamic acid. Hemoglobin-C-Harben mutation in β-polypeptide chain at 6th position replaces glutamic acid with valine, and at 73 rd position

Genetic Recombination

8.25

aspartic acid is replaced with asparagine. Another multiple site allele hemoglobin-Arlington Park in βpolypeptide at 6th position replaces glutamic acid with lysine and at 95 th position replaces lysine with glutamic acid. If frequency of mutations is low, probability that we will get heterozygote frequency is also low. For intragenic recombination to make contribution, the alleles may have to be present in considerable frequency. Electrophoretic bands also provide evidence for existence of multiple alleles in populations. When an acidic amino acid is replaced by a neutral amino acid, or when a neutral amino acid is replaced by an acidic amino acid, or when a basic amino acid is replaced by a neutral amino acid, or when a neutral amino acid is replaced by a basic amino acid, difference in electric charge between the wild-type and mutant bands will be of one unit. But when an acidic amino acid is replaced by a basic amino acid or vice-versa, difference will be of two units. Single-site mutations, along with the ancestral allele, can thus explain the existence of five such electrophoretic bands. In butterfly Colias eurytheme, 13 different bands were observed (Johnson 1976). They cannot be explained as single-site substitutions. To have three unit of difference, there must be 2 amino acid substitutions. Why? Since single site mutations can thus explain only five bands. How? Electrophoretic studies demonstrate that multisite mutations in certain cases must be present in a population. Electrophoretic studies provided clear cut evidence for existence of multisite-mutation in a population, but do not say anything about their origin. One most likely mechanism is intragenic recombination. Successive mutational event(s) is another possibility. Although there is greater probability for intragenic recombination, there is no way of excluding the second one.

RNA RECOMBINATION RNA recombination is a mechanism universally used by all animal and plant RNA viruses, and beside error-prone replication is one of major factors responsible for the emergence of new, often dangerous viral strains or species. Genetic RNA recombination is a process of the exchange of genetic information between RNA molecules, and it slightly differs from a typical genetic DNA recombination. RNA recombination was first discovered in poliovirus. At present, it seems that each RNA-based virus is capable of recombining. RNA recombination was observed for human, animal, plant and bacterial viruses. The exchange of genetic material most frequently takes place within a viral population, although it may also occur between different viral strains or between different viruses. Moreover, it was shown that viral RNA can recombine with host RNAs (Kurzynska-kokorniak 2011).

Molecular Mechanisms of RNA Recombination Despite extensive studies the molecular mechanism of RNA recombination is still not well understood. Initially, two completely different models of genetic RNA recombination – breakage and rejoining model and copy-choice mechanism – were proposed. Breakage and rejoining model assumed that viral recombinants arose as a result of breakage and rejoining of nucleic acid molecules. Copy-choice mechanism assumed that recombination takes place during viral genome replication, when the polymerase engaged in this process switches from one RNA template to another. RNA recombination based on assumptions of copy-choice hypothesis is classified as Class I, Class II and Class III. Class I copy-choice mechanism. Base-pairing-dependent recombination (similarity-essential recombination) – when substantial sequence similarity between parental RNAs is required – is the major determinant of recombination. Class II copy-choice mechanism. Base-pairing-independent recombination (similarity-non-essential recombination) – when sequence similarity is not required – but some homology may be present.

8.26

Essentials of Molecular Genetics

Recombination may be determined by other RNA features, such as RNA polymerase binding sites (highly structured RNA), secondary structures, and heteroduplex formation between parental RNAs. Class III copy-choice mechanism. Base-pairing-assisted recombination (similarity-assisted recombination) – features of class I and class II recombination. Sequence similarity influences sites and frequency of recombination; however, additional RNA features are also required.

Shaping Genome of RNA Viruses RNA-RNA recombination is one of the strongest forces shaping the genomes of plant RNA viruses. The detection of RNA recombination is a challenging task that prompted the development of both in vitro and in vivo experimental systems. In the divided genome of Brome mosaic virus system, both inter- and intrasegmental crossovers are described. Other systems utilize satellite or defective interfering RNAs (DI-RNAs) of Turnip crinkle virus, Tomato bushy stunt virus, Cucumber necrosis virus, and Potato virus X. These assays identified the mechanistic details of the recombination process, revealing the role of RNA structure and proteins in the replicasemediated copy-choice mechanism (SztubaSolińska et al. 2011). In copy choice, the polymerase and the nascent RNA chain from which it is synthesized switch from one RNA template to another (Figure 8.14) (Sztuba-Solińska et al. 2011). RNA recombination was found to mediate the rearrangement of viral genes, the repair of deleterious mutations, and the acquisition of non-self sequences influencing the phylogenetics of viral taxa. The evidence for RNA recombination, not only between related viruses but also among distantly related viruses, and even with host RNAs, suggests that plant viruses unabashedly test recombination with any genetic material at hand.

Classification of RNA Recombination based on RNA Structure and Function Based on the structure and function of RNA molecules, following types of genetic RNA recombination occurring during template switching by RNA polymerases, were distinguished. Figure 8.14 RNA-RNA recombination in viruses Homologous RNA recombination. It involves two identical or similar molecules (or two molecules which, although different, possess a sufficiently long regions of homology) and is called precise if recombinant junction sites are located

Genetic Recombination

8.27

accurately at the corresponding nucleotides, or imprecise when junction sites occupy different positions within the recombining molecules. As a result of precise crossovers parental molecules are regenerated, whereas imprecise recombination produces molecules in which some sequences are duplicated or deleted. Poliovirus RNA has been shown to undergo homologous RNA recombination at a high frequency in infected human cells. Recently it has become possible to mimic the entire intracellular replicative cycle of poliovirus replication in cytoplasmic extracts prepared from HeLa cells, resulting in the generation of infectious poliovirions. The mechanism of poliovirus RNA recombination has been shown previously to be coupled to RNA replication, presumably by template switching during the replication of parental RNAs. Experiments were designed to test whether recombinant poliovirus RNA molecules are produced in a cell-free environment. Recombinant molecules generated bear marker sequences that can be detected physically by reverse transcription and PCR. Tang et al. (1997) reported successful detection of poliovirus RNA recombination in a cellfree replication system. Non-homologous RNA recombination. It occurs between two various RNA molecules and generates products that differ distinctly from parental molecules. As a result, non-homologous recombinants are frequently dysfunctional and they rarely accumulate in the host organism. Sometimes non-homologous recombination can produce a new viral strain or species, possessing some advantageous features enabling it to compete successfully with other pathogens. Therefore, non-homologous recombination may play an especially important role in virus evolution. The appearance of thousands of variants enables the selection and replication of the most adaptable ones, due to which the virus can survive under unfavorable conditions. A survey of viral RNA structure and sequences suggested that many RNA viruses were derived from homologous or non-homologous recombination between viruses or between viruses and cellular genes during natural viral evolution (Lai et al. 1992). Replicase-driven template-switching RNA recombination. Molecular mechanisms of RNA recombination were studied in turnip crinkle carmovirus (TCV), which has a uniquely high recombination frequency and non-random crossover site distribution among the recombining TCV-associated satellite RNAs. To test the previously proposed replicase-driven template-switching mechanism for recombination, a partially purified TCV replicase preparation (RdRp) was programmed with RNAs resembling the putative in vivo recombination intermediates (Nagy et al. 1998). Analysis of the in vitro RdRp products revealed efficient generation of 3'-terminal extension products. Initiation of 3'-terminal extension occurred at or close to the base of a hairpin that was a recombination hotspot in vivo. Efficient generation of the 3'-terminal extension products depended on two factors: (i) a hairpin structure in the acceptor RNA region and (ii) a short base-paired region formed between the acceptor RNA and the nascent RNA synthesized from the donor RNA template. The hairpin structure bound to the RdRp is probably involved in its recruitment. The probable role of the base-paired region is to hold the 3' terminus near the RdRp bound to the hairpin structure to facilitate 3'-terminal extension. These regions were also required for in vivo RNA recombination between TCV-associated sat-RNA C and sat-RNA D, giving crucial and direct support for a replicase-driven template-switching mechanism of RNA recombination. Transesterification reactions in RNA recombination. Recent experiments in both the in vitro and the in vivo systems indicate that this type of recombination may result from various transesterification reactions which are either performed by RNA molecules themselves or are promoted by some proteins (Chetverin 1999).

8.28

Essentials of Molecular Genetics

REFERENCES Alberts, B. 2003. DNA replication and recombination. Nature 421: 431-5. Avery, O.T., C.M. MacLeod, and M. McCarty. 1944. Studies on the chemical nature of substance inducing transforming of pneumococcal types. Induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii. J. Exptl. Med. 79: 137-58. Bachmann, B.J. 1990. Linkage Map of Escherichia coli K-12. Microbiol. Rev. 54: 130-97. Bachmann, B.J., K.B. Low, and A.L. Taylor. 1976. Recalibrated linkage map of Escherichia coli K-12. Bacteriol. Rev. 40: 46-167. Berlyn, M.K.B. 1998. Linkage Map of Escherichia coli K-12, Edition 10. Microbiol. Mol. Biol. Rev. 62: 814-984. Bianco, P.R., Tracy, R.B., and Kowalczykowski, S.C. 1998. DNA strand exchange proteins: a biochemical and physical comparison. Front. Biosci. 3: d570-603. Brar, G.A., B.M. Kiburz, Y. Zhang, J.-E. Kim, F. White, and A. Amon. 2006. Rec8 phosphorylation and recombination promote the step-wise loss of cohesions in meiosis. Nature 441: 532-6. Bugreev, D.V., O.M. Mazina, and A.V. Mazin. 2006. Rad54 protein promotes branch migration of Holliday junctions. Nature 442: 590-3. Carpenter, A.T. 1979. Recombination nodules and synaptinemal complex in recombination-defective females of Drosophila melanogaster. Chromosoma 75: 259-92. Chetverin, A. 1999. The puzzle of RNA recombination. FEBS Lett.460: 1-5. Clark, A.J. 1974. Progress toward a metabolic interpretation of genetic recombination of Escheriehia coli and bacteriophage lambda. Genetics 78: 259-71. Clark, S.H., A.J. Hilliker, and A. Chovnick, 1988. Recombination can initiate and terminate at a large number of sites within the rosy locus of Drosophila melanogaster. Genetics 118: 261-6. Connolly, B., C.A. Parsons, F.E. Benson, H.J. Dunderdale, G.J. Sharples, R.G. Llold, and S.C. West. 1991. Resolution of Holliday junction in vitro requires the E. coli ruvC gene product. Proc. Natl. Acad. Sci. USA 88: 6063-7. Coyne, J.A. 1976. Lack of genic similarity between two sibling apecies of Drosophila as revealed by varied technique. Genetics 83: 593-607. Creighton, H.B., and B. McClintock. 1931. A correlation of cytological and genetical crossing-over in Zea mays. Proc. Nat. Acad. Sci. USA 17: 492-7. Delbruck, M. and Bailey, W.T. Jr. 1946. Induced mutations in bacterial viruses. Cold Sp. Harb. Symp; Quant. Biol. 11: 33-7. Dixon, D.A., and S.C. Kowalcykowski. 1993. The recombination hotspot Chi is a regulatory sequence that acts by attenuating the nuclease activity of the E. coli RecBCD enzyme. Cell 73: 87-96. Fogel, S., R. Mortimer, R.K. Lusnak, and F. Tavares. 1978. Meiotic gene conversion: a signal of the basic recombination event in yeast. Cold Sp. Harb. Symp. Quant. Biol. 43: 1325-41. Goodenough, U. 1994. Genetics. Saunders (W.B.) Co Ltd. Gottlieb, S., J. Wagstaff, and R.E. Esposite. 1989. Evidence for two pathways of meiotic intrachromosomal recombination in yeast. Proc. Natl. Acad. Sci. USA 86: 7072-6. Hadden, J.M., A.-C. Declais, S.B. Carr, D.M.J. Lilley, and E.V. Phillips. 2007. The structure of Holliday junction resolution by T7 endonuclease I. Nature 449: 621-4. Handa, N., K. Morimatsu, S.T. Lovett, and S.C. Kowalczykowski. 2009. Reconstitution of initial steps of dsDNA break repair by the RecF pathway of E. coli. Genes Dev. 15:1234-45. Hayes, W. 1952. Recombination in Bacteria E. coli K-12: unidirectional transfer of genetic material. Nature 169: 118-9. Hershey, A.D. 1946. Spontaneous mutations in bacterial viruses. Cold Sp. Harb. Symp. Quant. Biol. 11: 67-76. Holger, P., and H. Barbara. 1996. From centiMorgans to base pairs: homologous recombination in plants. Trends Plant Sci 1(10): 340-8 Holliday, R. 1964. A mechanism for gene conversion of fungi. Genet. Res. 5: 282-304. Honigberg, S.M., and C.M. Radding. 1988. The mechanics of winding and unwinding helices in recombination. torsional stress associated with strand transfer promoted by RecA protein. Cell 54: 525-32. Ip, S.C.Y., U. Rass, M.G. Blanco, H.R. Flynn, J.M. Skehel, and S.C. West. 2008. Identification of Holliday junction resolvases from humans and yeast. Nature 456: 357-61. Janssens, F.A. 1909. Spermatogenese dans les Batraciens. V. La. Theorie de la Chiasmatypie. Nouvelles interpretation des cinèses de maturatuion. Cellule 25: 387-411.

Genetic Recombination

8.29

Johnson, G.B. 1976. Hidden alleles at the alpha-glyceraldehyde dehydrogenase locus in Colias butterflies. Genetics 83: 149-67. Kraytsberg, Y., M. Schwartz, T.A. Brown, et al. 2004. Recombination of human mitochondrial DNA. Science 304: 981. Kumar, A. 1978. Genetics of esterase enzyme in Zaprionus paravittiger. M.Sc. Thesis (unpublished). Ludhiana, India: Punjab Agricultural University. Kurzynska-kokorniak, A. 2011. RNA recombination. BioInfoBank Institute. Lai, M.M. 1992. RNA recombination in animal and plant viruses. Microbiol Rev. 56: 61-79. Lemieu, C., M. Turmel, and R.W. Lee. 1981. Physical evidence for recombination of chloroplast DNA in hybrid progeny of Chlamydomonas eugametos and C. moewusii. Curr. Genet. 3: 97-103. Lin, F.L., K. Sperle, and N. Sternberg. 1984. Model for homologous recombination during transfer of DNA into mouse L cells: role of DNA ends in recombination process. Mol. Cell. Biol 4: 1020-34. Lindegren, C.C. 1953. Gene conversion in Saccharomyces. Jour. Genet. 51: 625-37. Liu, Y., J.-Y. Masson, R. Shah, P. O’Regan, and S.C. West. 2004. RAD51C is required for Holliday junction processing in mammalian cells. Science 303: 243-6. Marais, G. 2003. Biased gene conversion: implications for genome and sex evolution. Trends Genet. 19(6): 330-8. McClintock. B. 1948. Mutable loci in maize. Carnegie, Inst. Wash. Year Book 47: 155-69. Meselson, M.S., and C.M. Radding. 1975. A general model for genetic recombination. Proc. Natl. Acad. Sci. USA 72: 358-61. Morgan, T.H., and E. Cattell. 1912. Data for the study of sex-linked inheritance in Drosophila. J. exp. Zool. 13: 99-101. Mortimer, R.K., and R. Vonborstal. 1963. Gene conversion of non-sense suppressors in Saccharomyes. Genetics 48: 1545-9. Nagy, P.D., C. Zhang, and A.E. Simon. 1998. Dissecting RNA recombination in vitro: role of RNA sequences and the viral replicase. EMBO J. 17: 2392-03. Ohno, S. 1970. Evolution by Gene Duplication. Berlin: Springer-Verlag. Ohno, S., C. Stenius, L.C. Christian, C. Harris, and C. Schipwann. 1969. De novo mutation-like events observed at the 6-PGD locus of the Japenes quail and the principle of polymorphism breeding more polymorphism. Biochem. Genet. 3: 417-28. Oliver, C.P. 1940. A reversion to wild type associated with crossing-over in Drosophila melanogaster. Proc. Natl. Acad. Sci. USA 26: 452-4. Page, S.L., and R.S. Hawley. 2003. Chromosome choreography: The meiotic ballet. Science 301: 785-9. Paquette, N., and J.L. Rossignol. 1978. Gene conversion spectrum of 15 mutants giving post meiotic segregation in the B-2 locus of Ascobolus immersus. Mol. Gen. Genet. 163: 313-26. Pawlowski, W.P. I.N. Golubovkaya, L. Timofejeva, R.B. Meeley, W.F. Sheridan, and W.Z. Cande. 2004. Coordination of meiotic recombination, pairing and synapsis by PHS1. Science 303: 89-92. Puchta, H., and B. Hohn. 1991. The mechanism of extrachromosomal homologous DNA recombination in plant cells. Mol. Gen. Genet. 230: 1-7. Puchta, H., and B. Hohn. 1996. From centimorgans to base pairs: homologous recombination in plants. Trends Pl. Sci. 1: 340-8. Saville, B.J., Y. Kohli, and J.B. Anderson. 1998. mtDNA recombination in a natural population. Proc. Natl. Acad. Sci. USA 95: 1331-5. Schnable, P.S., A. Hsia, and B.J. Nikolau. 1998. Genetic recombination in plants. Curr. Opin. Pl. Biol. 1: 123-9. Singh, R.S., R.C. Lewontin, and A.A. Falton. 1976. Genetic heterogeneity within electrophoretic “alleles” of xanthine dehydrogenase in Drosophila peudoobscura. Genetics 84: 609-29. Spies, M., P.R. Bianco, M.S. Dillingham, N. Handa, R.J. Baskin, and S.C. Kowalczykowski. 2003. A molecular throttle: The recombination hotspot χ controls DNA translocation by the RecBCD helicase. Cell 114: 647-54. Stern, C. 1931. Zytologisch-genetische Untersuchungen als Beweise für die Morgansche Theorie des Faktorenaustauchs. Biol. Zentralbl. 51: 547-87. Stern, H., and Y. Hotta, 1974. DNA metabolism during pachytene in relation to crossing-over. Genetics 78: 22735. Stormomt, C. 1965. Mammalian immunogenetics. In: Genetics Today. Volume 3. Pp. 715-22. Ed. Geerts, S. New York: Gregomon Press. Streisinger, G., and V. Bruce. 1960. Linkage of genetic markers in phages T2 and T4. Genetics 45: 1289-96.

8.30

Essentials of Molecular Genetics

Szostak, J.W., T.L. Orr-Weaver, R.J. Rothatein, and F.W. Stahl. 1983. The double-strand break repair model for recombination. Cell 33: 25-35. Sztuba-Solińska, J., A. Urbanowicz, M. Figlerowicz, and J.J. Bujarski. 2011. RNA-RNA recombination in plant virus replication and evolution. Annu. Rev. Phytopath. 49: 415-43. Tang, R.S., D.J. Barton, J.B. Flanegan, and K. Kirkegaard. 1997. Poliovirus RNA recombination in cell-free extracts. RNA 3: 624-33. Thind, B.S. 2012. Procaryotic Pathogens and Plant Diseases. Jodhpur: Scientific Publishers. Tsuno, K. 1988. Contribution of recombinants produced by female flies heterozygous for Est- alleles to genetic variation of Drosophila virilis. Genet. Res. 51: 217-22. van Gool, A.J., N.M.A. Hajibagheri, A. Stasiak, and S.C. West, 1999. Assembly of the Escherichia coli RuvABC resolvasome directs the orientation of Holliday junction resolution. Genes Dev. 13: 1861-70. Westergaard, M., vonWettstein, D. 1970. Studies on the mechanism of crossing-over. IV. The molecular organization of the synaptinemal complex in Neotellia (Cooke) saccardo (Ascomycetes). C. R. Trav. Lab. Carlsberg. 37(11): 239–68. Whitehouse, H.L.K. 1963. A theory of crossing-over by means of hybrid deoxyribonucleic acid. Nature 199: 1034-40. Whitehouse, H.L.K., and P.J. Hastings. 1965. The analysis of genetic recombination on the polaron hybrid DNA model. Genet. Res. 6: 27-92. Winkler, H. 1930. Die Konversion der Gene. Eine Vererbungstheoretische Untersuchchung. Jena G. Fischer. Wojciech, P.P., N.G. Inna, T. Ljudmilla, B.M. Robert, F.S. William, and W.J. Cande. 2004. Coordination of meiotic recombination, pairing and synapsis by Phy1. Science 303: 89-92. Wright, J.E., and L. Antherton. 1968. Genetic control of intra-allelic recombination at the LDH-B locus in brook trout. (Abstract). Genetics 60: 240. Zickler, D., and N. Kleckner, 1999. Meiotic chromosomes: integrating structure and function. Annu. Rev. Genet. 33: 603-754. Zinder, N.D., and J. Lederberg. 1952. Genetic exchange in Salmonella. J. Bacteriol. 64: 679 -99.

PROBLEMS 1. 2. 3. 4. 5. 6.

How is genetic recombination fundamental to generation of genetic variation at molecular level? Is there any organism where crossing-over is known to be absent? Does it have advantage or disadvantage in such an organism? Holliday junction is an intermediate in the event of crossing-over. How has Holliday junction been utilized in DNA nanotechnology? How have transposons played role in evolution? Describe the role of intragenic recombination in generation of genetic variation? Does recombination take place in RNA? If so, in what type of systems does it occur? Does RNA recombination play role on shaping genome of the system where it occurs?

9 Mutation In the living cell, DNA undergoes frequent changes, especially when it is being replicated. Most of these changes are quickly repaired and those that are not result in DNA damage. DNA damage can be broadly classified as cytotoxic, carcinogenic, clastogenic or mutagenic. Cytotoxic damage interferes with the essential cellular processes like DNA replication, transcription and thus is lethal. Carcinogenic damage is likely to cause cancer. Clastogenic damage is likely to cause chromosomal rearrangements. Mutagenic damage is likely to induce mutations, or we can say that mutagenic DNA is often altering base pairing potential of the damaged base. Here we shall confine our discussion only to the mutagenic changes.

CLASSIFICATION OF MUTATIONS Mutations can be classified on the basis of their size, cause, direction, type of tissue affected, expression or effect, survival, location of genes, effect on the sense of genetic code, type of amino acid substituted in protein, effect on function, effect on survival, relevance to the process of evolution, degree of expression, etc.

Based on Size of Mutation Microlesions, These mutations are also known as point mutations or gene mutations. These are those mutations in which only one nucleotide pair is replaced. These changes being of very small in size are not observable even under a microscope. Such mutations can be detected by comparing nucleotide sequence of wild-type and mutant DNA. Due to such a substitution, the number of nucleotide pairs does not change in a gene. Such mutations are of two types – transitions and transversions. Transition is the change from purine to purine (A  G or G  A) or pyrimidine to pyrimidine (T  C or C  T) whereas transversion is the change from purine to a pyrimidine (A  T, A  C, G  T, G  C) or vice-versa (T  A, C  A, T  G, C  G). Intermediate lesions. In this case, either there is addition (insertion) or deletion (removal) of one or few nucleotide pairs from a gene. This leads to a change in number of nucleotide pairs in a gene. Thus changes may cause a shift in the translational reading frame and are known as frameshift mutations. Consider the following messenger RNA sequence of fifteen nucleotides: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 A U G C C C A A A G G G U U U ....

Essentials of Molecular Genetics

9.2

Workout the effects on translational reading frame when (a) base number 6 is deleted, (b) two bases number 6 and 7 are deleted, (c) three bases number 6, 7 and 8 are deleted, (d) one base adenine (A) is added after base numbered 8, (e) two bases, adenine (A) and uracil (U), are deleted after base numbered 8, and (f) three bases adenine (A) , cytosine (C) and guanine (G) are added after base 8. Take the help of genetic code dictionary (Table 9.1) for this exercise. Unless a frameshift mutation occurs close to the C-terminus, they knock down function of the gene. But this is not a rule. Table 9.1 Genetic code dictionary (Reproduced, with permission, from Khorana, H.G. 1968. Nobel Lecture at: http://nobelprize.org/nobel prizes/medicine/laureates/1968/ khorana-lecture.html © The Nobel Foundation)  U C Phe Ser Phe Ser Leu Ser Leu Ser C Leu Pro Leu Pro Leu Pro Leu Pro A Ileu Thr Ileu Thr Ileu Thr Met Thr G Val Ala Val Ala Val Ala Val Ala *, Initiation codon; **, Termination codons First Base  U

Second Base A Tyr Tyr Nonsense Nonsense His His Gln Gln Asn Asn Lys Lys Asp Asp Glu Glu

 G Cys Cys Nonsense Trp Arg Arg Arg Arg Ser Ser Arg Arg Gly Gly Gly Gly

Third Base  U C A G U C A G U C A G U C A G

Macrolesions. Macrolesions are also known as chromosome mutations which include changes in chromosome structure and changes in chromosome number. These changes at high resolution can be observed under a light microscope.

Based on Changes in Chromosome Structure Such mutations include deletions, duplications, inversions and translocations. Deletions, duplications and inversions are intrachromosomal modifications whereas translocations could be an intrachromosomal or interchromosomal modification. Deletion. A deletion may include a nucleotide, few nucleotides, part of a gene, whole gene, group of genes, complete chromosome, more than one chromosome or even complete set of chromosomes. Cytologically, deletions can be detected by failure of a segment of a chromosome to pair properly (Figure 9.1A). The segment present in normal chromosome but absent in mutant homolog loops out during pachytene synapsis. This is called deletion loop. Chromosome with a deletion can never revert to a wild-type condition. Deletions may show pseudodominance because in this case, a recessive gene present in a single dose expresses itself. Some deletions may have phenotypic effects. For example, deletion of a part of short arm of human chromosome 5 leads to Cri-du-chat syndrome in which case the individual survives only upto the age of 30. Deletion is the loss of a chromosome segment. Effects of deficiency are severe. Homozygous deficiency is usually lethal as some genes are completely lost from the genotype. Gametes bearing deficient chromosome may become functionless or sterile if some

Mutation

9.3

vital gene is lost. Genetic imbalances creep in; in case of heterozygous deficiency consequently deformities appears. Some other examples of syndromes caused by deletion are William syndrome (26 genes from long arm of chromosome 7 deleted), Jacobsen syndrome (chromosome 11q24 deleted), Prader-Willi syndrome (chromosome 15q11-13 deleted or unexpressed), Miller-Dieke lissencephaly syndrome (chromosome 17p13.3 deleted), SmithMagenis syndrome (17p11.2 deleted), DeGrouchy syndrome (deletion from distal section of 13q to the long arm of chromosome 18), and Di George syndFigure 9.1 (A) Deletion and (B) duplication loop. Each line rome (chromosome 22q11.2 deletion). represents a chromosome Duplication. Like deletion, duplication may include a nucleotide, a few nucleotides, part of a gene, whole gene, and group of genes, complete chromosome, more than one chromosome or even complete set of chromosomes. Duplications are usually not lethal in homozygous state. The reciprocal of a deletion is called duplication which at pachytene gives a loop-like structure (Figure 9.1B). There may be a tandem duplication of a part of a gene or whole of gene. Extrasegment(s) present in chromosome carrying duplication when pairs with its normal homolog loops out. This is called duplication loop. Duplication is the phenomenon of having a similar extra chromosome segment in addition to the normal chromosome complement so that one or more genes are present in more than normal dose. It increases the number of genes in the genome. Duplications have high degree of survival as they are seldom lethal. It increases genetic redundancy. Visible evidence of different DNA duplications in the phylogeny has been obtained in the study of giant (polytene) chromosomes of various dipteran species. Homology of the repeat segments is manifested by not only the similarity of the bands but also by their ability to pair with each other in polytene nuclei. Duplications may be found contiguous to each other in different parts of the chromosomes. One of the effects of duplications may be visible effect on phenotype. For example, duplication of 16A segment of X-chromosome of D. melanogaster leads to reduction in eye size. Flies homozygous for wild-type allele (B+/B+) have 779 facets in one compound eye, those heterozygous for Bar allele (B+/B) have 358 while those homozygous for Bar allele (B/B) have 68 facets. Ultrabar females have three 16A segments on one X chromosome and only one 16A segment on the other (BU/B+), but the bar effect is more pronounced as the eye size further reduces as the eye has 45 facets. Homozygous Bar and ultrabrabar females have same number of 16A segments (four) but differ with respect to their position. This difference in their position produces a phenotypic effect, which is known as position effect (Lewis 1950). Using symbol ―‖ to represent one 16A segment, genotypes of wildtype and various Bar variant females are written as represented in Figure 9.2. Duplications thus provided first conclusive proof of position effect. Some duplications may not have any visible effect on the phenotype. For example, Clarkia unguicalata may have whole chromosome present three times, rather than normal two, without exhibiting a detectable change in its phenotype or vigor. Duplications are very important changes from

Essentials of Molecular Genetics

9.4 Phenotype

Genotype +

+

Wild-type

B /B

Heterobar

B /B

Homozygous Bar

B/B

Ultrabar

B /B

+

U

Position and number of 16A segments X  X  X  X  X  X  X  X 

Figure 9.2 Genotypes of wild-type and various Bar variant females of Drosophila melanogaster

evolution point of view. In evolution of new genes, first step is duplication, followed by knocking down the existing function through a nonsense mutation (pseudogene) and accumulation of more mutations till the gene acquires a new function. Thus duplication provides additional genetic material potentially capable of giving rise to new genes during the process of evolution. Fragile X is the most common form of mental retardation. The X chromosome of some people is unusually fragile at one tip and is seen "hanging by a thread" under a microscope. Most people have 29 repeats at this end of their X chromosome; those with Fragile X have over 700 repeats due to duplications. Inversion. Inversion may involve a portion of a gene or a group of genes which have been cut out, turned through an angle of 180° and reattached in reverse order (Figure 9.3A-B). In case of an inversion, there is no net loss or gain of genetic material and thus heterozygote is perfectly viable. A heterozygous inversion can be detected as an inversion loop. For the genes present in the inverted chromosome, the linkage relationship is changed. Inversions may involve only one chromosomal arm and not include a centromere (paracentric inversion) (A) or it may involve both the chromosomal arms and thus include a centromere (pericentric inversion) (B). An intragenic inversion will affect amino acid sequence of polypeptide encoded by that gene. Phenotype may be altered or function may be lost. Inversions have some genetic effects. In a heterozygote for a paracentric inversion, crossing-over within the inversion loop produces dicentric and acentric chromosomes. Individuals producing crossover chromatids do not survive and thus do not produce recombinant gametes. Thus crossovers are not detected. In a heterozygote for a pericentric inversion also, cross over products are not recovered. That is why inversions are frequently used as suppressors of crossing-over in classical genetics experiments. Translocation. Translocations may involve transfer of a part of chromosome to the same chromosome (intercalary translocation) (Figure 9.4A), or to a non-homologous chromosome (simple translocation) Figure 9.3 (A) Paracentric and (B) pericentric inversion (Figure 9.4B), or two non-homologous loops. Pairing between only one normal and one inverted chromosome may mutually exchange their chromosome is shown

Mutation

9.5

parts and in this case, in reality, two simple translocations are simultaneously achieved (reciprocal translocation). Intercalary translocation involves transfer of a part of a chromosome to a same chromosome (Figure 9.4C). Simple translocation is onesided translocation in which a separated chromosome gets attached to the end of another chromosome. It is rare because a chromosome end does not commonly fuse with any other chromosome segment. Reciprocal translocation is a type of translocation which involves mutual exchange of segments between two non-homologous chromosomes. Two simple translocations when occur reciprocally result in one reciprocal translocation. If break is within a gene, the base sequence of the gene will be changed and its function may be lost. Translocations when in homozygous form change linkage relationships. They may drastically alter size of a chromosome as well as the position of its centromere. Translocation is used as an important tool in gene transfers and Figure 9.4 Different types of translocations: (A) intercalary production of duplications and defitranslocation, (B) simple translocation, and (C) reciprocal transciencies. Philadelphia chromosome location responsible for chronic myeloid leukemia in man is due to translocation of a part of long arm of chromosome 22 to chromosome 9. Acute Myelogenous Leukemia is caused by this translocation. Interactions between ends from different double-strand breaks (DSBs) can produce tumorigenic chromosome translocations, Two theories for the juxtaposition of DSBs in translocations, the static ―contact-first‖ and the dynamic ―breakage-first‖ theory, differ fundamentally in their requirement for DSB mobility. To determine whether or not DSB-containing chromosome domains are mobile and can interact, Aten et al. (2004) introduced linear tracks of DSBs in nuclei. They observed changes in track morphology within minutes after DSB introduction, indicating movement of the domains. Juxtaposition of different DSB-containing chromosome domains through clustering, which was most extensive in G1 phase cells, suggested an adhesion process in which they implicated Mre11. These results supported ―breakage-first‖ theory to explain the origin of chromosomal translocations. Translocations, inversions and deletions produce semi-sterility by generating unbalanced meiotic products that may themselves be lethal or that may result in lethal zygotes. Various genetic diseases in humans have been identified to be due to chromosomal rearrangements. The structural chromosomal

9.6

Essentials of Molecular Genetics

rearrangements can alter chromosome organization and thereby affect gene function. They can activate gene expression. They can create novel fusion genes. Also they can affect chromosome segregation (e.g., non-disjunction) during meiosis, and semi-sterility. Some types of rearrangements in meiosis may be of evolutionary significance.

Based on Changes in Chromosome Number Monoploid. Monoploid is defined as a haploid individual produced from a diploid species and containing only basic (x) number of chromosomes such that each kind of chromosome is represented only once in the nucleus. Monoploids usually result from mutations. In wasps, bees and ants, males are monoploids. Monoploids may be derived from the products of meiosis. Monoploids are generally sterile but by doubling the chromosomes, homozygous diploid fertile plants are produced. Monoploids can be produced by interspecific hybridization followed by chromosome elimination. There are two routes by which monoploids can be induced artificially. One route is based on the male gamete (microspore) and the other on the female gamete (megaspore). Potentially, the first route, via anther or microspore culture, has the advantage because there are far more potential monoploids per spike in the form of male gametophytes than female gametophytes. Both methods are based on embryogenesis and the development of plants from monoploid embryos, followed by chromosome doubling to obtain homozygous diploids. The advantage of monoploids as tools in plant breeding or genetics becomes more apparent when their direct application is visualized. They provide the quickest possible way to complete homozygosis. They may serve to recover recessives. Linkage data can be obtained directly by sampling gametes as monoploids. Doubled monoploids give an immediate product of stable recombinants from species crosses. Monoploids can be used to determine homology within a genome and between genomes. They are ideal for the study of mutation frequencies and spectra. They provide an ideal system for fundamental cell biological problems (i.e., biosynthesis) in cell and protoplast culture. Monoploid cells as protoplasts provide unique material for gene transfer, host-pathogen reactions, and cytoplasmic and/or chromosomal incompatibility. For breeding purposes one of the main advantages of using monoploids is that completely homozygous lines are produced directly from gametes of F 1 hybrids or from later (advanced) selections. This allows for a direct fixation of quantitative characters. For practical plant breeders, this saves time and the desired product is readily recognizable. Monoploid protoplasts are a powerful tool in plant modification and somatic hybridization. Haploid. Haploid is a sporophyte with gametophytic chromosome number (n). Haploids exist as a single set of chromosomes during almost entire part of their life and thus monoploids are distinguishable from true haploids. Although, haploids could be produced following delayed pollination, irradiation of pollen, temperature shocks, colchicine treatment and distant hybridization, the most important methods currently being utilized under biotechnology programs include anther or pollen culture, ovule culture, and chromosome elimination following interspecific hybridization (bulbosum technique). Haplodiploids are produced from haploids. Haplodiploidy produces individuals that are homozygous at all the loci. The methods used for utilizing haploids in plant breeding depend on the inheritance pattern observed in a crop, particularly when we use haploids derived from a polyploid crop. For instance allopolyploids, particularly those which are self pollinated (e.g., tobacco), exhibit disomic inheritance, while autopolyploids (e.g., potato), particularly those, which are cross pollinated, exhibit polysomic inheritance. Haploid number (n) and monoploid number (x). The haploid number (n) is the number of chromosomes in a gamete. A somatic cell has twice that many chromosomes (2n). Humans are diploid (2n = 46). Human germ cells (sperm and egg) have one complete set of chromosomes from the male or

Mutation

9.7

female parent. Germ cells, also called gametes, combine to produce somatic cells. Somatic cells therefore have twice as many chromosomes. Many organisms like bread wheat have more than two sets of homologous chromosomes and are called polyploid. The number of chromosomes in a single (non-homologous) set is called the monoploid number (x), and is different from the haploid number (n). The symbols ‗n‘ and ‗x‘ apply to every cell of a given organism. For humans, x = n = 23, which is also written as 2n = 2x = 46. Bread wheat is an organism where x and n differ. It has six sets of chromosomes, two sets from each of three different diploid species that are its distant ancestors. The somatic cells are hexaploid, with six sets of chromosomes, 2n = 6x = 42. The gametes are both haploid and triploid, with three sets of chromosomes. The monoploid number x = 7, and the haploid number n = 21. Changes in chromosome number may involve incomplete set of chromosomes (aneuploids) or a complete set of chromosomes (euploids). Aneuploids and euploids are discussed here. Aneuploids. Aneuploids may involve, in a disomic (2n), deletion of one chromosome (2n-1, monosomic), deletion of two non-homologous chromosomes (2n-1-1, double monosomic), or deletion of a pair of homologous chromosomes (2n-2, nullisomic). Aneuploids may also arise due to the addition of one chromosome (2n+1, trisomic), addition of two non-homologous chromosomes (2n+1+1, double trisomic), or addition of a pair of homologous chromosomes (2n+2, tetrasomic). Nondisjunction (failure of homologous chromosomes to separate during meiosis) is the cause of origin of aneuploids. Trisomics play important role in gene location programs and nullisomics and monosomics are useful in locating newly found genes on specific chromosomes in plants. Aneuploidy is the cause of various genetic disorders in humans (Table 9.2). Table 9.2 Human disorders due to aneuploidy (Extracted from http://users.rcn.com/jkimball.ma.ultranet/ BiologyPages/G/GenomeSizes.html) Name of the syndrome Down's syndrome Klinefelter's syndrome Klinefelter's syndrome Turner's syndrome Patau's syndrome Edward's syndrome Born criminal

Chromosome complement constitution 21 extra X extra XY extra Only one X present 13 extra 18 extra Y extra in male

Production of aneuploids. Although mutations in cell cycle regulators or spindle proteins can perturb chromosome segregation, it is assumed that non-disjunction of a chromosome during mitosis will yield two aneuploid daughter cells. Shi and King (2005) showed that chromosome non-disjunction is tightly coupled to regulation of cytokinesis in human cell lines, such that non-disjunction results in the formation of tetraploid rather than aneuploid cells. They observed that spontaneously arising binucleated cells exhibited chromosome mis-segregation rates upto 166-fold higher than the overall mitotic population. Most binucleated cells arose through a bipolar mitosis followed by regression of the cleavage furrow hours later. Non-disjunction occurred with high frequency in the cells that became binucleated by furrow regression, but not in cells that completed cytokinesis to form two mononucleated cells. These findings indicated that non-disjunction did not directly yield aneuploid cells, but rather produced tetraploid cells that might subsequently become aneuploids through further division. The coupling of spontaneous segregation errors to furrow regression provides a potential explanation for the prevalence of hyperdiploid chromosome number and centrosome amplification observed in many cancers. Model of Shi and King (2005), summarizing the relationship of chromosome missegregation in the regulation of cytokinesis, and subsequent possible fates of resulting binucleated

9.8

Essentials of Molecular Genetics

cells, is given in Figure 9.5. Normal segregation (left) is associated with completion of cytokinesis producing two diploid mononucleated daughter cells. Chromosome nondisjunction (right) is associated with furrow regression, producing a binucleated tetraploid cell. If this cell divides, it can proceed through bipolar mitosis to produce two mononucleated tetraploid cells with equivalent genomes. However, if a multipolar spindle is formed, aneuploid progeny are likely to be produced due to high rates of chromosome mis-segregation resulting from multipolar mitosis. Euploids. Euploids may contain more than two complete sets of chromosomes. Individuals having three, four, five, six... sets of chromosomes are called triploids, tetraploids, pentaploids, hexaploids, and so on. In animals, polyploids are restricted to groups that reproduce asexually or are hermaphroditic (an individual with both male and Figure 9.5 Model summarizing the relationship of female reproductive organs) and parthenochromosome mis-segregation in the regulation of cytogenetic (an individual developed from an kinesis, and subsequent possible fates of resulting unfertilized egg). Examples of euploids are binucleated cells earthworms, some types of shrimps, and parthenogenetic species of insects and lizards. Polyploids which are formed from sets of chromosomes from one single species are known as autopolyploids and those that are formed from sets of chromosomes of different species are called allopolyploids or amphidiploids. Alfalfa, coffee, potato and peanut are autotetraploid plants. Examples of allopolyploids are cabbage and wheat (Triticum aestivum). Triploids are characteristically sterile. A practical application of sterility associated with triploidy is the production of seedless varieties of watermelons, bananas, etc. These varieties are more palatable to consumers. Allopolyploidy seems to have played as an important force in speciation of plants. An alkaloid, colchicine, is used to produce polyploids as it acts by dissolving the spindle apparatus thereby doubling the number of chromosomes in the cell.

Based on Cause of Mutations Spontaneous mutations. These are those mutations whose cause is not known. These mutations are found in natural or laboratory populations under normal growth conditions. These may originate due to errors during DNA replication, due to environmental mutagens, tautomerism or transposable genetic elements. Induced mutations. These are those mutations whose cause is known. These can be obtained by treatment with physical or chemical mutagens. The induced mutations are produced under changed growth conditions.

Based on Type of Damage Based on type of damage, mutations can be divided into following types:

Mutation

9.9

Incorrect base. An incorrect base in one strand that cannot form hydrogen bonds with the corresponding base in the other strand can result from a replication error that by chance is not corrected by the editing/proofreading function of DNA polymerases. Missing bases. The glycosidic bond of a purine nucleotide is spontaneously broken at physiological temperatures, though at a very low rate. This process is called depurination because the purine is lost from the DNA. Altered bases. Bases can be changed into strikingly different compounds by a variety of chemical and physical agents. For instance, ionizing radiation can break purine and pyrimidine rings and can cause several types of chemical alterations. The most frequent substitutions are made in thymine. Free radicals produced in many metabolic reactions can cause a variety of significant changes. The best studied altered base is the interstrand dimer formed by two pyrimidines as a result of ultraviolet radiation. Single-strand breaks. A variety of agents can break phosphodiester bonds. Among the more common chemicals are peroxides, sulfhydryl containing compounds and metal ions such as Fe 2+ and Cu2+. Ionizing radiation produces strand breaks. Double-strand breaks. Double-strand breaks can also lead to DNA damage. If a DNA molecule receives a sufficiently large number of randomly located single-strand breaks, two breaks may be situated opposite to each other resulting in breakage of the double helix. Double-strand breaks can also result from a single event. This can result from exposure to highly ionizing radiation. Reactive oxygen species, ionizing radiations and chemicals that generate reactive oxygen species produce double-strand breaks. Double-strand breaks are normal result of V-D-J recombination in mammalian immunoglobin genes but occur unnaturally as a result of replication fork arrest and collapse. Covalent cross-linking of DNA bases. Some antibiotics (for example, mitomycin C) and some reagents can form covalent linkages between a base in one strand and an opposite base in the complementary DNA strand. This prevents strand separation during DNA replication and also causes a local distortion of the helix. One way in which DNA becomes damaged is by the covalent crosslinking of DNA bases; cross-links can be formed from bases on either the same strand of the helix (intrastrand cross-linking) or opposite strands of the helix (interstrand cross-linking). Covalent intrastrand cross-links can be formed by chemical agents, such as cisplatin, or by ultraviolet light, leading to the formation of a pyrimidine dimer. Such a pyrimidine dimer cannot fit into the double helix, so the normal functions of the cell (such as replication and transcription) are blocked until the dimer is removed. Blocked DNA replication. A mistake in DNA replication, where an incorrect nucleotide is incorporated, will lead to mutation in next round of DNA replication of the strand with incorrect nucleotide. If DNA replication is not completed it can lead to aberrant chromosome rearrangements or cell death. Slippage over sequence repeats during DNA replication causes insertion mutations. Replication errors are caused by DNA polymerase in homopolymeric runs. DNA polymerase is prone to make mistakes. This occurs by the slippage of the newly synthesized strand which is more likely and template strand which is less likely. This will lead to spontaneous insertion and deletion mutation. The frequency at which DNA polymerase makes mistake will influence the spontaneous mutation frequency and it has been observed that different polymerases vary in their accuracy. Translesion replication. Translesion replication refers to DNA replication past the point of lesion. Translesion replication can be mutagenic if it occurs in an important region of the genome thus can alter gene function or genome stability. Misreading by RNA polymerase. Misreading by RNA polymerase can introduce mutations in mRNA. It can block transcription and thus cause damage.

Essentials of Molecular Genetics

9.10 Based on Direction of Mutation

Considering direction of mutations as a criterion, mutations may be classified as forward or backward, also known as reverse mutations. Forward mutation. Any mutation away from the standard or wild-type is defined as forward mutation. For example, y+  y or R  r. Reverse or back mutation. Any mutation towards the standard or wild-type in known as reverse or back mutation. For example, y  y+ or r  R. A back mutation or reversion is a point mutation that restores the original sequence and hence the original phenotype (Ellis et al. 2001). Reverse mutations are classified further as exact reversions and equivalent reversions. Exact reversion. In this case same codon is reconstituted. This is shown by two examples given below: DNA RNA Amino acid Phenotype

AAA forward mutation UUU Phe Wild-type

GAA CUU Leu Mutant

reverse mutation

AAA UUU Phe Wild-type

DNA RNA Amino acid Phenotype

TTT forward mutation AAA Lys Wild-type

CTT GAA Glu Mutant

reverse mutation

TTT AAA Lys Wild-type

Equivalent reversion. Equivalent reversions are of two types. In one type of equivalent reversion, a different codon coding for same amino acid is reconstituted. In the example given here, reverse mutation produces a codon synonymous to the one that codes for amino acid in the wild-type. DNA RNA Amino acid Phenotype

AGG forward mutation UCC Ser Wild-type

ACG reverse mutation UGC Cys Mutant

TCG AGC Ser Wild-type

In second type of equivalent reversion a different codon coding for similar amino acid is reconstituted. In the example given below, reverse mutation produces a missense codon but amino acid substituted has similar charge (e.g., Arginine  Histidine; both amino acids are positively charged) and this may not considerably affect wild-type activity of the protein. DNA RNA Amino acid Phenotype

GCG forward mutation CGC Arg Wild-type

GGG reverse mutation CCC Pro Mutant

GTG CAC His Wild-type

Based on Change in the Sense of Codon Samesense mutation. A muation which changes codon that codes for the same amino acid as in wildtype is known as samesense mutation. For example,

Mutation DNA RNA Amino acid

9.11 AGA UCU Ser

Mutation

AGG, AGC or AGT UCC, UCA or UCG Ser

In some cases, base pair substitutions generate a different codon for the same amino acid, with no biological effect whatsoever. This is most likely to happen in the third position (wobble base) of redundant codons for the same amino acid. Such changes are considered to be mutations because they alter the genetic code. However, because they have no phenotypic effect, even at the level of protein amino acid sequence, they are called silent or neutral mutations. The new codon codes for the same amino acid as the original codon. These are also known as samesense mutations. Missense mutation. This is a point mutation that changes meaning of a codon such that the new codon codes for a different amino acid. An example is given below. DNA RNA Amino acid

TCA AGU Ser

Mutation

TCT or TCC AGA or AGG Arg

In missense mutation, a base pair change in the DNA causes a change in an mRNA codon with the result that a different amino acid is inserted into the polypeptide in place of the one specified by the wild-type codon. Depending on the nature of the amino acid substitution and its location within the protein, missense mutations may have a variety of effects, ranging from complete loss of biological activity to reduced activity or temperature-sensitive activity or no functional effect at all. For example, sickle-cell anemia was identified by Ingram (1957) as being caused by a missense mutation resulting in a single amino acid substitution in the β-globulin subunit of the hemoglobin tetramer (2α + 2β subunits). A transversion causes the codon GAG to be changed to GUG in messenger RNA (GTG in the DNA) (Figure 9.6). This replaces a glutamic acid with a valine as the sixth amino acid (counting from the N-terminus) in the mature β-globulin molecule. That substitution causes the hemoglobin to precipitate into fibrous aggregates that distort the shapes of red blood cells under low-oxygen conditions, resulting in blockage of capillary circulation and breakage of the red blood cells. Codon No.

4 ACT Thr ACT Thr

5 CCT Pro CCT Pro

6 GAG Glu GTG Val

7 GAG Glu GAG Glu

normal betaA gene normal betaA chain mutant betaA gene mutant betaA chain

Figure 9.6 Gene mutation in hemoglobin betaA chain (http://users.rcn.com/jkimball.ma.ultranet/Biology Pages/ M/Mutations.html)

There is another class of silent mutations also. The new codon codes for a different amino acid that is chemically equivalent to the original. Although a missense mutation, it does not affect function of the protein. Nonsense mutation. When a point mutation creates one of the three nonsense codons (UAA, UGA or UAG), no amino acid is then incorporated in the growing polypeptide chain against that mutant codon, and synthesis of polypeptide chain is stopped. Termination mutation is also called nonsense mutation. DNA RNA Amino acid

ATG UAC Tyr

Mutation

ATT or ATC UAA or UAG Ochre or Amber (Termination codon)

Essentials of Molecular Genetics

9.12

In some cases, the effects of nonsense mutations can be suppressed by modified tRNA molecules that insert an amino acid with a low efficiency when a stop codon is encountered. For example, cystic fibrosis is caused by a faulty protein that in wild-type form controls chloride ion transfer across cell membranes in secretary cells. The mutation results in deletion of phenylalanine from a chloridepermeable channel membrane protein. Deletion of the phenylalanine disrupts normal chloride channel function. The resulting imbalance of chloride and sodium ions results in abnormally dehydrated mucus which causes a multitude of physical problems, from sterility to respiratory failure. Splice-site mutations. Most of the eukaryotic genes are split genes. The transcript of such genes is a precursor of messenger RNA (pre-mRNA) where noncoding sequences (introns) are present. During processing of pre-mRNA these introns are removed to form mRNA. The removal of intron sequences must be done by the cellular machinery with great precision. Nucleotide signals at the splice sites guide the enzymatic machinery. If a mutation alters one of these signals, then the intron is not removed and remains as part of the final mRNA molecule. The translation of this mRNA sequence alters the amino acid sequence of the protein product. Indels. Extra base pairs may be added (insertions) or removed (deletions) from the DNA of a gene. The number can range from one to thousands. Collectively, these mutations are called indels. Indels involving one or two base pairs (or multiples thereof) can have devastating consequences to the gene because translation Figure 9.7 Shifting the reading frame one of the gene is "frameshifted". Figure 9.7 shows how by nucleotide to the right, the same sequence shifting the reading frame one nucleotide to the right, the of nucleotides encodes a different sequence same sequence of nucleotides encodes a different of amino acids (Redrawn from http://users. sequence of amino acids. The mRNA is translated in rcn.com/jkimball.ma.ultranet/BiologyPages/ new groups of three nucleotides and the protein M/ Mutations.html) specified by these new codons will be worthless. Frame shifts often create new stop codons and thus generate nonsense mutations. The protein thus formed would probably be such that it will not be useful to the cell. Indels of three nucleotides or multiples of three may be less serious because they preserve the reading frame. However, a number of inherited human disorders are caused by the insertion of many copies of the same triplet of nucleotides. Fragile X syndrome and Huntington's disease are examples of such trinucleotide repeat diseases.

Based on Types of Suppressor Mutations Supressor mutations suppress the expression of gene. Suppressor mutations may be intragenic or extragenic. Intragenic suppressor mutations. Intragenic suppressors may be due to frameshift mutations of opposite type at a second site or second site missense mutation. Frameshift of opposite type at a second site. Consider the following example where symbol ―+‖ indicates an addition and ―–‖ indicates a deletion, ―○‖ no change while ―×‖ indicates changed codon. CAT CAT CAT CAT CAT CAT (X)↓(+) (T)↓(–) CAT XCA TCA CAT CAT CAT ○ × × ○ ○ ○

Mutation

9.13

Second site missense mutations. A second missense mutation within the same gene can act as a suppressor of mutant phenotype. Study the example given below: Single-site missense mutant a1 × a1 Single-site missense mutant a2 × a2 Double-site missense mutant a1-a2 × a1

× a2

This type of suppressor mutation is very common. But it is difficult to explain how these restore normal function of the protein. Extragenic suppressors. In this case, mutations in tRNA gene cause suppression of a protein-coding gene. These may be further classified as nonsense, missense or frameshift suppressors. Nonsense suppressor. A tRNA gene undergoes a mutation in its anticodon region that enables it to recognize and align with mutant nonsense codon and permit completion of translation. For example, UAG (a termination codon) mutates to UAU (which cods for Tyrosine). Missense suppressor. In this case also tRNA gene undergoes a mutation in its anticodon region such that it becomes complementary to some other coodn in mRNA. For example, UCC (serine) UGC (cysteine) tRNA has anticodon for cysteine but carries serine. Frameshift suppressor. Due to addition of a base at anticodon region, four-nucleotide anticodon in a tRNA is generated which suppresses the effect of a frameshift mutation in a gene caused by addition of one base pair.

Based on Type of Tissue in which Mutation is Induced Somatic mutations. These mutations may lead to a sector or a clone of mutant cells that can be recognized in the background of normal tissue. The timing of a somatic mutation during development will determine the size of the sector. There is no chance of transmission of somatic mutations to the next generation, unless vegetative means of propagation are available or vegetative cells are destined to become sex cells. Germinal mutations. These mutations occur in those tissues that ultimately form sex cells. Mutations induced in the cells of these tissues may be passed on to the next generation.

Based on Expression or Effect on Function Morphological mutations. These mutations affect the visible characters or traits of an organism, e.g., size, shape, color, etc. of plant or plant part, bacterial or yeast colony. Conditional mutations. In this case, a mutant allele expresses the mutant phenotype under a certain condition, but expresses normal phenotype under another condition, e.g., some of the male-sterile mutants in maize are male-sterile in one season (Summer) and are fertile in the other (Winter). Some leaf rust resistance (Lr) genes in wheat express at 15°C, some at 25°C while others do so at both the temperatures. Choi et al. (2009) present an excellent review on conditional mutations in Drosophila. Biochemical mutations. These mutations are identified by a loss or change of some biochemical function of the cell or by change of isoenzyme pattern in plants. Certain induced auxotrophic mutants

9.14

Essentials of Molecular Genetics

of Neurospora crassa or E. coli cannot grow on minimal medium unless supplemented with some nutrient. Resistance mutations. These mutations enable a cell or an organism to grow in the presence of some antibiotic, herbicides, insecticides or pesticides or a pathogen to which wild-types are susceptible. Polar mutations. In E. coli, genes of an operon are not expressed beyond the point of this mutation. Insertion of a transposon within first gene of the operon leads to polar effect. Loss-of-function mutations. Loss-of-function mutations result into gene products having less or no function. When the allele has a complete loss of function (null allele) it is often called an amorphic mutation. Phenotypes associated with such mutations are most often recessive. Exceptions are when the organism is haploid, or when the reduced dosage of a normal gene product is not enough for a normal phenotype (this is called haploinsufficiency). Gain-of-function mutations. Gain-of-function mutations change the gene product such that it gains a new function. These mutations usually have dominant phenotypes and are often called neomorphic mutations. Dominant negative mutations. Dominant negative mutations (also called antimorphic mutations) have an altered gene product that acts antagonistically to the wild-type allele. These mutations usually result in an altered molecular function (often inactive) and are characterized by a dominant or semidominant phenotype. In humans, Marfan syndrome is an example of a dominant negative mutation occurring in case of an autosomal dominant disease. In this condition, the defective glycoprotein product of the fibrillin (FBN1) gene antagonizes the product of the normal allele.

Based on Survival of the Organism Many mutations affect some vital functions of the organism so they have drastic effect on the survival. Depending upon the degree of effect, these are classified as lethal, sublethal, vital and supervital. Lethal mutations. Lethal mutations cause death of the organism before it reaches adult stage. These mutations can be recessive, dominant, conditional, balanced or gametic lethals. Recessive lethals. Most of the lethal mutations are recessive as their lethal effect is expressed only when they are in homozygous or hemizygous state. Survival of heterozygotes is unaffected. Many recessive lethals affect some other character in addition to their effect on survival; e.g., yellow coat color in mice where (yLyL) individuals have grey coat color (YLyL) have yellow color while (YLYL) are lethal. Y gene is dominant for color, but recessive for lethality. Dominant lethal. Some mutations reduce viability in the heterozygous state as well. The individuals having such a mutation die even before adult stage. These are not maintained in population, e.g., epiloia mutation in human beings causes abnormal skin growth, severe mental effects and multiple tumors in the heterozygous condition. Conditional lethal. The lethal mutations that require a specific condition for their expression are termed as conditional lethals. Many mutations in Drosophila, Neurospora, barley, maize, etc. are temperature-sensitive since they require higher temperature to show their lethal effect. Balanced lethal. In this case, two non-allelic recessive mutations are linked in the repulsion phase. Here the recessive allele of one gene and dominant allele of other are present on the same chromosome. In this case, all the progeny is heterozygous for the lethal genes and homozygotes are not obtained. Balanced lethal mutations are known in Drosophila, mice, Oenothera and many other organisms. Gametic lethal. Some mutations lead to the inviability of the gametes carrying them. This disturbs the ratio expected in a segregating generation. This phenomenon is called segregation distortion, e.g., males of Drosophila pseudoobscura having a sex-ratio mutation mostly produce inviable or nonfunctional Y-chromosome-containing sperms. When these males are mated to normal females, almost

Mutation

9.15

all the progenies carry only females. In this case, mostly X-chromosome containing sperm are functional. Sub-lethal or semi-lethal mutations. These mutations do not lead to the death of all the individuals that carry them. They cause death of variable per cent of individuals. More than 90 per cent of the individuals in certain xantha and chlorina mutants in crop plants are semi-lethal. Some examples of semi-lethal mutations in man are sickle cell anemia, weakness, pain, heart failure, rheumatism. Vital mutations. These mutations do not affect the survival of the individuals in which they are present. Some mutations, relating to height, shape and size of leaves, fall in this category. Supervital mutations. Mutations that enhance the survival of the individuals carrying them in the appropriate environment or stress are known as supervital mutations. For example, mutations inducing resistance to diseases and tolerance to salinity, alkalinity, high temperature, low temperature, or drought in crop plants are supervital because they enhance the fitness of the plants in the presence of a particular stress. Similarly, antibiotic resistant, pest resistant, high temperature resistant mutants in bacteria have stress resistant mutations.

Based on Location of Genes Nuclear mutations. These mutations refer to mutations in the nuclear genes. In organisms having two (a homogametic and a heterogametic) sexes, nuclear mutations may be autosomal, sex (X)-linked, Ylinked (holandric) or X- and Y-linked (pseudoautosomal). Autosomal mutations are located in the chromosomes that are present as a homologous pair in both the sexes. Sex-linked mutations are located in the differential region of a chromosome which is present twice in one sex but only once in the other. For example, X chromosome in human and Drosophila is present twice in female but in male it is present only as one copy. Holandric mutations are present in a gene located in differential region of a chromosome which is present in heterogametic sex but not in the homogametic sex. For example, Y chromosome is present in human and Drosophila males but not in the females. X- and Y-linked mutations are present in the pairing regions of X and Y chromosomes. Differential and pairing regions of human X and Y chromosomes are shown in Figure 9.8. Single autosome recessive gene mutations are responsible for certain human disorders such as albinism, Tay-Sachs diseaFigure 9.8 Different regions of human X and Y chromosomes (Redrawn from se, cystic fibrosis, phenyhttp://www.ucl.ac.uk/~ucbhjow/ b250/sex_determination.html) lketonuria, thalassemia, sickle cell anemia. Sex-linked recessive mutations lead to human disorders, e.g. hemophilia, Duchenne muscular dystrophy, Lesch-Nyhen syndrome.

Essentials of Molecular Genetics

9.16

Cytoplasmic mutations. These mutations refer to mutations in the cytoplasmic DNA. All the F1s will show the characters of the female parent. Chloroplast and mitochondrial mutations fall in this class. These mutations show no linkage with those in nuclear DNA. Results from reciprocal crosses between individuals from wild-type and mutant strains are used to infer location of mutations (Table 9.3). Table 9.3 Genetic explanation inferred from some reciprocal crosses between wild-type and mutant strains in animal and plant systems Phenotype of F1 progeny of Cross Mutant × Wild-type Animal and Plant Systems All wild-type All mutant All mutant Animals (having XY system) All females and males wild-type All females and males mutant All females wild-type All males mutant All females and males mutant

Wild-type × Mutant

Genetic explanation of mutant phenotype

All wild-type All Mutants All wild-type

Nuclear recessive gene Nuclear dominant gene Cytoplasmic gene

All females and males wild-type All females and males mutant All females and males wild-type

Autosomal recessive gene Autosomal dominant gene Sex-linked recessive gene

All females mutant All males wild-type

Sex-linked dominant gene

Based on Type of Amino Acid Substituted Depending upon the type of amino acid substituted, the missense mutations may be classified as conservative or drastic. Conservative substitutions. In this type of substitution, one amino acid is replaced by another amino acid possessing the same charge (Table 9.4). In all these cases, the net effect on the charge of protein in the molecule is nil. Conservative mutations may or may not affect the function of the protein. It depends upon the site of replacement. Certain biochemical or kinetic properties, viz., activity, K m, Ki, Vmax, may be affected by conservative mutations. Such substitutions cannot be detected electrophoretically. Table 9.4 Various conservative mutations (Reproduced, with permission, http://www.ndsu.edu/pubweb/~ mcclean/plsc431/eukarychrom/eukaryo3.htm © Phillip McClean) Change in amino acid (AA) From To Acidic AA Acidic AA Basic AA Basic AA Neutral AA Neutral AA Non-polar AA Non-polar AA

Examples Aspartic acid, Glutamic acid Lysine, Arginine, Histidine Glycine, Serine, Threonine, Tyrosine, Cysteine, Glutamine Alanine, Valine, Leucine, Tryptophan, Methionine, Proline

Drastic substitutions. Drastic mutations are also called radical mutations. In this case, amino acids having dissimilar charge replace each other; chemically reactive amino acids may be replaced by nonreactive amino acids, or amino acids having high molecular weight may be replaced by amino acid having low molecular weight, or vice-versa. These mutations result in differences in mass/change ratio

Mutation

9.17

of protein and hence can be detected electrophoretically. The effect of such a mutation depends upon the type of replacement or site of substitution, which may vary from 'no' effect to 'full' effect.

Based on Relevance to Evolution All mutations are not relevant to the process of evolution, i.e., all mutations do not accompany the process of evolution. Accordingly, the mutations are classified as forbidden and favored. Forbidden mutations. These are those missense mutations that affect the function of a gene so drastically that the individual cannot tolerate them, e.g., frameshift mutations. Such mutations do not accompany the process of evolution. Favored mutations. These are those missense mutations that affect less critical sites of a polypeptide chain such that the basic functional property of the polypeptide does not change. Such mutations may, however, affect some kinetic properties (like Km, Ki, Vmax) of the protein/enzyme and may be more efficient under one environment or the other. Such mutations may accompany the process of evolution.

Based on Degree of Character Expression The visible mutations are classified as amorphic, hypomorphic, isoallelic and hypermorphic, depending upon degree of expression. Amorphic mutations. These mutations lead to almost a total loss of the expression of a trait as they produce non-functional proteins or enzymes, e.g., white eye in Drosophila. Hypomorphic mutations. These mutations lead to a partial loss in the expression of a trait because the enzyme produced by the mutant allele is partially functional, e.g., chlorina mutants in crop plants. Isoallelic mutations. These mutations do not affect the intensity of expression of the concerned trait. The enzymes encoded by such alleles are comparable in activity to those produced by their wild-type alleles. The enzymes can be differentiated electrophoretically, as they have different mobility under specific conditions, e.g., alleles of rosy locus of Drosophila. Hypermorphic mutations. These mutations lead to an increased expression than that of the wild-type, e.g. rII mutants of bacteriophage T4 produce relatively large plaques and are called rapid lysis mutants.

DETECTION OF MUTATIONS Mutation detection is important in all areas of genetics. Various test systems used for detection of mutagenicity include viruses, bacteria, Neurospora crassa, plants, Drosophila, small mammals, and even man. Classical methods of mutation detection have helped a great deal in understanding various fundamental concepts of genetics. Nowadays molecular methods are also used to detect mutations. Various methods commonly used for detection of mutations are described here only very briefly.

Viruses Viruses infect large range of organisms and effect of viruses is studied from kinds of plaques produced by viral destruction of bacteria. The plaques can also be seen on animal cell tissues. In plants, for example, tobacco mosaic virus (TMV) expresses its effect through size, shape, and color of lesions produced. When there are mutations in viruses, the properties of phenotypic effects change. Host range mutations, i.e., the changed ability to infect formerly immune hosts, are another product of viral mutations.

9.18

Essentials of Molecular Genetics

Bacteria Samples taken from bacterial clones are first grown on complete medium to enable all the prototrophic (wild-type) and auxotrophic (mutant) cells to grow. The bacterial colonies are then grown on minimal medium. If growth occurs, this means that the bacterial strain is prototroph. If growth fails to occur, the strain is considered an auxotroph. If growth can occur by supplementing the medium with specific vitamins, amino acids, or compounds, specific auxotrophs are identified. Replica-plating technique Lederberg and Lederberg (1952) introduced replica-plating technique to indirectly select bacterial mutants (Figure 9.9). Master plate shows diffused bacterial growth of phage-sensitive Escherichia coli on a non-phage medium. Replicas are made by pressing a velvet-covered wooden block against the master plate, and then pressing this, in the same oriented direction, to the surface of Petri dishes containing culture medium mixed with phage. One master plate possesses sufficient bacteria to start colonies on a number of replica plates. Replica plates show occurrence of resistant colonies in identical locations, indicating that resistance to a phage must have been present at each of these positions in the master plate.

Figure 9.9 Replica-plating technique (Adapted, with permission, from Sager, R.; Ryan, F. Cell Heredity. New York: John Wiley & Sons)

Ames test To detect the mutagenic effects of various commercial and pharmaceutical products, B.N. Ames devised a simple inexpensive test that relies on the reverse mutation of histidine-requiring (his–) auxotrophs in Salmonella typhimurium to wild-type (his+) prototrophs (Ames et al. 1975). Ordinarily, such reversions are quite rare, no more than about once in 100 million (10 8) cells; meaning that very few, if any, prototrophic colonies will be found when 10 8 his– cells are plated on minimal agar. However, should a tested substance be mutagenic then increased densities of his+ revertants will occur in those areas of his– plates to which the substance is applied. To improve the efficiency of the test, strains of bacteria are used that are defective in DNA repair (improving the ability to detect mutagenicity) and that also have increased permeability (allowing substances under test to enter the cell more easily). In addition, the substances to be tested are pre-incubated with liver extracts in order to simulate mammalian metabolic activity, and thus to discover whether such substances, as

Mutation

9.19

metabolized in the body, produce mutagenic activity. The Ames test and its various derivatives are, therefore, being extensively used to obtain information on both mutagenicity and carcinogenicity of a large number of chemicals. Two types of his– auxotroph strain are used: Strain TA100 is highly sensitive to reversions by base pair substitutions whereas strains TA1535 and TA1538 are sensitive to reversions by frameshift mutations.

Neurospora crassa Beadle and Tatum (1941) developed a technique to induce and isolate nutritional mutant of N. crassa). Conidia of a particular strain are exposed to agents causing mutation (X-rays or ultraviolet and crossed to wild-type of the opposite sex. Haploid spores of this cross are then isolated and grown on a complete medium. Inability of such an isolated strain to grow on a minimal medium indicates a growth defect (Beadle 1946). Attempts are then made to discover the source of this defect by growing the aberrant strain on a minimal medium supplemented with various additives. If pantothenic acid added to a minimal medium enables the strain to grow, it indicates that the mutant strain is pantothenicless. Observation of the expected 4 wild-type: 4 pantothenicless segregation ratio in a cross with wild-type indicates that the mutation is of nuclear origin.

Plants Cytological effects For detection of cytological and clastogenic effects, developing germ cells or root tip cells are studied cytologically for physiological and clastogenic effects. Auxotrophic mutants For detection of auxotrophic mutants, a method was developed by Carlson (1970). Haploid plants are produced by culturing tetrad stage anthers. Longitudinally spliced stem pieces of the haploid plants are inoculated on callusing medium. Haploid calli are transferred to liquid medium to get cell suspensions. Large populations of haploid cells (at density of 25 × 103 cells/ml) are treated with a mutagen. Cells are washed twice in liquid medium and resuspended in fresh medium. Cells are incubated for 4 days with fresh medium added after 2 days interval. After 4 days, the medium is supplemented with BUdR (to a final concentration of 10–5 M) and kept in dark for 36 h. At the end of dark incubation period, the cells are washed twice with liquid medium and plated on solid medium. Cells are illuminated with cool white fluorescent lamps. Growing cells are killed at this step and only mutant cells survive. Calli appearing on this medium are isolated. Calli are transferred to unsupplemented medium. Calli not showing normal growth are selected. Selected calli are tested on a number of different nutrient supplements to determine their auxotrophic requirement. Out of 119 calli, only 6 were auxotrophic for hypoxanthine, biotin, arginine, lysine, proline, p-amino benzoic acid. Plants were regenerated from four mutant cells. Disease-resistant mutants For detection of disease resistant mutants, another method was given by Carlson (1973). For selection of disease-resistant mutants of any plant species, either toxin released by the pathogen or its structural analog should be available. Methionine sulfoximine is structural analog of toxin produced by

9.20

Essentials of Molecular Genetics

Pseudomonas tabaci, causing wild fire disease of tobacco. Haploid plants are raised by anther culture method from haploid stem pieces. Haploid cell suspensions are prepared. Haploid protoplasts or cells are treated with some mutagen. The treated protoplasts are washed twice with liquid medium and plated on solid medium. Cultures are grown for two weeks and are overlaid with an equal volume of medium containing 10 mM of pathotoxin or structural analog. These cultures are incubated for three months. Surviving calli are placed on medium lacking pathotoxin. Each callus is grown for several months and tested for resistance to pathotoxin. Calli retaining resistance are diploidized and regenerated into whole plants. Regenerated plants are tested for their resistance to the disease. Recessive mutations Stadler (1928) induced mutations artificially. For this purpose, irradiated pollen from dominant stock is used for pollinating recessive stock. Progeny showing recessive phenotype were classified as mutants, even though, at least in some cases, these were later found to be due to deficiencies. Detection of recessive mutations is done by using a method given by Singleton (1951). Singleton also studied induced mutations. The pollen from plants dominant for several genes and growing in a field with radiation source were used to pollinate a recessive stock growing in a field without radiation source. The pollen carrying mutations will give seeds, which will show recessive character in phenotype. Mutations for endosperm characters For detection of mutations for endosperm characters, a method was given by Stadler (1930). He studied frequencies of spontaneous mutations in maize for endosperm characters. The following steps are involved. A genetic stock dominant for several genes is grown as female parent and detasseled. Seed of multiple recessive stock is sown on every fifth row to supply pollen. Seed set on female plants is examined for endosperm characters, e.g., shrunken. Most of the seeds show dominant phenotype; the number of seeds showing recessive character represented mutations in female gametes. Mutations at unspecified loci Detection of mutations at unspecified loci is done when a plant is used where no marker stocks are available and the purpose is to study visible mutations in the genome as a whole. The mutations are studied in segregating (M2) generation. It involves three steps: (a) irradiation of seeds, (b) obtaining the M1 generation and selfing the individual plants, and (c) growing M 2 single plant progeny and study of segregation in M2 families. Each family represents one mutation in an irradiated seed.

Drosophila Drosophila tests autosomal and sex-linked recessive lethals, translocation induction, and dominant lethals through genetic crosses. All different types of chromosomal aberrations can be easily detected because of the presence of polytene chromosomes. Techniques for the detection of various types of mutations in D. melanogaster have been described by Muller and Oster (1963). Detection of sex-linked recessive lethals ClB technique. This method involves use of a ClB stock which carries an inversion in heterozygous state to work as crossover suppressor (C), a recessive lethal (l) on X chromosome in heterozygous state and a sex-linked semi-dominant marker, Bar (B) (narrow slit-like eye). One of the two X chromosomes in a female fly carried all these three features and the other X-chromosome was normal. Male

Mutation

9.21

flies irradiated for induction of mutations were crossed to ClB females. Male progeny receiving ClB Xchromosomes will die. The CIB female flies obtained in progeny can be detected by barred phenotype. These are crossed to normal males. In the next generation, 50 per cent of males receiving ClB Xchromosomes will die. The other 50 per cent males will receive X chromosomes, which may or may not carry the induced mutation. In case a lethal mutation was induced, no males will be observed. On the other hand, if no lethal mutation was induced, 50 per cent males will survive. Thus, the ClB method was the most efficient method for detecting sex-linked lethal mutations (Muller 1927). H.J. Muller was awarded Nobel Prize in 1946 for inducing and detecting mutations in Drosophila by use of X-rays. Muller-5 technique. ClB technique has now been replaced by Muller-5 method (Muller and Oster 1963), which is utilized to detect sex-linked recessive lethals induced in males of D. melanogaster. Muller-5 stock carries an X chromosome that contains Bar (B) and white-apricot (wa) and two scute inversions. The precise genetic transcription of the stock is ―In(1)scSIL sc8R+S, scS1 sc8 wa B‖. If a lethal is induced in the wild-type male being tested, F1 heterozygous female will carry the lethal on the X chromosome derived from the male parent. If this is so, wild-type males will not appear in F2. Frequency of lethal mutations can be accurately scored in large samples. If a recessive lethal is not present in X chromosome of the wild-type male being tested, F1 heterozygous female will carry normal X chromosome derived from the male parent. In F 2 progeny, four phenotypic classes, namely, Muller-5 females, heterobar females, Bar-eyed males and normal males, are normally present in 1:1:1:1 ratio, statistically. If a recessive lethal is present in the X chromosome of the wild-type male being tested, the above-mentioned four phenotypic classes will be present in 1:1:1:0 ratio, statistically and such mutations are termed as complete lethal. When ratio of these phenotypes is 2:2:2:1, mutations are called semi-lethal but when ratio of these phenotypes is 4:4:4:1, mutations are termed as quasi-lethal. To decide among complete, semi- and quasi-lethal mutations, a large F2 progeny should be scored. Frequency of lethal mutations can be accurately scored in large samples. In order to confirm induction of a SLRL, heterobar female from F2 progeny is mated with a Muller-5 male. If ratio observed in the F3 generation is same as that observed in F2, induction of SLRL is confirmed. The yellow-Bar Test. In yellow-Bar test, males of genotype yB/y+Y are treated and then mated to Oster females of genotype Inscy; bw; st pp. The genetic constitution of inscy chromosome is ―In(1)scS1L sc8R + dl-49, y scS1 sc8‖. Since inversion delta-49 is longer than inversion S, it provides even better inhibition of crossing-over in the F1 females. The yellow-Bar test has the advantage that, if required, loss of X or Y chromosome can be scored in F1 progeny and that F1 males can be used to test induced translocations. Absence of Bar-eyed males in F2 indicates the induction of recessive lethals. Detection of autosomal recessive lethals Detection of mutations on autosomes (chromosomes other than sex chromosomes) of D. melanogaster makes use of a balanced lethal stock. Balanced lethal stock is one which carries recessive lethal alleles on two specific homologous chromosomes, their normal allele being present on its homolog; consequently, homozygous individuals will not survive. The concept of balanced lethal would be better understood using a specific example. For detection of visible mutations in Drosophila, a stock carried the dominant genes Cy (Curly wing) and L (Lobed eye) on one chromosome and Pm (Plum, i.e., brownish eye) on the other homolog so that the organism can be designated as Cy L/Pm. Phenotype of the stock which was heterozygous for Cy, L and Pm genes was Curly, Lobe and Plum. If such individuals are crossed among themselves, progeny is always heterozygous, because each of the two homologous chromosomes carried recessive lethal genes. Through the presence of inversion, crossingover was also suppressed, so that Cy and/or L may not be transferred to the chromosome carrying Pm and vice-versa. For detection of an autosomal mutation, Cy L/Pm stock is crossed to irradiated fly. From F1 generation, Cy L flies were again backcrossed to Cy L/Pm in order to obtain Curly, Lobe

9.22

Essentials of Molecular Genetics

(heterozygotes) male and female flies carrying the same autosome to be tested. Such Cy L heterozygotes when crossed among themselves would give genotypic ratio 1 Cy L/Cy L : 2 Cy L/++ : 1 ++/++. Since Cy L homozygotes are lethal, phenotypic ratio will be 2 Curly : 1 wild-type. However, if lethal mutation was induced, only Curly flies will appear. If a recessive visible mutation was induced this will appear in homozygous state in one-third of the progeny. Thus balanced lethal system also detects visible mutations. Detection of sex-linked visible mutations Attached X method which utilizes Muller-5 and attached X-chromosomes is used for this purpose. The attached X females (X^XY) have a special advantage. When these females are crossed to a treated male, X chromosome of irradiated male goes either to superfemale daughters or to the sons. Since in the sons there is single X chromosome any visible mutation will immediately express itself and can be easily scored. Dominant lethal test Although it constitutes a less precise estimate, it gives a quick estimation of any chromosome breakage effects. First of all, only embryonic lethality is measured. For this, white lethal eggs and brown lethal eggs are counted separately. White lethal eggs represent unfertilized eggs, eggs fertilized by nonfunctional sperms, or embryos killed during first few hours of development (gastrulation). An increase of only white eggs strongly suggests that the test substance interferes with gametogenesis, fertilization or physiological functions of the gametes. On the other hand, an increase in the number of brown lethal eggs represents the induction of damage leading to late death. In most of the cases, this damage results from chromosomal alterations which do not interfere with mitosis. An increase of brown lethal embryos strongly suggests that the test compound damages the chromosomes. To further test whether genetic or non-genetic damage contributes to this lethality, another set of experiments is done. Its basis is that the extent of genetic damage induced in developing mature gametes is proportional to the amount of genetic material treated within the cell. If the flies are treated which produce at least two types of gametes differing in the amount of chromosomal material included in the nuclei, the number of true dominant lethals should be higher in the one with more chromosome material treated. If true dominant lethal mutations are induced, less number of females than males will be found in the progeny. The lethality for males and females is calculated separately. In control group, suppose number of eggs ‗E‘ were placed into vials. ‗F‘ females and ‗M‘ males survived. In treated group, number of eggs ‗e‘ were placed in vials, ‗f‘ females and ‗m‘ males survived. Expected females = F.e/E, while expected males = M.e/E. Therefore, lethality is calculated by using the formula: observed number/expected number. Thus lethality of female progeny = f/(F.e/E) and lethality of male progeny = m/(M.e/E). Chromosomal aberration detection Salivary chromosomes in the third instar larvae of Drosophila have a giant size and show a large number of well-defined landmarks in form of characteristic bands, puffs, swellings, bends, etc. when dyed. Different types of chromosomal rearrangements can easily be detected under a light microscope by comparing with a standard cytological map and photomicrographed for record. Test for translocation between second and third chromosomes Presence of markers on the 2nd and 3rd chromosomes in the Oster stock makes it possible to test genetically for the induction of heritable reciprocal translocations. In the absence of a II-III

Mutation

9.23

chromosome translocation, the F2 progeny will contain four classes of phenotypes: red, white, orange and brown eyes. In case of translocation, instead of four Mendelian classes, only two types of flies with red and white eyes are observed (due to deficiencies and duplications). Translocation thus results in suppression of independent Mendelian reassortment but instead genes on 2 nd and 3rd chromosomes of II-III translocation heterozygote segregate as if they were linked together. Toxicity test First of all, the possible toxicity of the test compound is worked out. For this, groups of males are fed with food medium having different concentrations of the test compound. After an overnight feeding, males are placed in vials containing standard food medium. After 24 h, number of flies still alive are counted and compared with control. Significant reduction in the number of flies indicates the toxicity of test substance. Such a short-term test provides sufficient information on the toxicity for further experiments to be adequately planned. Sterility test Groups of one- to two-day-old males are fed with food having selected concentrations of the compound to be tested. After an overnight feeding, each group of males with about twice the number of females is placed in separate culture bottles. After 2-3 days, males are given a new set of virgin females. This is continued for 10-12 days to raise 4-5 broods. Number of progeny in different broods is compared with control. If in some broods, reduction in progeny (number of flies) is observed then a dominant lethal test is carried out.

Small Mammals Micronucleus test This test devised by Schmid (1977) utilizes small mammals and is based on the following principles and observations: during anaphase acentric chromatids and chromosome fragments lag behind and are not included in any of the spindle poles. Nuclear membranes are formed at two poles around damaged but not around undamaged chromosomes. Chromosome elements, which are not included into any of regular nuclei, are transformed into one or several secondary nuclei which are much smaller than the principal nucleus. These are called micronuclei. The micronucleus test is a method, therefore, devised primarily for screening chemicals which have chromosome breaking effect. Intraperitoneal injections of the test substances are given to 7- to 12-week-old rats or mice. The treatment is allowed for 30 h so that the cells are exposed for two cell cycles. Animals are killed and bone marrow cells are extracted from the femur bone. Micronuclei are found in many cell types – myeloblasts, myelocytes and erythroblasts. Young erythrocytes are most suitable for this test. A few hours after completion of the test, erythroblasts expel their nucleus, micronuclei remain in the cytoplasm of the young erythrocytes and there they are easily recognizable. Formation of micronuclei is indication of mutagenic nature of test compound. There are many advantages of micronucleus test. Spontaneous chromosomal aberration rate in direct preparations of bone marrow cells from many mammalian species is very low ( i+; i+ > i–. Since i mutants acted in both cis- and trans-arrangement, this implied that i gene specified a product which could diffuse from site of its synthesis to an altogether different region of the genome and thereby influenced the function of wild-type operator.

Essentials of Molecular Genetics 

22.8 

Table 22.3 Genotypes and phenotypes of haploids and merodiploids for the lac region in E. coli strains    Genotype  Phenotype  Interpreted i allele behavior    Constitutively synthesized  Inducibly synthesized  proteins  proteins  Haploid  i+o+z+y+a+  None  All three  Wild type  iso+z+y+a+  None  None  Superrepressor mutant  i–o+z+y+a+  All three  None  Constitutive mutant  Merodiploid  s + + + + s + None  None  i  dominant over i   iozya  + + – – – iozya  iso+z–y–a–  None  None  is dominant over i+  + + + + + iozya  i–o+z+y+a+  None  All three  i+ dominant over i–  + + – – – iozya  None  All three  i+ dominant over i–  i–o+z–y–a–  + + + + + iozya 

control region operator promoter i- gene

CAP site RNA polymerase attachment site

z-y-a structural genes

DNA Transcription mRNA Translation

Defective repressor polypeptide (monomer)

transcription of structural genes and lac enzymes synthesized

active repressor that binds to lac operator not formed

  Figure 22.6 The i– mutation leads to constitutive synthesis of the three enzymes of lac operon 

The action of i– and is mutants is shown in Figure 22.6 and Figure 22.7, respectively. Interpretations that can be drawn from the studies on i and o mutants are: repressor and operator genes interact in the control of structural gene activity. Since operator controlled lac z-lac y-lac a genes in cis-arrangement, Jacob and Monod proposed that operator gene controlled coordinated transcription of the three structural genes in the same strand. Operator determined whether or not the three structural genes would be transcribed. Repressor physically interacted with the operator site. When inducer was present, it combined with the repressor and the altered repressor could no longer bind with the operator site. Thus operator was available for RNA polymerase to start the transcription of polycistronic operon.

Gene Regulation in Bacteria 

22.9 control region operator promoter

is gene

CAP site RNA polymerase attachment site

z-y-a structural genes

DNA transcription of structural genes prevented by is repressor even in presence of inducer

Transcription mRNA Translation

inducer is Repressor Repressor (tetramer) polypeptide (monomer) binds to operator but not inducer

  Figure  22.7  Working  of  lac  operon  in  presence  of  i  repressor  and  inducer.  Transcription  of  lac  operon  is  prevented even in presence of inducer 

In the absence of inducer, repressor could prevent transcription. Thus, in absence of inducer, the three enzymes are not synthesized. W. Gilbert isolated in 1960's purified i gene product, repressor, which was present in i+ but not in i– cells (Gilbert and Muller-Hill 1966, 1967). Repressor protein binds to o+ but not oc DNA. This confirmed the model of Jacob and Monod. Mutations in promoter. The third kind of regulatory mutations in lac region of E. coli was found in another site, called promoter. This site was found to be present to the left of the operator. Promoter mutants coordinately reduced proteins of lac z-lac y-lac a cluster. There mutants responded to i gene (repressor) control. That is, the three proteins are made in low amounts in a promoter mutant cell when lactose is present. This means repressor control has not changed. Promoter is the specific DNA site to which RNA polymerase binds. Polymerase can move along DNA if it is not blocked by a repressor bound to the operator between promoter site and lac z-lac y-lac a cluster. In mutant promoter cell, RNA polymerase does not bind to promoter tightly. Polar mutations. Direction of transcription was determined by studies on polar mutants. A polar mutation reduces all wild-type activity distal to it so that a polar mutation in z gene influences lac z, lac y and lac a; a polar mutation in lac y influences lac y and lac a genes only; and a polar mutation in gene lac a influences gene lac a only. These studies showed that direction of transcription is distally, from promoter to the operator and on to lac z, lac y, lac a genes in that order. Controlling region of lac operon. Detailed base sequence of controlling region of lac operon of E. coli is given in Figure 22.8. Sites of various components of the controlling region have been marked. Nucleotide length of different components of lac operon of E. coli: lac i promoter; 40 nucleotides; i gene, 111 nucleotides; lac i operator, 26 nucleotides; lac z gene, 3063 nucleotides; lac y gene, 800 nucleotides; lac a gene, 800 nucleotides. E. coli lac promoter has three components: cgs, catabolic gene activator (CAP) site; ibs, initial binding site; and op, operator. cAMP is essential for activation of cga protein. cAMP-cga protein system is influenced by level of glucose. This exerts a positive control on lac operon. E. coli catabolite activator protein (CAP) is a helix-turn-helix motif sequence-specific DNA binding protein. CAP was converted into a site-specific cleavage agent by incorporation of the

Essentials of Molecular Genetics 

-100

gln

.

.

-84

-90

f-met

AATTGTGAGCGGATAACAATTTCACACAGGAAACAGCTATGACCATG--3' TTAACACTCGCCTATTGTTAAAGTGTGTCCTTTGTCGATACTGGTAC--5'

5'--GGAAAGCGGGCAGTGA 3'--CCTTTCGCCCGTCACT

.

lac z gene

operator mRNA transcript

stop

gly

glu

ser

lac i regulator gene

thr met

22.10 

+1

.

.

+10

+20

.

.

+30

+40

promoter RNA polymerase interaction site

CAP site axis of symmetry

high G-C

high A-T

high G-C

GCGCAACGCAATTAATGTGAGTTAGCTCACTACTTAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGA CGCGTTGCGTTAATTACACTCAATCGAGTGAGTAATCCGTGGGGTCCGAAATGTGAAATACGAAGGCCGAGCATACAACACACCT

.

.

.

.

.

.

.

.

.

-80

-70

-60

-50

-40

-30

-20

-10

+1

A T

T A

down-promoter mutations Figure 22.8 Control region of lac operon of E. coli 

A T

A T

up-promoter mutations

 

chelator 1,10-phenanthroline at amino acid 10 of the helix-turn-helix motif (Ebright et al. 1990). Polypeptide size of repressor and three enzymes produced by lac operon is: repressor (monomer), 360 amino acids; β-galactosidase, 1021 amino acids; permease, 275 amino acids; transacetylase; 275 amino acids. Positive control of lac operon. There also exists a positive control of gene action in lac operon. Positive control in lac operon of E. coli was studied by glucose effect. Cyclic adenosine monophosphate (cAMP) is the key metabolite which influences glucose effect. It is synthesized from adenosine triphosphate (ATP) by adenylate cyclase (Figure 22.9). cAMP is effective only when bound to catabolite activator protein (CAP). CAP consists of 209 amino acids. The carboxyl domain is involved in DNA binding. cAMP is bound to its amino terminal domain. Each CAP molecule has three α helices (αD, αE and αF) and distance between αF helices in a dimer is 34Å making it possible for them to fit into the grooves of B DNA, which may take the form of z DNA during DNA-protein interaction. When cAMP-CAP complex binds to a specific site on operon, it increases the rate of transcription of lac z-lac y-lac a cluster. This system enables the cells to utilize a large number of different sugars as a source of carbon. CAP-deficient strains and adenylate cyclase deficient strains do not influence glucose effect. Adenylate cyclase-deficient strains and CAP-deficient strains make low amounts of inducible enzymes. The positive and negative controls act together, but each is a separate regulation system for operon transcription. Binding site of cAMP-CAP complex is located proximally to the lac promoter. When cAMP-CAP complex is bound at the promoter, binding of RNA polymerase to the operon is greatly enhanced. DNA bending upto 180º is observed during binding of CAP protein (Figure 22.10). DNA bending is very important in activation of transcription. Local destabilization of double helical DNA is induced at RNA polymerase interaction site of the promoter by cAMP-CAP binding. RNA polymerase binds to the destabilized area. RNA polymerase then moves to the operator site and begins transcription. When E. coli is grown on medium containing both glucose and lactose, only glucose is utilized, the lac operon is inactive. The same situation holds well in case of glucose and galactose. The galactose (gal) operon remains inactive. The glucose blocks the activity of other operons through cAMP. When glu-

Gene Regulation in Bacteria 

22.11 NH2 O

O

O

O

C N 7 C5 6 HC 8 4 9 O N C 3 N 1' C

5'

P O P O P O CH 2 OOOC 4' H

H

H

C

3'

2'

OH

1N 2

CH

H

C OH

adenosine triphosphate (ATP) adenylcyclase

inhibited by glusose NH2 C

O

O

N 7 C5 6 HC 8 4 9 O N C 3 N 1' C

5'

O CH 2

P

C 4' H

H

H

C

3'

O

2'

1N 2

CH

H

C OH

cyclic adenosine monophosphate (cAMP) phosphodiesterase O

O

C N 7 C5 6 HC 8 4 9 O N C 3 N 1' C

5'

O CH 2

P O-

C 4' H

NH2

H

H

C

3'

OH

2'

1N 2

CH

H

C OH

5' adenosine monophosphate (AMP) Figure 22.9 Conversion of ATP into cAMP and to AMP 

-10 -16 16 0 10 5¢ GCAACGCAATTAATGTGAGTTAGCTCACTCATTAGGCACCC 3¢ lac DNA CAP site

CAP

RNA polymerase -10

  Figure 22.10 DNA bending due to binding of CAP protein (Reprinted, with permission, from Gupta, P.K. 2009.  Genetics. 4th edition. Meerut: Rastogi Publications) 

cose is available as energy source, cAMP level decreases, which prevents the functioning of other operons involved in the metabolism of other sugars, like lactose, arabinose, galactose, maltose. When glucose levels decrease, cAMP level increases and other source of sugar can be utilized.

Essentials of Molecular Genetics 

22.12  Binding of a lac repressor to a lac operator 

Transcription factors regulate gene expression through their binding to DNA. In a living E. coli cell, Elf et al. (2007) directly observed specific binding of lac repressor, labeled with a fluorescent protein, to a chromosomal lac operator. Using single-molecule detection techniques, they measured the kinetics of binding and dissociation of the repressor in response to metabolic signals. Furthermore, they characterized the non-specific binding to DNA, one-dimensional diffusion along DNA segments through cytoplasm at the single molecule level. In searching for the operator, lac repressor spends ~90 per cent of time non-specifically bound to and diffusing along DNA with a residence time of 1 h) between sequential injections of sense and antisense RNA resulted in a dramatic decrease in interfering activity. This suggests that injected single-strands may be degraded or otherwise rendered inaccessible in the absence of the opposite strand. The phenotype produced by interference using unc-22 dsRNA was extremely specific. Progeny of injected animals exhibited behavior that precisely mimicked loss-of-function mutations in unc-22. Target specificity of dsRNA was assessed using three additional genes with well characterized phenotypes. unc-54 encodes a body-wall-muscle heavy-chain isoform of myosin that is required for full muscle contraction; fem-1 encodes an ankyrin-repeat-containing protein that is required in hermaphrodites for sperm production; and hlh-1 encodes a C. elegans homolog of myoD-family proteins that is required for proper body shape and motility. For each of these genes, injection of related dsRNA produced progeny broods exhibiting the known null-mutant phenotype, whereas the purified single RNA strands produced no significant interference (Fire et al. 1998). The results were interpreted as: (a) Silencing was triggered efficiently by injected dsRNA but weakly or not at all by sense or antisense single-stranded RNAs. (b) Targeted mRNA disappeared suggesting that it was degraded. (c) Only a few molecules per cell were sufficient to accomplish full silencing. This indicated that dsRNA was amplified. (d) The dsRNA effect could spread between tissues and even to the progeny, suggesting a transmission of the effect among cells. On the whole, RNAi can be divided into four stages – double-stranded RNA cleavage, silencing complex formation, silencing complex activation, and mRNA degradation. RNAi can be brought about by two processing pathways but the core components of RNAi machinery remain the same. Core enzymes involved in RNAi are Dorsha, Dicer, and Argonautes. The pathway of RNA interference starts when Dicer cuts the dsRNA into small interfering RNAs (siRNAs) that subsequently target homologous mRNAs for destruction. microRNA processing from stem loop precursors similarly requires Dicer activity. Dicer is also essential for the execution of phases of RNAi and exploring the distinct requirements for Dicer in the siRNA and miRNA pathways (Tijsterman and Plasterk 2004). Model for RNA silencing in Drosophila is given in Figure 26.1. In an ordered biochemical pathway, miRNAs (left panel) and siRNAs (right panel) are processed from double-stranded precursor molecules by Dcr-1 and Dcr-2, respectively, and stay attached to Dicer-containing complexes, which assemble into RNA-induced silencing complex (RISC). The degree of complementarity between the RNA silencing molecule and its cognate target determines the fate of the mRNA: blocked translation or immediate destruction. Drosha was first implicated in RNAi through its biochemical activity when it became apparent that many primary miRNA (pri-miRNA) transcripts contain multiple miRNAs that were first trimmed in the nucleus into separate pre-miRNA species before further processing. Drosha is responsible for carrying out pri-RNA cropping in human nuclear extracts. Drosha is contained within a large nuclear complex, dubbed the microprocessor, where it interacts with a co-factor Pasha, which may initiate binding to the pri-miRNA and is essential for its activity. The cleavage of pri-miRNA is determined by the ssRNA-dsRNA junction at the base of the miRNA hairpin, where Drosha cuts ≈11 bp, or one

Essentials of Molecular Genetics 

26.14  pre-miRNA

dsRNA

Dcr-1

Dcr-2/R2D2 siRNA

miRNA

R2D2 Dcr-1? RISC-mediated translation blockage Figure 26.1 Model for RNA silencing in Drosophila 

Dcr-1? RISC-mediated mRNA destruction

 

dsRNA helical turn, from the base (Han et al. 2006). The resulting pre-miRNA is a hairpin containing a 3′ two-nt overhang. After Drosha cleavage the pre-miRNA exits the nucleus via transport by exportin-5, which binds the 3′ two-nt of the pre-miRNA (Lund et al.2004). Drosha is conserved only in animals. Although plants express a wide variety of miRNAs, they do not possess a Drosha homolog. Dicer. The RNase III enzyme Dicer processes RNA into siRNAs and miRNAs, which direct RISC to cleave mRNA or block its translation (RNAi). Lee et al. (2004) have characterized mutations in the Drosophila dicer-1 and dicer-2 genes. Mutation in dicer-1 blocks processing of miRNA precursors, whereas dicer-2 mutants are defective for processing siRNA precursors. It has been recently found that Drosophila Dicer-1 and Dicer-2 are also components of siRNA-dependent RISC (siRISC). They find that Dicer-1 and Dicer-2 are required for siRNA-directed mRNA cleavage, though the RNase III activity of Dicer-2 facilitates distinct steps in the assembly of siRISC. However, Dicer-1 but not Dicer2 is essential for miRISC-directed translation repression. Thus, siRISCs and miRISCs are different with respect to dicer in Drosophila. Dicer leaves a characteristic dsRNA terminal consisting of 5′ phosphate and a 2-base overhang at 3′-end. Four domains of Dicer are: dsRBD, RNA helicase domain, RNase III domain, and PAZ domain. dsRBD contains a single dsRNA binding domain. RNA helicase domain causes unwinding of dsRNA molecule. RNase III domain has two independent catalytic sites that form the catalytic centre. Each site is capable of cutting one RNA strand of the duplex to generate products with a two nucleotide 3′ overhang. PAZ domain (Piwi/Argonaute/Zwille domain) is unique to RNAi machinery. It recognizes 3′ two-nucleotide overhang at siRNA end and therefore leaves doublestrand RNA and pre-miRNA substrates containing 3′ overhang nucleotides more efficiently than those containing blunt ends (Jaskiewicz and Fillipowicz 2008). Argonautes. Argonaute genes are found in bacteria, archea and eukaryotes and their number varies among different species. Argonaute proteins were discovered in plants more than a decade ago. Argonautes are the major players in RNA-based gene silencing pathways. Through the involvement of Argonautes in many RNAi-based silencing mechanisms, they contribute to maintain the genome, to produce small noncoding RNAs, to form the heterochromatin and to control RNA stability and protein synthesis. Argonaute proteins are classified into three groups based on their phylogenetic relationships and ability to bind small RNAs. Group 1 members are referred to as AGO proteins and they are capable of binding miRNA and siRNAs. Group 2 members are referred to as PIWI, that bind

Noncoding RNAs and Gene Silencing    

26.15

interacting RNAs (piRNAs) and group 3 members are found only in worms and bind to secondary siRNAs. Arabidopsis and rice genomes contain 10 and 18 group 1 argonaute-like genes, respectively. Argonaute proteins are similar to RNAse H endonucleases but they use RNA instead of DNA to target the RNA molecules (Gendrel et al. 2002). Argonautes contain four distinct functional domains, namely, the N-terminal, PAZ, MID and PIWI domains (Hamilton et al. 2002). PAZ, MID and PIWI domains have important functions in small RNA pathways. The PAZ domain recognizes the 3'-end of small RNAs and the MID domain binds to the 5' phosphate. The PIWI domain exhibits an endonuclease activity that is similar to that of RNase H enzymes. Three of the ten AGO proteins in Arabidopsis have been implicated in RdDM. AGO4 binds siRNAs from a number of transposons and repeats and interacts with the CTD of NRPD1b, which is consistent with the involvement of AGO4 in de novo methylation. AGO6 is required for DNA methylation and siRNA accumulation at some loci, where it may function redundantly with AGO4. Argonautes are also referred to as Ago proteins. These basic proteins function as catalytic engines. These consist of two highly conserved domains, viz., PAZ and PIWI. PAZ domain is constituted by about 130 amino acids. It enables binding of 3′-end of guide RNA. It is dispensable for target cleavage. PIWI domain is constituted by about 300 amino acids. It binds 5′ phosphate group of guide strand of siRNA. This domain shows a high degree of similarity to the catalytic core of RNaseH enzyme. It contains a DDH motif. DDH refers to the presence of three amino acids: two aspartate (D) and one histidine (H). This motif determines the SLICER activity of Ago members (Boisvert and Simard 2008). SiRNP assembly. Pham et al. (2004) characterize complexes that mediate RNA interference (RNAi) in Drosophila. Three distinct complexes (R1, R2, and R3) assemble on short interfering RNAs (siRNAs) in vitro. To form, all the three complexes require Dicer-2 (Dcr-2), which directly contacts siRNAs in the ATP-independent R1 complex. R1 serves as a precursor to both the R2 and R3 complexes. R3 is a large (80S), ATP-enhanced complex that contains unwound siRNAs, cofractionates with known RNAi factors, and binds the cleaved targeted mRNAs in a cognate-siRNA-dependent manner. These results establish an ordered biochemical pathway for RISC assembly and indicate that siRNAs must first interact with Dcr-2 to reach the R3 “holo-RISC” complex. Dcr-2 does not simply transfer siRNAs to a distinct effector complex, but rather assembles into RISC along with the siRNAs, indicating that its role extends beyond the initiation phase of RNAi (Pham et al. 2004). The dsRNA or siRNA silencing trigger is shown. R1 complex can arise either from Dcr-2 activity on long dsRNA triggers or from exogenous pre-cleaved siRNAs. R1 has already been shown to contain duplex siRNAs, and the ATP independence of R2 suggests that its formation also precedes siRNA unwinding. Both R2 and R3 complexes form from R1. One cannot formally exclude the possibility that R2 is an off-pathway dead-end complex; accordingly, two possible pathways are shown that differ in the placement of R2, either on (A) or off (B) the productive RNAi pathway. Additional complexes may be involved. The siRNA is unwound in the R3 complex. RNAi processing pathways  Two processing pathways, one mediated by siRNAs and the other by miRNAs, are described here. siRNA processing pathway. In siRNA processing pathway, the first step includes ATP-dependent processive double-stranded RNA cleavage into double-stranded fragments 21-25 nucleotides long. They contain a 5′ phosphate and a 3′ hydroxyl termini and a 2′ additional overhanging nucleotide on their 3′ ends. The fragments thus generated are called siRNA molecules. Experiments have revealed that this step is carried by RNaseIII like nuclease named Dicer. In the second step, siRNAs are incorporated into a protein complex (RISC) which is inactive in this form to conduct RNAi. The third step involves unwinding of the siRNA duplex and remodeling of the complex to generate an active

26.16 

Essentials of Molecular Genetics 

form of RISC. The final step includes the recognition and cleavage of mRNA complementary to the siRNA strand present in RISC. In some organisms such as C. elegans, plants, fungi, an additional step in the RNAi pathway has been described involving a population of secondary siRNAs derived from the action of RdRp (RNAdependent RNA polymerase). Secondary siRNA molecules are generated during cyclic amplification in which RdRp is primed on target mRNA template by existing siRNA. Various types of siRNAs are known. Heterochromatic siRNAs (hc-siRNAs), natural antisense transcript-derived siRNAs (nat-siRNAs), trans-activating small interfering RNAs (tasiRNAs), and short hairpin RNA (shRNA) are described here briefly. The hcRNAs are 24-nt long small RNAs that are implicated in transcriptional gene silencing which results in silencing of endogenous genes or transgenes through the inactivation of promoter sequences. The nat-siRNAs are small RNAs which are produced from partially overlapping transcripts of antisense gene pairs encompassing an inducible and a constitutive gene. The tasiRNAs is a class of endogenous small RNAs which are produced from noncoding transcripts of TAS loci. They are negative regulators of gene expression. Short hairpin RNA, also known as small hairpin RNA (shRNA), is a sequence of RNA that makes a tight hairpin turn that can be used to silence target gene expression via RNA interference (RNAi). Symmetric RNA structures are small RNA molecules containing a 19-bp duplex with 2-nt overhangs at each 3′-end. This is the standard si RNA structure. Asymmetric RNA duplexes consist of 19 to 21-nt antisense and 16-nt sense strands that generate a 16-bp duplex region with 3 to 5-nt long 3′ antisense overhang. These asymmetric RNAs mediate cleavage of target mRNAs. One critical factor in gene silencing is the ability to deliver intact siRNAs into target cells/organs in vivo. Oligonucleotide delivery systems and polyethylenimines are commonly used siRNA delivery systems. Polymeric nanoparticles prepared from polycationic polymers are used as efficient siRNA delivery systems which can protect and efficiently transport siRNAs to target cells. Cell-penetrating peptides are able to transverse cell membranes and deliver biological macromolecules to living cells. miRNA processing pathway. MicroRNAs (miRNAs) are short (~22 nt) noncoding RNAs that control gene expression by base pairing with 3′-untraslated regions (3′UTRs) of their regulated transcripts. miRNA biogenesis occurs through various steps in which are involved Drosha and Dicer, two main RNase III endonucleases (Masotti 2012). Precursor miRNAs (pre-miRNAs) are ~70 nucleotide-long RNA molecules with a characteristic hairpin structure. They originate in longer primary transcripts (pri-miRNAs) that are cleaved in animals by the Drosha endonuclease in the nucleus (Lee et al., 2003). Following the export of pre-miRNAs to the cytoplasm by Exportin-5, the loop region of the hairpin is removed by the Dicer endonuclease to produce a short, double-stranded RNA (dsRNA) (Cullen 2004). Based on the thermodynamic stability of each end of this duplex (O'Toole et al. 2006), one of the strands is preferentially incorporated in the RISC, producing a biologically active mature miRNA (generally the -5p miR) (Bartel 2004), while the inactive strand (the -3p miR) is degraded (Kim 2005). The coupling of the active miRNA to the 3′ UTR of its target gene facilitates mRNA degradation or translation inhibition (Djuranovic et al. 2012). As a direct consequence, miRNAs regulate many biological processes and have critical roles in cell proliferation, differentiation and death (Shivdasani 2006; Gomase and Parundekar 2009). Degree of complementarity between small RNA and target mRNA determines whether the target is silenced by siRNA processing pathway or miRNA processing pathway. If a high degree of complementarity exists between small RNA and target mRNA, siRNA pathway is followed that means degradation of target mRNA occurs and thus no translation occurs. If a partial complementarity exists between small RNA and target RNA, then target RNA is silenced by miRNA processing pathway. Hence it is the degree of complementarity and not the source of small double-stranded RNA molecules which determine the type of pathway to be followed. Aberrant miRNA expression leads to developmental abnormalities and diseases, such as cardiovascular disorders and cancer; however, the stimuli and processes regulating miRNA biogenesis

Noncoding RNAs and Gene Silencing    

26.17

are largely unknown. The transforming growth factor β (TGF-β) and bone morphogenetic protein (BMP) family of growth factors orchestrates fundamental biological processes in development and in homeostasis of adult tissues, including the vasculature (Davis et al. 2008). Guide RNA in RNA interference  RNA interference (RNAi) is a conserved sequence-specific gene regulatory mechanism mediated by the RISC, which is composed of a single-stranded guide RNA and an Argonaute protein. The Piwi domain, a highly conserved motif within Argonaute, has been shown to adopt an RNase H fold critical for the endonulease activity of RISC. Ma et al. (2005) report the crystal structure of Archaeoglobus julgidus Piwi protein bound to double-stranded RNA, thereby identifying the binding pocket for guidestrand 5'-end recognition and providing insight into guide-strand-mediated messenger RNA target recognition. The phosphorylated 5'-end of the guide RNA is anchored within a highly conserved basic pocket, supplemented by the carboxy-terminal carboxylate and a bound divalent cation. The first nucleotide from the 5'-end of the guide RNA is unpaired and stacks over a conserved tyrosine residue, whereas successive nucleotides form a four-bp RNA duplex. Mutation of the corresponding amino acids that contact the 5' phosphate in human Ago2 resulted in attenuated mRNA cleavage activity. The structure of Piwi-RNA complex provides direct support for the 5' region of the guide RNA serving as a nucleation site for pairing with target mRNA and for a fixed distance separating the RISC-mediated mRNA cleavage site from the anchored 5'-end of the guide RNA. Structure of A. fulgidus Piwi protein and sequence and pairing alignment of the 21-mer RNA are shown in Figure 26.2. Mechanism of target mRNA recognition and  cleavage 

(A) 1

38 Domain A 168

Domain B

427

RNA interference and related RNA silencing N-domain phenomena use short antisense guide RNA (B) molecules to repress the expression of target genes. Argonaute proteins, containing amino5¢ -P-AGACAGCAUUAUGCUGUCUUU - 3¢ terminal PAZ (for PIWI/Argonaute/Zwille) 3¢ - UUUCUGUCGUAUUACGACAGA-P- 5¢ domains and carboxy-terminal PIWI domains,   are core components of these mechanisms. Figure 26.2 (A) Structure of A. fulgidus Piwi protein and  Parker et al. (2005) have analyzed the crystal (B) sequence and pairing alignment of the 21‐mer RNA  structure of a Piwi protein from A. fulgidus (AfPiwi) in complex with a small interfering RNA (siRNA)-like duplex, which mimics the 5'-end of a guide RNA strain bound to an overhanging target messenger RNA. The structure contains a highly conserved metal-binding site that anchors the 5' nucleotide of the guide RNA. The first base pair of the duplex is unwound, separating the 5' nucleotide of the guide from the complementary nucleotide on the target strand, which exists with the 3' overhang through a short channel. The remaining base-paired nucleotides assume an A-form helix, accommodated be extended to place the scissile phosphate of the target strand adjacent to the G1 G16 putative slicer catalytic site. Guide 5¢ pUUCGACGCGUCGAAUU Anchored 5¢ end This study provides insights Target 3¢ UUAAGCUGCGCAGCUUp into mechanisms of target T14 T(-2)T1 mRNA recognition and cleav  Figure 26.3 RNA duplex co‐crystallized with AfPiwi  age by an Argounate-siRNA guide complex. The RNA duplex co-crystallized with AfPiwi indicates nomenclature for guide and target strands Figure 26.3.

26.18 

Essentials of Molecular Genetics 

Quelling  HDGS phenomena were also observed independently in fungal systems. These events were called quelling. Quelling came to light during attempts to boost the production of an orange pigment made by the gene al1 of the fungus Neurospora crassa. One N. crassa strain containing a wild-type al1 gene (orange phenotype) was transformed with a plasmid containing a 1,500-bp fragment of the coding sequence of the al1 gene. A few transformants were stably quelled and showed albino phenotypes. In the al1-quelled strain, the level of unspliced al1 mRNA was similar to that of the wild-type strain, whereas the native al1 mRNA was highly reduced, indicating that quelling and not the rate of transcription affected the level of mature mRNA in a homology-dependent manner.

Co‐suppression  In plants, the RNA silencing story unfolded serendipitously during a search for transgenic petunia flowers that were expected to be more purple. In 1990, R. Jorgensen’s laboratory wanted to upregulate the activity of a gene for chalcone synthase (chsA), an enzyme involved in the production of anthocyanin pigments (Napoli et al. 1990). Surprisingly, some of the transgenic petunia plants harboring the chsA coding region under the control of a 35S promoter lost both endogene and transgene chalcone synthase activity, and thus many of the flowers were variegated or developed white sectors. The loss of cytosolic chsA mRNA was not associated with reduced transcription. Jorgensen coined the term co-suppression to describe the loss of mRNAs of both the endogene and the transgene. Around the same time, two other laboratories also reported that introduction of the transcribing-sense transgenes could downregulate the expression of homologous endogenous genes. Subsequently, many similar events of co-suppression were reported. All cases of cosuppression resulted in the degradation of endogene and transgene RNAs after nuclear transcription had occurred. Co-suppression, also known as sense-suppression, involves introduction of extra copies of an endogenous genes to boost expression of endogenous or introduced gene, which may, however, result in the coordinate silencing not only of the introduced transgenes but also of the endogenous genes. The coordinate reversible suppression phenomenon is called co-suppression. The mechanisms of supprssion of sense genes may involve interference of RNA strands with the transcription process itself. Catalanotto et al. (2000) reported that two distantly related organisms, the nematode C. elegans and fungus N. crassa, which have quite different mechanisms of GS but both organisms use a similar protein to control the process. Studies in C. elegans have shown that there is a genetic link between cosuppression and interference (Ketting and Plasterk 2000). The observed alterations in the PTGS-related phenotypes were attributed to multiple-site integrations, aberrant RNA formations, repeat structures of the transgenes, etc. Later on, it became clear that the expression of the transgene led to the formation of dsRNA, which, in turn, initiated this phenomenon. For example, in the case of co-suppressed petunia plants, chsA mRNA formed a partial duplex, since there are regions of self-complementarity located between chsA 3′ coding region and its 3′ untranslated region. This was revealed by DNA sequence analysis.

Nonsense‐mediated mRNA Decay  Most eukaryotic genes are interrupted by noncoding introns that must be accurately removed from premessenger RNAs to produce translatable mRNAs. Splicing is guided locally by short conserved sequences, but genes typically contain many potential splice sites. In most organisms, short introns recognized by the intron definition mechanism cannot be efficiently predicted on the basis of sequence

Noncoding RNAs and Gene Silencing    

26.19

motifs. In multicellular eukaryotes, long introns are recognized through exon definition and most genes produce multiple mRNA variants through alternative splicing. The nonsense-mediated mRNA decay (NMD) pathway may further shape the observed sets of variants by selectively degrading those containing premature termination codons, which are frequently produced in mammals. In eukaryotes, a specialized pathway of mRNA degradation termed nonsense-mediated decay (NMD) functions in mRNA quality control by recognizing and degrading mRNAs with aberrant termination codons. Sheth and Parker (2006) demonstrate that NMD in yeast targets premature termination codon (PTC)-containing mRNA to P bodies. Upf1p is sufficient for targeting mRNAs to P bodies, whereas Upf2p and Upf3p act, at least in part, downstream of P body targeting to trigger decapping. The ATPase activity of Upf1p is required for NMD after the targeting of mRNA to P bodies. Moreover, Upf1p can target normal mRNAs to P bodies but not promote their degradation. These observations lead us to propose a new model for NMD wherein two successive steps are used to distinguish normal and aberrant mRNAs. A fundamental aspect of the biogenesis and function of eukaryotic messenger RNA is the quality control systems that recognize and degrade non-functional mRNAs. Eukaryotic mRNAs where translation termination occurs too soon (nonsense-mediated decay) or fails to occur (non-stop decay) are rapidly degraded Doma and Parker (2006) show that yeast mRNAs with stalls in translation elongation are recognized and targeted for endonucleolytic cleavage, referred to as no-go decay. The cleavage triggered by no-go decay is dependent on translation and involves Dom34p and Hbs1p. Dom34p and Hbs1p are similar to the translation termination factors eRF1 and eRF3, indicating that these proteins might function in recognizing the stalled ribosome and triggering endonucleolytic cleavage. No-go decay provides a mechanism for clearing the cell of stalled translation elongation complexes, which could occur as a result of damaged mRNAs or ribosomes, or a mechanism of posttranscriptional control.

Virus‐induced Gene Silencing  Virus-induced gene silencing (VIGS) is a plant RNA-silencing technique that uses viral vectors carrying a fragment of a gene of interest to generate double-stranded RNA, which initiates the silencing of the target gene. Several viral vectors have been developed for VIGS and they have been successfully used in reverse genetics studies of a variety of processes occurring in plants. VIGS has not been widely adopted for the model dicotyledonous species Arabidopsis (Arabidopsis thaliana), possibly because, until now, there has been no easy protocol for effective VIGS in this species. Burch-Smith et al. (2006) show that a widely used tobacco rattle virus-based VIGS vector can be used for silencing genes in Arabidopsis ecotype Columbia-0. The protocol involves agroinfiltration of VIGS vectors carrying fragments of genes of interest into seedlings at the two- to three-leaf stage and requires minimal modification of existing protocols for VIGS with tobacco rattle virus vectors in other species like Nicotiana benthamiana and tomato (Lycopersicon esculentum). The method described here gives efficient silencing in Arabidopsis ecotype Columbia-0. They show that VIGS can be used to silence genes involved in general metabolism and defense and it is also effective at knocking down expression of highly expressed transgenes. Gene expression in plants can be suppressed in a sequence-specific manner by infection with virus vectors carrying fragments of host genes (Baulcombe 1999). The mechanism of this gene silencing is based on an RNA-mediated defense against viruses. It has also emerged that a related mechanism is involved in the post-transcriptional silencing that accounts for between line variation in transgene expression and co-suppression of transgenes and endogenous genes. The technology of virus-induced gene silencing is being refined and adapted as a high throughput procedure for functional genomics in plants.

26.20 

Essentials of Molecular Genetics 

Riboswitches  Riboswitches are metabolite-sensing RNAs typically located in the noncoding portions of messenger RNAs that control the synthesis of metabolite-related proteins. In bacteria, the intracellular concentration of several amino acids is controlled by riboswitches. One of the important regulatory circuits involves lysine-specific riboswitches, which direct the biosynthesis and transport of lysine precursor common for lysine and other amino acids. Serganov et al. (2008) present crystal structure of the 174nucleotide sensing domain of the Thermotoga maritime lysine riboswitch in the lysine-bound (1.9Å) and free (3.1Å) states. The riboswitch features an unusual and intricate architecture, involving threehelical and two-helical bundles connected by a compact five-helical junction and stabilized by various long-range tertiary interactions. Lysine interacts with the junctional core of the riboswitch and is specifically recognized through shape-complementarity with the elongated binding pocket and through several direct and K+-mediated hydrogen bonds to its charged ends. Structural and biochemical studies indicate preformation of riboswitch scaffold and identify conformational changes associated with the formation of a stable lysine-bound state, which prevents alternative folding of the riboswitch and facilitates formation of downstream regulatory elements. Riboswitches have pharmaceutical and biotechnological exploration potentialities. Proteins are not the only regulators of metabolic synthesis – some RNA molecules do it too (Reichow and Metabolite Varani 2006). Gene regulation by riboswitches is shown in Figure 26.4. The RNA of a riboswitch contains two functional domains: a metabolite-sensing domain and a gene-expression signal. Unbound Active Bound Inactive These domains adopt inter-dependent   conformations in response to the Figure 26.4 Gene regulation by riboswitches  presence or absence of a particular metabolite. In this example, the gene-expression signal is required for the initiation of protein synthesis. When the metabolite is absent, the metabolite-sensing domain adopts a con-formation that reveals the gene-expression signal and allows protein synthesis to occur (indicated by dark star). When the metabolite binds, the ensuing structural reorganization leads to the sequestration of the geneexpression signal, shutting off protein production (indicated by the grey star). Riboswitches are cisacting genetic regulatory elements found in the 5′-untranslated regions of messenger RNAs that control gene expression through their ability to bind small molecule metabolites directly. Regulation occurs through the interplay of two domains of the RNA: an aptamer domain that responds to intracellular metabolite concentrations and an expression platform that uses two mutually exclusive secondary structures to direct a decision-making process. In Gram-positive bacteria such as Bacillus species, riboswitches control the expression of more than 2 per cent of all the genes through their ability to respond to a diverse set of metabolites including amino acids, nucleobases and protein cofactors.

Transgene Silencing  In genetically modified plants, the introduced transgenes are sometimes not expressed. They can be silenced. Transgenes can also cause the silencing of endogenous plant genes if they are sufficiently homologous (Stam et al. 1997). Silencing occurs transcrip-tionally and post-transcriptionally but silencing of endogenous genes seems predominantly post-transcriptional. If viral transgenes are introduced and silenced, the post-transcriptional process also prevents homologous RNA viruses from

Noncoding RNAs and Gene Silencing    

26.21

accumulating; this is a means of generating virus-resistant plants. Various factors seem to play a role, including DNA methylation, transgene copy number and the repetitiveness of the transgene insert, transgene expression level, possible production of aberrant RNAs, and ectopic DNA–DNA interactions. The causal relationship between these factors and the link between transcriptional and post-transcriptional silencing is not always clear.

METHYLATION OF NONCODING RNAs   Different functions of methyl groups in RNA include biophysical, biochemical and metabolic stabilization of RNA, quality control, resistance to antibiotics, mRNA reading frame maintenance, deciphering of normal and altered genetic code, selenocysteine incorporation, tRNA aminoacylation, ribotoxins, splicing, intracellular trafficking, immune response, and others (Motorin and Helm 2011). Connections to other fields including gene regulation, DNA repair, stress response, and possibly histone acetylation and exocytosis are being suggested. Methylation on the base of the ribose is prevalent in eukaryotic ribosomal RNAs (rRNAs) and is thought to be crucial for ribosome biogenesis and function. Artificially introduced 2′-O-methyl groups in small interfering RNAs (siRNAs) can stabilize siRNAs in serum without affecting their activities in RNA interference in mammalian cells. Yu et al. (2005) show that plant microRNAs (miRNAs) have a naturally occurring methyl group on the ribosome of the last nucleotide. Whereas methylation of rRNAs depends on guide RNAs, the methyltransferase protein (HEN1) is sufficient to methylate miRNA/miRNA duplexes. These studies uncover a new and crucial step in plant miRNA biogenesis and have profound implications in the function of miRNAs.

Cytosine Methylation of Noncoding RNAs  The sequence and the structure of DNA methyltransferase-2 (Dnmt2) bear close affinity to authentic DNA cytosine methytransferases. Human DNMT2 does not methylate DNA but instead methylates a small RNA; this RNA is aspartic acid transfer RNA (tRNAAsp) (Goll et al. 2006). DNMT2 specifically methylated cytosine 38 in the anticodon loop (Figure 26.5). The function of DNMT2 is highly conserved, and human DNMT2 protein restored methylation in vitro to tRNAAsp from Dnmt2-efficient strains of mouse, A. thaliana, and D. melanogaster in a manner that was dependent on pre-existing pattern of modified nucleosides. Post-transcriptional RNA modification is a characteristic feature of noncoding RNAs, and has been described for rRNAs, tRNAs and miRNAs. (Cytosine-5) RNA methylation has been detected in stable and long-lived RNA molecules, but its function is still unclear, mainly due to technical limitations. In order to facilitate the analysis of RNA methylation patterns, Schaefer et al. (2009) have established a protocol for the chemical deamination of cytosines in RNA, followed by PCR-based amplification of cDNA and DNA sequencing. Using tRNAs and rRNAs as examples they show that cytosine methylation can be reproducibly and quantitatively detected by bisulfite sequencing. The combination of this method with deep sequencing allowed the analysis of a large number of RNA molecules. These results establish a versatile method for the identification and characterization of RNA methylation patterns, which will be useful for defining the biological function of RNA methylation.

TRANSLATIONAL GENE SILENCING  Post-transcriptional gene silencing is the mechanism whereby mRNA is not allowed to be translated into proteins. In this case, the mRNA is either degraded or blocked. When gene silencing takes place through suppression of translation, phenomenon is also known as “translational gene silencing”.

Essentials of Molecular Genetics 

26.22  3 OH

G G

5 P U C C U C G U U U G A y A U GA D G AG U A U C

C C C G’ C C U manQ

3 OH

A 72 A C C C C G G A A 5 P U G G C G G C G G U G G C C C G A A UA UA U GCCCC U GCCCC U AU G U A G G yAUG G C C m5C G G G G T m5C G G G G T G 5 5 m CA m CA D G AG U A U C G AG G C AG G G C Dnmt2 G G C C C G’ G G 5 C C 38 m C 38 C AdoMet AdoHcy A A U manQ U C C U

Figure 26.5 Dnmt2 methylates cytosine 38 in tRNAAsp

 

Sampath et al. (2004) discover gene-specific translational silencing as a novel function of the fused glutamyl- and prolyl-tRNA synthetase (GluProRS). GluProRS is released from a multisynthetase translation complex in response to γ-interferon and forms a four-protein GAIT (γ-activated inhibitor of translation) complex that silences translation of ceruloplasmin (Cp), a protein linked to the inflammatory response (Schimmel and Ewalt 2004). Some snmRNA species such as MicF, DicF or DsrA RNAs (Lee et al 1993; Wightman et al 1993) have been shown to interact with the translation initiation region of target mRNAs via an antisense element in bacteria. By this mechanism, these snmRNAs regulate translation of their target mRNAs. In animals, miRNAs typically bind to the 3′ untranslated region (3′UTR) of target mRNAs with imperfect sequence complementarity and repress translation (Wang et al. 2006). In vitro reactions for miRNA-directed translational gene silencing are now known. These reactions faithfully recapitulate known in vivo hallmarks of mammalian miRNA function, including a requirement for a 5′ phosphate and perfect complementarity to the mRNA target in the 5′ seed region. Translational gene silencing by miRNAs in vitro requires target mRNAs to possess a 7-methylguanine (m7G) cap and a poly(A) tail, whereas increasing poly(A) tail length alone can increase miRNA silencing activity.

EXPLOITATION OF GENE SILENCING  Gene silencing has mostly been reported in plants. The host plant has mechanisms, which have evolved to defend against the deleterious effects of expressing genes encoded by foreign DNAs or RNAs. Gene silencing is also used as a mechanism to prevent other genetic events such as transposition of mobile elements, which have a potential to disturb the normal structure and function of the host genome. Transgenes that are derived from viral cDNA and are able to induce gene silencing

Noncoding RNAs and Gene Silencing    

26.23

may also suppress the accumulation of viruses that are similar in nucleotide sequence. Mohanpuria et al. (2010) discuss potential use of gene silencing techniques in crop improvement. A viral sequence initially identified as a mediator of synergistic viral disease acts to suppress the establishment of both transgene-induced and virus-induced post-transcriptional gene silencing (Anandalakshmi et al. 1998). Gene silencing serves as a natural antiviral defense system in plants and offers different approaches to elucidate the molecular basis of gene silencing. VIGS exploits an RNAmediated antiviral defense mechanism. In plants infected with unmodified viruses, the mechanism is specifically targeted against the viral genome. However, with virus vectors carrying inserts derived from host genes the process can be additionally targeted against the corresponding mRNAs. VIGS has been used widely in plants for analysis of gene function and has been adapted for high-throughput functional genomics (Lu et al. 2003). VIGS has been applied to the identification of genes required for disease resistance in plants. These methods and the underlying general principles also apply when VIGS is used in the analysis of other aspects of plant biology. By development of viral silencing systems monocot plants can also be targeted as silencing host in addition to dicotyledonous plants. For instance, barley stripe mosaic virus (BSMV)-mediated VIGS allows silencing of barley and wheat genes. VIGS is a recently developed gene transcript suppression technique for characterizing the function of plant genes (Burch-Smith et al. 2004). Mangeot et al. (2004) describe a simple and powerful RNA interference-based method that can silence the expression of any transgene. This new method to silence transgene expression is more versatile than existing methods of conditional inactivation of gene expression, such as transcriptional switches or site-specific recombination. It is applicable to a wide variety of models including primary cells, terminally differentiated cells and transgenic animals. RNAi is an emerging technology and its immense potential is being mined for deciphering function of many genes. Probably of most commercial interest is the use of RNAi as a therapeutic agent. RNAi also has applications in understanding regulation of gene expression in crop improvement.

REFERENCES  Anandalakshmi, R., G.J. Pruss, X. Ge, et al. 1998. A viral suppressor of gene silencing in plants. Proc. Natl. Acad. Sci. USA 95: 13079-84. Bartel, D.P. 2004. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116: 281-97. Baulcombe, D.C. 1999. Fast forward genetics based on virus-induced gene silencing. Curr. Opn. Pl. Biol. 2: 10913. Bhattacharjee, K., S. Banerjee, and S.R. Joshi. 2012. Diversity of Streptomyces spp. in Eastern Himalayan region – computational RNomics approach to phylogeny. Bioinformation 8: 548-54. Boisvert, M.E.L., and M.J. Simard. 2008. RNAi pathway in C. elegans: The argonautes and collaborators. Curr. Topics Microbiol. Immunol. 320: 21-33. Brennecke, J., C.D. Malone, A.A. Aravin, et al. 2008. An epigenetic role for maternally inherited piRNAs in transposon silencing. Science 322: 1387-92. Burch-Smith, T.M., J.C. Anderson, G.B. Martin, and S.P. Dinesh-Kumar. 2004, Applications and advantages of virus-induced gene silencing for gene function studies in plants. Plant J. 39: 734-46. Burch-Smith, T.M., M. Schiff, Y. Liu, and S.P. Dinesh-Kumar. 2006. Efficient virus-induced gene silencing in Arabidopsis. Plant Physiol. 142: 21-27. Camblong, J., N. Iglesias, C. Fickentscher, G. Dieppois, and F. Stutz. 2007. Antisense RNA stabilization induces transcriptional gene silencing via histone Deacetylation in S. cerevisiae. Cell 131: 706-17. Cao, X., W. Aufsatz, D. Zilberman, et al. 2003. Role of the DRM and CMT3 methyltransferases in RNA-directed DNA methylation, Curr. Biol. 13: 2212-17. Catalanotto, C., G. Azzalin, G. Macino, and C. Cogoni, 2000. Gene silencing in worms and fungi. Nature 404: 245-6.

26.24 

Essentials of Molecular Genetics 

Chan, S.W.L., I.R. Henderson, and S.E. Jacobsen. 2005. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat. Rev. Genet. 6: 351-60. Chen, L. and Widom, J. 2005. Mechanism of transcriptional silencing in yeast. Cell 120: 37-48. Cullen, B.R. 2004. Transcription and processing of human microRNA precursors. Mol. Cell 16: 861-5. Davis, B.N., A.C. Hilyard, G. Lagna, and A. Hata. 2008. SMAD proteins control DROSHA-mediated microRNA maturation. Nature 454: 56-61. Djuranovic, S., A. Nahvi, and R. Green. 2012. miRNA-mediated gene silencing by translational repression followed by mRNA deadenylation and decay. Science 336: 237-40. Doma, M.K., and R. Parker, 2006. Endonucleolytic cleavage of eukaryotic mRNAs with stalls in translation elongation. Nature 440: 561-4. El-Shami, M., D. Pontier, S. Lahmy, et al. 2007. Reiterated WG/GW motifs form functionally and evolutionarily conserved ARGONAUTE-binding platforms in RNA RNAi-related components. Genes Dev 21: 2539-44. Fire, A., S. Xu, M. Montgomery, et al. 1998. Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391: 806-11. Gendrel, A.V., Z. Lippman, C. Yordan, V. Colot, and R.A. Martienssen. 2002. Dependence of heterochromatic histone H3 methylation patterns on the Arabidopsis gene DDM1. Science 297: 1871-3. Goll, M.G., F. Kirpaker, K.A. Maggert, et al. 2006. Methylation of tRNAAsp by the DNA methyltransferase homolog Dnmt2. Science 311: 395-7. Gomase, V.S., and A.N. Parundekar. 2009. microRNA: human disease and development. Int. J. Bioinform. Res. Appl. 5: 479-500. Gullerova, M., and N.J. Proudfoot. 2012. Convergent transcription induces transcriptional gene silencing in fission yeast and mammalian cells. Nat. Str. Mol. Biol. 19: 1193-201. Han, J., Y. Lee, K.H Yeom., et al. 2006 Molecular basis for the recognition of primary microRNAs by the DroshaDGCR8 complex. Cell 125: 887-91. Huettel, B., Kanno, T., Daxinger, L., Bucher, E., J. van der Winden, A.J.M. Matzke, and M. Matzke. 2007. RNAdirected DNA methylation mediated by DRD1 and Pol IVb: a versatile pathway for transcriptional gene silencing in plants. Biochim. Biophy. Acta 1769: 358-74. Huttenhofer, A., J. Brosius, and J.P. Bachellerie. 2002. RNomics: identification and function of small, nonmessenger RNAs. Curr. Opin. Chem. Biol. 6: 835-43. Jacob, F., and J. Monod. 1961. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 3: 31856. Jaskiewicz, L., and W. Fillipowicz. 2008. Role of Dicer in post-transcriptional RNA silencing. Curr. Topics Microbiol. Immunol. 320: 77-97. Jensen, S., M.-P. Gassama, and T. Heidmann. 1999. Taming of transposable elements by homology-dependent gene silencing. Nature Genet. 21: 209-12 Ketting, R.F., and R.H.A. Plasterk. 2000. A genetic link between co-suppression and RNA interference in C. elegans. Nature 404: 296-8. Kim, D.H., P. Sætrom, O. Snøve, Jr., and J.J. Rossi. 2008. MicroRNA-directed transcriptional gene silencing in mammalian cells. Proc. Natl. Acad. Sci. USA 105: 16230-5. Kim, V.N. 2005. MicroRNA biogenesis: coordinated cropping and dicing. Nat. Rev. Mol. Cell Biol. 6: 376-85. Kinoshita, T., A. Miura, Y. Choi, Y. Kinoshita, X. Cao, S.E. Jacobsen, R.L. Fischer, and T. Kakutani. 2004. Oneway control of FWA imprinting in Arabidopsis endosperm by DNA methylation. Science 303: 521-23. Lee, R.C., R.L. Feinbaum, and V. Ambros. 1993. The C. elegans heterochronic gene lin-4 encodes small RNAs with antisense complementarity to lin-14. Cell 75: 843-54. Lee, Y., C. Ahn, J. Han, H. Choi, J. Kim, J. Yim, et al. 2003. The nuclear RNase III Drosha initiates microRNA processing. Nature 425: 415-9. Lee, Y.S., K. Nakahar, J.W. Pham, et al. 2004. Distinct roles for Drosophila Dicer-1 and Dicer-2 in the siRNA/miRNA silencing pathways. Cell 117: 69-81. Liu, J., Y. He, R. Amasino, and X. Chen. 2004. siRNAs targeting an intronic transposon in the regulation of natural flowering behavior in Arabidopsis. Genes Dev 18: 2873-78. Lu, R., A.M. Martin-Hernandez , J.R. Peart , I. Malcuit , and D.C. Baulcombe. 2003. Virus-induced gene silencing in plants. Methods 30(4): 296-303. Luff, B., L. Pawlowski, and J. Bender. 1999. An inverted repeat triggers cytosine methylation of identical sequences in Arabidopsis. Mol. Cell 3: 505-11. Lund, E., S. Güttinger, A. Calado J.E. Dahlberg, and U. Kutay. 2004. Nuclear export of microRNA precursors. Science 303: 95-8.

Noncoding RNAs and Gene Silencing    

26.25

Ma, J.-B., Y.-R. Yuan, G. Meister, Y. Pei, T. Tuschi, and D.J. Patel. 2005. Structural basis for 5′-end-specific recognition of guide RNA by the A. flugidus PIWI protein. Nature 434: 666-70. Mahfouz, M.M. 2010. RNA-directed DNA methylation: Mechanisms and functions. Pl. Signal Behav. 5(7): 1-11. Mangeot, P.-E., F.-L. Cosset, P. Colas, and I. Mikaelian. 2004. A universal transgene silencing method based on RNA interference. Nucl. Acids Res. 32(12): e102. doi: 10.1093/nar/gnh105 Masotti, A. 2012. Interplays between gut microbiota and gene expression regulation by miRNAs. Front. Cell. Inf. Microbio. doi: 10.3389/fcimb.2012.00137. Mattick, J.S. 2009. The Genetic signatures of noncoding RNAs. PLoS Genet 5(4): e1000459. doi:10.1371/ journal.pgen.1000459. Melquist, S., and J. Bender. 2004. An internal rearrangement in an Arabidopsis inverted repeat locus impairs DNA methylation triggered by the locus. Genetics 166: 437-48. Mohanpuria, P., V. Kumar, M. Mahajan, H. Mohammad, and S.K. Yadav. 2010. Gene Silencing: Theory, Techniques and Applications. Pp. 321-34. In: Gene Silencing: Theory, Techniques and Applications. Editor: Catalano, A.J. New York: Nova Science Publ., Inc. Mondal, T.K., P.K. Kundu, and P.S. Ahuja. 1997. Gene silencing: A problem in transgenic research. Curr. Sci. 72: 699-700. Montgomery, M.K., S. Xu, and A. Fire. 1998. RNA as a target of double-stranded RNA-mediated genetic interference in Caenorhabditis elegans. Proc. Natl. Acad. Sci. USA 95: 15502-7. Morris, K.V., S.W.-L. Chan, and S.E. Jacobsen, and D.J. Looney. 2004. Small interfering RNA-induced transcriptional gene silencing in human cells. Science 305: 1289-92. Motorin, Y., and Helm, M. 2011. RNA nucleotide methylation. Advanced Review. Published Online: March 23 DOI: 10.1002/wrna.79. Napoli, C., C. Lemieux, and R. Jorgensen. 1990. Introduction of a chimeric chalcone synthase gene into petunia results in reversible co-suppression of homologous genes in trans. The Plant Cell 2(4): 279-89. Narikawa, K., K. Nishi, Y. Naito, M. Mazda, and K. Ui-Tei. 2010. Genome-wide identification and analysis of miRNAs complementary to upstream sequences of mRNA transcription start sites. Pp. 287-319. In: Gene Silencing: Theory, Techniques and Applications. Editor: Catalano, A.J. New York: Nova Science Publ., Inc. Nishimura, T., G. Molinard, T.J. Petty, et al. 2012. Structural basis of transcriptional gene silencing mediated by Arabidopsis MOM1. PLoS Genet 8(2): e1002484. doi:10.1371/journal.pgen.1002484. Onodera, Y., J.R. Haag, T. Ream, P.C. Nunes, O. Pontes, and C.S. Pikaard. 2005. Plant nuclear RNA polymerase IV mediates siRNA and DNA methylation-dependent heterochromatin formation. Cell 120: 613-22. O'Toole, A.S., S. Miller, N. Haines, M.C. Zink, and M.J. Serra. 2006. Comprehensive thermodynamic analysis of 3′ double-nucleotide overhangs neighboring Watson-Crick terminal base pairs. Nucl. Acids Res. 34: 3338-44. Parker, J.S., S.M. Roe, and D. Barford. 2005. Structural insights into mRNA recognition from a PIWI domainsiRNA guide complex. Nature 434: 663-6. Pelissier, T., and M. Wassenegger. 2000. A DNA target of 30 bp is sufficient for RNA-directed DNA methylation. RNA 6: 55–65. Pham, J.W., J.L. Pellino, Y.S. Lee, R.W Carthew, and E.J. Sontheimer. 2004. A Dicer-2-dependent 80S complex cleaves targeted mRNAs during RNAi in Drosophila. Cell 117: 83-94. Pontier, D., G. Yahubyan, D. Vega, A. Bulski, J. Saez-Vasquez, and M.A. Hakimi. 2005. Reinforcement of silencing at transposons and highly repeated sequences requires the concerted action of two distinct RNA polymerases IV in Arabidopsis. Genes Dev. 19: 2030-40. Qi, Y., X. He, X.-J. Wang, O. Kohani, J. Jurka, and G.J. Hannon. 2006. Distinct catalytic and noncatalytic roles of ARGONAUTE4 in RNA-directed DNA methylation. Nature 443: 1008-12. Que, Q., N.Y. Wand, J.J. English, and R.A. Jorgensen. 1997. The frequency and degree of co-suppression by sense Chalcone synthase transgenes are dependent on transgene promoter strength and are reduced by premature nonsense codons in the transgene coding sequences. Pl. Cell 9: 1357-68. Reiche, K. 2012. Bioinformatics for RNomics. Methods Mol. Biol. 719: 299-33. Reichow, S., and G. Varani. 2006. RNA switches function. Nature 441: 1054-5. Sampath, P., B. Mazumder, V. Seshadri, et al. 2004. Non-canonical function of glutamyl-prolyl tRNA synthetase; Gene-specific silencing of translation. Cell 119: 195-208. Sana, J., P. Faltejskova, M. Svoboda, and O. Slaby. 2012. Novel classes of non-coding RNAs and cancer. J. Transl. Medi. 10:103 doi:10.1186/1479-5876-10-103. Schaefer, M., T. Pollex, K. Hanna, and F. Lyko. 2009. RNA cytosine methylation analysis by bisulfite sequencing. Nucl. Acids Res. 37(2): e12. doi: 10.1093/nar/gkn954.

26.26 

Essentials of Molecular Genetics 

Schimmel, P., and K. Ewalt.2004. Translation silenced by fused pair of tRNA synthetases. Cell 119: 147-8. Schuettengruber, B., D. Chourrout, M. Vervoort, B. Leblanc, and G. Cavalli. 2007. Genome regulation by Polycomb and Trithorax proteins p735. Cell 128: 735-45. Serganov, A., L. Huang, and D.J. Patel. 2008. Structural insights into amino acid binding and gene control by a lysine riboswitch. Nature 455: 1263-7. Sheth, U., and R. Parker. 2006. Targeting of aberrant mRNAs to cytoplasmic processing bodies. Cell 125: 10950109. Shibuya, K., S. Fukushima, and H. Takatsuji. 2009. RNA-directed DNA methylation induces transcriptional activation in plants. Proc. Natl. Acad. Sci. USA 106: 1660-5. Shivdasani, R.A. 2006. MicroRNAs: regulators of gene expression and cell differentiation. Blood 108: 3646-53. Sijen, T., I. Vijn, A. Rebocho, Dick et al. 2001. Transcriptional and posttranscriptional gene silencing are mechanistically related. Curr. Biol. 11: 436-40. Sleutels, F., R. Zwart, and D.P. Barlow. 2002. The non-coding Air RNA is required for silencing autosomal imprinted genes. Nature 415: 810-3. Stam, M. 2009. Paramutation: A heritable change in gene expression by allelic interactions in trans. Mol. Plant 2: 578-88. Stam, M., J.N.M. Mol, and J.M. Kooter. 1997. The silence of genes in transgenic plants. Ann. Bot. 79(1): 3-12. Tijsterman, M., and R.H.A. Plasterk. 2004. Dicers at RISC: The mechanism of RNAi. Cell 117: 1-3. Vaucheret H., and M. Fagard. 2001. Transcriptional gene silencing in plants: targets, inducers and regulators. Trends Genet. 17: 29-36. Wang, B., T.M. Love, M.E. Call, J.G. Doench, and C.D. Novina .2006. Recapitulation of short RNA-directed translational gene silencing in vitro. Mol. Cell 22: 553-60. Wightman, B., I. Ha, and G. Ruvkun. 1993. Posttranscriptional regulation of the heterochronic gene lin-14 by lin4 mediates temporal pattern formation in C. elegans. Cell 75: 855-62. Xie, Z., L.K. Johansen, A.M. Gustafson, et al. 2004. Genetic and functional diversification of small RNA pathways in plants. PLoS Biol. 2: e104. Yang, P.K., and M.I. Kuroda, 2007. Noncoding RNAs and intranuclear positioning on monoallelic gene expression. Cell 128: 777-86. Yu, B., Z. Yang, J. Li, et al. 2005. Methylation as a crucial step in plant microRNA biogenesis. Science 307: 9325. Zaratiegui, M., D.V. Irvine, and R.A. Martienssen. 2007. Noncoding RNAs and gene silencing. Cell 128: 763-76. Zhou, S., T.G. Campbell, E.A. Stone, T.F.C. Mackay, R.R.H. Anholt. 2012. Phenotypic plasticity of the Drosophila transcriptome. PLoS Genet 8(3): e1002593. doi:10.1371/journal.pgen.

PROBLEMS  1.

What is the basis of classifying RNA as (a) genetic RNA and nongenetic RNA, (b) coding RNA and noncoding RNA, and sense noncoding RNA and antisense noncoding RNA? 2. What does the term gene silencing mean? What are different levels of gene expression at which this phenomenon operates? At which one of these levels operation of gene silencing economical to the cell? 3. What do we study under RNomics? What are different approached used in RNomics? 4. What is RNA-directed DNA methylation? What are various functions of this type of DNA methylation 5. What is RNA interference? What are different pathways in RNA interference? 6. RNA interference uses guide RNA. Is this guide RNA different from the one used in RNA editing? 7. Describe various phenomenon associated with transcriptional gene silencing? 8. Describe various phenomenon associated with post-transcriptional gene silencing? 9. Describe various phenomenon associated with translational gene silencing? 10. Explain monoallelic gene expression implicating RNA interference as a mechanism. 11. What are riboswitches. What type of RNAs are involved in this process? 12. How has gene silencing been exploited in genetic improvement of plants and animals?

27 Molecular Techniques and Tools A large number of sensitive techniques are available for molecular genetic analysis. Some of the techniques of molecular genetics that have found tremendous applications in different disciplines of biology and predominantly used directly or indirectly in the study of genetic material, genome/gene structure, gene function, gene expression and its regulation. These techniques are also instrumental in applying knowledge of molecular genetics for improvement of microorganisms, plants and animals and hence for welfare of mankind. It is not possible to discuss all the molecular techniques and tools in one chapter. So only some important techniques discussed here only briefly are: DNA/RNA separation, in situ hybridization, squash dot hybridization, southern blotting, northern blotting, western blotting, eastern blotting, dot blot, slot blot, electrophoresis, colony hybridization, plaque hybridization, chromosome walking, chromosome jumping/hopping, chromosome landing, nick translation, RNA sequencing, DNA sequencing, RNA synthesis, DNA synthesis, DNA synthesis machines, DNA fingerprinting, DNA markers, microarrays, restriction endonucleases, recombinant DNA technology, and quantitative trait loci mapping. The purpose here is only to introduce these molecular techniques to the readers.

SEPARATION OF DNA/RNA Molecular weight of DNA is very high (106 to 1010 Da). DNA can be separated by the following techniques:

Calcium Chloride Density Gradient Centrifugation Method Density gradient centrifugation is a technique used to separate molecules on the basis of density. Calcium chloride density gradient centrifugation method was used by Meselson and Stahl (1958) to separate DNA containing 15N from that containing 14N. This method could also be used to separate DNA in which GC:AT ratio differs since the density of GC is different from that of AT.

Ethidium Bromide Method The most commonly used stain for detecting DNA/RNA is ethidium bromide. Ethidium can insert or intercalate between bases of DNA if the DNA is flexible. This so happens in case of linear DNA but not with DNA that is circular. Ethidium intercalation makes DNA dense which could be separated from DNA not containing ethidium. Ethidium bromide possesses UV absorbance maxima at 300 and 360 nm. This technique is used to separate circular DNA of episomes, including mitochondria, chloroplasts and plasmids. Bacterial DNA although circular yet is sufficiently long as it breaks during

27.2

Essentials of Molecular Genetics

isolation and hence behaves if it were linear DNA. Staining of denatured, ssDNA or RNA is relatively insensitive.

Hydroxyapatite Columns and Nitrocellulose Filters This method is used for separating double-stranded DNA from singlestranded DNA. Double-stranded DNA binds to hydroxyapatite whereas single-stranded DNA does not. Double-stranded DNA is separated from single-stranded DNA by putting them into a column containing hydroxyapatite. Denatured DNA when being annealed the repetitive sequences could become double-stranded and hence would be separated from single-stranded DNA. The technique used for separating double-stranded DNA Figure 27.1 Technique used for measuring the extent of from single-stranded is given in reassociation by separating reassociated double-stranded DNA from Figure 27.1. Mixture of doubleremaining single-stranded DNA (Redrawn, with permission, from stranded and single-stranded DNAs Gupta, P.K. 2009. Genetics. Merrut; Rastogi Publ.) is passed through a column of hydroxyapatite crystals, which retains the double-stranded DNA, and allows single-stranded DNA to pass through. Double-stranded DNA is eluted by raising temperature and its quantity is measured. Nitrocellulose has the same properties as hydroxyl-apatite. With nitrocellulose also, double-stranded DNA can be separated from single-stranded DNA.

Silica Adsorption Method DNA separation by silica adsorption is an important method of DNA separation that is used in novel technologies that use microchannels (Cady et al. 2003). The principle behind this type of separation relies on DNA molecules binding to silica surfaces in the presence of certain salts and under certain pH conditions. DNA extraction on microchips provides a fast, cost effective, and effective method for high-throughput screening, which also has a very small footprint. This new method has useful applications for biosensors, ―lab-on-a-chip‖ devices, and other new technologies that require rapid, high quality DNA at minimal cost. There are four basic steps of this purification method: (1) The sample is run through a microchannel. (2) DNA binds to the channel, and all other molecules remain in the buffer solution. (3) The channel is washed of impurities. (4) An elution buffer removes the DNA from channel walls, and the DNA is collected at the end of the channel. A sample (this may be anything from purified cells to a tissue specimen) is placed onto the chip and lysed. The resultant mix of proteins, DNA, phospholipids, etc., is then run through the channel where the DNA is adsorbed by silica surface in the presence of solutions with high ionic strength. The highest DNA adsorption efficiencies are shown to occur in the presence of buffer solution with a pH at or below the pKa of the surface silanol groups. Methods using silica beads and silica resins have successfully isolated DNA molecules which can then be PCR amplified.

Molecular Techniques and Tools

27.3

Nanospheres Method Tabuchi et al. (2004) reported a technology to carry out separation of a wide range of DNA fragments with high speed and high resolution. The approach uses a nanoparticle medium, core-shell type nanospheres, in conjunction with a pressurization technique during microchip electrophoresis. DNA fragments up to 15 kbp were successfully analyzed within 100 seconds without observing any saturation in migration rates. DNA fragments migrate in the medium while maintaining their characteristic molecular structure. To guarantee effective DNA loading and electrofocusing in the nanosphere solution, they developed a double-pressurization technique. Optimal pressure conditions and concentrations of packed nanospheres are critical to achieve improved DNA separations.

DNA Separation from Proteins, Lipids, RNA and Carbohydrates Phage DNA. Aqueous suspension of phage is shaken in phenol. Phenol breaks down the protein coat which precipitates at the aqueous phenol layer. DNA gets precipitated in ethanol and can be separated. Bacterial DNA. Bacterial cell wall and inner cell membrane can be dissolved by phenol. They are degraded by enzyme lysozyme and some detergents like sodium dodecyl sulfide (SDS). Cell contents thus released are excluded from RNA by RNase and proteins by phenol. DNA is precipitated in ethanol.

IN SITU HYBRIDIZATION In situ hybridization (ISH) combines molecular biological techniques with histological and cytological analysis of gene expression (Jin and Lloyd 1997). RNA and DNA can be readily localized in specific cells with this method. ISH has been useful as a research tool, and recent studies have used this technique in the diagnostic pathology laboratory and in microbiology for the tissue localization in infectious agents. In situ or cytological hybridization is used to establish the location of satellite DNA in the chromosome. For this purpose, radioactive copies of satellite DNA or its complementary RNA (using satellite DNA as a template) are prepared. Chromosomes in squash preparations are pre-treated to expose and denature their DNA without affecting their structural integrity. The chromosomes are then located with radioactive single-stranded satellite DNA or its complementary RNA. After a particular time the squash preparations are washed to remove the non-hybridized radioactive DNA/RNA probe and location of radioactivity in the chromosome is determined through autoradiographic technique, which consists of covering the squash with a special type of photofilm or photographic emulsion. These slides are stored in dark for a definite time after which the film is developed. The film is viewed under a light microscope. The areas having radioactivity would show silver grains on photographic film (due to radioactive decay). Using this technique, satellite DNA has been shown to be mainly located in telomeric and centromeric regions. ISH is also used to locate specific genes in giant chromosomes of Drosophila. For this purpose, a radioactive clone representing a gene, mostly labeled cDNA copy of mRNA, is used as a probe. This shows a specific sequence in particular bands of salivary chromosomes. Colony hybridization is also a kind of in situ hybridization. It is also used to diagnose a disease or detect a virus in a tissue or a cell. It permits the location of a concerned pathogen in a cell or tissue. Other recent developments in the applications of ISH involve in situ polymerase chain reaction (PCR) and in situ reverse transcription (RT)-PCR, which can be used to detect very low levels of nucleic acids in tissues by taking advantage of the powerful amplification capacity of PCR.

27.4

Essentials of Molecular Genetics

Fluorescent in situ Hybridization Fluorescent in situ hybridization (FISH) technique is used for the detection of target molecules with a system of coupled antibodies and fluorochromes. The detection of nucleotidic sequences on a combed DNA molecule is performed indirectly, by first hybridizing the seeked nucleotidic sequences (the probes) with the combed DNA (also called the matrix DNA or target). If the probes are synthesized with incorporated fluorescent molecules or antigenic sites which can be recognized with fluorescent antibodies, the direct visualization of the relative position of the probes is possible. This is the goal of physical mapping. The probes are synthesized separately using different existing protocols. Basically, one strand of a double-stranded DNA molecule is resynthesized while incorporating modified nucleotides. The random priming technique consists of polymerization of part of a complementary single-strand between two single-strand hexamers (short single-stranded DNA made of 6 nucleotides). This is explained, according to Feinberg and Vogelstein (1983), in Figure 27.2A. Three examples of hexamers from the mixture of all possible hexamers in random primers are given here. These three particular primers could bind to three overlapping portions of this mRNA to prime the production of cDNA. The primer that arrives first will bind and the other two will have to find another segment of DNA (either another copy of the same mRNA or from a different locus) to bind. Hexamers were used instead of octamers to minimize clutter. The polymerase enzyme (P) synthesizes the complementary strand N between hexamer 1 and hexamer 2, previously hybridized onto the single-stranded DNA molecule M (Figure 27.2B). During the synthesis, random incorporation of labeled nucleotides takes place, which leads to a labeled singlestrand DNA molecule after denaturation. The hybridization step consists in simply mixing the single-strand probes with the denaturated target DNA (the combed molecules). Denaturation of the DNA is obtained by heating the DNA, which separates the two strands, and allows access of the single-strand probes. The detection of the probes is the final step of fluorescent in situ hybridization. It consists of recognizing the probes with fluorescent antibodies corresponding to the antigens incorporated in the probes. In the biotin-avidin system, one uses modified fluorescent avidin molecules, which can themselves be recognized by another Figure 27.2 (A) Three examples of hexamers from the layer, as shown in Figure 27.3. ―M‖ represents mixture of all possible hexamers in random primers the denaturated DNA matrix strand, and ―H‖ the hybridized probe. The probe has modified nucleotidic sites, which possess a biotin molecule, to which fluorescent streptavidin molecules spontaneously bind. These molecules are then recognized by antiavidin antibodies with a fluorescent site and a biotin arm. The same construction can be used several times, leading to a sandwich of detection layers. Observation of the hybridized sequences is made with an epifluorescence microscope. The white light of the source lamp is filtered so that only the relevant wavelengths for excitation of the fluorescent molecules arrive onto the sample. Emission of the fluorochromes happens in general at larger wavelengths, which allows distinguishing between excitation and emission light by mean of another optical filter. One thus sees bright colored signals onto a dark background. With a more sophisticated filter set, it is possible to distinguish between several excitation and emission bands, and thus between several fluorochromes, which allows observation of many different probes on the same strand.

Molecular Techniques and Tools

27.5

Figure 27.3 Detection of the probes in fluorescent in situ hybridization (FISH)

SQUASH AND DOT HYBRIDIZATION The in situ hybridization technique is often used for locating positions of repeat sequence on a specific chromosome. In some cases, the repeated DNA from an alien species is used for detecting the presence of chromatin material introduced from this species in a crop plant due to introgression. Squash dot hybridization technique is designed for this purpose. In this technique, one root tip from a germinated seed may be quashed onto nitrocellulose filter and filters are hybridized with radioactively labeled repeated DNA probe derived from the alien species in question. After washing the filters in appropriate buffers, they are used for exposure of X-ray film, which on developing gives the desired information about the presence of alien genetic material. The successful use of this technique has actually been demonstrated to detect the presence of introgressed rye chromatin in several wheat cultivars, through the use of highly repeated DNA probes derived from rye.

SOUTHERN BLOTTING The name of this technique has been derived from its inventor E.M. Southern and DNA-DNA hybridization forms its basis (Southern 1975). Various steps in Southern hybridization are electrophoresis, denaturation, blotting, baking, pre-treatment, hybridization and autoradiography. The procedure is outlined in Figure 27.4. A sample of DNA containing fragments of different sizes is subjected to electrophoresis using polyacrylamide (for smaller fragments upto 20 kb) or agarose gel (for larger fragments, greater than 20 kb). Mechanical shearing or restriction endonucleases can fragment DNA. Very large fragments upto 1,000-2,000 kb are separated from agarose gel with pulsed electrical fields (PEF) or field inversion or pulsed field gel electrophoresis (PFGE). Gel provides varying fragments depending upon their size. Since DNA is negatively charged, it moves to positive pole. Marker fragments of known sizes are placed in a separate lane to interpolate sizes of the unknown fragments. Gels are stained with intercalating dye ethidium bromide, which gives visible fluorescence on illumination of gel with ultraviolet light. As little as 0.05 g DNA band can be detected. The restriction fragments of DNA present in the agarose gel after electrophoresis are denatured into single-stranded DNA by alkali treatment. They are then transferred to a nitrocellulose filter membrane. The DNA becomes trapped in nitrocellulose membrane. This step is known as blotting. It takes several hours but bands retain their positions on the filter and there is a minimum loss in their resolution. The nitrocellulose membrane is removed and the DNA is permanently immobilized on the

27.6

Essentials of Molecular Genetics

membrane by baking it at 80 °C in vacuo. Single-stranded DNA has high affinity towards nitrocellulose filter (RNA lacks it). The baked membrane is treated with a solution containing 0.2 per cent each of Ficoll (a polymer of sucrose), polyvinylpyrrolidone and bovine serum albumin. This is called pre-treatment. It prevents non-specific binding of radioactive probes. The pre-treated membrane is placed in a solution of radioactive single-stranded DNA or oligodeoxynucleotide, called probe. The probe hybridizes with complementary DNA and this step is known as hybridization reaction. After hybridization, the membrane is washed to remove the unbound probe. The membrane is now placed in close contact with an X-ray film and incubated to allow images due to radioactive probe. The film is developed to reveal distinct bands in the gel which show complementarity with the probe. Southern blotting is commonly used method for the identification of DNA fragments that are complementary to a know DNA sequence. This technique has numerous applications. It allows a comparison between the genome of a particular organism and that of an available gene or gene fragment (the probe). It can tell us whether an organism contains a particular gene, and provide information about the organization and restriction map of that gene. Particular DNA fragments can be analyzed/detected. It has been applied to detect restriction fragment length polymorphism (RFLP) and variable-number-of-tandem-repeat (VNTR) polymorphism. The latter is the basis of DNA fingerprinting. It is also used in detection and identification of transferred genes Figure 27.4 Steps in Southern blotting technique in the transgenic individuals. Now nylon (Modified, with permission, from Gupta, P.K. 1994. membranes have been developed. These Genetics. Merrut: Rastogi Publ.) membranes are physically robust. Both DNA and RNA become cross-linked by a brief exposure to UV light. These membranes save baking time. Same membrane blot can be used for search with more than one probe. It is reusable.

NORTHERN BLOTTING This technique was developed by Alwine et al. (1977). It is simply an extension of Southern blotting. A somewhat different filter is used to blot RNA from agarose gel. Probe can be DNA or RNA but the condition is that it should be labeled. In this case, DNA/RNA or RNA/RNA hybrids are formed.

Molecular Techniques and Tools

27.7

Northern blotting separates RNA samples by size and detection with a hybridization probe complementary to part of or the entire target sequence. In this technique, RNAs are separated by gel electrophoresis and RNA bands are transferred to a suitable membrane, e.g., diazo benzyloxymethyl (DBM) paper or nylon membrane, for immobilization (baking) and hybridized with radioactive singlestranded DNA probe. DBM paper, which is prepared from Whatman filter paper No. 540 after a series of reactions, is diazotized by introducing the diazo group into a chemical compound through the treatment of an amine with nitrous acid. This specially prepared filter is used to blot RNA from agarose gel. These bands are detected by autoradiography. DBM is equally effective in binding to denatured DNA and is more efficient than nitrocellulose filters for small DNA fragments. Later, Thomas (1980) showed that mRNA bands could be blotted directly onto to nitrocellulose membrane which can be hybridized with a labeled DNA or RNA probe. The single-stranded regions of probe are removed by nuclease so that quantitative estimation of hybridized mRNA can also be made. Northern blotting is used for the detection of RNA size. It is used for observing alternate splice products. The use of probes with partial homology can be made. The quality and quantity of RNA can be measured on the gel prior to blotting. The membranes can be stored and reprobed for years after blotting. Southern and northern blots differ from each in certain ways. In Southern blotting, DNA is separated. DNA has to be denatured before blotting. Here nitrocellulose membrane filter is normally used. Hybridization with DNA probe produces DNA/DNA hybrids. In northern blotting, RNA is separated. RNA being single-stranded, denaturation step is not needed. Nitrocellulose membrane filter is not used as RNA does no bind to it; instead diazo benzyloxymethyl (DBM) paper or nylon membrane is used. Hybridization with DNA probe produces RNA/DNA hybrids.

WESTERN BLOTTING This technique was developed by Towbin et al. (1979). This technique is similar to Southern blotting but it is used to identify different polypeptide chain products of genes. Western blotting is used to detect a particular protein in a mixture. The probe used is therefore not DNA or RNA, but antibodies. The technique is also called "immunoblotting". In this case, proteins are electrophoresed in polyacrylamide gel and transferred onto nitrocellulose or nylon membrane. Earlier capillary blotting was done but now-a-days elctrophoretic blotting is done as it is much faster than capillary blotting. Protein bands are detected by their interaction with antibodies, lectins or some other compounds, which are used as probes. Lectins are used as probes for identification of glycoproteins. They may be radioactive or a radioactive molecule may be tagged to them. Often detection process is known as ―sandwich reaction‖. This technique is similar to Southern blotting but it is used to identify different polypeptide chains. Western blotting is used to identify newly formed polypeptide chains by a transformed cell. The extracted proteins are subjected to polyacrylamide gel electrophoresis (PAGE) and are then transferred onto nitrocellulose to which they bind. Nitrocellulose membrane is then used for probing with a specific labeled antibody. Antibody tends to bind with a protein; it does not hybridize with protein. The antibody may be labeled with 135I and the signal is detected again with autoradiography. Western blotting technique is very useful in studying final product of protein-coding gene. It is used to identify newly formed polypeptide chains by a transformed cell. It has been widely used in analyzing and identifying target proteins. The assay can also be used for semi-quantifying protein concentration through measuring signal intensity of each protein band, if a standard curve of the same protein or a protein at similar molecular weight can be established. It has also been used in clinical laboratories for assisting identification of certain antigen proteins (pathogen or biomarker).

27.8

Essentials of Molecular Genetics

EASTERN BLOTTING Most proteins that are translated from mRNA undergo modifications before becoming functional in cells. These modifications are collectively known as post-translational modifications (PTMs). The nascent or folded proteins, which are stable under physiological conditions, are then subjected to a battery of specific enzyme-catalyzed modifications on the side chains or backbones. Post-translational protein modifications can include: acetylation, acylation (myristoylation, palmitoylation), alkylation, arginylation, biotinylation, formylation, geranylation, glutamylation, glycosylation, glycylation, hydroxylation, isoprenylation, lipoylation, methylation, nitroalkylation, phosphopantetheinylation, phosphorylation, prenylation, selenation, S-nitrosylation, sulfation, transglutamination, and ubiquitination (sumoylation) (Mann and Jensen 2003; Walsh et al. 2003). Eastern blotting has been described by Thomas et al. (2009) as a technique which probes proteins blotted to PVDF membrane with lectins, cholera toxin and chemical stains to detect glycosylated, lipoylated or phosphorylated proteins. Thus, eastern blotting can be considered an extension of the biochemical technique of western blotting. Multiple techniques have been described by the term eastern blotting. It is most often used to detect carbohydrate epitopes. One application of the technique includes detection of protein modifications in two bacterial species. Cholera toxin B subunit (which binds to gangliosides), Concanavalin A (which detects mannose-containing glycans) and nitrophospho molybdate-methyl green (which detects phosphoproteins) were used to detect protein modifications (Thomas et al. 2009). Post-translational modifications occurring at the N-terminus of the amino acid chain play an important role in translocation across biological membranes. These include secretory proteins in prokaryotes and eukaryotes and also proteins that are intended to be incorporated in various cellular and organelle membranes such as lysosomes, chloroplast, mitochondria and plasma membrane. Expression of post-translated proteins is important in several diseases.

DOT BLOTS AND SLOT BLOTS These techniques for detecting, analyzing, and identifying proteins, similar to the western blot technique but differing in that protein samples are not separated electrophoretically but are spotted through circular templates directly onto the membrane or paper substrate. The drawbacks of blotting techniques have lead to the development of dot blotting technique which is more advanced, less time consuming, accurate and applicable to a wide variety of gene/source simultaneously. The dot or slot blotting technique is the most widely used of all techniques for analyzing proteins and nucleic acids (DNA and RNA). None of these blot methods requires electrophoresis prior to blotting and hybridization. Hybridization of cloned DNA without electrophoretic separation is called dot blotting. The overall scheme of dot blot technique is presented in Figure 27.5. Slot blot procedure comprises of the following steps: (1) To each sample, add equal amounts of DNA to the wells of a 96 well plate (If possible, try to keep volume as small as possible (~10 μl). (2) Add 100 μl of 0.1 N NaOH to each well. Incubate plate at 65 °C for 30 min. (3) Add 100 μl of 2 M ammonium acetate to each well. (4) Prepare nylon membrane(s): Cut to the appropriate size and label. Place membrane in double-distilled water (ddH2O). Pour out ddH2O and add 20× SSC buffer. Set up the slot blot apparatus as follows: Wet two sheets of blotting paper in 20× SSC and place both sheets on the bottom of the slot blot apparatus. Then place the Nylon membrane on top of the blotting paper and remove the air bubbles. Close slot blot apparatus. (5) Transfer samples to the well of the slot blot apparatus. Suck samples into wells by applying house vacuum to the slot blot apparatus. (6) Remove filter and bake in a vacuum oven for 2 h. (7) Hybridize with the appropriate probe using the Southern hybridization procedure.

Molecular Techniques and Tools

27.9

Figure 27.5 Dot blots

ELECTROPHORESIS Gel electrophoresis is used for the separation of deoxyribonucleic acid (DNA), ribonucleic acid (RNA), or protein molecules using an electric field applied to a gel matrix. Gel electrophoresis is usually performed for analytical purposes, often after amplification of DNA via PCR, but may also be used as a preparative technique prior to use of other methods such as mass spectrometry, RFLP, PCR, cloning, DNA sequencing. The term "gel" in this instance refers to the matrix used to contain and then separate the target molecules. In most cases, the gel is a cross-linked polymer whose composition and porosity is chosen based on the specific weight and composition of the target to be analyzed. When separating proteins or small nucleic acids (DNA, RNA, or oligonucleotides) the gel is usually composed of different concentrations of acrylamide and a cross-linker, producing different sized mesh networks of polyacrylamide (Smithies 1955; Bier 1959). When separating larger nucleic acids (greater than a few hundred bases), the preferred matrix is purified agarose. In both cases, the gel forms a solid, yet porous matrix. Acrylamide in contrast to polyacrylamide is a neurotoxin and must be handled using appropriate safety precautions to avoid poisoning. Agarose is composed of long unbranched chains of uncharged carbohydrate without cross links resulting in a gel with large pores allowing for the separation of macromolecules and macromolecular complexes. Electrophoresis refers to the electromotive force (EMF) that is used to move the molecules through the gel matrix. By placing the molecules in wells in the gel and applying an electric field, the molecules will move through the matrix at different rates, determined largely by their mass when the charge to mass ratio (Z) of all species is uniform, toward the anode if negatively charged or toward the cathode if positively charged. After the electrophoresis is complete, the molecules in the gel can be stained to make them visible. Ethidium bromide, silver, or coomassie blue dye may be used for this process. Other methods may also be used to visualize the separation of the mixture's components on the gel. If the analyte molecules fluoresce under ultraviolet light, a photograph can be taken of the gel under ultraviolet lighting conditions. If the molecules to be separated contain radioactivity added for visibility, an autoradiogram can be recorded of the gel. If several mixtures have initially been injected next to each other, they will run parallel in individual lanes. Depending on the number of different

27.10

Essentials of Molecular Genetics

molecules, each lane shows separation of the components from the original mixture as one or more distinct bands, one band per component (Figure 27.6.). Bands in different lanes that end up at the same distance from the top contain molecules that passed through the gel with the same speed, which usually means they are approximately of the same size. There are molecular weight size markers available that contain a mixture of molecules of known sizes. If such a marker is run on one lane in the gel parallel to the unknown samples, the bands observed can be compared to those of the unknown in order to determine their size. In the case of nucleic acids, the direction of migration, from negative to positive electrodes, is due to the naturallyoccurring negative charge carried by their sugar-phosphate Figure 27.6 A gel with electrobackbone. Double-stranded DNA fragments naturally behave phoresed DNA in several lanes as long rods, so their migration through the gel is relative to their size or, for cyclic fragments, their radius of gyration. Single-stranded RNA or DNA tends to fold up into molecules with complex shapes and migrate through the gel in a complicated manner based on their tertiary structure. Therefore, agents such as sodium hydroxide or formamide that disrupt the hydrogen bonds are used to denature the nucleic acids and cause them to behave as long rods again. For separation of DNA fragments differing by a few base pairs, polyacrylamide gels are used. Polyacrylamide gels are more commonly used for DNA sequencing experiments. DNA fragments of large size cannot be handled by this technique. Gel electrophoresis of large DNA or RNA is usually done by modified agarose gel electrophoresis. Separation of large DNA molecules or whole chromosomes is done using pulsed field gel electrophoresis (PFGE). In this method, short pulses of electricity are used in two different directions and DNA is embedded and used in the form of agarose plugs to avoid fragmentation of large DNA molecules. Using technique of PFGE and a more refined technique contour clamped homogeneous electric field electrophoresis (CHEFE), genomes of several fungi could be resolved into chromosomal bands and used for mapping of DNA sequences in specific chromosomes. Proteins are analyzed for studying gene expression patterns. Proteins, unlike nucleic acids, can have varying charges and complex shapes. Therefore, they may not migrate into the polyacrylamide gel at similar rates, or at all, when placing a negative to positive EMF on the sample. Proteins therefore, are usually denatured in the presence of a detergent such as sodium dodecyl sulfate/sodium dodecyl phosphate (SDS/SDP) that coats the proteins with a negative charge. Generally the amount of SDS bound is relative to the size of the protein (usually 1.4g SDS per gram of protein) so that the resulting denatured proteins have an overall negative charge, and all the proteins have a similar charge to mass ratio. Since denatured proteins act like long rods instead of having a complex tertiary shape, the rate at which the resulting SDS coated proteins migrate in the gel is relative only to its size and not its charge or shape.

COLONY OR PLAQUE HYBRIDIZATION Colony hybridization is the screening of a library with a labeled probe (radioactive, bioluminescent, etc.) to identify a specific sequence of DNA, RNA, enzyme, protein, or antibody. Colonies carrying this sequence are identified by dark spots after autoradiography, so that the original chimeric vector carrying the desired gene sequence can be recovered from one or more colonies in the original master plate and used for further experiments.

Molecular Techniques and Tools

27.11

This technique is used for bacterial colonies in a plate, which contains a specific DNA sequence (usually a cloned cDNA for screening of a genomic library) (Grunstein and Hogness 1975). Following steps comprise this technique: (1) Bacterial cells are transformed (i.e., by introducing a vector carrying a given sequence). This is master plate. (2) Colonies from master plate are replica-plated on a nitrocellulose filter by making a reference fragment on an agar plate. (3) After the colonies appear on the filter, it is removed from the agar plate and treated with alkali to lysate the bacterial cells. This also denatures DNA. (4) Filter is treated with proteinase K to digest and remove proteins. The denatured DNA remains bound to the filter. (5) The filter is now baked at 80 °C to fix DNA. This yields DNA prints of bacterial colonies in the same position as in the master plate. (6) The filter is now hybridized with radioactive RNA probe which represents the sequence of DNA used in transformation. Repeated washing removes unbound probe. (7) Hybridized colonies are detected by autoradiography. The colonies with the desired sequence could be picked up for further studies. Riley and Caffrey (1990) identified enterotoxigenic Escherichia coli by colony hybridization with non-radioactive digoxigeninlabeled DNA probes.

CHROMOSOME WALKING Chromosome walking is a method in molecular genetics for identifying and sequencing long parts of a DNA strand, e.g., a chromosome (Bender et al. 1983). Using conventional techniques, one may be able to characterize about a 1,000kb chromosome. By using pulse field gel electrophoresis (PFGE), more than 100 kb DNA fragments may be used for chromosome walking. Chromosome walking is used in identification of chromosome fragments with an overlapping sequence for reconstruction or characterization of the large chromosome regions. Thus this technique is useful in studying segments of DNA larger than can be individually cloned by using overlapping probes. Chromosome walking involves following steps (Figure 27.7): (1) From the genomic library a clone of interest is Figure 27.7 Chromosome walking with RFLP markers selected and a small fragment is subcloned from one end of the clone. (2) The sub-cloned fragment of the selected clone may be hybridized with other clones in the library and the second clone hybridizing with the sub-clone of the first clone is identified due to the presence of overlapping region. (3) The end of the second clone is then subcloned and used for hybridization with other clones to identify a third clone having overlapping region with the subcloned end of the second clone. (4) Third clone identified as above is

27.12

Essentials of Molecular Genetics

also subcloned and hybridized with the clones in the same manner, and the procedure is continued. (5) Restriction map of each selected clone may be prepared and compared to know the regions of overlap so that identification of new overlapping restriction sites will amount to walking along the chromosome or along a long chromosome segment. Chromosome walking is also used in gene isolation.

CHROMOSOME JUMPING/HOPPING Chromosome hopping (or jumping) is a technique of isolating clones from a genomic library that are not contiguous but skip a region between known points on the chromosome. This is done usually to bypass regions that are difficult or impossible to ―walk‖ through or regions known not to be of interest. This technique is based on the following steps (Figure 27.8): (1) Depending upon the distance between gene and the marker, distance of ‗jumps‘ or ‗hopsize‘ is decided (e.g., 1,000 kb or 2,000 kb). (2) Genomic DNA molecules in the range of selected size (say 80 kb to 130 kb) in case of 100 kb hopsize or 160-240 kb for hopsize of 200 kb) are selected through pulse-field gel electrophoresis. (3) For circularization of DNA segments, ligation between two ends of each long linear DNA was allowed using T4 ligase in the presence of supF, a suppressor tRNA gene that allows selection of clones representing junction fragments by cloning them in an amber mutated phage supF–). (4) DNA circles obtained in step 3 are digested with EcoRI; (5) the vector Ch3A lac, an Figure 27.8 Steps involved in chromosome jumping amber-mutated phage vector (supF–) is also (Redrawn, with permission, from Gupta, P.K. 1994. Genetics. cut with EcoRI and used for cloning small Merrut; Rastogi Publ.) DNA fragments representing the junctions of the circularized genomic DNA molecules and carrying supF–. (6) The cloned DNA fragments obtained in (5) above represent the jumping library, which can be plated on a bacterial host and screened through the technique of plaque hybridization. This technique of chromosome jumping helps in narrowing the gap between the gene and the molecular markers so that after several cycles of chromosome jumping followed by cloning the regions that are closer to the gene, it will be possible to approach very close to the desired gene and clone it.

CHROMOSOME LANDING In plants, the abundance of repetitive DNA makes chromosome walking impossible, but the availability of high-density physical maps in many species based on markers such as RAPDs means

Molecular Techniques and Tools

27.13

that markers are often found close enough to genes to be included on the same genomic clone. Since this procedure allows a suitable marker to be used directly as a probe to screen a genomic library, the procedure has been termed chromosome landing which is analogous to transposon tagging. The genetic technique of chromosome landing is a method of cloning of a gene of interest from a clone library. It is based on the principle that the expected average between-marker distances can be smaller than the average insert length of a clone library containing the gene of interest. The strategy of chromosome walking is based on the assumption that it is difficult and time consuming to find DNA markers that are physically close to a gene of interest. Recent technological developments invalidate this assumption for many species. As a result, the mapping paradigm has now changed such that one first isolates one or more DNA marker(s) at a physical distance from the targeted gene that is less than the average insert size of the genomic library being used for clone isolation. The DNA marker is then used to screen the library and isolate (or 'land' on) the clone containing the gene, without any need for chromosome walking and its associated problems. Chromosome landing, together with the technology that has made it possible, is likely to become the main strategy by which map-based cloning is applied to isolate both major genes and genes underlying quantitative traits in plant species. Chromosome landing is a paradigm for map-based gene cloning in plants with large genomes (Tanksley et al. 1995).

NICK TRANSLATION Circular (e.g., Simian Virus 40) and linear (e.g., λ phage) DNAs have been labeled to high specific radioactivities (>108 cts/min per μg) in vitro using deoxynucleoside [α-32P] triphosphates (100 to 250 Ci/mmol) as substrates and the nick translation activity of E. coli DNA polymerase I (Rigby et al. 1977). The reaction product yields single-stranded fragments about 400-nucleotide long following denaturation. Because restriction fragments derived from different regions of the nick-translated DNA have nearly the same specific radioactivity (cts/min per 10[su3] bases), it was inferred that nicks are introduced, and nick translation is initiated, with equal probability within all internal regions of the DNA. Such labeled DNAs and restriction endonuclease fragments derived from them) are useful probes for detecting rare homologous sequences by in situ hybridization and reassociation kinetic analysis.

RNA SEQUENCING RNA is less stable in the cell and also more prone to nuclease attack experimentally. As RNA is generated by transcription from DNA, the information is already present in the cell's DNA. However, it is sometimes desirable to sequence RNA molecules. In particular, in eukaryotes RNA molecules are not necessarily colinear with their DNA template, as introns are excised. Earlier, the world of nucleic acid was a RNA world and the history of nucleic acid sequencing technology was largely contained within the history of RNA sequencing. All current methods allow the direct use of small DNA/RNA fragments not requiring their insertion into a plasmid or other vector, thereby removing a costly and time-consuming step of traditional methods.

Generating DNA through Reverse Transcription To sequence RNA, the usual method is first to reverse transcribe the sample to generate copy DNA fragments. This copy DNA is then sequenced by employing one of many DNA sequencing methods.

27.14

Essentials of Molecular Genetics

Ion Exchange Chromatography The first complete nucleotide sequence of RNA was published by R.W. Holley (Holley 1964; Holley et al. 1965) for yeast alanine tRNA. Their method involved digesting RNAs with sequence-specific RNase fractionating the resulting oligoribonucleotides by ion exchange chromatography or two dimensional homochromatography and establishing the order of bases within each fragment by exonuclease digestion by using two different methods of fragmentation. They were able to establish overlaps between fragments and hence to assemble the entire 77-nucleotide sequence of yeast alanine tRNA (tRNAAla). To determine its sequence, they cleaved the alanine tRNA polynucleotide chain into 16 fragments, identified the small fragments, and then reconstructed the original nucleotide sequence by determining the order in which the small fragments occurred in the RNA molecule. They first used pancreatic ribonuclease to cleave the RNA chain next to pyrimidine nucleotides and then used takadiastase ribonuclease T1 to cleave the RNA chain at guanylic acid residues. They isolated the resulting fragments by ion-exchange chromatography. The components of dinucleotide fragments were then identified by chromatographic and electrophoretic properties and spectra. Larger fragments were digested with snake venom phosphodiesterase and sequenced. The determination of the structures of all the fragments took approximately 2½ years. R. Holley was awarded Nobel Prize in 1968 for determining base sequence of yeast alanine transfer RNA.

RNA Sequencing with Radioactive Chain-Terminating Ribonucleotides The DNA sequencing approaches developed by Maxam and Gilbert (1977) and Sanger et al. (1977) were also used for sequencing of RNA. The Sanger sequencing approach employs the 3′-deoxy analogs of the ribonucleoside triphosphates as specific chain terminators during RNA synthesis. The DNA molecules are radioactively labeled at 5′-end. This method was used to sequence MDV-1(-) RNA, a molecule that is synthesized in vitro by Qβ replicase (Kramer and Mills 1978). Maxam and Gilbert devised a chemical method to sequence DNA. Using this approach, yeast 5S and 5.8S RNA were sequenced. Four different base-specific chemical reactions generated a means of directly sequencing RNA terminally labeled with 32P. Four different base-specific chemical reactions generate a means of directly sequencing RNA terminally labeled with 32P (Peattie 1979). After a partial, specific modification of each kind of RNA base, an amine-catalyzed strand scission generates labeled fragments whose lengths determine the position of each nucleotide in the sequence. Dimethyl sulfate modifies guanosine. Diethyl pyrocarbonate attacks primarily adenosine. Hydrazine attacks uridine and cytidine, but salt suppresses the reaction with uridine. In all cases, aniline induces a subsequent strand scission. The electrophoretic fractionation of the labeled fragments on a polyacrylamide gel, followed by autoradiography, determines the RNA sequence. RNA labeled at the 3′-end yields clean cleavage patterns for each purine and pyrimidine and allows a determination of the entire RNA sequence out to 100-200 bases from the labeled terminus.

Next Generation Sequencing Technologies for RNA Sequencing The arrival of next generation sequencing (NGS) technologies has brought a revolution in sequencing technologies. RNA sequencing (RNA-Seq) is perhaps one of the most complex next-generation applications. Any high throughput sequencing technologies can be used for RNA-Seq. The Illumina, Applied Biosystems SOLiD and Roche 454 Life Sciences systems have already been applied for this

Molecular Techniques and Tools

27.15

purpose. Various steps involved in RNA-Seq include sample isolation and library preparation, sequencing, and genome alignment and reads assembly, which are discussed here briefly. The first step in RNA-Seq experiment is the isolation of RNA samples. Transcriptome consists of both mRNA and non-mRNA, and a large amount (90–95%) of rRNA species. To perform a whole transcriptome analysis which is limited to annotated mRNAs, the selective depletion of abundant rRNA molecules (5S, 5.8S, 18S and 28S) is a key step. Hybridization with rRNA sequence-specific 5′biotin labeled oligonucleotide probes and then followed by removal with streptavidin-coated magnetic beads is the main procedure to selectively deplete large rRNA molecules from total isolated RNA. rRNA is characterized by the presence of 5′-phosphate which is a useful approach for selective ribodepletion. It is based on the use of an exonuclease which is able to specifically degrade RNA molecules bearing a 5′-phosphate. Compared to the polyadenylated (polyA) mRNA fraction, the ribodepleted RNA is enriched in non-polyA mRNA, preprocessed RNA, tRNA, regulatory molecules such as miRNA, siRNA, small ncRNA, and other RNA transcripts of yet unknown function. A doublestranded cDNA library can be usually be prepared by using (1) fragmented double-stranded (ds) cDNA and (2) hydrolyzed or fragmented RNA. The cDNA fragments are ligated to specific adaptor sequences to one or both ends. The resulting cDNA is size selected by gel electrophoresis and cDNAs are PCR amplified (Costa et al. 2010). The size distribution is evaluated before sequencing. Sequencing Each molecule, with or without amplification, is then sequenced in a high-throughput manner to obtain short sequences from one end (single-end sequencing) or both ends (pair-end sequencing) using different NGS platforms (454 Roche, Illumina, SOLiD) (Metzker 2010). The reads are typically 30400 bp, depending on the DNA-sequencing technology used. The tetrabytes of data generated is stored in supercomputers. Genome alignment and reads assembly The first step of NGS data analysis consists of mapping the sequence reads to a reference genome (and/or to known annotated transcribed sequences) if available, or de novo assembling to produce a genome-scale transcriptional map. Analyzing the transcriptome of organisms without a specific reference genome requires de novo assembling (Zerbino and Birney 2008). A reasonable strategy for improving the quality of the assembly is to increase the read coverage and to mix different reads types.

Direct RNA Sequencing Tanaka et al. (1980) developed a direct read-off sequencing procedure, based on the method of Stanley and Vassilenko (1978) using E. coli 5S ribosomal RNA as a model compound. Radioactive bands were transferred from an acrylamide gel fractionation in the first dimension onto a DEAE-cellulose thin layer plate. After in situ enzymatic digestion with RNase T2, mononucleoside 3',5'-diphosphates were separated in the second dimension by electrophoresis at pH 2.3. Using this two-dimensional procedure the entire sequence of 163 residues of the previously unknown Vicia faba (broad bean) 5.8S ribosomal RNA was deduced. In direct RNA sequencing using the Helicos approach, a RNA that is polyadenylated and 3′ deoxy-blocked with poly(A) polymerase is captured on poly(dT)-coated surfaces. A ‗fill-and-lock‘ step is performed, in which the ‗fill‘ step is performed with natural thymidine and polymerase, and the ‗lock‘ step is performed with fluorescently labeled A, C and G Virtual Terminator (VT) nucleotide and

27.16

Essentials of Molecular Genetics

polymerase. This step corrects for any misalignments that may be present in poly(A) and poly(T) duplexes, and ensures that the sequencing starts in the RNA template rather than the polyadenylated tail. Imaging is performed to locate the positions of the templates. Then, chemical cleavage of the dye– nucleotide linker is performed to release the dye and prepare the templates for nucleotide incorporation. Incubation of this surface with one labeled nucleotide (C-VT is shown as an example) and a polymerase mixture is carried out. After this step, imaging is performed to locate the templates that have incorporated the nucleotide. Chemical cleavage of the dye allows the surface and DNA templates to be ready for the next nucleotide-addition cycle. Nucleotides are added in the C, T, A, G order for 120 total cycles (30 additions of each nucleotide) (Ozsolak et al. 2009). RNA-Seq is not limited to detecting transcripts that correspond to existing genomic sequence. For example, 454-based RNA-Seq has been used to sequence the transcriptome of the Glanville fritillary butterfly (Vera et al. 2008). A second advantage of RNA-Seq relative to DNA microarrays is that RNA-Seq has very low background signal because DNA sequences can be unambiguously mapped to unique regions of the genome. RNA-Seq does not have an upper limit for quantification, which correlates with the number of sequences obtained. It has a large dynamic range of expression levels over which transcripts can be detected: a greater than 9,000-fold range was estimated in a study that analyzed 16 million mapped reads in Saccharomyces cerevisiae (Nagajakshmi et al. 2008), and a range spanning five orders of magnitude was estimated for 40 million mouse sequence reads. The results of RNA-Seq also show high levels of reproducibility, for both technical and biological replicates. Because there are no cloning steps, and with the Helicos technology there is no amplification step, RNA Seq requires less RNA sample. This method offers both single-base resolution for annotation and ‗digital‘ gene expression levels at the genome scale, often at a much lower cost than either tiling arrays or large-scale Sanger EST sequencing. Whole Transcriptome Shotgun Sequencing (WTSS) refers to the use of high-throughput sequencing technologies to sequence cDNA in order to get information about a sample's RNA content, a technique that is quickly becoming invaluable in the study of diseases like cancer. Thanks to the deep coverage and base level resolution provided by next-generation sequencing instruments, RNA-Seq provides researchers with efficient ways to measure transcriptome data experimentally, allowing them to get information such as how different alleles of a gene are expressed, detect post-transcriptional changes or identifying gene fusions. A method called mRNA-Seq has been used to sequence, at various levels of detail, the transcriptomes of four organisms – the fission yeast Schizosaccharomyces pombe, the budding yeast S. cerevisiae, the plant Arabidopsis, and laboratory mouse (Gravely 2008) (Figure 27.9).

Figure 27.9 mRNA-Seq analyses isolated mRNA using one of the three methods a, b or c. In all the three cases the resulting DNA is analyzed by next-generation sequencing technologies

Molecular Techniques and Tools

27.17

DNA SEQUENCING Chemical-Cleavage Method DNA can be sequenced by a chemical procedure that breaks a terminally labeled DNA molecule partially at each repetition of a base. The lengths of the labeled fragments then identify the positions of that base. Maxam and Gilbert (1977) described reactions that cleaved DNA preferentially at guanine, adenine, cytosine, and thymine equally, and cytosines alone. When the products of these reactions are resolved by size, by electrophoresis on a polyacrylamide gel, the DNA sequence can be read from the pattern of radioactive bands. The technique permits sequencing of at least 100 bases from the point of labeling. In this method, many uniform repeated length of DNA are isolated through restriction endonuclease digests. Copies of the DNA fragments are labeled at their 3'- or 5'-end with radioactive 32 P. These fragments are then broken at various points by four separate chemical treatments, each treatment removing on an average either one purine or one pyrimidine from any particular chain. These four treatments are shown in Table 27.1. Table 27.1 Breakage of different DNA nucleotides by giving chemical treatments Group I II III IV

Treatment Methylation Acid (pH 2.0) Hydrazine Hydrazine in high salt

Break point breaks DNA at G breaks DNA at A and G breaks DNA at T and C breaks DNA at C

The result of each such break in the DNA is to generate a 32P-labeled fragment of a specific length that bands at a specific position on a gel subjected to an electric gradient. This process of gel electrophoresis can separate molecules that differ by one nucleotide length. The radioactive DNA molecules are then divided into four groups. Figure 27.10 illustrates sequencing of 11-nucleotide long DNA fragments that has been labeled at its 5'-end with 32P. These treatments are adjusted in such a way that on an average only one nucleotide is removed from each DNA strand. A particular treatment will generate fragments of varying lengths, depending upon which nucleotide was removed. Products of the four different treatments are placed in parallel lanes on a polyacrylamide gel at the negative pole of an electrophoretic apparatus. Pore size of the gel governs the mobility of DNA molecules. Smaller the nucleotide fragment, faster is the migration rate towards the positive pole. After electrophoresis, gel is covered with a sensitized film and an autoradiograph is taken. From the banding pattern, nucleotide sequence of the DNA is then read in ascending order from the 5' terminal end at the positive pole.

Enzymatic Method Sanger et al. (1977) developed a different sequencing method. This method is similar to ―plus or minus‖ method of Sanger and Coulson (1975) but their newer method makes use of 2′,3′-dideoxy and arabinonucleoside analogs of the normal deoxynucleoside tiphosphates, which act as specific inhibitors of DNA polymerase. The scheme used comprises of the following steps. DNA polymerase-generated single-stranded oligonucleotides are electrophoresed on polyacrylamide gels. Conditions are arranged such that the nucleotide sequences can be read directly from an autoradiograph of the electrophoresed

27.18

Essentials of Molecular Genetics

Figure 27.10 Maxam and Gilbert DNA nucleotide sequencing technique for an 11-nucleotide long sequence

polyacrylamide gel. Primer is elongated at 3', the end towards the region of bases ATGCTG using one nucleotide radioactively labeled with 32P. This comprises plus series of four experiments. Synthesis of new strand is halted at various points by insertion of dideoxy analogs. Dideoxy analogs lack 3'-OH group of deoxyribose sugar (Figure 27.11). In the minus series of experiments, the oligonucleotide primer is elongated using only three of the deoxynucleoside triphosphates (e.g., dGTP, dCTP, dTTP are added, but dATP is missing). Each primer will be elongated until the missing nucleotide is specified by the template. The termination point will be just before dA in the experiment shown. A

Molecular Techniques and Tools

27.19

Figure 27.11 Dideoxy method of DNA sequencing. Structure of one dideoxynucleotide is shown. In the figure these nucleotides are indicated by asterisks

27.20

Essentials of Molecular Genetics

minus experiment, leaving out each of the four nucleotides, is performed. DNA products of each of the four minus experiments are denatured and electrophoresed on a polyacrylamide gel. Then, autoradiography is used to locate the radioactive DNA bands. The shortest fragments move the fastest in this analysis. The sequence 3'-TACGAC-5' can be read directly from the four minus lanes in the autoradiograph. The sequence is simultaneously confirmed using the plus series of experiments. The T4 DNA polymerase will degrade double-stranded DNA from its 3' end. If one type of deoxynucleoside triphosphate (for example, dATP) is added during this process, the exonuclease action of the T4 DNA polymerase stops at the nucleotides containing the same base as the free nucleotide (at nucleotides with the base A. The minus series stops just before an A, whereas the plus series degrades until an A is reached. F. Sanger won Nobel Prize twice – once in the year 1958 for determining amino acid sequence of a protein (insulin) and then in the year 1980. F. Sanger and W. Gilbert were awarded Nobel Prize for discovering method for DNA base sequencing.

Automated DNA Sequencing Traditional methods of manual DNA sequencing utilize radioactive isotopes to label the DNA. Automated DNA sequencing utilizes fluorescent tracers instead of radioisotopes to label the DNA, thereby eliminating or significantly reducing the use of radioactive materials in some research laboratories (Caruthers 1985). Smith et al. (1986) developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments was accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different colored fluorophore was used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures were combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA were detected near the bottom of the tube, and the sequence information is acquired directly by computer. Automated DNA sequencers can read upto 96 DNA sequences in a 2-h period, which is extremely fast as compared to manual DNA sequencing. Automated DNA sequencing has the following advantages over manual DNA sequencing: (1) Radioactivity is not used. (2) Gel processing after electrophoresis and autoradiography are not needed. (3) The tedious manual reading of gels is not required as data are processed in a computer. (4) The sequence data are directly fed into and stored in a computer. (5) The separation of the same reaction products can be repeated to recheck the results in cases of doubt since they can be stored for a long period of time. (6) It is extremely fast. Connell et al. (1986) invented automated DNA sequencer. A DNA sequencer is a scientific instrument used to automate the DNA sequencing process (Figure 27.12). It can also be considered an optical instrument as it generally analyses light signals originating from fluorochromes attached to nucleotides. DNA sequencers have become more important for large genomics projects and the need to increase productivity. Modern automated DNA sequencers are able to sequence multiple samples in a batch (run) and perform as many as 24 runs a day. These perform only Figure 27.12 Automated DNA Sequencer the size separation and peak reading; the

Molecular Techniques and Tools

27.21

actual sequencing reaction(s), cleanup and resuspension in a suitable buffer must be performed separately. The magnitude of the fluorescent signal is related to the number of strands of DNA that are in the reaction. If the initial amount of DNA is small, the signals will be weak. However, the properties of PCR allow one to increase the signal by increasing the number of cycles in the PCR program. A simple DNA sequencer has one or more lasers that emit at a wavelength that is absorbed by the fluorescent dye that has been attached to the DNA strand of interest. It will then have one or more optical detectors that can detect at the wavelength that the dye fluoresces at. The presence or absence of a strand of DNA is then detected by monitoring the output of the detector. Since shorter strands of DNA move through the gel matrix faster they are detected sooner and there is thus a direct correlation between length of DNA strand and time at the detector. This relationship is then used to determine the actual DNA sequence. The output of these machines is not perfect as it may contain reading errors and needs to be processed. Until a few years ago, this task was done manually by an operator. The assembly process for a contig made of 10 samples took about 5-15 minutes, depending on the quality of the samples. However, today, modern software can automatically process the output in seconds. Further improvements in automated DNA sequencers have been made using various systems such as slab-gel sequencing systems, capillary-gel sequencing systems, DNA sequencing using PCR, DNA sequencing through transcription, large-scale DNA sequencing using microarrays on DNA chips, pyrosequencing, DNA sequencing through transcriptional motion of RNA polymerase, amplificationfree DNA sequencing, third-generation genome sequencing. Eid et al. (2009) presented single-molecule, real-time sequencing data obtained from a DNA polymerase performing uninterrupted template-directed synthesis using four distinguishable fluorescently labeled deoxyribonucleoside triphosphates (dNTPs). They detected the temporal order of their enzymatic incorporation into a growing DNA strand with zero-mode waveguide nanostructure arrays, which provided optical observation volume confinement and enabled parallel, simultaneous detection of thousands of single-molecule sequencing reactions. Conjugation of fluorophores to the terminal phosphate moiety of the dNTPs allows continuous observation of DNA synthesis over thousands of bases without stearic hindrance. The data reported directly on polymerase dynamics reveal distinct polymerization states and pause sites corresponding to DNA secondary structure. Sequence data are aligned with the known reference sequence to assay biophysical parameters of polymerization for each template position. Consensus sequences are usually generated from the singlemolecule reads at 15-fold coverage, showing a median accuracy of 99.3 per cent, with no systematic error beyond fluorophore-dependent error rates. In genome sequencing there is quest for accuracy and speed (Hayden 2009). In third-generation genome sequencing, its current machine reads 3 bases per second. It is aimed to produce entire human genome in less than 3 minutes. It also aims accuracy of greater than 99.9999 per cent.

Strategies of Genome Sequencing The Shotgun approach The shotgun approach involves breaking the genome into pieces, sequencing each of them, and then using powerful computer to order this by sequence overlaps. Fleischmann et al. (1995) used this method for mapping genome of Haemophilus influenzae. The advantage of this method is that it eliminates the need for time consuming mapping. But the problem arises due to the presence of repetitive nucleotide sequences, which make up large part of vertebrate genomes. Only 5 per cent of genome codes for protein, and rest are repetitive. Suppose there is a DNA fragment and when we do random fragmentation and then try to assemble, it may assemble incorrectly and sometimes intervening

27.22

Essentials of Molecular Genetics

information is lost. If that part contains important gene, it changes the phenotypic trait of that organism. Clone-by-clone approach It involves breaking the genome into relatively large chunks, called clones. Then cut each clone into smaller overlapping pieces and sequence the pieces and use the overlaps to reconstruct the sequence of the whole clone. Researchers in human genome sequence consortium combined this with BAC-based clone-by-clone approach. Procedure is as follows: (1) Start by breaking genome into overlapping fragments, 100-200 kbp in size. (2) Plug these into BACs and clone in E. coli. (3) Map each of these to its correct position in the genome, using a restriction enzyme-generated ―fingerprint‖ of clone to match a spot on a restriction map of entire genome. (4) Take each BAC, shear into pieces, and determine the sequence using the shotgun approach. Eventually, piece together the BACs to make the whole genome. Analogy – first separate the copies of book into individual pages, and then shred each page into its own separate pile. Then work on one page at a time. The clone-by-clone approach produced the first draft of the human genome, which was published in June 2000 and February 2001 and April 2003 in journals Nature and Science. In strict sense, the term synthesis of nucleic acids refers to in vitro or organo-chemical production of nucleic acids (deoxyribonucleic acid and ribonucleic acid) with or without a template. However, some authors also include in vitro synthesis of nucleic acids through the use of a template in this category. Here we discuss both RNA and DNA synthesis.

RNA SYNTHESIS In vivo RNA synthesis using DNA as a template is termed transcription; this type of RNA synthesis is not discussed here. RNA synthesized in vitro will be discussed.

RNA Phosphorylase Method Grunberg-Manago and Ochoa (1955) isolated an enzyme specifically involved in RNA metabolism. This enzyme, RNA phosphorylase, could link together ribonucleotides into a long RNA chain or break such a chain into smaller sections. In contrast to DNA polymerase, this enzyme could engage in polymerization activity in the absence of primer, and RNA synthesis proceeds easily no matter what proportions of the four ribonucleotides are used, as long as they are diphosphates rather than mono- or tri-phosphates. If only one kind of ribonucleotide is provided, RNA chains can be obtained consisting wholly of a single repeating base. For example, adenine ribonucleotide produces poly(A) chains. Random incorporation of ribonucleotides into chain even in the presence of DNA indicates that this particular enzyme is not responsible for DNA-directed RNA synthesis. In vivo, therefore, the function of RNA phosphorylase is probably not that of synthesizing RNA molecules, but of breaking down RNA molecules that are no longer necessary.

RNA Polymerase Method An enzyme which is probably more important enzyme for the synthesis of RNA in the cell was found by Weiss (1960), Hurwitz et al. (1960) and Stevens (1960). This enzyme, RNA polymerase, functioned only in the presence of DNA as a primer and hooked together ribonucleotides that had been added in

Molecular Techniques and Tools

27.23

the form of triphosphates. In this case, DNA primer was not affected, but remained fully biologically active. Thus primer DNA served as a template for RNA in vitro synthesis.

Homopolymer RNA Synthesis Method Nirenberg and Matthaei (1961) synthesized RNA to crack the genetic code. They synthesized homopolymers using only uracil, only adenine, only cytosine, and only guanine to obtain poly-U, poly-A, poly-C or poly-G and used them in cell free synthesis of a polypeptide. First three of these homopolymers, respectively, produced polyalanine, polylysine and polyproline. Experiments with poly-G were not successful. After this, Nirenberg and co-workers synthesized RNA by using two or more bases to assign more codons to amino acids.

Single-Stranded DNA Virus Method According to Chamberlin and Berg ((1963), the single-stranded X174 DNA forms a complementary structure with its RNA product which can be isolated as a hybrid DNA/RNA molecule. RNA production in this case occurs though the expected base-pairing mechanism. Double-stranded X174 DNA, on the other hand, does not produce hybrid DNA/RNA molecules. Nevertheless, production of RNA is presumed to occur through splitting of the hydrogen bonds between two DNA strands, and consequent pairing between RNA nucleotides and the exposed DNA nucleotides in the split region.

Phage RNA Polymerase Method SP6, T7 and T3 phage RNA polymerases have high specificity for their respective 23-base promoters. The development of cloning vectors containing promoters for these polymerases has made the in vitro synthesis of single-stranded RNA molecules a routine laboratory procedure. Modern multipurpose cloning vectors contain multiple cloning sites (MCSs) flanked on each side by promoters for different polymerases. This allows the synthesis of either sense or antisense RNAs from sequences cloned into the multiple cloning sites. Plasmid templates are generally linearized with a restriction enzyme to allow run off RNA transcripts to be synthesized with a defined end. In addition to plasmid DNA, PCR products and synthetic oligonucleotides can be used as templates for transcription reactions. For PCR products, one of the primers needs to include the promoter sequence of a phage polymerase so that the PCR product contains a phage promoter. Synthetic oligonucleotides need to contain the phage promoter, which must be double-stranded. However, the remainder of the template need only be single-stranded.

DNA SYNTHESIS DNA synthesis commonly refers to DNA replication or DNA biosynthesis (in vivo DNA amplification), polymerase chain reaction or enzymatic DNA synthesis (in vitro DNA amplification), oligonucleotide synthesis or chemical synthesis of nucleic acids, and gene synthesis (physically creating artificial gene sequences). DNA Synthesis is a process by which the strands of nucleic acid are created in vitro. Some authors consider DNA replication and DNA synthesis synonymously. In vivo synthesis of DNA is called DNA replication and will not be dealt with here while other methods of in vitro DNA synthesis will be discussed here. One approach of DNA synthesis is an organo-chemical process of formation of DNA which does not utilize DNA template and DNA polymerase. Second

Essentials of Molecular Genetics

27.24

approach of DNA synthesis is in vitro DNA replication of whole genome. In vitro synthesis of a particular gene can be done from isolated messenger RNA or from a true precursor of transfer RNA as template through reverse transcription. Peptide nucleic acid is a useful tool in DNA synthesis. DNA synthesis will be discussed here from these different points of view.

Organo-Chemical DNA Synthesis Organo-chemically synthesized DNA with defined structure using natural components of DNA is known as synthetic DNA. The key step in making synthetic DNA is formation of internucleotide phosphate linkage. Various approaches used in synthesis of DNA include (a) Phosphoryl Chloridate Method (Michelson and Todd 1955; Hogrefe 2008), (b) Phosphodiester Method (Agarwal et al. 1970), (c) Solid phase peptide synthesis (SPPS) (Merrifield 1963), Phosphotriester Method (Letsinger and Mahadevan 1965; Letsinger et al. 1969; Letsinger, et. al. 1975; Letsinger and Lunsford 1976), and Phosphoramidite Approach (Beaucage and Caruthers 1981). H.G. Khorana and his team (Agarwal et al. 1970; Khorana et al. 1972; Agarwal et al. 1976; Khorana 1979) developed a method for total synthesis of a given gene. Advances were made in this area of research after the discovery of (a) polynucleotide ligase and (b) polynucleotide kinase. This method of DNA synthesis does not use any template or polymerase. But this method requires that nucleotide sequence of the DNA fragment (gene) to be synthesized should be known. Sequence of 77nucleotide alanine tRNA (tRNAAla) of yeast was first determined by Holley et al. (1965). The procedure involves blocking of these reactive groups of the tri-phosphate nucleotides, which are not required to participate in the reaction and removing blocks of the groups which are required to participate in the reaction. Table 27.2 mentions different treatments used for blocking the reactive groups of the deoxyribonucleotides and for removing the blocks. Table 27.2 Blocking and reactivating of different molecules for construction of short oligonucleotides, as shown in Figure 27.13 Molecule 5-OH group of deoxyribonucleotide 3-OH group of deoxyribonucleotide Amino group of heterocyclic bases: Adenine Cytosine Guanine

Blocking agent

Reactivating agent

Monomethoxytrityl (mmt) Acetyl group (ac)

Mild acid NaOH

Benzoyl (bz) Anisoyl (an) Isobutryl (ib)

Aquous ammonia Aquous ammonia Aquous ammonia

Let us assume that sequence of desired polynucleotide is 5'-GGAAGCTTAAC-3'. Synthesis starts at the 5'-end of the chain, with a nucleotide carrying a 5'-monomethoxytrityl group. The second nucleotide carries acetyl group at its 3'-end. The nucleotides are joined together by tri-isopropylbenzenesulphonylchloride (TPS)-mediated condensation reaction (Figure 27.13). In this way, different di-, tri- and tetra-nucleotide blocks are formed. Then di-, tri- and tetra-nucleotides are joined by the same method to prepare longer fragments (Figure 27.14). In this stepwise manner proceeding from 5' to 3' ends, the desired polydeoxynucleotide sequence is synthesized. A DNA fragment synthesis by above method will be in the form: 5′-Gib.Gib. Abz.Abz.Gib.Can.T.T.Abz.Abz.CanOH-3'. Then the 5' OH group is phosphorylated with polynucleotide kinase. We will get fragment: 5'-pGibGib.Abz.Abz.Gib.Can.T.T.Abz. Abz.CanOH-3'.

Molecular Techniques and Tools

27.25

Sequence of polynucleotide segment to be synthesized: 5' G-G-A-A-G-C-T-T 3' ib

ib

5' mmt-G -OH + HO-PO3-G acetyl 3' TPS-mediated condensation ib

ib

5' mmt-G .G -acetyl 3' NaOH acetyl group removed ib

ib

bz

5' mmt-G .G -OH + HO-PO3-A -3'acetyl TPS-mediated condensation ib

ib

bz

bz

bz

ib

5' mmt-G .G .A -acetyl 3' Other blocks can be synthesized likewise ib

ib

an

5' mmt-G .G .A .A .G .C .T.T-acetyl 3' Removal of protective groups 5' G-G-A-A-G-C-T-T 3' Figure 27.13 Procedure for chemical synthesis for oligonculeotide fragments. Di- and tri-nucleotide fragments synthesized in this manner are joined together as shown in Figure 27.14

Figure 27.14 Stages in the synthesis of a piece of double helix

Essentials of Molecular Genetics

27.26

Small sections of DNA are synthesized chemically, nucleotide by nucleotide, so that some nucleotides in one section are complementary to those in other section thereby forming ―overlapping fragments‖. Fifteen oligonucleotides, ranging in length from 5-20 nucleotides were synthesized. From these 15 single-stranded oligonucleotides, three double-stranded fragments A, B and C were synthesized. Each of these three fragments had one or two single-stranded termini in a reaction mixture by complementary base pairing. The joining enzyme, polynucleotide ligase, then chemically binds together adjacent segments. Other pairs are synthesized in a similar way and then they are all joined together to form a DNA double helix 77-nucleotide long of which one strand is complementary to the nucleotide sequence in the tRNA molecule. Following this procedure, synthesis of dsDNA corresponding to alanine tRNA of yeast was completed by Khorana‘s group. Finally, the protective groups are removed. Now machines have been invented that automate the synthesis of oligonucleotides at the rate of one unit per half an hour. Some examples of organo-chemical synthesis of gene are alanyl tRNA in yeast (Agarwal et al. 1970); human hormone somatostatin (Itakura 1977); proinsulin (Gilbert 1978); human insulin A and B (Geoddel et al. 1979) tyrosine tRNA precursor in Escherichia coli (Khorana 1979); N-deacetylthymosin (Wetzel et al. 1980); IFN- (Edge et al. 1981); human leukocyte inteferon (Houghton et al. 1980); and pre-proinulin (Brousseau et al. 1983; Narang 1984).

In Vitro DNA Synthesis using DNA Template Arthur Kornberg focused all his efforts on in vitro DNA synthesis, using a template. To find the crucial enzyme in broken cell extracts from E. coli bacteria, he added ATP, plus the appropriate nucleotides, tagged with radioactive isotopes to trace their incorporation into the nucleic acid chain, and then added DNA as a primer for the chain. It took many months to achieve a reliable trace of the synthesis with radioactive thymidine, so that the enzyme's activity could be traced (Kornberg et al. 1956). Next, Kornberg had to isolate and purify the DNA assembling enzyme, which he named DNA polymerase, from the bacteria cell extract, separating it out from all the other proteins (including many enzymes that interfere with the synthesis) using a wide range of procedures. Within a year, Kornberg was able to synthesize DNA from a variety of sources with this polymerase. Thus, Kornberg (1957) isolated a DNA polymerase enzyme from E. coli that could be used for the in vitro synthesis of DNA. Arthur Kornberg was awarded Nobel Prize in 1959 for in vitro synthesis of polydeoxyribonucleotides. Reaction system used in synthesis of nucleic acids is given in Figure 27.15. If even one of the nucleotides is absent, no DNA synthesis will occur. n = dA-(P)~(P)~(P) n = dT-(P)~(P)~(P) n = dG-(P)~(P)~(P) n = dC-(P)~(P)~(P) + Mg++

Template DNA Kornberg Enzyme

dA-P dT-P template DNA-n- dG-P dC-P~P~P + (4n-1)PP Figure 27.15 Reaction system used in synthesis of nucleic acids

Molecular Techniques and Tools

27.27

Kornberg enzyme acts by hooking together the free added nucleotide units in a DNA strand. Using different phosphodiesterase enzymes (that could break the phosphate sugar ester bonds of the DNA chain) it was shown that new DNA molecule grows by addition of nucleotides to the hydroxyl group at 3' carbon position of the deoxyribose sugar at one end of the chain. There was observed a difference in polarity of complementary strands of the DNA helix, so that sugars of one strand were oriented in a direction opposite to that of the other. DNA template is considered as a master copy upon which synthesis of DNA occurs. The necessity of template DNA for in vitro DNA synthesis is not absolute. In absence of template, and at the end of a long lag period, the polymerase enzyme will nevertheless synthesize polynucleotide chain of DNA that appears to be normal double helix. In this fashion chains of polymers of synthetic DNA, e.g., poly(dA.T) and poly(A.BU) (BU = bromouracil) have been manufactured. The synthetic DNA polymers were used to confirm semi-conservative mode of replication of DNA. In vitro synthesis of biologically active DNA is possible. Goulian et al. (1967) were able to replicate X174 DNA in a test tube. The scheme used by these workers is given in Figure 27.16. They took tritium-labeled (3H) X174 plus (+) strand (parental) and used it as a template for synthesis of a minus (–) strand. In minus (–) strand, the usual T was replaced by its heavier analog 5-bromouracil and 32 P. This was done to differentiate minus (–) strands from parental plus (+) strands. The minus (–) strand was allowed to form a circular DNA molecule with DNA ligase. The double helices were briefly exposed to DNase and then denatured by heat to allow isolation through centrifugation of heavier circular minus (–) strand. Again a reaction mixture was prepared with DNA polymerase and ligase; this product has completely synthetic DNA molecule just like natural replicative form. Treatment with DNase permitted separation of newly synthesized (+) strands, which were later shown to be fully infective when exposed to bacterial protoplasts. Thus the progeny of such infections were normal X174 viruses.

In Vitro Synthesis of DNA from mRNA Temin and Mizutani (1970) and Baltimore (1970) independently discovered the presence of RNAdirected DNA polymerase (reverse transcriptase) enzyme which exhibited the ability of synthesizing DNA on a RNA template. This enabled molecular biologists to synthesize complementary DNA using mRNA template. If mRNA transcribed from a specific gene is made available in purified form the complementary DNA (cDNA) synthesized with its help will represent the synthesized gene. By copying eukaryotic purified mRNA, several genes have been artificially synthesized. Most important of these genes are the genes for sea urchin histone proteins, ovalbumin gene in chicken and globin gene in mammals. The gene synthesized as cDNA from a β-globin mRNA was inserted into a plasmid in order to study its behavior. To what extent this synthesized gene is a faithful copy of native gene will depend on the fidelity of copying mRNA and the stability of DNA thus synthesized. Moreover, since mRNA of a gene doesn‘t have the complete transcript of a gene in vivo, the synthesized gene will be smaller than the gene in vivo as this gene lacks any additional regulatory sequences absent in mRNA but present in globin gene in vivo. This type of synthesized gene will also lack intron sequences found in eukaryotic split genes. These genes have already become very important tool in molecular biology experiments.

Synthesis of DNA (Gene) from a True Precursor tRNA Before Khorana could complete the synthesis of gene for yeast alanine tRNA in 1970, it became obvious that tRNA was not the first direct product of transcription. Instead, a precursor molecule (pre-

27.28

Essentials of Molecular Genetics

Figure 27.16 Procedure for synthesis of biologically active DNA (Redrawn, with permission, from Goulian, M., A. et al. 1967. Proc. Natl. Acad. Sci. USA 58: 2321-8)

tRNAAla) is first synthesized which subsequently, after losing segments of RNA by cleavage, give rise to tRNA. Obviously, therefore, the actual gene for yeast alanine tRNA was longer than the DNA duplex synthesized by H.G. Khorana and his team. In view of this, Khorana subsequently initiated synthesis of a gene for E. coli tyrosine suppressor tRNA precursor. DNA duplex which will give rise to this tRNA precursor was synthesized in the form of 26 small (oligonucleotide) segments. These segments were then arranged into six DNA duplex fragments having single-stranded ends. These six fragments gave rise to presumed gene for E. coli tyrosine suppressor tRNA precursor. This gene, however, still lacked promoter region and other sequences essential for processing. Later, in 1979, Khorana and his team reported completion of the total synthesis of a biologically functional tyrosine suppressor tRNA gene carrying all the regulatory sequences as well. Gene was 207bp long having 51-bp long DNA promoter region, 126-bp long DNA corresponding to the precursor tRNA and 25-bp long duplex region which includes 16-bp natural sequence that includes EcoR1

Molecular Techniques and Tools

27.29

endonuclease-specific cleavage site. The complete gene was cloned in a vector. On transformation of E. coli, the phage could multiply with the cloned gene.

GENE SYNTHESIS MACHINES Discovery of gene synthesis machines revolutionized this field (Caruthers 1985). Genes can now be synthesized rapidly. For example, synthesis of 77-bp gene for tRNAAla in yeast, which took H. G. Khorana and his co-workers more than 25 years, can now be synthesized within a matter of days with the help of gene synthesis machines. The key innovations which made the availability of gene synthesis machines possible include (a) the development of silica-based supports which are insoluble and provide support for the solid phase synthesis of DNA chains and (b) development of stable deoxyribonucleoside phosphoroamidates as synthons which are remarkably stable towards oxidation and hydrolysis and thus are ideal for DNA synthesis. Several versions of gene machines are now available. These machines, under the control of microprocessor, synthesize specific short sequences of single-stranded DNA automatically. The desired sequence is entered in the keyboard and the microprocessor automatically opens the valves of the containers of the successive nucleotides, reagents and solvents needed at each step, into a synthesizer column, which is packed with tiny silica beads. These beads provide support on which DNA molecules are assembled (Figure 27.17).

Synthesis of Gene Using PCR The DNA/RNA Synthesizer provides a complete and automated procedure for the synthesis of DNA sequences. Each base unit is added in a 30minute cycle, permitting a tetradecamer to be constructed in 6½ h (Alvarado-Urbina et al. 1981). The methods for gene synthesis utilized Figure 27.17 Gene synthesis machine (Redrawn, with permission, earlier, whether manually or through from Gupta, P.K. 2009. Genetics. Meerut: Rastogi Publi.) automatic synthesizers, involved the following steps: (1) synthesis of oligos; (2) annealing to give duplexes with single-stranded cohesive or sticky ends; and (3) ligation of duplexes to obtain complete gene. This strategy was initially used by Khorana and was subsequently followed by many. Even while using automatic synthesizers, only the oligos are synthesized on the synthesizer, but the production of gene from these oligos is achieved later

27.30

Essentials of Molecular Genetics

following the usual procedure. Crude synthetic oligonucleotides were sometimes also used for rapid generation of DNA fragments through polymerase chain reaction (PCR) in all these procedures oligos are first phosphorylated at the 5′ ends, overlapping ends annealed, gaps filled by enzymatic extension at the 3′ ends and finally nicks were joined with the DNA ligase. The full length double-stranded DNA could be cloned in a plasmid/phage vector and multiplied in E. coli. Alternatively, it could be amplified by PCR, separated from gel, purified from gel and cloned. The above steps of phosphorylation, annealing, ligation and cloning could be dispensed with, in some of the recent methods of gene synthesis. For instance, overlapping oligonucleotides could be extended through several rounds of PCR. Full length double-stranded DNA could then be amplified using short primers complementary to 5′ ends of the two strands.

Automated Solid-Phase Technique for DNA Synthesis DNA can be synthesized by automated solid-phase technique, just like the polypeptides. A monomer is attached to a insoluble column, then activated monomers are added in the order of the sequence interested to form an oligonucleotide. The activated monomers are deoxyribonucleoside 3′phosphoramidites. The meaning of activated is that the reactive group, hydroxyl group on the fifth carbon (5′) is replaced by dimethoxytrityl (DMT). It keeps the deoxyribonucleoside 3′phospohramidite molecule away from any further reaction on 5′-end. Thus, it was said to be a protecting group. The 3′-phosphoryl group is protected by a β-cyanoethyl (βCE) group. Since it is the 3′-phosphoryl group that is attached to the insoluble support, the chain is synthesized in 3′ to 5′ direction, meaning thereby that the 5'-hydroxyl group performs a nucleophilic attack on the 3′phosphoramidite of the incoming monomer. It is different from the regular 5' to 3' synthesis of natural DNA and RNA, where the 3′-hydroxyl group attacks the inner most phosphorus atom of 5′triphosphate. The synthesis of oligonucleotide consists of following three basic steps. Step one. The procedure starts with deoxyribonucleoside 3′-phosphoramitide (with DMT that was already added to protect the 5′ and a βCE that was added to protect 3'-phosphoryl) react with a growing chain to form a phosphite triester group through coupling. This coupling reaction is driven by the nucleophilic hydroxyl group on the 5′ carbon of the nucleoside that is to be added attacking the electrophilic phosphorus atom on the existing chain with the -NR2, often diisopropylamine (a relative stable secondary amine), as a leaving group. The reaction is carried out in an anhydrous environment to avoid water hydrolyze the 3′-phosphoramidte causing the reaction go backward unexpectedly. Once the water attacks the 3′-phosphoramidite, it is no longer reactive in receiving the nucleophilic attack from the 5′-OH of the growing chain. Step two. Using Iodine (I2), the phosphite triester group is oxidized to a phosphotriester group. Step three. The DMT group on the 5'-side of the activated monomer is removed by adding dichloroacetic acid (DCA, CHCl2COOH), while the rest of the molecule remains unchanged. The oligonulceotide is now extended by one monomer unit and is ready to react with another incoming activated monomer. When the oligonucleotide of the desired length is synthesized, the final product can be obtained by adding ammonia (NH3) to remove all the protecting groups and remove the product from the insoluble support. Because no synthesis is perfect, not all of the growing oligonucleotide will react with the added monomers every time. Some oligonucleoides will be shorter than the other, for some nucleotides are missing. Thus, these were considered to be impurities as they do not have the exact sequence of the gene we wanted. The final product is the longer one. The mixture of newly synthesized oligonucleotides can be separated by gel electrophoresis to get the desired product. After all the synthesis steps, the mixture of product is then put into concentrated ammonium hydroxide, NH4OH, for an hour at room temperature. Next, the mixture with ammonium hydroxide is placed into an ice-bath and transferred to a vial that has a screw-cap afterwards. In order to remove the

Molecular Techniques and Tools

27.31

heterocyclic base protecting groups, the solution in a vial is heated at 55 °C overnight. Next day, the solution would be cooled in an ice-bath and evaporated off to dryness. DNA synthesizers have been used to synthesize millions of high quality oligonucleotides across a broad range of applications. Oligonucleotide synthesizers are flexible enough to fit a wide variety of DNA synthesis applications including gene synthesis, PCR, dual labeled probes, microarrays, sequencing, and antisense experiments.

DNA FINGERPRINTING DNA fingerprinting, also known as DNA profiling, is a method of establishing the identity of an unidentified body by tracing and matching ―signatures‖ peculiar to an individual of a species. DNA is a highly stable biochemical molecule that does not lose its characteristics over millennia. Climate and immediate atmosphere can lead to bacterial contamination of a cell causing a certain extent of degradation. The chances of positive identification, however, are not entirely lost. This was proven when the method was used to draw DNA samples of mummies recovered from Egyptian tombs, or even dinosaurs. While the period from the time of a person‘s demise is not the most important factor, the quantity of sample collected is of vital importance. However, through polymerase chain reaction (PCR), even minute quantity of DNA can be amplified. DNA fingerprinting technique is useful in establishing near-perfect identity of an unidentified body. It is based on the fact that every organism is conferred with DNA variations in form of variable-number-of-tandem-repeats (VNTRs). The DNA profile of an individual is also known as genetic signatures. Accuracy of this method, 1-in-75 billion error probability ratio, is egging the scientific community and forensic experts to push it as an infallible test. DNA fingerprinting using microsatellites, also called simple-sequence repeats (SSRs), has also been utilized both in plants and animals (Jeffreys et al. 1985; Burke et al. 1991; Jeffreys 2005). Except in case of identical twins, no two human beings share the same DNA sequence. Any cellular material such as hair, blood or semen that is left at the scene of crime can potentially serve as a source of DNA bar-code analysis. The analysis allows a positive identification of an individual‘s barcode pattern and a comparison of an unknown specimen with known specimen in an effort to determine whether they have a common origin. If this process receives continuous judicial acceptance, it could be single greatest advancement for criminal law courts towards convicting the guilty and acquitting the innocent. With a few exceptions, composition of a person‘s DNA does not vary from cell to cell. The area of DNA bar-code is focused on the polymorphisms or the junk DNA. Not all parts of the DNA structure vary among individuals. The segments that do vary are known as restriction fragment length polymorphisms (RFLPs). Scientists measure DNA size distinctions through the RFLP process. Polymerase chain reaction (PCR) can be thought of as molecular photocopying. PCR itself does not analyze DNA rather it makes possible the application of other techniques where only minute biological samples are available. PCR allows a scientist to take a sample of what ordinarily would be insufficient DNA to assess and reproduce enough DNA copies for examination by a number of technologies, including RFLP. It is possible to identify a person by his or her genetic structure. The ideal sample size should be at least the size of half of a sugar cube. A good source of tissue sample would come from the spleen, bone marrow or muscle of the thigh or upper arm. Where the body is in more advanced stage of decomposition, the best samples are from spongy bones and the best choices are the rib bones. A few hair roots or a small sample of blood, buccal smear, semen spots or skin tissue left behind by the criminal is sufficient to obtain genetic fingerprints. These can be compared with that of the suspect for confirming the rape charge beyond any doubt. Vaginal swabs are particularly suited for DNA analysis because they are generally derived and stored frozen. Blood stains stored under favorable conditions have been successfully analyzed upto three years and semen stains

27.32

Essentials of Molecular Genetics

up four years after collection. It is possible to separate the DNA of male origin when semen is mixed with vaginal secretions. A vaginal swab taken upto 20 h after intercourse can be used to isolate DNA from sperm. Experts maintain that blood transfusion does not cause change in one‘s DNA. The white blood corpuscles, the source of donor DNA, are present in very low quantity in most of the commonly transfused blood components. In case of semen samples, experts feel that in case of vasectomized individuals, it is still possible in some circumstances to get a DNA pattern from semen sample but chances are considerably reduced for getting a DNA fingerprint in such cases.

Applications of DNA Fingerprinting DNA fingerprints are used in pedigree analysis and establishing paternity and maternity. The patterns are so specific that half of the DNA fragments (RFLPs) will be common with those of the father and half with those of the mother. Figure 27.18 shows DNA fingerprints (profile) used in a paternity dispute. RFLP pattern of father (A), mother (B) and two children (C and D) are shown here. The profiles show that the child C had inherited one DNA band from each parent but the child D has one band inherited from the mother and an unrecognized band different from the father. A is therefore not the father of the child D. DNA profiling is nowadays used by immigration authorities in establishing family relationship. The technique has a considerable potential in forensics. DNA fingerprints can be used for personal identification. DNA fingerprints can also be used for diagnosing inherited disorders like cystic fibrosis, hemophilia, Huntington‘s disease, Familial Alzheimer‘s, thalassemia, sickle cell anemia and many others in prenatal and newly born babies. Figure 27.18 DNA fingerprints (profile) used in DNA fingerprinting is also applicable to animals for a paternity dispute. RFLP pattern of father, livestock breeding and in plants for authentication of mother and two children are shown here seeds and germplasm. There is a need to have stringent quality control checks in DNA fingerprinting laboratories worldwide. Identification of mutilated dead bodies is possible from their tissue remnants with the help of DNA fingerprints of close relatives. DNA fingerprinting can be used in social security record identification. In solving murder cases by DNA fingerprint analysis blood swabs taken from the murder weapon used and found in the possession of the accused‘s clothes, and comparing the DNA fingerprints of the victims; the two should be identical. In sexing biological samples, DNA fingerprinting can be used by in situ hybridization with Y chromosome-specific probes. In detecting specific bands in close linkage with disease loci in large pedigrees, and marker loss in tumors, which then could be used for diagnosis purpose: this may help in identifying and cloning the defective gene. DNA fingerprinting of mammalian cell cultures is used to confirm genetic homogeneity or detect contamination, verification of cell hybrids and monoclonal cell lines. DNA fingerprinting can be used in identification of post-natal cell population. DNA fingerprinting can be used in providing assurance in livestock breeding and hence can be useful in animal breeding programs. DNA fingerprinting can be used for authentication of seed stocks and germplasm. DNA fingerprinting can be used in demographic studies of animal populations.

Molecular Techniques and Tools

27.33

DNA MARKERS Molecular markers include biochemical constituents like secondary metabolites in plants and macromolecules like proteins and DNA. Properties desirable for ideal DNA markers include highly polymorphic nature, preferably codominant inheritance, frequent occurrence in genome, neutral behavior to environmental conditions and management practices, easy and fast assay, high reproducibility and easy exchange of data between laboratories. Depending on the type of study to be undertaken, a marker system can be identified that fulfills at least some of the above characteristics. DNA markers are based on the following molecular mechanisms: (a) single-base substitution in the restriction sites or PCR priming sites (b) arrangement within the DNA intervening the two restriction sites or PCR priming sites (c) error in replication of arrays of tandemly repeated DNAs and (d) mutation in DNA sequence. There are following different types of DNA markers:

Restriction Fragment Length Polymorphisms (RFLPs) The genetic variation in different species/populations/genotypes may be due to inversion, translocation, deletions or transpositions. These natural variations in DNA sequence can be detected in several ways. The one of these methods is to use a special class of enzymes called restriction endonucleases. So when a cloned DNA fragment being used as a probe is hybridized with the DNA extracted from two different individuals and they show banding at different sites and their hybridized bands appear at 2 different places on the gel; then two individual are said to be polymorphic. RFLPs are based on Southern blotting (Southern 1975) (Figure 27.19) and the term RFLP is used because this

Figure 27.19 Method of RFLP analysis

27.34

Essentials of Molecular Genetics

technique is based on the restriction enzymes which cleave the DNA at specific restriction sites, which recognizes specific sequences of DNA and cleave the DNA adjacent to those sites (Botstein et al. 1980). It comprises of the following steps: (1) RFLP analysis begins with a blood sample. (2) DNA is extracted from nuclei of white blood cells and digested with a restriction endonuclease. (3) The resulting DNA fragments are separated by gel electrophoresis which separates them according to size. (4) The RFLP is then detected by Southern blotting. First DNA in the gel is heated to denature (the two strands of DNA separate) and is blotted onto a nylon membrane. (5) A probe, a radioactively labeled segment of single-stranded DNA that is complementary to the RFLP locus is applied to the membrane. (6) The probe hybridizes with the fragment carrying the locus. A sheet of X-ray film is placed over the membrane detects the radioactively tagged fragment and reveals the RFLP. (7) In RFLP analysis, DNA samples from several individuals are often analyzed at the same time. RFLP analysis is based on two assumptions: (1) All the bases are present in equal amount in a genome. (2) Random distribution of the bases in the whole genome. On the basis of these assumptions, we can work out the expected molecular weight of the fragments and the average of probes showing polymorphism. Salient characteristics of RFLPs are: lack of dominance; multiple allelic form; lack of pleiotropic effect on agronomic traits; co-dominant mode of inheritance; no measurable effect on phenotype; and no effect of environment on their evaluation. All these characteristics make RFLPs extremely useful in genetic improvement programs. There are some limitations of RFLP analysis. Large amount of DNA is required for restriction/digestion and Southern blotting. Assay is time consuming and labor intensive. The requirement of radioactive isotope makes the analysis relatively expensive and hazardous. Satellites and highly repetitive DNA sequences are inaccessible with naturally occurring probes usually. Also, RFLP analysis is unable to detect the single-base change. RFLPs were actually used as genetic marker for diagnosis of human diseases/disorders. The use of RFLPs as genetic markers in plant genetic analysis could also offer an alternative approach to specific problems which previously have been unamenable to conventional techniques. RFLPs have potential usefulness in basic plant genetic studies. Use of RFLPs can be made for studying varietal identification, identification and mapping of both quantitative and qualitative loci, early selection or recombinant lines, etc.

Microsatellites and Minisatellites Microsatellites are tandem repeats of DNA sequences of only a few base pairs (1-6 bp) in length, the most abundant being the dinucleotide repeats (Litt and Lutty 1989). The term microsatellite was introduced to characterize the simple-sequence stretches amplified by polymerase chain reaction (PCR). These are also known as short tandem repeats (STRs) or simple-sequence repeats (SSRs) and these differ from minisatellites (often called variable-number-of-tandem-repeats (VNTRs), which are repeated sequences having repeat units ranging from 11 to 60 bp in length. The minisatellites were first reported by Jeffreys et al. (1985), though their utility through PCR was suggested later. The microsatellites are more evenly dispersed in the genome than minisatellites, which are generally confined to telomeres, e.g., A dinucleotide SSR (CA) n occurs in human genome, as many as 50,000 times, with ―n‖ ranging from 10 to 60. The tri- and tetranucleotide repeats are also common in human genome. Both SSRs and VNTRs are multilocus probes creating complex banding patterns and are usually non-species specific occurring ubiquitously. The both belong to repetitive DNA family and ginger prints generated by these probes are also known as oligonucleotide fingerprints. The DNA sequences flanking SSRs are known to be conserved in the same manner as those flanking minisatellites (VNTRs). These conserved sequences have been used for designing suitable primers for amplification

Molecular Techniques and Tools

27.35

of the SSR loci using PCR. Any such primer or a pair of primers, when used to amplify a particular SSR locus in a number of genotypes which reveal SSR polymorphism in the form of differences in length of the amplified product, each band representing an allele at that locus. The length differences are attributed to the variation in the number of repeat units at a particular SSR locus, possibly caused by slippage during replication. SSRs are codominant like RFLPs when used as sequence tagged microsatellite but dominant when used as fingerprinting markers. Many alleles exist in a population and the level of heterozygosity is extremely high. Markers are inherited in Mendelian fashion, thus can be used for linkage analysis. The polyallelic markers will be very useful for mapping both the simple Mendelian traits as well as polygenic traits in segregating populations. The most widespread of these polyallelic markers are minisatellites or the VNTR loci, which are uncovered by locus-specific probes and exhibit highly polyallelic fragment length variation. The VNTR loci are hypervariable loci because of tandem repeats. Such loci are presumably generated by unequal crossing over. Since VNTR loci are highly variable, probing for one of these VNTR loci in a population reveals many alleles. The Southern blots create a DNA fingerprint of extreme value in forensics. This technique has a greater power to positively identify individuals than using their fingerprints. An interesting and common example of VNTRs involves the number of tandem repeat loci associated with rRNA genes in rDNA concentrate at nucleolar organizing regions (NORs) of specific chromosomes of an organism. Like rRNA genes, most VNTR loci are concentrated in proterminal regions of chromosomes, and thus may not be able to provide the desired density of markers. This problem has largely been overcome in microsatellite or SSR loci. Microsatellites or SSRs represent variation in the repeat number of sequences, 1-6 bp long (VNTRs can be as long as 1 kbp), such as (AC) n or (AAC)n, etc. Being smaller in size, SSRs are more common and amenable to PCR analysis.

Expressed Sequence Tags (ESTs) These are PCR-based markers using a pair of primers. An EST is a DNA sequence from a cDNA clone that corresponds to an mRNA or a part thereof (Adams et al. 1991). It has been shown that ESTs in most of the genomes are 150-400 bp long and are useful in search of similarity and mapping of genomes. High throughput approaches have also been used for producing ESTs. The basic principle is that we go for end sequencing of random cDNA clones. For this, first we map the cDNA clones as RFLP markers and then partially sequence them to convert them into PCR-based markers. Thus these are live STS markers (i.e., sequence tagged sites). ESTs are being generated in several animals and in a variety of plant systems. EST databases have proven to be a tremendous resource for finding genes. ESTs, which are accumulating in large numbers at a fast speed due to current emphasis or functional genomics, can be converted into STS markers and tested for polymorphism. SSRs have been developed from ESTs in human beings.

Sequence Tagged Sites (STSs) STS is a short unique sequence that identifies one or more specific loci, which can be amplified through PCR (Olson et al. 1989). Each STS is characterized by a pair of PCR primers, which are designed by partial end sequencing of RFLP probe (including genomic DNA + cDNA probes), representing a mapped low copy number DNA sequences. These primers (generally 20-mers) are then used for amplifying specific genomic sequences using PCR. Here the limitation is that the level of polymorphism is reduced than the corresponding RFLP marker. STS markers have been developed for RFLP markers linked with bacterial blight resistance genes Xa5, Xa13, Xa21, in rice and stem rust resistance gene in barley.

27.36

Essentials of Molecular Genetics

Sequence Characterized Amplified Regions (SCARs) These markers overcome the limitation of RAPDs. Here the RAPD fragments that are linked to a gene of interest are cloned and their ends are sequenced. Based on the end sequencing, 20-mer primers are designed; which lead to a more specific amplification of particular locus. These are similar to STS markers in their construction and application. Both are dominant markers, i.e., presence/absence of bands reveals the polymorphism.

Single-Strand Conformation Polymorphism (SSCP) This technology allows the detection of polymorphism due to differences of one or more base pairs in the PCR products. The technique relies on the secondary structure being different for single-strands derived from PCR products that differ by one or more nucleotides at an internal site within the strand. In order to detect such differences, PCR products are devalued, and electrophoretically separated in neutral acrylamide gels. PCR products that do not differ in fragment length have been shown to exhibit SSCP in several studies. Here sequence information is required for designing the primers and then radioactive labeling and autoradiography are used to detect SSCP variants; this makes this technique relatively unsuitable for routine mapping/tagging studies.

Random Amplified Microsatellite Polymorphism (RAMP) Here two primers are used, one is SSR-anchored primer and another is RAPD primer. The amplified products resolve length polymorphism that may be present either at the SSR target site itself or at the associated sequence between the two primer binding sites. The RAPD primer binding site actually serves as an arbitrary end point for SSR-based specific amplified sequence. The amplified product may also be digested with a restriction enzyme, to further resolve the polymorphism, thus named as digested RAMP (dRAMPs) as in barley. The merit of this procedure is that we use undigested total genomic DNA as template instead of pre amplified restriction enzyme digested DNA as in AFLP. In SCARs, the amplified product obtained can be digested with the restriction enzyme to further resolve the polymorphism, thus giving a new marker, viz., cleavage amplified polymorphism sequence (CAPS).

Random Amplification of Polymorphic DNA (RAPD) RAPD is pronounced "rapid". It is a type of PCR reaction, but the segments of DNA that are amplified are random. The scientist performing RAPD creates several arbitrary, short primers (8-12 nucleotides), then proceeds with the PCR using a large template of genomic DNA, hoping that fragments will amplify. By resolving the resulting patterns, a semi-unique profile can be gleaned from a RAPD reaction. No knowledge of the DNA sequence for the targeted gene is required, as the primers will bind somewhere in the sequence, but it is not certain exactly where. This makes the method popular for comparing the DNA of biological systems that have not had the attention of the scientific community, or in a system in which relatively few DNA sequences are compared (it is not suitable for forming a DNA databank). Because it relies on a large, intact DNA template sequence, it has some limitations in the use of degraded DNA samples. Its resolving power is much lower than targeted, species-specific DNA comparison methods, such as short tandem repeats. In recent years, RAPD has been used to characterize, and trace, the phylogeny of diverse plant and animal species.

Molecular Techniques and Tools

27.37

Amplified Fragment Length Polymorphism (AFLP) This technique is a combination of RAPDs and RFLPs. AFLP is based on PCR amplification of a set of restriction fragments, selected from a pool of fragments, that are generated due to digestion with a pair of specific restriction enzymes, one of them being a fragment cutter, e.g., MseI and the other being a rare cutter, e.g., EcoRI (6 bp) (Vos et al. 1995). To facilitate designing of primers for selection of restriction fragments, oligonucleotide adapters which are few (≈20) bp long, ligated at the ends of these DNA fragments (e.g., MseI and EcoRI adapters). Sequence of the adapter including the sequence of adjacent restriction sites serve as primer binding sites for subsequent amplification of the restricted/ligation fragments. 1/2/3 selective nucleotides are added to the 3′ ends of the PCR primers which therefore can recognize only a subset of the restriction sites. Only restriction fragments in which the nucleotide flanking the restriction sites match to the selective nucleotide will be further amplified. The ligation of adapters restricts the number of DNA fragments to be amplified, since the primers are designed to bind to the sequence of ligated adapters + restriction of enzyme + 1-3 selective bases chosen randomly. Basic steps involved in this technique are restriction of genomic DNA and ligation of specific primers followed by selective amplification of sets of restriction fragments and finally the electrophoretic gel analysis of the amplified fragments. For restriction of genomic DNA, we use a set of two enzymes, one frequent and other rare cutter (MSeI and EcoRI). Once we cut the genomic DNA with these two enzymes completely (complete digestion) we expect three type of fragments – EcoRIEcoRI , MseI-MseI and EcoRI-MseI. When two enzymes are used together, these enzymes generate optimal sized DNA fragment, which get amplified well and are in the optimal size range for separation on denaturing PAGE. Also, number of fragments to be amplified is reduced, by using two enzymes since only those fragments are preferentially amplified that have EcoRI site sequence on one side and MSeI on other side, i.e., EcoRI-MseI. Advantages of using AFLPs are that in this case no prior sequence knowledge is required but still can detect 16-fold greater loci and 8-times more polymorphic bands, as compared to RFLP. AFLP can detect the polymorphism in multiple loci within a short period of time. AFLPs are relatively easy and inexpensive to generate. Also AFLPs are able to detect small sequence variation and require small amount of initial DNA. However, there is a disadvantage of using AFLPs. It is difficult to handle the gel, as large number of bands is to be reported. Scoring of bands is difficult.

Single-Nucleotide Polymorphism (SNPs) This class of markers has been referred to as the mother of all the markers. These are the recently discovered and most frequent markers in any genome (Rafalski et al. 2002). These are often pronounced ‗snips‘, which represent the sites, where DNA sequence differs by a single base. These are also referred to as simple-nucleotide polymorphism (SNP). Some workers also include indels (insertions + deletions) and other sequence variations in this class. In human genome, SNPs have been shown to be the most abundant with an estimated average frequency of one SNP per kilobase pairs, i.e., approximately 3,000,000 SNPs in the entire human genome. We can have gel-based assays as well as nongel-based assays for SNPs. The gel-based assays are time consuming and expensive also; therefore, nongel-based assays are being used. The disadvantage of SNPs lies in their complex assay and detection. Being biallelic as against polyallelic SSRs could be their disadvantage, but their abundance makes them more attractive. Also certain evidences are available for their higher stability and higher relative fidelity of their inheritance than SSRs and AFLPs. These advantages led to rapid development of a number of methods for SNP detection, leading to construction of a human SNP

27.38

Essentials of Molecular Genetics

genetic map. Recently, in crops like barley, wheat and maize, SNPs has been discovered in large number. The principle used is the capacity to distinguish a perfect match from a single base mismatch. PCR products are assayed of for SNP detection. Gel-based assays The presence of SNP can be detected by RFLP or AFLP conducted on PCR products, when ever such an SNP generates or destroys a specific restriction site for an enzyme. SSCP can also be used for the detection of SNPs. However these gel-based assays have recently been taken over by several other nongel-based assays due to their ease in use, and high efficiency. Nongel-based assays Several nongel-based methods are available for SNP detection. If SNP is present at 3′-end of a PCR primer binding site, it can be detected simply by the failure of amplification due to mismatch between the primer sequence and the binding site in the template, although it may be difficult to distinguish this failure of PCR amplification due to SNP from the PCR amplification failure due to other reasons. The common, non-gel-based assays for detection of SNPs at the internal sites are based on the detection of mismatch between the PCR product and an oligonucleotide used as a probe. The common methods used are Taqman assay and molecular beacon. Taqman assay. Here an oligonucleotide probe is labeled with fluorescent reporter molecule at 5´-end and a quencher molecule at 3′-end. This probe is called Taqman probe because it‘s degraded by endonuclease activity of Taq polymerase enzyme. After the digestion of PCR, the reporter is released, leading to a rise in fluorescent signal. However, when due to the presence of an SNP, the probe mismatches with the template leading to a failure in duplex formation, no such degradation occurs, no reporter is released, and no rise in fluorescent signals occur. Different combinations of reporters and quenchers, will also allow multiplexing so that as many as 6 SNPs can be scored in a single PCR reaction. Molecular beacon assay. In this assay, an oligonucleotide probe (molecular beacon) is used, which is made of the target SNP sequence, with its two ends being complementary to each other. The two ends of this molecular beacon are labeled just like the oligonucleotide probe used earlier, i.e., 5′-end with reporter and 3′-end with quencher. The probe when fails to form a duplex with the template DNA due to presence of SNP, leads to the formation of hairpin structure due to self annealing of its two ends, thus quenching the reporter. But when the probe, anneals with the template, it gets linearized, thus separating the reporter from quencher and permitting fluorescence signal (as digestion will occur in PCR). These fluorescent signals can be detected by appropriate sensing devices.

Random Amplified Hybridized Microsatellites (RAHM) A novel strategy was developed to combine the several advantages of oligonucleotide fingerprinting with RAPD-PCR and microsatellite Primed-PCR (MP-PCR). In this approach, the genomic DNA is amplified with either a single arbitrary 10-mer primer as in RAPD or with a microsatellite complementary 15-mer/10-mer primer as in MP-PCR and the PCR products are electrophoresed, Southern blotted and hybridized to 32P or digoxigenin-labeled microsatellite probe, e.g., (CA)8/(GA)8 (GAC)5, etc. Subsequent autoradiography reveals reproducible, probe-dependent fingerprints, polymorphic at interspecific level. This provides for speed of the assay along with high sensitivity, so that high level of polymorphism is detected. This is termed as random amplified hybridized microsatellites (RAHM).

Molecular Techniques and Tools

27.39

Molecular markers are now well established as tools in plant breeding and genetics. They have also provided a major new impetus to plant breeding programs offering considerable improvements in the efficiency and sophistication of breeding. Their use as research tools is also well developed and they have played a key role in improving our understanding of genome organization, structure and behavior for many of our major crops. The key developmental challenges for molecular markers lie in developing now breeding strategies where the objectives will be increasing the germplasm base and increasing the number of traits that can be effectively selected simultaneously. The new marker technologies that offer greatly reduced costs in marker screening and high multiplexing capabilities will be central to these developments. Essentially we will move to whole-genome-based-selection (WGBS) strategies where specific recombination events are sought and changes will be assessed on a genome wide scale. In this way we can look to better manage chromosome regions that may come from wild relatives or land races, track several traits at once and keep the population sizes as small as possible. There is a need for collaborative approach to synthesize the new class of markers called genetarget markers or functional markers like cDNA-RFLPs, cDNA-RAPDs (Gupta and Rastogi 2004). None of the molecular markers is ideal. Some markers are better for some purposes than others, but codominant markers are usually preferred. DNA tags Inspired by commercial barcodes, DNA tags provide a quick, inexpensive way to identify species (Stoekle and Herbert 2008). A small segment of DNA from the mitochondria – the same short strand for each species – is selected to use for the identification of animal species. The segment they chose comes from a gene called CO1. It contains only 648 bp of nucleic acids, making for quick reading of its DNA sequence. But the small piece varies enough from creature to creature for the differences to be distinguished from one species from another.

MICROARRAYS Micrarrays are also referred to as DNA/RNA chip technology. Nucleic acid arrays are based on hybridization of sample (labeled RNA or DNA) in solution to immobilized DNA fragments on a solid surface. The arrayed DNA fragments often come from cDNA, genomic DNA or plasmid libraries. Usually an array is designed based on specific sequence information, a process sometimes referred to as downloading the genome onto a chip. There are several variations in this basic technical theme, which as hybridization reaction may be driven by an electric field, other detection methods besides fluorescence can be used and the surface may be made of materials other than glass, such as plastic, silicon, gold, gel or membrane or even comprised of beads at the ends of fiber optic bundles. Oligonucleotide chips and cDNA chips Based on production technology and source of probe DNA, nucleic acid arrays are categorized into two types – oligonucleotide chips and cDNA chips. Oligonucleotide chips are synthesized in situ base by base using ~107 copies of selected oligonucleotides each of 20-25 nucleotide length based on prior sequence information on a 24×24 µm area of a glass surface by photolithography. This is a lightdirected synthesis strategy where a solid surface in derivatized with chemical linkers containing photolabile protecting groups. This is selectively activated by shining light through a photomask. This deprotects the 5'-OH group. The wafer is flooded with a modified nucleotide, resulting in the coupling of residues to the activated region of the chips. A second region of the chip is selectively activated with light using different photomask, allowing the coupling of 6 residues to different activated region. A repeated series of steps allows parallel oligonucleotide synthesis at many addresses. Using comple-

27.40

Essentials of Molecular Genetics

mentorial synthesis strategy a set of 4K oligonucleotides of length K-mers can be generated in 4K synthesis cycles (Lemieux et al. 1998). Oligonucleotide chips use multiple probes per gene. In case of cDNA chips, cDNA of 0.5-2 Kb are amplified by PCR and about (1 ng) of cDNA is deposited by robots at intervals of 100-300 µm either by mechanical microspotting or by inkjetting. Here, a single long double-stranded DNA probe per gene is used. In both the cases, probes are usually designed from sequences located nearer to the 3'-end of the gene near the poly(A) tail in eukaryotic mRNA and different probes can be used for different exons (Lockhart and Winzeler 2000). This system consists of chip, hybridization chamber to control hybridization conditions, a reader and software to analyze the chip data. The array is mounted in a temperature controlled hybridization chamber. A target sequence is fluorescently tagged and then injected into the chamber and left overnight where the target hybridizes to its complementary sequences on the array. Laser excitation enters through the back of the array focused at the interface of array surface and the target solution. Fluorescence emission is collected by a confocal lens and passes through a series of apical filters to a sensitive detector. A quantitative fluorescence image proportional to hybridization intensity is obtained by charged coupled device (CCD). Quantitative estimates of number of transcripts per cell can be obtained directly by averaging the signal from multiple probes. Rare mRNA transcripts of 1:100,000 can be detected (Bassett et al. 1999). A few characteristic features of microarrays are: (1) Parallelism — This method allows parallel acquisition and analysis of massive data in a single reaction. (2) Miniaturization — This step leads to less consumption of DNA probes and reagents. (3) Multiplexing — Multiple samples can be analyzed in a single assay. Labeling with multicolor fluorochrome comparison of multiple samples can be made on a single DNA chip. This removes chip to chip variation and discrepancies in reaction conditions. (4) Automation — Advanced manufacturing techniques permit the mass production of DNA chips (Gupta et al. 1999). The major drawback of microarrays is its cost and requirement for a specialized arraying robot and scanner. Also, the arrays cannot be reused, which further increases the cost. Spotted vs. in situ synthesized microarrays In spotted microarrays, the probes are oligonucleotides, cDNA or small fragments of PCR products that correspond to mRNAs. The probes are synthesized prior to deposition on the array surface and are then "spotted" onto glass. A common approach utilizes an array of fine pins or needles controlled by a robotic arm that is dipped into wells containing DNA probes and then depositing each probe at designated locations on the array surface. The resulting "grid" of probes represents the nucleic acid profiles of the prepared probes and is ready to receive complementary cDNA or cRNA "targets" derived from experimental or clinical samples. In oligonucleotide microarrays, the probes are short sequences designed to match parts of the sequence of known or predicted open reading frames. Although oligonucleotide probes are often used in "spotted" microarrays, the term "oligonucleotide array" most often refers to a specific technique of manufacturing. One technique used to produce oligonucleotide arrays includes photolithographic synthesis on a silica substrate where light and lightsensitive masking agents are used to "build" a sequence one nucleotide at a time across the entire array. Two- vs. one-channel microarray detection In single-channel or one-color microarrays, the arrays provide intensity data for each probe or probe set indicating a relative level of hybridization with the labeled target. However, they do not truly indicate abundance levels of a gene but rather relative abundance when compared to other samples or conditions when processed in the same experiment. Each RNA molecule encounters protocol and batch-specific bias during amplification, labeling, and hybridization phases of the experiment making comparisons between genes for the same microarray uninformative. The comparison of two conditions

Molecular Techniques and Tools

27.41

for the same gene requires two separate single-dye hybridizations. One strength of the single-dye system lies in the fact that an aberrant sample cannot affect the raw data derived from other samples, because each array chip is exposed to only one sample (as opposed to a two-color system in which a single low-quality sample may drastically impinge on overall data precision even if the other sample was of high quality). Another benefit is that data are more easily compared to arrays from different experiments so long as batch effects have been accounted for. Two-color or two-channel microarrays are typically hybridized with cDNA prepared from two samples to be compared (e.g., diseased tissue versus healthy tissue) and that are labeled with two different fluorophores. Fluorescent dyes commonly used for cDNA labeling include Cy3, which has a fluorescence emission wavelength of 570 nm (corresponding to the green part of the light spectrum), and Cy5 with a fluorescence emission wavelength of 670 nm (corresponding to the red part of the light spectrum). The two Cy-labeled cDNA samples are mixed and hybridized to a single microarray that is then scanned in a microarray scanner to visualize fluorescence of the two fluorophores after excitation with a laser beam of a defined wavelength. Relative intensities of each fluorophore may then be used in ratio-based analysis to identify up-regulated and downregulated genes.

RESTRICTION ENDONUCLEASES Nucleases are those enzymes which cleave nucleic acids. Nucleases are of various types. Some deoxyribonucleses (DNases) cleave double-stranded DNA while some others cleave single-stranded DNA molecules. DNase may act as endonucleases (enzymes that cleave DNA from within) or exonucleases (enzymes that cleave DNA at ends). Restriction endonucleases cleave DNA only at specific sites. Ribonucleases (RNases) cleave RNA molecules. C.B. Anfinsen, S. Moore and W.H. Stein were awarded Nobel Prize in 1972 for discovery of chemical structure and activity of enzyme ribonuclease (Anfinsen et al. 1961; Haber and Anfinsen 1962; Epstein et al. 1963; Lin et al. 1968; Takahashi et al. 1967). Some RNases cleave double-stranded RNA while some others cleave singlestranded RNA molecules. Nucleases are one of the various types of enzymes that are universally distributed. A nuclease isolated from one organism may be different from that isolated from another organism. Various well-documented functions of nucleases are given below:  During DNA replication, DNA polymerase I acts to degrade primer through its exonuclease activity.  During the process through which a circular molecule become linear, endonuclease activity occurs which produces nicks at two specific positions of λ chromosome.  During generalized recombination in prokaryotes and eukaryotes, endonucleases produce nicks.  Endonucleases play important role in dark repair of UV-induced DNA damage. Mechanism for excision of thymidine dimer utilizes an endonuclease.  During dark repair of UV-induced damage, the 3′-phosphate group is removed by a 3′ endonuclease and then 5′ enonuclease removes 6 or 7 nucleotide long sections including the thymine dimers.  Restriction endonucleases used in host restriction recognize a specific nucleotide sequence on DNA and cleave both strands of DNA at staggered positions.  Exonuclase-3 nibbles double-stranded DNA from 3′→5′ direction thus exposing 5′ single-strands at each end. Thus it can help in transforming a linear DNA molecule into a circular DNA molecule.  An RNase chews up the excessive transcripts upto the point where poly(A) chain is to be added during RNA processing.

Essentials of Molecular Genetics

27.42

W. Arber, D. Nathans and H. Smith were awarded Nobel Prize in 1978 for discovering restriction endonuclease and its application to the problems of molecular genetics (Nathans and Smith 1975; Smith and Nathans 1973; Linn and Arber 1968; Arber and Linn 1969). Restriction endonucleases are the enzymes which recognize and cut specific nucleotide sequences in DNA, which are four- or sixnucleotide long. There are three types of restriction endonucleases – type I, type II and type III. Type I restriction endonucleases (REs) are multimers and undertake both endonuleolytic and methylation activities. It is uncertain as to what proportion of bacteria have type I system but they certainly are less common than those having type II system. Best known examples of type I REs are EcoB and EcoK, which are variants of the hsd system. Type II restriction endonucleases recognize, bind and cleave at palindromic sequences. Palindromic sequence is a sequence of letters (DAD, MADAM, MALAYALAM, REDIVIDER), words (AND MADAM DNA), phrases, nucleotides, or nucleotide pairs (GAATTC/CTTAAG) that reads the same regardless of which direction one starts from. Type II restriction endonucleases have found applicability in construction of recombinant DNA molecules. Characteristics of some type II restriction endonucleases are given in Table 27.3. These restriction endonucleases may produce cohesive (sticky) or blunt ends). Cohesive ends are used in construction of recombinant DNA molecules. Blunt ends are used to produce sticky ends of interest. Type II RE is responsible for restriction only and a separate enzyme is responsible for methylating the same target sequence. Structures of these proteins are also complicated. Best understood are the components of EcoRI system where restriction enzyme is a dimer of identical subunits and methylase is a monomer. The target sites for type II REs are the palindromic sequences of 4-6 bp. Because of symmetry, the bases to be methylated occur on both the strands. So a target site could be fully methylated, hemiTable 27.3 Characteristics of some type II restriction endonucleases (Only one strand of DNA is shown. Arrow (↓) shows the site of cut) Enzyme BamHI

Organism from which enzyme isolated Bacillus amyloliquefaciens H

BglIII

Bacillus globigii

EcoRI

Escherichia coli RY13

HaeIII

Haemophilus aegyptius

HhaI

Haemophilus haemolyticus

HindIII

Haemophilus influenzae Rd

HpaI

Haemophilus parainfluenzae

PstI

Providencia stuartii

SmaI

Serratia marcescens

SalI

Streptomyces albus G

Recognition sequence and position of cut ↓ 5'G GATCC3' ↓ A GATCT ↓ G AATTC ↓ GG CC ↓ GCG C ↓ A AGCTT ↓ GTT AAC ↓ CTGCA G ↓ CCC GGG ↓ G TCGAC

Number of cleavage sites in DNA from λ Ad2 SV40 5 3 1 5

12

0

5

5

1

50

50

18

50

50

2

6

11

6

11

6

5

18

25

3

3

12

0

2

3

0

Molecular Techniques and Tools

27.43

methylated or non-methylated. A fully-methylated site is not a target for restriction or methylation. A hemi-methylated site is not recognized by RE but may be converted by methylase into fullymethylated condition. It occurs during the perpetuation of fully-methylated DNA which on replication produces hemi-methylated DNA. This is primarily the usual mode of action of methylase in vivo. A non-methylated target site may be substrate for either restriction or modification in vivo. It is rare for unmodified DNA to survive by gaining the modification pattern of new host. Most of the type II REs cleave DNA at an unmethylated target site by cleaving one bound in each strand. It could lead to blunt ends of staggered cuts (cohesive ends). In Type III REs, three modification and restriction enzymes recognized are: EcoP1, EcoP15 and Hinf. Each enzyme consists of two subunits, R and MS. R subunit is responsible for restriction whereas MS is responsible for both modification and recognition. Modification and restriction activities are performed simultaneously. Restriction occurs 24-26 bases on one side involving staggered cuts. REs arise due to lack of methylation. In hemi-methylated site, nonmethylated adenine strand is a target for repair by mutH, mutL, mutS and uvrD. Mismatch is thus corrected. Fully methylated sequences are indistinguishable to repair system. Restriction enzymes offer a powerful tool for analysis of DNA organization, particularly in higher organisms in which a large amount of DNA in each cell has been an obstacle to investigated one particular region of DNA. If a particular piece of DNA can be identified, it can be isolated and studied. One method of identification is to develop appropriate DNA probe. A common technique for this is to purify mRNA product of a particular gene and to form a radioactive labeled copy of cDNA with the help of reverse transcriptase. The labeled cDNA is the probe for the gene from which the original mRNA was produced. By denaturing and renaturing together the cDNA and the restriction fragment of DNA containing the gene under study, the complementary strands of the two can be made to hybridize. The identity of the hybrid can be established by radioactive label contributed by the probe. E.coli RY13 from which EcoRI type II RE was isolated produces sticky ends. Streptomyces marcescens from which SmaI restriction endonuclease was isolated produces blunt ends. With the use of restriction endonucleases, the recombinant DNA molecules can be generated combining DNA segments from any two species. Over 90 restriction enzymes each with different site specificity have so far been isolated. These enzymes provide powerful tools for analysis of DNA organization, gene structure and gene regulation. Recombinant DNA molecules produced using these enzymes are playing important role in genetic engineering and transfer of genes from one organism to another bypassing the limitations imposed by biological methods (sexual cycle, lack of pollination, fertilization, etc.).

RECOMBINANT DNA TECHNOLOGY Recombinant DNA is the DNA that has been artificially created. It is DNA from two or more sources that are incorporated into a single-recombinant molecule. Smith and Wilcox (1970) identified a restriction enzyme in the bacteria Haemophilus influenzae. Subsequently, Cohen et al. (1973) formed first recombinant DNA hybrid. Recombinant DNA are molecules constructed outside the living cells by joining natural or synthetic DNA segments to DNA molecules that can replicate in a living cell, or molecules that result from their replication. They published their findings in a 1973 paper entitled "Construction of Biologically Functional Bacterial Plasmids in vitro", which described a technique to isolate and amplify genes or DNA segments and insert them into another cell with precision, creating a transgenic bacterium. Recombinant DNA technology was made possible by the discovery of restriction endonucleases. Recombinant DNA technology comprises of battery of experimental procedures used to isolate, characterize and clone the individual genes at the molecular level. The technology is based on restriction enzymes, which cut DNA into defined fragments having sticky, allowing them to be

27.44

Essentials of Molecular Genetics

inserted into a vector capable of replicating in a bacterial cell. Such a molecular hybrid is known as recombinant DNA. Using recombinant DNA technology, genes in a single step have been transformed into different plants and animals, crossing all fertilization barriers. Plant diseases are one of the major factors that limit crop productivity and enforce injudicious and large scale use of pesticides that lead to potential harmful impact on the environmental safety and human health, besides threat of pest resistance. Recombinant DNA technology has helped to protect crops against detrimental pests which pose a major challenge to fulfill the increasing demand for food. Recombinant DNA, also known as in vitro recombination, is a technique involved in creating and purifying desired genes. Molecular cloning (i.e., gene cloning) involves creating recombinant DNA and introducing it into a host cell to be replicated. One of the basic strategies of molecular cloning is to move desired genes from a large, complex genome to a small, simple one. The process of in vitro recombination makes it possible to cut different strands of DNA, in vitro (outside the cell), with a restriction enzyme and join the DNA molecules together via complementary base pairing. Recombinant DNA is a form of artificial DNA, which is engineered through the combination or insertion of one or more DNA strands, thereby combining DNA sequences, which would not normally occur together. In terms of genetic modification, recombinant DNA is produced through the addition of relevant DNA into an existing organismal genome, such as the plasmid of bacteria, to code for or alter different traits for a specific purpose, such as immunity. It differs from genetic recombination, in that it does not occur through processes within the cell or ribosome, but is exclusively engineered. The first production of recombinant DNA molecules, using restriction enzymes, occurred in the early 1970s. Recombinant DNA technology involves the joining of DNA from different species and subsequently inserting the hybrid DNA into a host cell, often a bacterium. Researchers at UC San Francisco and Stanford used restriction enzymes to cut DNA from different species at specific sites, and then fused the cut strands from the different species back together. Stanley Cohen of Stanford and Herbert Boyer of UCSF applied for a patent on recombinant DNA technology in 1974; it was granted in 1980. P. Berg in 1975 opined that international research should continue but under stringent guidelines (Jackson et al. 1972; Berg 1981; Berg 2008). He was among the first to produce a recombinant DNA molecule in 1972 (Jackson et al. 1972), wrote a letter shortly afterwards, along with ten other researchers, to the journal Science. In the letter, they urged the National Institutes of Health to regulate the use of recombinant DNA technology, and meanwhile, they urged scientists to halt most recombinant DNA experiments until they better understood whether the technique is safe. These concerns eventually led to the 1975 Asilomar Conference, where one hundred scientists gathered to discuss the safety of manipulating DNA from different species. The meeting resulted in a set of NIH guidelines. NIH has revised the document, "Guidelines for Research Involving Recombinant DNA Molecules," several times since 1976. P. Berg was awarded Nobel Prize in 1980 for discovering method for joining unrelated DNA molecules with terminal transferase. Steps Involved in formation of recombinant DNA are as under. Treat the DNA taken from both sources with the same restriction endonuclease. The restriction enzyme cuts both molecules at the same site. The ends of the cut have an overhanging piece of single-stranded DNA called ―sticky‖ or ―cohesive‖ ends. These sticky ends are able to base pair with any DNA molecule that contains the complementary sticky end. Complementary sticky ends can pair with each other when mixed. DNA ligase is used to covalently link the two strands into a molecule of recombinant DNA. In order to be useful, the recombinant DNA needs to be replicated many times (i.e., cloned). Cloning can be done in vitro (via the PCR) or in vivo (inside the cell) using unicellular prokaryotes (e.g., E. coli), unicellular eukaryotes (e.g., yeast), or mammalian tissue culture cells. Procedure for construction of a recombinant DNA molecule has been illustrated in Figure 27.20. Treat DNA from both sources with the same restriction endonuclease. The cohesive ends

Molecular Techniques and Tools

27.45

are able to base-pair with any DNA molecule containing the complementary sticky end. In this case, both DNA preparations have complementary sticky ends and thus can pair with each other when mixed. DNA ligase covalently links the two into a molecule of recombinant DNA. To be useful, the recombinant molecule must be replicated many times to provide material for analysis, sequencing, etc. Producing many identical copies of the same recombinant molecule is called cloning. Cloning can be done in vitro, by PCR.

Genetic Engineering

Figure 27.20 Construction of recombinant DNA molecule

Genetic engineering deals with isolation, synthesis, adding, removing or replacing genes in order to achieve permanent and heritable changes in plants, microbes, animals or even man. The main advantage of techniques of genetic engineering is that it bypasses all reproductive barriers. Since the invention of recombinant DNA technology, a large number of viral and non-viral methods have been developed to transfer genes from one species to another and consequently transgenic organisms have been produced with novel characteristics. It is beyond the scope of this book to discuss principles of genetic engineering and the methods here. The breakthrough genetic improvements brought out by using genetic engineering methods could have never been produced through conventional breeding methods.

QUANTITATIVE TRAIT LOCI MAPPING The improved productivity of domestic plants and animals through the collective efforts of breeders represents one of mankind's greatest achievements. For characters affected by major genes, conventional breeding procedures based on phenotypic selection have often been successful because the major genes have a large effect and thus desirable genotypes can be identified by phenotypic evaluation. However, most measures of agricultural productivity, such as size, shape, yield, and quality, are influenced by many genes (polygenes), so that traits in a population do not fall into discrete classes but show a continuous range of phenotypes. Quantitative variation in phenotype can be explained by the combined action of many discrete genetic factors, each having a rather small effect on the over-all phenotype, and the influence of environments. As a result, breeding for quantitative traits tends to be a less efficient and time consuming process. The tools available for directed genetic manipulation of quantitative traits have recently undergone a crucial revolution with the development of molecular markers. Traits which have been improved

27.46

Essentials of Molecular Genetics

largely by conventional breeding were genetically analyzed by biometrical methods in the past are now manipulated using molecular markers. Location and effect of the genes controlling a quantitative trait is determined by marker-based genetic analysis. A chromosomal region linked to or associated with a marker gene which affects a quantitative trait is defined as quantitative trait locus (QTL) (Geldermann 1975). A QTL that has large effect and can explain a major part of total variation can be analyzed genetically in the same way as major gene. The agronomic performance of crop varieties is mainly influenced by complex quantitative traits, for example, components of yield and quality. Since the development of molecular markers, it has become feasible to identify and genetically localize the contributing genetic factors as quantitative trait loci (QTLs) and to utilize these QTLs for crop improvement. This has led to an increasing number of QTL studies, involving the most agronomically important crop species. Despite successes in mapping QTLs, the relevance of this information for breeding new varieties is limited. In most cases, the QTL analysis has been carried out in crosses utilizing parents drawn from elite germplasm sources. Hence in most cases the studies have been able to identify a limited number of alleles that are already present in the mainstream breeding programs and offer little opportunity for variety improvement. In recent years, QTL research has been attracting many scientists resulting in publication of hundreds of papers. Until now, most research has focused on QTL mapping and the related theoretical problems. From the viewpoint of theory and practice, however, one should consider the whole picture of QTLs for single QTLs, multiple QTLs, separating to pyramiding single trait to trait complexes, single environments to multiple environments, and mapping to cloning. Advantages in this field could answer the following questions: How many genes are involved in genetic control of each quantitative trait in a segregating population? Can we separate closely linked QTLs into single units, and then pyramid favorable QTL alleles dispersed in genetic materials into a common background? Can we clone QTLs in the same way as major genes? One reason for wishing to locate QTLs is to provide a means of answering some fundamental questions about the genetic control of quantitative traits: How many genes are there? Are there as many genes as was suggested for yield or are there just a few important genes segregating while the rest are of little or no significance? How are these genes distributed across the genome? Are there 'hotspots' on particular chromosomes for particular traits or is there a relatively random distribution? In considering these questions, we must always bear in mind that we are solely concerned with those genes that are segregating in the crosses or populations under study. Although there are potentially hundreds of genes which might affect yield, only a small proportion of these may actually be segregating in any given instance. Quantitative genetics can say nothing about the monomorphic loci because they do not contribute to variation. Principle of mapping a QTL to a particular chromosomal region using a RFLP marker is illustrated in Figure 27.21. A hypothetical chromosome pair in the fruit fly is shown. The flies have been selected for a geotactic score. QTL1 is the locus in the high line and QTL2 is the locus in the low line. RFLP1 is homozygous in the high line and RFLP2 is homozygous in the low line. Mapping of QTLs can be done with ‗MAPMAKER‘. 'MAPMAKER' is an interactive computer package for constructing genetic linkage maps and for mapping genes underlying complex traits using those linkage maps. MAPMAKER/EXP Version 3.0 and MAPMAKER/ QTL 1.1 were issued in January 1993 by Eric S. Lander, Whitehead Institute, 9 Cambridge Center, Cambridge, MA 02142, USA. MAPMAKER, alongwith its manual, is distributed freely to both academic and corporate researchers. This software should be cited as Lincoln et al. (1992a,b). MAPMAKER/EXP performs full multipoint linkage analyses for dominant, recessive, and co-dominant (e.g., RFLP-like) markers in BC1 backcrosses F2 and F3 (self) intercrosses and recombinant inbred lines. MAPMAKER/ QTL is a companion program to MAPMAKER/ EXP which allows one to map genes controlling polygenic quantitative traits in F2 intercrosses and BC1 backcrosses relative to a genetic linkage map. The

Molecular Techniques and Tools

27.47

Figure 27.21 Mapping a QTL to a particular chromosome using a RFLP marker

manual, which includes two tutorials, describes in detail various commands commonly used. It includes a section which describes how to prepare data for use by this program. There are several limitations of QTL analysis. Normally QTLs identified are quite long and as such cannot be cloned or cannot be used practically. It is essential to map QTL precisely so that is can be cloned and used. Sometime false QTL are identified which are not only of no value but also a problem. This depends upon the level of significance and LOD score to be used. Only a little of total variation is due to the QTL with minor effects hence QTLs go undetected. In addition, if two QTLs one with positive effect and other with negative effect are closely linked then their net effect comes to be zero and QTL remains undetected completely.

REFERENCES Aagard, J., and J.J. Rossi. 2007. RNA Therapeutics: Principles, prospects and challenges. Adv. Drug Delivery Rev. 59: 75-86. Adams, M.D., J.M. Kelley, J.D. Gocayne, et al. 1991. Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252: 1651-6. Agarwal, K.L., H. Buchi, M.H. Caruthers, et al. 1970. Total synthesis of the gene for an alanine transfer ribonucleic acid from yeast. Nature 227: 27-34. Alvardo-Urbina, G., G.M. Sathe, W.-C. Lin, et al. 1981. Automated synthesis of gene fragments. Science 214: 270-4. Alwine, J.C., D.J. Kemp, and G.R. Stark. 1977. Method for detection of specific RNAs in agarose gels by transfer to diazobenzyloxymethyl paper and hybridization with DNA probes. Proc. Natl. Acad. Sci. USA 74: 5350-4. Anfinsen, C.B., E. Haber, M. Sela, and F.H. White, Jr. 1961. The kinetics of formation of native ribonuclease during oxidation of the reduced polypeptide chain. Proc. Nat. Acad. Sci. USA 47: 1309-14. Arber, W. and S. Linn, 1969. DNA modification and restriction. Annu. Rev. Biochem. 38: 467-500. Baltimore, D. 1970. RNA dependent DNA Polymerase in virions of RNA tumour viruses. Nature 226: 1209-11. Bassett, D.E., M.B. Eisen, and M.S. Boguski. 1999. Gene expression informatics – it's all in your mine. Nat. Genet. 21:51-5. Beaucage, S.L., and M.K. Caruthers. 1981. Deoxynucleoside phosphoramidites - a new class of key intermediates for deoxypolynucleotide synthesis. Tetrahedmn Lett. 22: 1859-62.

27.48

Essentials of Molecular Genetics

Bender W., P. Spierer, and D.S. Hogness. 1983. Chromosomal walking and jumping to isolate DNA from the ACE and rosy loci and the bithorax complex in Drosophila melanogaster. J. Mol. Biol. 168: 17-33. Berg, P. 1981. Dissections and reconstructions of genes and chromosomes. Science 213: 296-303. Berg, P. 2008. Asilomer 1975: DNA modification secured. Nature 455: 290-1. Bier, M. (ed.) 1959. Electrophoresis. Theory, Methods and Applications. New York: Academic Press. Bockaert, J.l., and J.P. Pin. 1999. Molecular tinkering of G protein-coupled receptors: an evolutionary success. EMBO. J. 18: 1723-29. Bohlar, C., P.E. Nielsen, and L.E. Orgel. 1995. Template switching between PNA and RNA oligonucleotides. Nature 376: 578-81. Botstein, D.R., R.L. White, M. Skolnick, and R.W. Davis. 1980. Construction of a genetic linkage map in man using restriction fragment length polymorphisms. Am. J. Hum. Genet. 32: 314-31. Bowman, B.H., F. Yang, and G.S. Adrian. 1988. Transferrin: evolution and genetic regulation of expression. Adv. Genet. 25: 1-38. Brousseau, R., R. Wu, S. Narang, and D.Y. Thomas. 1983.Synthesis of a human insulin gene, VI. Expression of the synthetic proinsulin gene in yeast. Gene 24: 289-97. Burke, T., G. Dolf, A.J. Jeffreys, and R. Wolf. eds. 1991. DNA Fingerprinting: Approaches and Applications. Basel, Berlin: Birkhäuser Verlag. Cady, N., S. Stelick, and. C.A. Batt. 2003. Nucleic acid purification using microfabricated silicon structures. Biosens. Bioelec. 19: 59-66. Cantor, C.R. 1992. Report on the Sequencing by Hybridization Workshop. Genomics 13: 1378-93. Cantor, C.R., M. Przetakiewicz, T. Sano, and C.L. Smith. 1999. Positional sequencing by hybridization. US Patent 6007987. Ceballos, C., C.A.H. Prata, S. Giorgio, et al. 2009. Cationic nucleoside lipids based on a 3-Nitropyrrole universal base for siRNA delivery. Bioconjugate Chem. 20: 193-6. Chamberlin, M.J., and P. Berg. 1963. Studies on DNA directed RNA polymerase: formation of DNA/RNA complexes with single-stranded X174 DNA as template. Cold Sp. Harb. Symp. Quant. Biol. 28: 67-75. Chu, C.Y. and T.M. Rana. 2008. Potent RNAi by short RNA triggers. RNA 14: 1714-9. Cohen, S.N. A.C. Chang, H.W. Boyer, and H.B. Helling. 1973. Construction of biologically functional bacterial plasmids in vitro. Proc. Natl. Acad. Sci. USA 70: 3240-4. Connell, C.R., C. Heiner, S.B.H. Kent, and L.E. Hood. 1986. Fluorescence detection in automated DNA sequence analysis. Nature 321: 674-9. Edge, M.D., A.R. Greene, G.R. Heathcliffe, et al. 1981. Total synthesis of a human leukocyte interferon gene. Nature 292: 756-62. Eghlom, M., O. Buchardt, L. Christensen, et al. 1993. PNA hybridizes to complementary oligonucleotides obeying the Watson-Crick base-pairing rules. Nature 365: 566-68. Eid, J., A. Fehr, J. Gray, et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323: 133-8. Epstein, C.J., R.F. Goldberger, and C.B. Anfinsen. 1963. The genetic control of tertiary protein structure: Studies with model systems. Cold Sp. Harb. Symp. Quant. Biol. 28: 439-49. Feinberg, A.P., and B. Vogelstein, 1983. A technique for radiolabeling DNA restriction endonuclease fragments to high specific activity. Anal. Biochem. 132: 6-13. Fleischmann, R.D., M.D. Adams, O. White, et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269: 496-512. Freitas, R. 1999. Nanomedicine. Austin, TX: Landes Bioscience. Gaylord, B.S., A.J. Heeger, and G.C. Bazan. 2002. DNA detection using water-soluble conjugated polymers and peptide nucleic acid probes. Proc. Natl. Acad. Sci. USA 99: 10954-57. Geldermann, H. 1975. Investigations on inheritance of quantitative characters in animals by gene markers. I. Methods. Theor. Appl. Genet. 46: 319-30. Geoddel, D.V., D.G. Kleid, F. Bolivar, et al. 1979. Expression in Escherichia coli of chemically synthesized genes for human insulin. Proc. Natl. Acad. Sci. USA 76: 106-10. Gilbert, W. 1978. Why genes in pieces? Nature 271: 401. Goulian, M., A. Kornberg, and R.L. Sinsheimer. 1967. Enzymatic synthesis of DNA. XXIV. Synthesis of X174 DNA. Proc. Natl. Acad. Sci. USA 58: 2321-8.

Molecular Techniques and Tools

27.49

Greenleaf, W.J., and S.M. Block. 2006. Single-molecule, motion-based DNA sequencing using RNA polymerase. Science 313: 801. Grunberg-Manago, M., and S. Ochoa. 1955. Enzymatic synthesis and breakdown of polynucleotides: polynucleotide phosphorylase. J. Am. Chem. Soc. 77: 3165-6. Grunstein, M. and D. Hogness. 1975. Colony hybridization: a method for the isolation of cloned DNAs that contain a specific gene. Proc. Natl. Acad. Sci. USA 72: 3961-5. Gupta, P.K. 2009. Genetics. Merrut; Rastogi Publ. Gupta, P.K., and S. Rastogi, 2004. Molecular markers from the transcribed/expressed region of the genome in higher plants. Funct. Integ. Genomics 4: 139-62. Gupta, P.K., J.K. Roy, and M. Prasad. 1999. DNA chips, microarrays and genomics. Curr. Sci. 77: 875-84. Haber, E., and C.B. Anfinsen. 1962. Side-chain interactions governing the pairing of half-cystine residues in ribonuclease. J. Biol. Chem. 237: 1839-44. Hannon, G.J. 2002. RNA interference. Nature 418: 244-51. Harris, T.D., P.R. Busby, H. Babock, et al. 2008. Single-molecule DNA sequencing of a viral genome. Science 320: 106-9. Hayden, E.C. 2009. Genome sequencing: the third generation. Nature 457: 768-69. Hogrefe, R. 2008. A short history of oligonucleotide synthesis. TriLink BioTechnologies [email protected] Holley, R.W. 1964. Alanine transfer RNA. In: Nobel Lectures in Mol. Biol. 1933-1975: 285-300, Elsevier North Holland, New York, NY, USA. Holley, R.W., J. Apgar, G.A. Everett, et al. 1965. Structure of a ribonucleic acid. Science 147: 1462-5. Houghton, M., M.A. Easton, A.G. Stewart, et al. 1980. The complete amino acid sequence of human fibroblast interferon as deduced using synthetic oligodeoxyribonucleotide primers of reverse transcriptase. Nucl. Acids Res. 8: 2885-94. Hurwitz, J., A. Bresler, and R. Diringer. 1960. The enzymic incorporation of ribonucleotides into polyribonucleotides and the effect of DNA. Biochem. Biophys. Res. Commun. 3: 15-9. Iannelli, F., L. Giunti, and G. Pozzi. 1998. Direct sequencing of long polymerase chain reaction fragments. Mol. Biotechnol. 10:183-5. Itakura, K., T. Hirose, R. Crea, and A.D. Riggs, 1977. Expression in Escherichia coli of a chemically-synthesized gene for the hormone somatostatin. Science 198: 1056-63. Jackson, D.A., S.H. Symonst, and P. Berg. 1972. Biochemical method for inserting new genetic information into DNA of simian virus 40: SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc. Natl. Acad. Sci. USA 69: 2904-10. Jacob, F. 1983. Molecular tinkering and evolution. In: Bendall, D.S. ed. Evolution from Molecules to Men. Cambridge: Cambridge University Press, pp. 131-44. Jeffreys, A. 2005. DNA fingerprinting. Nature Med. 11: xiv-xviii. Jeffreys, A.J., V. Wilson, and S.L. Thein. 1985. Hypervariable minisatellite regions in human DNA. Nature 314: 67-72. Jin, L., and R.V. Lloyd. 1997. In situ hybridization: methods and applications. J. Clin. Lab. Anal. 11: 2-9. Jun, K., and H. Yoshihide. 1999. DNA sequencing by the slab-gel automated sequencer. Mol. Med. 36: 1191-200. Khorana, H.G. 1979. Total synthesis of a gene. Science 203: 614-25. Kornberg, A. 1957. Pathways of enzymatic snthesis of nucleotides. In: The Chemical Basis of Heredity. Eds. McEllory, W.D., and B Glass, p. 579. Baltimore: Johns Hopkins Press. Kornberg, A., I.R. Lehman, M.J. Bessman, and E.S. Simms. 1956. Enzymatic synthesis of desoxyribonucleic acid. Biochim. Biophys. Acta 21: 197-8. Kramer F.R., and D.R. Mills. 1978 RNA sequencing with radioactive chain-terminating ribonucleotides. Proc. Natl. Acad. Sci. USA 75(11): 5334-8. Lambertucci, C., G. Schepers, G. Cristalli, P. Herdewijn, and A. Van Aerschot. 2007. Synthesis and evaluation of hexitol nucleoside congeners as ambiguous nucleosides. Tetrahed. Lett. 48: 2143-5. Lemieux, B., A. Aharoni, and M. Schena. 1998. Overview of DNA chip technology. Mol. Breeding 4: 277–289. Letsinger, R.L., and V. Mahadevan. 1965. Oligonucleotide synthesis on a polymer support. J. Am. Chem. Soc. 87(15):3526-7. Letsinger, R.L., and W.B. Lunsford. 1976. Synthesis of thymidine oligonucleotides by phosphite triester intermediates. J. Am. Chem. Soc. 98(12):3655-61.

27.50

Essentials of Molecular Genetics

Letsinger, R.L., J.L. Finnan, and N.B. Lunsford. 1975. Phosphite coupling procedure for generating internucleotide links. J. Am. Chem. Soc. 97(11): 3278-9. Lin, M.C., W.H. Stein, and S. Moore. 1968. Further studies on the alkylation of the histidine residues at the active site of pancreatic ribonuclease. J. Biol. Chem. 243: 6167-70. Lincoln, S., M. Daly, and E. Lander. 1992a. Constructing Linkage Maps with MAPMAKER/EXP 3.0. Whitehead Institute Technical Report. 3rd Edition. Lincoln, S., M. Daly, and E. Lander. 1992b. Mapping Genes Controlling Quantitative Traits with MAPMAKER/QTL 1.1. Whitehead Institute Technical Report. 2nd Edition. Linn, S., and W. Arber. 1968. Host specificity of DNA produced by Escherichia coli, X. In vitro restriction of phage fd replicative form. Proc. Natl. Acad. Sci. USA 59:1300-6 Litt, M., and J.A. Lutty. 1989. A hypervariable microsatellite revealed by in-vitro amplification of a dinucleotide repeat with in the cardiac muscle actin gene. Am. J. Hum. Genet. 44: 397-401. Lockhart, D.J., and E.A. Winzeler. 2000. Genomics, gene expression and DNA arrays. Nature 405: 827-36. Manalis, S.R., S.C. Minne, and C.F. Quate. 2004. Direct DNA sequencing with a transcription protein and a nanometer scale electrometer. US Patent 6770472. Mann, M., and O.N. Jensen. 2003. Proteomic analysis of post-translational modifications. Nature Biotech. 21: 255-61. Maxam, A., and W. Gilbert. 1977. A new method for sequencing DNA. Proc. Natl. Acad. Sci. USA 74: 560-4. Merrifield, R.B. 1963. Solid phase peptide synthesis. I. The synthesis of a tetrapeptide. J. Am. Chem. Soc. 85 (14): 2149 doi:10.1021/ja00897a025. Meselson, M., and F.W. Stahl. 1958. The replication of DNA in Escherichia coli. Proc. Natl. Acad. Sci. USA 44: 671-82. Metzker, M.L. 2010. Sequencing technologies the next generation. Nat. Rev. Genet. 11: 31-46. Michelson, A.M., and A.R. Todd. 1955. Synthesis of a dithymidine dinucleotide containing a 3′: 5′-internucleotidic linkage. J. Chem. Soc. 2632-8. DOI: 10.1039/JR9550002632. Ming, X., P. Leonard, D. Heindl, and F. Seela. 2008. Azide-alkyne "click" reaction performed on oligonucleotides with the universal nucleoside 7-octadiynyl-7-deaza-2'-deoxyinosine. Nucl. Acids Symp. Series 52: 471-72. Narang, S.A. 1984. Chemical synthesis, cloning and expression of human preproinsulin Gene. J. Biosci. 6: 739– 55. Nathans, D. and Smith, H.O. 1975. Restriction endonucleases in the analysis and restructuring of DNA molecules. Annu. Rev. Biochem. 44: 273-93. Nichols, R., P. C. Andrews, P. Zhang, and D.E. Bergstrom. 1994. A universal nucleoside for use at ambiguous sites in DNA primers. Nature 369: 492-3. Nielsen, P.E. 2008. A new molecule of life. Scient. Am. 299(6): 64-71. Nirenberg, M.W., and J.H. Matthaei. 1961. The dependence of cell free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proc. Natl. Acad. Sci. USA 47: 1588-602. Olson, M., L. Hood, C. Cantor, and D. Botstein. 1989. A common language of physical mapping of human genome. Science 245: 1434-44. Peattie, D.A. 1979. Direct chemical method for sequencing RNA. Proc. Natl. Acad. Sci. USA 76(4): 1760-4. Poole, A., D. Penny, and B.-M.Sjöberg. 2001. Confounded cytosine! Tinkering and the evolution of DNA. Nature Rev. Mol. Cell Biol. 2: 147-51. Rafalski, J.A. 2002. Application of SNP in crop genetics. Curr. Opn. Pl. Bio. 5: 94-100. Rigby, P.W. J., M. Dieckmann, C. Rhodes, and P. Berg 1977. Labeling deoxyribonucleic acid to high specific activity in vitro by nick translation with DNA polymerase I. J. Mol. Biol. 113: 237-51. Riley, L.K., and C.J. Caffrey. 1990. Identification of enterotoxigenic E. coli by colony hybridization with nonradioactive digoxigenin-labeled DNA probes. J. Clin. Microbiol. 28: 1465-68. Ronaghi, M. 2001. Pyrosequencing sheds light on DNA sequencing. Genome Res. 11: 3-11. Sanger, F., S. Nicklen, and A.R. Coulson. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. USA 74: 5463-7. Seela, F., and H. Debelak. 2000. The N8-(2′-deoxyribofuranoside) of 8-aza-7-deazaadenine: a universal nucleoside forming specific hydrogen bonds with the four canonical DNA constituents. Nucl. Acids Res. 28: 3224-32.

Molecular Techniques and Tools

27.51

Smith, H.O., and D. Nathans. 1973. A suggested nomenclature for bacterial host modification and restriction systems and their enzymes. J. Mol. Biol. 81: 419-23. Smith, H.O., and K.W. Wilcox. 1970. A restriction enzyme from Hemophilus influenzae. I. Purification and general properties. J. Mol. Biol. 51: 379-91. Smith, L. M. 1991. High-speed DNA sequencing by capillary gel electrophoresis. Nature 349: 812-3. Smith, L.M., Z. Sanders, JR.J. Kaiser, et al. 1986. Fluorescence detection in automated DNA sequence analysis. Nature 321: 674-9. Smithies, O. 1955. Zone electrophoresis in starch gels: group variations in the serum proteins of normal human adults. Biochem. J. 61: 629-41. Southern, E.M. 1975. Detection of specific sequences among DNA fragments separated by gel electrophoresis. J. Mol. Biol. 98: 503-17. Stanley, J., and S. Vassilenko. 1978. A different approach to RNA sequencing. Nature 274: 87-9. Stevens, A. 1960. Incorporation of the adenine ribonucleotide into RNA by cell fractions from E. coli B. Biochem. Biophys. Res. Commun. 3: 92-6. Stoekle, M.Y., and D.N. Herbert. 2008. Barcode of life. Scient. Am. 299(4): 82-88. Tabuchi, M., M.U. Kaji, Y. Yamasaki, Y. Nagasaki, K. Yoshikawa, K. Kataoka, and Y. Baba. 2004. Nanospheres for DNA separation chips. Nature Biotech. 22: 337-40. Takahashi, K., W.H. Stein, and S. Moore. 1967. The identification of a glutamic acid residue as part of the active site of ribonuclease T-1. J. Biol. Chem. 242: 4682-90. Tanaka, Y., T.A. Dyer, and G.G. Brownlee. 1980. An improved direct RNA sequence method; its application to Vicia faba 5.8S ribosomal RNA. Nucl. Acids Res. 8: 1259-72. Tanksley, S.D., M.W. Ganal, and G.B. Martin. 1995. Chromosome landing: a paradigm for map-based gene cloning in plants with large genomes. Trends Genet. 11(2): 63-68. Temin, H.M., and S. Mizutani, 1970. RNA-dependent DNA polmerase in virions of Rous sarcoma virus. Nature 276: 1211-3. Thomas, P.S. 1980. Hybridization of denatured RNA and small DNA fragments transferred to nitrocellulose. Proc. Natl. Acad. Sci. USA 77: 5201-5. Thomas, S., N. Thirumalapura, E.C. Crossley, N. Ismail, D.H. Walker, 2009. Antigenic protein modifications in Ehrlichia. Parasite Immunol. 31(6): 296-303. Towbin, H., T. Staehalin, and J. Gordon. 1979. Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications. Proc. Natl. Acd. Sci. USA 76: 4450-4. Van Aerschot, A., C. Hendrix, G. Schepers N. Pillet, and P. Herdewijn, 1995. In search of acyclic analogues as universal nucleosides in degenerate probes. Nucleos. Nucleot. Nucl. Acids 14: 1053-6. Vasudevan, H. 2011. DNA fingerprinting in the standardization of herbs and nutraceuticals. Sci. Creat. Quart. Issue 6. http://www.scq.ubc.ca/dna-fingerprinting-in-the-standardization-of-herbs-and-nutraceuticals/ Vera J.C., C.W. Wheat, H.W. Fescemyer, et al. 2008. Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol. Ecol. 17: 1636-47. Veselkov, A.G., V.V. Demidov, and M.D. Frank-Kamenetskil. 1996. PNA as a rare genome-cutter. Nature 379: 214. Von Kiedrowski, G., B. Wlotzka, and J. Halbing. 1989. Sequence dependence of temple-directed synthesis of hexadenoxynucleotide derivatives with 3′-5′ pyrophosphate linkage. L. Angew. Chem. Int. Eng. 28: 1235-7. Vos, P., R. Hogers, M. Bleeker, et al. 1995. AFLP: a new technique for DNA fingerprinting. Nucl. Acids Res. 23: 4407-14. Walsh, C.T., S. Garneau-Tsodikova, and G.J. Gatto Jr. 2005. Protein posttranslational modifications: The chemistry of proteome diversifications. Angewandte Chemie (Internatl. Ed. English) 44: 7342-72. Weiss, S.B. 1960. Enzymatic incorporation of ribonucleoside triphosphates into the interpolynucleotide linkages of ribinucleic acid. Proc. Natl. Acad. Sci. USA 46: 1021-30. Wetzel, R., H.L., Heynejer, D.V. Goeddel, et al. 1980. Production of biologically active N-desacetylthymosin 1 in Escherichia coli through expression of a chemically synthesized gene. Biochemistry 19: 6096-104. Zamore, P.D., and B. Haley. 2005. Ribo-genome: the big world of small RNAs. Science 309: 1519-24. Zerbino, D.R. and E. Birney. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5): 821-9.

27.52

Essentials of Molecular Genetics

PROBLEMS 1.

2. 3. 4.

5.

What is the principle, procedure, applications and limitations of each one of the following molecular techniques: (a) DNA/RNA separation, (b) in situ hybridization, (c) squash dot hybridization, (d) southern blotting, (e) northern blotting, (f) western blotting, (g) eastern blotting, (h) dot blot, (i) slot blot, (j) colony hybridization, (k) plaque hybridization, (l) chromosome walking, (m) chromosome jumping/hopping, (n) chromosome landing, (o) nick translation, (p) RNA sequencing, (q) DNA sequencing, RNA synthesis, (r) DNA synthesis, (s) DNA fingerprinting, (t) DNA markers, (u) microarrays, (v) restriction endonucleases, (w) recombinant DNA technology, and (x) quantitative trait loci mapping. There are several techniques that have been described in the preceding chapters. Make a list of such techniques. Describe principle, procedure, applications and limitations of all these molecular techniques. Who discovered molecular techniques, tools and methods discussed in above two problems. The subject index of this book contains information on Nobel Prize winners whose research had tremendous impact in the subject of molecular genetics. Identify these Nobel laureates with their contribution for which they were awarded Nobel Prize. There might be some problems in molecular genetics to which you might be seeking an answer. Make a list of such questions and try to find answers.

Glossary A Actin gene family – Actin genes in eukaryotes represent a multigene family, where the members are homologous but non-identical, giving rise to slightly different variants so that different members function either at different times or in different tissues at the same time. Activation of tRNA – Before translation commences, activation of amino acids (formylated methionine in prokaryotes and methionine in eukaryotes) takes place in the cytosol at the expense of ATP. An amino acid reacts with ATP to become adenylated with concomitant release of pyrophosphate. As a result of adenylation, the amino acid is attached to adenylic acid via a high energy ester bond in which the carbonyl group of amino acid is joined to phosphoryl group of AMP. Addition of cap at 5′-end of pre-mRNA – Eukaryotic mRNA always has a 5' cap composed of a 5' → 5' triphosphate linkage between two modified nucleotides: a 7-methylguanosine and a 2' O-methyl purine. This cap is a modified guanine (m7GpppN1) which is added post-transcriptionally and serves to identify this RNA molecule as an mRNA to the translational machinery. Addition of poly(A) tail at 3′-end – Polyadenylation signal in the train is AAUAAA in pre-mRNA. Polyadenylation involves two steps — cleavage and addition of poly(A) tail. Multiple copies of polyadenylation signal are of common occurrence. Adenine methylation in mRNA – mRNA modification influences expression of thousands of genes. Messenger RNA (mRNA) is often chemically modified by addition of a methyl group to adenine. Thus, a fifth base, N6methyladenosine (m6A), pervades the transcriptome. m6A sites frequently occur in regions of mRNA that are highly conserved across several species of vertebrates. This modification is likely to have widespread effects on how genes are expressed. Adenine methylation in RNA is the first demonstration of an epitranscriptomic modification that is not due to changes in the underlying sequence. Alarmones – Alarmones are possible cellular signals of distress. They affect cellular enzymes that are proteins to counteract the problem, whose metabolic consequences they represent, by moving in different sites in the cell, just as hormones. Some examples of alarmones are: cAMP (3′-5′ cyclic AMP), ppGpp (guanosine tetraphosphate, pppGpp (guanosine pentaphosphate, AppppA (diadenosine tetraphosphate), ZTP (5-amino, 4immidazolecarboxamide riboside 5′-triphosphate). Allelic exclusion – In immunoglobulin-producing cells or plasma cells, only one member of each pair of alleles, is involved in the synthesis of immunoglobulin being expressed. Correct rearrangement of only one chromosome is needed for the generation of each expressed light or heavy chain genes. Since DNA rearrangement of one chromosome takes place, expression of the alternate allele is excluded. Allelic mutations – Two mutations are alleleic if they do not show complementation. Alternative RNA splicing – It is the process through which more than one functional mRNA molecules are produced from one and the same primary transcript by differential removal of introns. Alternative splicing makes it possible for a single gene to produce more than one messenger RNA molecule, which contradicts the basic conceptual framework of the neoclassical view of the gene. Alternative sigma factors – Sigma factor is an important component of transcription initiation by RNA polymerase. Various forms of sigma factors have been found in viruses, prokaryotes and eukaryotes.

G.2

Essentials of Molecular Genetics

Ambiguous code – When one codon can code for more than one amino acid. Amplified fragment length polymorphism (AFLP) – This technique is a combination of RAPDs and RFLPs. AFLP is based on PCR amplification of a set of restriction fragments, selected from a pool of fragments, that are generated due to digestion with a pair of specific restriction enzymes, one of them being a fragment cutter MseI and the other being a rare cutter EcoRI (6 bp). Ancillary sites – Sequences located between –30 and –40 position. Also referred to as –35 sequence. In bacterial genes, these sites are known as CAP (cAMP-activated protein) site. The term ancillary site applies to both prokaryotic and eukaryotic genes. Antibody diversity – Organisms have an ability to synthesize a large number (in millions) of different types of immunoglobulins to defend the body against a large number of foreign substances (antigens) which may otherwise harm the body. DNA rearrangements are responsible for antibody diversity. Anticodon – Sequence of nucleotides on tRNA that complements with the codon on mRNA. Antioncogenes – Proteins encoded by antioncogenes can prevent tumorigenic transformation of cells. These genes act by suppressing malignant growth. Also known as tumor-suppressor genes (TSGs) or recessive oncogenes. Antisense DNA strand – DNA template strand for a given mRNA. Also known as coding strand, transcribing strand, template strand, or Watson strand. Antisense RNA technology – Antisense RNA is a fragment of RNA with a sequence of bases complementary to the sequence of a particular mRNA. Since antisense RNA will be complementary to mRNA, they will form a duplex which will interfere with its translation. This technology involves cloning of a particular gene in reverse orientation, with respect to promoter such that coding strand act as a template and sequence of mRNA is same as coding sense strand. The antisense gene when transcribed gives rise to mRNA, thus inhibiting the target gene expression by forming RNA duplex. The antisense RNA binds to complementary mRNA and prevents its transport to cytoplasm or translation into protein. Antitermination – Process that causes the enzyme to continue transcription past the terminator sequence, an event called readthrough. Antitermination is used as a control mechanism in both bacterial operons and phage regulatory circuits. Antitermination is used as a control mechanism in operons controlled by attenuation to provide a link between translation and transcription. Antiterminators – Anti-termination causes the enzyme to continue transcription past the terminator sequence. There are certain specific ancillary factors that interact with RNA polymerase (RNAP or Pol). NusA protein controls the hairpin formation promoting termination, and stabilization of these contacts by phage lambda N protein leads to antitermination. Also known as antitermination factors. Arabinose (ara) operon of E. coli – Arabinose (ara) operon of Escherichia coli has three structural genes, D, A and B which code for enzymes epimerase, isomerase and kinase, respectively. Gene C produces a repressor called C protein, which complexes with arabinose. Corepressor-arabinose complex binds at gene i, which induces the production of these enzymes. In E. coli cells growing in absence of arabinose, the three enzymes involved in its breakdown are present in only very small quantities but on arabinose addition all these enzymes increase in amounts. ara operon is a glucose-sensitive operon, so ara mRNA synthesis is also dependent upon the binding of CAP to promoter by its activation with cAMP. Thus arabinose operon demands simultaneous presence of two positive controls. Argonaute (AGO) proteins – Argonaute proteins, discovered in plants, are the major players in RNA-based gene silencing pathways. Argonautes in many RNAi-based silencing mechanisms contribute to maintain the genome, to produce small noncoding RNAs, to form the heterochromatin and to control RNA stability and protein synthesis. Attenuation – A mechanism that links the supply of an aminoacyl~tRNA to the ability of RNA polymerase to read through a termination site. The terminator is located at the beginning of the cluster of structural genes coding for the enzymes that synthesize amino acid carried by tRNA. Attenuator – Aregulator site which serves as a barrier to transcription. The termination event at this site responds to the level of transcription. Autogenous regulation – Autogenous regulation means a protein or RNA regulates its own production. Several proteins work as repressors for repression of their own production. This is achieved by binding of repressor protein to ribosome binding site, i.e., Shine-Dalgarno (S-D) sequence, or translation initiation codon (AUG) of mRNA. Autogenous regulation is found among proteins that are incorporated into macromolecular assemblies. The assembled particle may be unstable as a regulator, because it is large, too numerous, or too

Glossary

G.3

restricted in its locations. If assembly pathway is blocked for some reason, free subunits accumulate and shut off unnecessary synthesis of further components. Eukaryotic cells have this type of regulation. Autotransporter system – Another name for Type V secretion system (T5SS) of Gram-negative bacteria.

B Bacteria signal sequences – In bacteria, proteins destined to be secreted are synthesized as pre-proteins with Nterminal signal sequences sometimes termed leader peptides. Bacteria signal sequences are short (25 residues) and comprise of a hydrophobic central core which can adopt an α-helical structure, flanked by region containing several charged residues. Bacterial methylase systems – Well-studied methyl systems in bacteria are hsd, dam, and dcm. Bacterial promoters – Promoters lack any extensive conservation of sequence over the 60 bp associated with RNAP but some short sequences within the promoters appear to be conserved, which could be referred to as signal sequences. A 6-bp sequence upstream from the start point is recognizable in almost all the promoters studied. This signal sequence is TATAAT, sometimes known as Pribnow box. Occurrence of bases in Pribnow box is T89T89T50A65A65T100. The actual location of this hexamer varies from –11 to –5 to –14 to –8. Pribnow box is 10 bp upstream the start point, so it is called –10 sequence. Centre of the Similarities of sequence also occur at other locations centered about –35 sequence, sometimes called recognition region. Consensus sequence of –35 region with commonly occurring bases is T85T83G81A61C69A52. The distance between –35 and –10 sites varies between 16 and 19 bp in known promoters. The –35 region is involved in the efficiency of the polymerase recognition. Bacterial RNA polymerase – Bacterial RNA polymerase is composed of six polypeptides (α2ββ'ωζ) of five different kinds. This means five different genes are needed to make E. coli RNAP. Active form of enzyme is called holoenzyme. Holoenzyme is made up of a core enzyme and a sigma (ζ) factor. Core enzyme has five subunits (α2ββ'ω). No covalent bond runs between the various chains of RNAP. β-galactosidase – This enzyme performs two functions — first, breakdown of disaccharide lactose into two monosaccharides, namely, glucose and galactose, and second, formation of inducer by conversion of lactose into allolactose. Also known as protein “z”. Bipolar gene regulation – Transcriptional regulation in yeast appears to be mechanistically bipolar, possibly reflecting a need to balance inducible stress-related responses with constitutive housekeeping functions. Blank codons – Amber nonsense codons and, to a lesser extent, ochre, opal, or frameshift codons have been used to specify UAAs. A sense codon has been used to encode an UAA in an auxotrophic strain by exploiting wobble position stability differences and limiting endogenous amino acid concentrations. A general solution that involves the conversion of sense codons into blank codons would be ideal. Blunt DNA ends – The DNA ends that have no single-stranded protruding extensions. Blunt ends are used to produce sticky ends of interest. Britten-Davidson model – This model explains regulation of gene expression in eukaryotes. It has four major components – sensory sites, integrator genes, receptor sites, and producer genes. In Britten-Davidson model, it is also proposed that receptor sites and integrator genes may be repeated a number of times so as to control the activity of a large number of genes in the same cell. Repetition of receptor ensures that same activator recognizes all of them and in this way several enzymes of one metabolic pathway are simultaneously synthesized. A set of structural genes controlled by one sensor site is termed as a battery. Britten-Davidson model is also known as gene-battery model.

C CAAT box – A sequence present in promoters. This sequence GGT/CAATCT lies between –70 and –80. Cap-independent mRNA translation – Proteins of family 14-3-3 are crucial in a wide variety of cellular responses including cell cycle progression, DNA damage checkpoints and apoptosis. A protein called 14-33ζ inhibits the cell cycle and may act as a tumor suppressor. It now turns out that it is also involved in regulating cap-independent protein synthesis from messenger RNA during cell division.

G.4

Essentials of Molecular Genetics

Cardinal nucleotide – The 3′-nucleotide of anticodon triplet is considered to be the most important in codonanticodon recognition and is called cardinal nucleotide. Catabolite activator protein (CAP) – E. coli catabolite activator protein (CAP) is a helix-turn-helix motif sequence-specific DNA binding protein, which plays role in gene regulation. Cellular transformation – Primary cultures of mouse/rat fibroblasts consistently give rise to permanent lines, which can subsequently acquire all the usual properties of fully transformed cells. At least one of the steps in cellular transformation may be an epigenetic change in gene activity, such as the loss of DNA methylation induced by DNA damage. Cellulose utilization (cel) operon of E. coli – The cel (cellulose utilization) operon of E. coli K12 expresses when there are mutations in the operon. Two kinds of mutations are capable of activating the operon. The first type involves the mutations that allow the repressor to recognize the substrate cellulose, arbutin and salicin as inducers. Five different active alleles were studied and differences were found to be single base pair changes at one of the two lysine codons in the repressor gene. The second type involves the integration of the insertion sequences IS1, IS2 or IS5 into a 108-bp region 72-180 bp upstream of the start of transcription. Central dogma of molecular biology – Genetic information from DNA is transcribed to RNA and that information from RNA is translated to polypeptide chain. This concept about flow of biological information from DNA to RNA and then to protein was given by F.H.C. Crick in 1958. With the discovery of reverse transcription this concept was modified Central dogma, new contemporary – This model explains that the genome (all the genes of an organism) gives rise to transcriptome (the complete set of mRNA), which is then translated to produce the proteome (complete set of proteins in a given cell under particular conditions). The genome is a static information source with a defined gene content that with a few exceptions remains the same regardless of the cell type or environmental conditions. Both transcriptome and proteome are dynamic entities, whose content can fluctuate dramatically under different conditions due to regulation of transcription, RNA processing, translation, and protein modification. Chaperonins – A class of chaperones. Chaperonins occur in bacteria, chloroplast and mitochondria. The best known of these chaperonins is the protein GroE of E. coli. This protein in its active form is composed of two components. GroEL and GroES. GroEL (hsp60) is composed of two disks, each composed of seven copies of a polypeptide. GroES (hsp10) is a smaller component composed of seven copies of a small subunit. GroEL forms a barrel within which protein folding takes place. As a protein begins to emerge from the ribosome in E. coli, two proteins which are products of genes DnaJ and DnaK,, complex with it. A third protein, GrpE, causes release of DnaJ and DnaK leading to interaction with GroEL. Once inside, GroES can cycle on and off as ATPs are hydrolyzed, helping GroEL chaperone correct folding of the protein. Finally, the protein emerges. If still not folded correctly, it can recycle through GroEL again. Also see chaperones. Charging of transfer RNAs – During this charging reaction, specific amino acid is covalently attached to a specific tRNA by a tRNA-specific enzyme called aminoacyl-tRNA synthetase. Each tRNA can be charged only with the amino acid for which its anticodon is appropriate. Charging reaction involves two steps, activation and transfer reactions, which are crucial for functioning of the tRNAs. During charging reaction, each of the 20 different amino acids gets attached to its corresponding tRNA by specific aminoacyl-tRNA synthetases. The amino acid is linked by an ester bond involving its carboxyl group to one of the last base of the tRNA (which is always adenine). The meaning of tRNA is determined by its anticodon and not by its amino acid. Chloroplast DNA transcription – A template-binding polypeptide has been identified in pea chloroplast transcriptional complex. It has a molecular weight of 150 kDa and binds to both chloroplast ribosomal (16S rRNA) and messenger (psbA) promoters and it exhibits some degree of correlation to total transcriptional element. Chloroplast gene introns – Split genes for ribosomal RNA (rRNA), transfer RNAs and some proteins have also been reported in the chloroplast genomes of several plants including Chlamydomonas and Nicotiana. Introns found in chloroplast genes can be classified into three groups on the basis of intron boundary sequences. Group I introns (e.g., in trnL) can be folded in a secondary structure similar to self-splicing rRNA procedure of Tetrahymena. These can be removed either by self-splicing or by a 'maturase' (as in cytochrome b and cytochrome oxidase mRNA precursors). Group II introns (e.g., majority of genes including trnA and trnl) can be folded into a complex secondary structure (as in introns of mitochondrial genes for cytochrome oxidase in

Glossary

G.5

maize and yeast). Group III introns (e.g., trnG, trnK, trnV, rpl2, rps12, rps16, etc.) have conserved sequence at their borders (GTGCGNY at 5′-end, and ANCNRYY(N)YYAY at 3′-end), similar to those in the eukaryotic nuclear genes, where R = purine, Y = pyrimidine, and N = any nucleotide. Chromatin regulators – Chromatin, the protein wrapping of the genome, harbors information about how the genes it contains are to be regulated. A combination of histone modification marks, in the absence of a transcription factor, may suffice to stably recruit a regulatory protein complex to a regulatory site. Chromodomains – Chromodomains are modules implicated in the recognition of lysine-methylated histone tails and nucleic acids. Chromosome-ATPase/helicase-DNA binding (CHD) proteins regulate ATP-dependent nucleosome assembly and mobilization through their conserved double chromodomains and SW12/SNF2 helicase/ATPase domain. Deletion of chromodomains impairs nucleosome binding and remodeling by CHD proteins. Chromosome imprinting – The classical example of epigenetic inheritance is the phenomenon of imprinting, in which the expression status of a gene depends upon the parent from which it is derived. The X chromosome of paternal and maternal parents act differently (probably due to difference in methylation). Aso known as genetic imprinting. Cis-position – When two alleles or elements are present in the same DNA strand. Cis-TGS – Transcriptional gene silencing event affecting single or multiple copies inserted at one locus (i.e., it does not require the presence of homologous sequences in the genome). Compare with trans-TGS. Cistron – Operationally, a cistron is defined by cis-trans complementation tests. Two mutations belong to the same cistron if they do not show complementation in cis-trans test or in a simple trans complementation test. Class discovery – This popular approach involves grouping similar genes or samples together. The simplest form of class discovery would be to list all the genes that changed by more than a certain amount between two experimental conditions. This approach helps in gene annotation. Class I genes – Genes transcribed by RNA polymerase I. Pre-rRNA 45S (35S in yeast) genes Class I oncogenes – The oncogenes (e.g., src, yes, neu, abl, fps, fms, erb, ros, mos, fgr) that are related to synthesis of cell surface receptor proteins. The cells surface receptor proteins are activated by growth factors produced by other cell types. An activated cell will proliferate as long as the second cell type produces its growth factor in response to some external stimulus. Class I oncogene products have kinase activity and are found in cytoplasm. Class I tRNAs – Have only 3-5 bases in their extra loop; they represent ~75 per cent of all tRNAs. Class II genes – Genes transcribed by RNA polymerase I. Precursors of mRNAs and most snRNA and microRNAs. Many class II genes contain a region located between 19 and 27 bp upstream from the transcription start site whose consensus sequence TATANA is similar (except in location) to the prokaryotic TATA box, also known as Hogness box, where N means any nucleotide. Class II promoters contain yet another upstream element with consensus sequence GGGCGG. This sequence functions in either direction and is often present in multiple copies and binds the transcription factor SP1. Class II genes contain still other binding sites for transcription factors in a variety of locations. Certain sequences are present only in some classes of genes, e.g., those for heat shock proteins). These sequences bind to regulatory factors. Class II oncogenes – These oncogenes (e.g., HA-ras, k1-ras, N-ras) are commonly found in human tumors. Their protein products have a common characteristic of regulating cellular metabolism. Class II tRNAs – These tRNAs have larger loop having 13-21 bases in their extra loop and 5 base pairs in the stem. Class III oncogenes – These oncogenes (e.g., myc, myb, fos, ski, p53) regulate nuclear activities, possibly cell cycling. Class IV oncogenes – These oncogenes (e.g., cis, rel, B-lym, erb-A, ets, met) are relatively unrelated. oncogenes, some of them produce cell growth factors (proteins). Coactivator proteins – Gene activation in higher eukaryotes requires the concerted action of transcription factors and coactivator proteins. Coactivators exist in multiprotein complexes that dock on transcription factors and modify chromatin, allowing effective transcription to take place. Code dictionary – A table of all code words that specify amino acids. Also known as genetic code dictionary. Code letter – In standard genetic code, nucleotides A, T, G, and C in DNA and A, U, G and C in mRNA. Codon Sequence of nucleotides specifying an amino acid. Coding RNA – The RNA which serves as template during translation. Messenger RNA is the only coding RNA.

G.6

Essentials of Molecular Genetics

Codon length – When only as many amino acids are coded as there are code words in end-to-end sequence. (e.g., UUUCCC sequence has two code words; they code for only two amino acids). Code length is three code letters. Cohesive ends – The DNA ends that have protruding single-stranded complementary ends. Cohesive ends are used in construction of recombinant DNA molecules. Also known as sticky ends. Colinearity – A point-for-point correspondence between gene and its protein product. Sequence of nucleotides in a prokaryotic gene determines sequence of amino acids in a protein/polypeptide. A prokaryotic gene and its protein product were colinear. Not all eukaryotic genes are colinear with their proteins. This is so because many eukaryotic genes contain interspersed blocks of sequences called intervening sequences or introns which are transcribed into RNA but are not translated into protein product. The two pairs of sites that are equidistant in the protein product may not necessarily be so in the gene. However, codons would appear in same order in DNA as their corresponding amino acids do in the protein. Combinatorial exons – In this type of alternative RNA splicing, inclusion or exclusion of a particular exon is independent of all other exons present in the transcript. Complementation – The ability of two recessive mutations to restore wild-type phenotype (partially or completely). Complex gene – Complex genes are so called because either there is rearrangement of DNA at different levels before 3'←5' strand of DNA is available for transcription (as is case of immunoglobulin gene) or pre-mRNA is cleaved through a series of steps before mRNA in finished form is available (as is the case of dimorphic genes, as exemplified by the kallikrein gene or these genes have a cryptic structure in that the ultimate active product or products are carried within the precursorial protein (as is the case of cryptomorphic genes, exemplified by yeast sex pheromone gene (MF1), mammalian pancreatic glucagon genes, which are released after enzymatic breakdown of the precursor and become functional following further processing. Complex promoters – These promoters have interdigitated array of constitutive and regulatory elements. Both constitutive and regulatory elements are required for correct enhancer function. Complex transcription units – Those transcription units in which a primary transcript may give rise to two or more mRNAs that encode different proteins. Complex transcription units may have two or more poly(A) sites or two or more splice sites. Compound gene – Genes in which coding sequences (exons) are separated by noncoding sequences (introns). Discovery of split genes was made independently by two groups led by Phillp A. Sharp and Richard J. Roberts in 1977. Also known as split genes. Consensus sequence – A sequence found in majority of the sequences analyzed to find out structure of an element. Any consensus sequence is defined by aligning all known examples so as to maximize their homology. Constitutive elements in eukaryotic gene regulation – Constitutive elements are shared by a large number of different genes and are required for a basal level of expression. One such constitutive element is GoldbergHogness or TATA box which is located 20-30 bases upstream of the transcription initiation site and is involved in determining the precise position of initiation. Constitutive sequences upstream of TATA box, such as CAAT and GC boxes, play the dominant role in controlling the frequency of transcription initiation. Constitutive gene – A gene whose expression is not regulated. Product of a regulatory gene is synthesized regularly and is invariably present in the cell at roughly the same concentration regardless of whether substrate to that enzyme is ever presented to the cell or not. Such genes thus have invariant control. Constitutive genes encode for constitutive enzymes, like enzymes of glucose metabolism and those of monophosphate pathway. Controlling elements – The sequences that are involved in regulation of gene expression. They have been shown to regulate phenotypic expression, being capable of moving to different locations and exerting their influence on a variety of genes. Controlling region of lactose operon – Nucleotide length of different components of lac operon of E. coli: lac i promoter; 40 nucleotides; i gene, 111 nucleotides; lac i operator, 26 nucleotides; lac z gene, 3063 nucleotides; lac y gene, 800 nucleotides; lac a gene, 800 nucleotides. E. coli lac promoter has three components: cgs, catabolic gene activator (CAP) site; ibs, initial binding site; and op, operator. Convergent transcription – Sister chromatids, the products of eukaryotic DNA replication, are held together by the chromosomal cohesin complex after their synthesis. This allows the spindle in mitosis to recognize pairs

Glossary

G.7

of replication products for segregation into opposite directions. Cohesin forms large protein rings that may bind DNA strands by encircling them. Cohesin localizes almost exclusively between genes that are transcribed in converging directions. Active transcription positions cohesins at these sites, not the underlying DNA sequence. Cohesin localization to places of convergent transcription is conserved in fission yeast, suggesting that it is a common feature of eukaryotic chromosomes. Compare with divergent transcription. Cooperation between nuclear and chloroplast DNA – Small subunit of fraction I protein is coded by nuclear DNA and is synthesized in the cytoplasm and is then transported to the chloroplast. The large subunit of fraction I protein is synthesized in the chloroplast. The small subunit is then transported to the chloroplast. The small and large subunits of fraction I protein join each other in the chloroplast. Cooperation between nuclear and mitochondrial DNA – There are many proteins in mitochondria that have dual origin, some polypeptides having cytoplasmic origin and others having mitochondrial origin. Coordinated gene expression – A three-dimensional examination of gene regulation suggests that portions from different chromosomes ‗communicate‘ with each other, and bring related genes together in the nucleus to coordinate their expression which is required for homeostasis, growth and development in all organisms. Such coordination may be partly achieved at the level of messenger RNA stability, in which the targeted destruction of subset of transcripts generates the potential for cross-regulating metabolic pathways. Copia gene family – This gene family is 4-5 kb long and appears to consist of genes of very similar structure located at multiple sites that are characterized by different flanking sequences. Co-repressor – In repressible gene control, end product is known as co-repressor. Effectors of repressible operons are known as co-repressors. Co-suppression in plants – R.A. Jorgensen coined this term in 1990 to describe the loss of mRNAs of both the endo- and the transgene. Now this term is used in plants, Caenorhabditis elegans and Drosophila to describe the reciprocal silencing of transgenes and (partially) homologous endogenous genes. Cosuppression occurs either at the transcriptional level, when the homology is within the promoter, or at the post-transcriptional level, when the homology is within the coding sequence. Co-transcriptional cleavage (CoTC) – This primary cleavage event within β-globin pre-messenger RNA, downstream of the poly(A) site is critical for efficient transcriptional termination by RNA polymerase II. CoTC process involves a self-cleaving activity. Autocatalytic core of the CoTC ribozyme has functional role in efficient termination in vivo. CoTC may be a general phenomenon and functionally it may provide an entry point for exonuclease involved in mRNA maturation, turnover and, in particular, transcriptional termination. Co-translational protein translocation – Secreted and membrane proteins are translocated across or into cell membranes through a protein-conducting channel (PCC) when they are being synthesized. This translocating PCC forms connections with ribosomal RNA hairpins on two sides and ribosomal proteins at the back, leaving a frontal opening. For entry of these proteins into the ER, leader sequence is required at the Nterminal end of the protein. Also known as co-translational transfer. CpG methylation maintenance – DNA methyltransferase 1 (Dnmt1) is the principal enzyme responsible for maintenance of CpG methylation and is essential for regulation of gene expression, silencing of parasitic DNA elements, genomic imprinting and embryogenesis. Cryptic genetic variation – The variation that is not obvious at phenotypic level. Elecrophoresis also fails to detect this type of variation. Cryptic variation is detected at protein level in terms of some biochemical (kinetic) properties. Cryptic variation is suggested to compensate in response to varying environmental conditions (temperature) during different stages of development (larvae, pupae and adults). Such variation was suggested to play role in adaptation. Cyclic adenosine monophosphate (cAMP) – Essential for activation of cga protein. cAMP-cga protein system is influenced by level of glucose. This exerts a positive control on lactose operon. cAMP is the key metabolite which influences glucose effect. Also known as second messenger. Cytoplasmic gene control – Wide range of half-lives for different specific mRNAs in the same cells exists and different half-lives for the same mRNA in the same cell exist under different circumstances. Some evidences show that cytoplasm can control nuclear activity. Cytosine methylation in DNA – In mammalian DNA, 2 to 7 per cent of the total cytosine is converted to 5methylcytosine (m5C). Methylation occurs enzymatically after DNA synthesis by methyl transfer from Sadenosylmethionine (SAM or AdoMet) to position 5 of cytosine. The essential function of m 5C is to modify

G.8

Essentials of Molecular Genetics

protein-DNA Interactions. The conversion of cytosine to m5C introduces a methyl group into an exposed position in the major groove of the DNA helix, and the binding to DNA of proteins such as the lactose repressor, histones, and hormone receptors is known to be affected by changes in the major groove. Cytosine methylation in RNA – Post-transcriptional RNA modification is a characteristic feature of noncoding RNAs, and has been described for rRNAs, tRNAs and miRNAs. (Cytosine-5) RNA methylation has been detected in stable and long-lived RNA molecules. Cytosine methylation in RNA can be reproducibly and quantitatively detected by bisulfite sequencing. Function of cytosine methylation in RNA is still unclear.

D dam system – This system distinguishes the strands of newly replicated DNA by methylating adenine. It may be involved in control of replication and marking DNA strands for repair. dcm system – Cytosine in DNA is methylated in this case. Its role is suggested to be in cellular differentiation and development. In E. coli, internal cytosine residues in the sequence CCWGG (W stands for A or T) are converted to 5-methylcytosine (m5C) by DNA cytosine methylase (Dcm). This methylation system might protect the genome from the action of restriction enzymes. Degenerate code – When there is more than one codon for a particular amino acid. Demethylation models – In the early undifferentiated state, the DNA is postulated to be fully or ―uniformly‖ methylated in that all sites that ever will be methylated. During development, sequences-specific proteins would inhibit methylation during DNA replication, leading to methylation patterns specific for each tissue. Once the specific demethylation events occur, the differentiated methylation pattern would be inherited clonally as a result of the maintenance methylase system. Demethylation alone cannot turn on any gene. If appropriate trans-acting factors are present to repress transcription, or if the tissue is missing specific transacting factors that positively regulate transcription, demethylation may have little or no effect. Deoxyribonucleic acid (DNA) – A polymer of deoxyribonucleoside-phosphates. Deoxyribonucleses (DNases) – Those enzymes that cleave double-stranded DNA or single-stranded DNA molecules. Differential RNA processing – Differential processing is known for cellular mRNAs. For example, during Blymphocyte development and immunoglobulin synthesis, the μ heavy chain is first inserted as an integral protein in plasma membrane. Later, a similar μ chain is found as a part of secreted immunoglobulins. Analysis of cDNA of the two mRNA's showed that carboxyl termini of the μ membrane were different and amino termini were the same for the two μ chains. Further examination showed that genomic DNA revealed two poly(A), sites in the same transcript unit and same primary transcript seemed to cover both of them. Depending upon choice of poly(A) site two heavy chain mRNAs with different 5'-most sequences can be formed, one for membrane chain and other for the μ secreted chain. Divergent multigene family – Different members of the family have resulted from concerted evolution to the formation of divergent sets of duplicates. Examples of divergent multigene families are immunoglobulin genes, globin genes, actin genes, albumin genes, α-fetoprotein genes, serine protease genes, the interferon geness (for defense against viral infection) genes, and chorion protein-making genes Divergent transcription – Transcription initiation by RNAP II is thought to occur unidirectionally from most genes. Transcription start site-associated RNAs (TSSa-RNAs) non-randomly flank active promoters, with peaks of antisense and sense short RNAs at 250 nucleotides upstream and 50 nucleotides downstream of TSSs, respectively. Divergent transcription over short distances is common for active promoters. Compare with convergent transcription. Divided operons – When structural genes of an operator are located at different positions in the chromosome, the operon is said to be divided. Each of these structural genes lies next to its operator. For example, tryptophan (trp) operon, arabinose (ara) operon, arginine (arg) operon. Divisive introns – Random insertions of DNA (bearing rudimentary splice signals) into previously intact genes. Such introns could arise by aberrant recombination, retrovirus or transposon integration, or retrovirusmediated insertion of cellular mRNA sequences. Also known as type B introns. DNA fingerprinting – DNA fingerprinting is a method of establishing the identity of an unidentified body by tracing and matching ―signatures‖ peculiar to an individual of a species. DNA is a highly stable biochemical molecule that does not lose its characteristics over millennia. Also known as DNA profiling.

Glossary

G.9

DNA footprinting –A method of investigating the sequence specificity of DNA-binding proteins in vitro. This technique can be used to study protein-DNA interactions both outside and within cells. DNA looping – Looping of DNA seems to be the most likely mechanism facilitating protein-protein interactions involved in gene regulation. This mechanism imagines that proteins bound at widely separated sites act in the same way, with the intervening DNA looping or bending to allow protein-protein interactions. According to this idea, it is the interaction between DNA-bound proteins, not the looping per se, regulate gene expression. DNA markers – Properties desirable for ideal DNA markers include highly polymorphic nature, preferably codominant inheritance, frequent occurrence in genome, neutral behavior to environmental conditions and management practices, easy and fast assay, high reproducibility and easy exchange of data between laboratories. DNA markers are based on the following molecular mechanisms: (a) single base substitution in the restriction sites or PCR priming sites, (b) arrangement within the DNA intervening the two restriction sites or PCR priming sites, (c) error in replication of arrays of tandemly repeated DNAs, and (d) mutation in DNA sequence. DNA methylation analysis – The majority (about 90%) of the m5C residues in eukaryotic DNA are found in the dinucleotide sequence CpG. Fortunately, several restriction enzymes include CpG in their recognition sequence. Some of these ―CpG enzymes‖ (and cutting sites) are the recognition sites of several restriction endonucleases such as HpaII (CCGC), MspI (CCGG), HhaI (GCGC), XhoI (CTCGAG), AvaI (CPyC GPuG), SalI (GTCGAC), and SmaI (CCCGGG). Most of these enzymes do not cut the DNA if the CpG sequence is methylated. Thus, these enzymes can be used to probe for methylation. DNA methylation and gene expression – DNA methylation is one mechanism which out of the four bases chemically alters at least two by incorporating methyl groups on them. Adenine gets methylated at 6 th carbon position and cytosine gets methylated at 5th carbon position. Both adenine and cytosine are amino bases. Methylation of either coding or noncoding DNA strand was sufficient to block expression of the hemimethylated chromatin. DNA methylation is known to have important role in gene expression. Methylation of DNA leads to B  Z transformation which affects interaction of regulatory proteins with DNA. There are several examples which show that methylation at specific sites in a gene has regulatory role, particularly the methylation at 5′-end of a gene. DNA methylation and gene regulation – Adding methyl groups to DNA is a way of regulating some genes and genomic sequences. Structural analysis reveals that the enzyme complex that mediates this process shows unexpected sequence specificity. In these regions, CG dinucleotides occur with a periodicity of 8-10 bp. The studies suggest that different patterns of CG periodicity might reveal different functional specificities within the genome. An antisense transcript not naturally occurring but induced by genetic mutation leads to gene silencing and DNA methylation. DNA methylation is an important epigenetic mark for transcriptional gene silencing (TGS) in diverse organisms. Correlation has been found between undermethylation and gene activity. Ethionine (a methionine analog) and 5-azacytidine (an analog of cytosine) are potent inhibitors of DNA that cause undermethylation of cytosine in DNA and gene activity. DNA methylation in plants – Cytosine DNA methylation is important in regulating gene expression and in silencing transposons and other repetitive sequences. Recent genomic studies in A. thaliana have revealed that many endogenous genes are methylated either within their promoters or within their transcribed regions, and that gene methylation is highly correlated with transcription levels. However, plants have different types of methylation controlled by different genetic pathways. DNA methylation in transposable element silencing – DNA methylation is an epigenetic mark associated with transposable element silencing and gene imprinting in flowering plants and mammals. Imprinting in plants evolved from targeted methylation of transposable element insertions near genic regulatory elements followed by positive selection when the resulting expression change was advantageous. DNA methylation of imprinted genes – Genome imprinting found in flowering plants and placental mammals, uses DNA methylation to yield gene expression that is dependent on the parent of origin. DNA tags – DNA tags provide a quick, inexpensive way to identify species. A small segment of DNA is selected from the mitochondria – the same short strand for each species – to use for the identification of animal species. The segment chosen comes from a gene called CO1. It contains only 648 bp of nucleic acids, making for quick reading of its DNA sequence. But the small piece varies enough from creature to creature for the differences to be distinguished from one species from another.

G.10

Essentials of Molecular Genetics

DNA topoisomerase Type I – Type I DNA topoisomerase rectifies the situation behind by making a transient break in one strand of DNA. DNA topoisomerase Type II – Type II DNA topoisomerase relaxes negative supercoiling during transcription by introducing a transient double-stranded break in DNA. DNA topoisomerases – RNA ppolymerase generates positive supercoiling ahead and leaves negative supercoiling behind. DNA topoisomerases are the enzymes that introduce or remove turns from the double helix by transient breakage of one or both polynucleotides. These enzymes are important during transcription. There exist two types of DNA topoisomerases – Type I and Type II. See DNA topoisomerase Type I and DNA topoisomerase Type II. DNA viruses – Those viruses that contain DNA as genetic material. DNA viruses can be further divided into following two classes: (a) those that have their genes in a double-stranded DNA molecule (dsDNA) and (b) those that have their genes in a molecule of single-stranded DNA (ssDNA). Examples of dsDNA viruses are: smallpox, vaccinia, varicella-zoster, adenoviruses, SV40, T2, T4, lambda, herpes viruses, KSHV, cytomegalo-virus (CMV), Epstein-Barr virus (EBV). Well-known examples of ssDNA viruses are X174 and adeno-associated virus (AAV). Domains – Functional regions that form modular architecture in proteins, which is made up of discrete structural regions. Different exons code for the different domains of a protein. Each domain is also found in other proteins. Origin of genes coding for such proteins may have been by exon shuffling. Dosage compensation complex (DCC) – The DCC in C. elegans is recruited to specific regions of the X chromosome and spreads out along the chromosome from these initial binding sites. Multiple discrete recognition elements on the X chromosomes recruit DCCs and initiate their chromosomal spreading. Association of DCCs with both X chromosomes leads to transcriptional repression of genes on those chromosomes. Double-stranded RNA (dsRNA) viruses – The viruses that have double-stranded RNA (dsRNA) as genetic material. Examples of dsRNA viruses are wound tumor (plant) virus and retrovirus (animal virus). Down mutations – Mutations with lost or reduced transcription of the adjacent gene are known as down mutations. Down mutations could be due to deletion of an extragenic part of the promoter. Down mutations can be produced involving only a single base pair.

E Early lambda phage genes – As soon as phage infects, transcription is initiated with the help of two promoters, PR and PL, on the right and left side, respectively. The transcription starts on both DNA strands. One strand is transcribed on the left side and the other on the right side. Early gene cro is transcribed on the right side under the control of promoter P R and N gene is transcribed on the left side under the control of the promoter PL. Also known as the immediate early genes. Early steps in processing 5S rRNA – After transcription, the primary transcript is cleaved into two parts, one bears the genes for the minor rRNA and one or two tRNAs present on the intergenic spacer and other contains the major and the supplemental rRNAs, depending upon whether or not a tRNA occurs on the 3'end. Neither RNase III nor E is involved in initial enzymatic reaction. Editing reactions during translation – Synthesis of proteins containing errors (mistranslation) is prevented by aminoacyl tRNA synthetases through their accurate aminoacylation of cognate tRNAs and their ability to correct occasional errors of aminoacylation by editing reactions. A principle source of mis-translation comes from mistaking glycine or serine for alanine. Effector gene – An effector gene is a gene that produces a regulatory molecule that drives the expression of another gene. Effector molecule – A molecule (a sugar, an amino acid or a nucleotide) that can bind to a regulator protein and thereby change the ability of the regulator molecule to interact with the operator. Effectors in inducible operons are called inducers. Effector plasmid – Effector plasmids carry a gene that expresses a regulatory protein which in turn regulates the expression of another gene carried in a reporter plasmid. Encrypted genes – These genes are found as separate segments around the genome, so that, for example, all building blocks of a given mRNA molecule can be located, as modules, on separate chromosomes. In the

Glossary

G.11

organelles of microbial eukaryotes and in the prokaryotes many examples of so-called encrypted genes are known. Endonucleases – Enzymes that cleave DNA from within. Engineered vaccines – Vaccines prepared by recombinant DNA technology. Enhanceosome – An assembly which instructs recruitment program of chromatin modifiers/remodelers and general transcription factors to the promoter. This program culminates with sliding of a nucleosome blocking the core promoter to a downstream position, a pre-requisite for transcriptional activation. This assembly is, for example, required for transcriptional activation of the IFN-β gene in response to virus infection. The identity of a gene expression program is achieved and maintained by the dynamic interplay between specific enhanceosomes and specific local chromatin structure. Enhancers – A common feature of enhancers is sequence GGTGTGGAAAG. Enhancers have been detected in eukaryotic genes also. Enhancers mostly are cis-acting but trans-acting ones are also known. Enhancers are effective whether lying upstream or downstream from the promoter. They are active whether they lie in same or opposite polarity as the mature gene. They are equally effective regardless of the organism from which the gene is derived when attached to foreign DNA. First enhancer found has 72-bp tandemly repeated sequence located 100 nucleotides upstream of the ancillary site in DNA of SV40. Enhansons – The enhancer elements cooperate with one another or duplicates of themselves to enhance transcription. These elements are bipartite, being composed of subunits called enhansons, which can be duplicated or interchanged to create new enhancer elements. Enhansons differ from the enhancer elements because they are very sensitive to changes in spacing. Enzyme-linked immunosorbent assay (ELISA) – This assay works by using antibodies immobilized on a microtiter plate to capture proteins of interest from samples added to the well. Using a detection antibody conjugated to an enzyme or fluorophore the quantity of bound protein can be accurately measured by fluorometric or colorimetric detection. The detection process is very similar to that of a Western blot, but by avoiding the gel steps more accurate quantification can be achieved. Epigene – Geneticists study the gene; however, for epigeneticists, there is no obvious ‗epigene‘. So epigene is a hypothetical term. Epigenetic changes – The changes that influence the phenotype without altering the genotype. Such changes occur through DNA modification. Numerous types of epigenetic modifications of DNA include methylation of cytosine at position 5, a process that is known to inactivate eukaryotic genes. Epigenetic code – In case of DNA methylation, the enzymes involved actually alter DNA structure, thereby imposing a epigenetic code over and above the genetic code and thus specifically altering the information content of the DNA. Epigenetic gate keepers – Epigenetic silencing genes p16, SFRPs, GATA-4 and GATA-5, and APC in stem/precursor cells of adult cell-renewal systems that may serve to abnormally lock these cells into stemlike states that foster abnormal clonal expansion. Epigenetic gene silencing maintenance – The establishment and maintenance of epigenetic gene silencing is fundamental to cell determination and function. The essential epigenetic systems involved in heritable repression of gene activity are Polycomb group (PcG) proteins and the DNA methylation systems. Epigenetic landscape – A mechanism proposed by C.H. Waddington in which a cell proceeds through development by transversing. The cell begins in a totipotent state and becomes more and more restricted by determinative events. The general properties of this process can be applied to both cytodifferentiation and pattern formation. Extrinsic forces such as hormonal stimuli or induction can influence the decision points, and the decisions will be dependent upon the genetic response of the cell to that stimulus. Waddington referred to this process as canalization. Epigenetic regulation – It involves reversible changes in DNA methylation and/or histone modification patterns. Short interfering RNAs (siRNAs) can direct DNA methylation and heterochromatic histone modifications, causing sequence-specific transcriptional gene silencing. For example, H2B deubiquitination by SUP32/UBP26 is required for heterochromatic histone H3 methylation and DNA methylation. Epigenetic silencing – Embryonic cells silence transcription by retroviruses, but how? They recognize viral DNA. A TRIM28 corepressor complex binds to retrovirus promoter binding site. Epigenetic silencing of retrovirus transcription is accomplished by ―writing‖ a dimethyl mark on lysine 9 of histone H3 that is read by the heterochromatin protein HP1γ.

G.12

Essentials of Molecular Genetics

Epigenetic trait – A trait that is transmitted independently of the DNA sequence itself. Epigenetic variation has the potential to contribute to the natural variation. Epigenome analysis aids in explaining how natural epigenetic variation causes phenotypic differences in plants. Stable inheritance of complex traits such as flowering time has been observed in these epi-recombinant inbred line (epi-RIL) populations, providing important evidence that epigenetic variation can contribute to complex traits. Epigenetics – Gene regulating activity that does not change DNA code and can persist through one or more generations. Epigenetics involves the process by which genotype gives rise to a new phenotype without change in genetic code. Epigenetics aims to describe inheritance of information on the basis of gene expression in contrast to genetics, which aims to describe the inheritance of information on the basis of DNA. Epigenome – The epigenome refers to the complete description of these potentially heritable changes across the genome. The composition of the epigenome within a given cell is a function of genetic determinants, lineage, and environment. Epimutations – Changes in gene activity due to DNA methylation or any such changes should be termed as epimutations to distinguish them from classical mutations. Thus epimutations are heritable changes in gene activity due to DNA modification contrary to gene mutations which are heritable changes due to changes in DNA sequence. Eukaryotic gene – Ten regions: recognition region (~50 kb), transcription initiation site, 5' untranslated region, translation initiation, alternating exon/intron, splice donor and acceptor sites, translation stop site, 3' untranslated region, polyadenylation signal, and transcription stop site. Eukaryotic gene activation – The interactions involving four classes of molecules – sensor site, integrator site, receptor site, and producer gene – regulates gene expression in eukaryotes. Transcription of a producer gene could occur only if at least one of its receptor sites was activated by forming a sequence-specific complex with activator RNA. This RNA (or protein) would be synthesized by integrator gene in response to the signals by the sensor site that are sensitive to external or internal development signals. Eukaryotic ribosomes – Ribosomes of eukaryotes and prokaryotes differ in size and other details. The cytoplasmic ribosomes of eukaryotes are of 80S, size, contain 60 per cent rRNA and 40 per cent protein and dissociate into a smaller 40S subunit and a larger 60S subunit. 60S subunit has 5S, 5.8S and 28S rRNA and 40S subunit has 18S rRNA. Eukaryotic RNA polymerases – Eukaryotic RNA polymerases are characterized by type of RNA they synthesize. Five types of eukaryotic RNA polymerases are well known. See RNA polymerase I, RNA polymerase II, RNA polymerase III, RNA polymerase IV, and RNA polymerase V. Eukaryotic translation elongation – Elongation factor eEF3 is an ATPase that, in addition to the two canonical factors eEF1A and eEF2, serves an essential function in the translation cycle of fungi. eEF3 is required for the binding of the aminoacy-tRNA-eEF1a-GTP ternary complex to the ribosomal A-site and has been suggested to facilitate the clearance of deacyl-tRNA from the E-site. Eukaryotic translation initiation – In eukaryotes, there is existence of at least 11 eukaryotic initiation factors (eIF1, eIF2, eIF3, eIF4A, eIF4B, eIF4C, eIF4 D, eIF4E, eIF4F, eIF5, eIF6). There are two cap binding proteins (CBPI and CBP2)). CBP1 binds to the 5′ cap (m7GpppX) of mRNA and facilitates formation of a complex between the mRNA and the 40S ribosomal subunit. CBP2 performs some yet unknown function. The 5′ cap is required for efficient translation. The eIF1 assists mRNA binding, eIF2 binds met-tRNA. It has 3 subunits - α (binds ATP), β (may be recycling factor) and γ (binds met-tRNAfMet). The eIF3 binds mRNA. The initiation factor eIF4A assists mRNA binding and also binds ATP, eIF4B assists mRNA binding and unwinding, eIF4C binds 60S subunit. Function of eIF4D and eIF4E is unknown. Initiation factor eIF4F mediates the function of cap. Initiation factor eIF4E binds the cap. This step is thought to be regulated by phosphorylation. Initiation factor eIF5 releases eIF2 and eIF3 while eIF6 prevents 40S-60S joining. The eukaryotic initiation factor 4G (eIF4G) is the core of a multicomponent switch controlling gene expression at the level of translation initiation. It interacts with the small ribosomal subunit interacting protein, eIF3, and the eIF4E/cap-mRNA complex in order to load the ribosome onto mRNA during cap-dependent translation. Initiation factor eIF5A seems to have a hand in every step. Mammalian eIF6 is required for efficient initiation of translation, in vivo. Eukaryotic translation termination – In eukaryotic system, only one release factor, eRF, has been found. GTP seems to be necessary for activity of eRF. The released peptidyl-tRNA comes in the cytoplasm.

Glossary

G.13

Exon – Sequences present in split (compound) genes whose complementary sequences are represented in messenger RNA. Exon shuffling – Mixing up of exons during evolution. It can lead to significant genome rearrangement due to meiotic division. Exon shuffling is believed to be responsible for evolution of split genes in nuclear genes. During the splicing stage, different coding sequences (segments) could splice in a large number of different ways, thus generating a large number of different genes. Exonucleases – Those enzymes that cleave DNA at ends. Exosome – Exosomes are 50-90 nm vesicles secreted by a wide range of mammalian cell types. Exosomes constitute a mechanism for selective removal of many plasma membrane proteins. The exosome is a major eukaryotic nuclease located in both the nucleus and the cytoplasm that contributes to the processing, quality control and/or turnover of a large number of cellular RNAs. Expansion of genetic alphabet – A new Watson-Crick base pair with a hydrogen pattern different from that in the A∙T and G∙C base pairs is incorporated into the duplex DNA and RNA by DNA and RNA polymerases, and expands the genetic alphabet from 4 to 6 letters. The genetic code can be extended artificially. Expressed sequence tags (ESTs) – These are PCR-based markers using a pair of primers. An EST is a DNA sequence from a cDNA clone that corresponds to an mRNA or a part thereof. It has been shown that ESTs in most of the genomes are 150-400 bp long and are useful in search of similarity and mapping of genomes. EST databases have proven to be a tremendous resource for finding genes. Expression component – One of the two components of riboswitches. This component regulates gene expression, and, contrary to the other component (aptamer), it can vary greatly in order to affect the different processes of transcription, translation, and RNA processing. Also see aptamer. Expression of immunoglobulin genes – Cells that synthesize antibody molecules, called immunoglobulins, undergo DNA rearrangements during cell differentiation, suggesting that although a genome usually remains constant in different cells, it does undergo a change with regard to immunoglobulin genes in mammals. Expression system – A system specifically designed for the production of a gene product of choice. This is normally a protein although may also be RNA, such as tRNA or a ribozyme. An expression system consists of a gene, normally encoded by DNA, and the molecular machinery required to transcribe the DNA into mRNA and translate the mRNA into protein using the reagents provided. In the broadest sense this includes every living cell but the term is more normally used to refer to expression as a laboratory tool. Extein – The remaining portins of the protein after inteins (protein introns) have been spliced out. Extended anticodon hypothesis – The structure of anticodon loop and the proximal anticodon stem are related to the sequence of anitcodon. In other words, anticodon is extended into the nearby structure and consists of (a) two nucleosides at the 5′-end of the anticodon loop, (b) three nucleosides of anticodon, and (c) two nucleosides at the 3′-end of the anticodon loop (d) and five nucleoside pairs in the anticodon stem. Extended anticodons are involved in translation.

F Feedback inhibition – Feedback inhibition is a cellular control mechanism in which an enzyme that catalyzes the production of a particular substance in the cell is inhibited when that substance has accumulated to a certain level, thereby balancing the amount provided with the amount needed. This type of control is generally found in metabolic pathways. Feedback inhibition in eukaryotic cells involves time delays of gene expression resulting from transcription, transcript splicing and processing, and protein synthesis. In principle, such phenomenon could result in the oscillation of gene expression. Feedback inhibition, also known as endproduct inhibition, is a type of control that operates after the protein synthesis has taken place. Fidelity of transcription – During transcription elongation, the hydrolytic reaction stimiulated by misincorporated nucleotides proofreads most of the misinccorporation events and thus serves as an intrinsic mechanism of transcriptional fidelity. 5S RNA gene family – These genes are present independently of the rDNA, which is localized in nucleolar organizer region (NOR) in eukaryotes. In prokaryotes and yeast, 5S rRNA genes are present in close vicinity of rDNA. The 5S rRNA genes are transcribed by RNA polymerase II. Fold-back elements – Some inverted repeats in DNA may be immediately adjacent or separated by upto several thousand base pairs. During renaturation, as the reaction is intramolecular, the structures formed are called

G.14

Essentials of Molecular Genetics

snap-back (SB) or fold-back (FB) structures. There are 2,000-4,000 pairs of inverted repeats or potential foldback structures in D. melanogaster which comprise about 3 per cent of the total genome. Forward epimutations – De novo methylation leads to forward epimutations Functional alleles – Alleles determined on the bases of complementation test. Functional domains in eukaryotic chromosome – Eukaryotic chromatin is separated into functional domains differentiated by post-translational histone modifications, histone variants and DNA methylation. Methylation is associated with expression of transcriptional initiation in plants and animals, and is frequently found in transposable elements. Functional state of chromatin – Eukaryotic genomes are organized into active (euchromatic) and inactive (heterochromatic) domains. Post-translational modifications of histones are the key to defining these functional states, particularly in promoter regions. Asymmetric di-methylation of histone H3 arginine 2 (H3R2me2a) counter correlates with di- and tri-methylation of H3 lysine 4 (H3K4me2, H3K4me3) on human promoters.

G Galactose permease – Also known as protein “y” which facilitates entry of lactose into the cell. Gel-based assays – The presence of SNP can be detected by RFLP or AFLP conducted on PCR products, when ever such an SNP generates or destroys a specific restriction site for an enzyme. SSCP can also be used for the detection of SNPs. Gene – A DNA sequence coding for a specific polypeptide but split gene DNA sequence must also include introns (noncoding segments) and exons (coding segments that make proteins) and others DNA pieces. Any definition of gene must also include promoters, enhancers, regulator genes, operators, and also segments that code for rRNA, tRNA, and snRNPs. The gene is a sequence in a nucleic acid (DNA, except in RNA viruses) that provides a code for a protein, polypeptide, or RNA of direct value to cellular metabolic processes, plus those parts, adjacent or internal, important to its being located, identified, transcribed, often translated, and processed. Gene annotation – Gene annotation provides functional and other information, for example the location of each gene within a particular chromosome. Gene discoveries – Phase in understanding of gene function when many new different types of genes, viz., repeated genes, moveable genes (transposable genetic elements), pseudogenes, retrogenes, and related phenomenon such as multiple polyadenylation sites, protein trans-splicing, were discovered. These discoveries made the neoclassical concepts of gene structure and gene function obsolete. Gene expression – The process by which information from a gene is used in the synthesis of a functional gene product. These products are often proteins, but in non-protein coding genes such as ribosomal RNA (rRNA), transfer RNA (tRNA) or small nuclear RNA (snRNA) genes, the product is a functional RNA. Gene expression networks – Genes have sometimes been regarded as nodes in a network, with inputs being proteins such as transcription factors, and outputs being the level of gene expression. The node itself performs a function, and the operation of these functions has been interpreted as performing a kind of information processing within cells and determines cellular behavior. Gene function – What does gene do? Answer to this question came by understanding relationship between gene and its product. First crucial step was to understand the genetic control of biochemical reactions. Relationship between genotype and phenotype was then understood. Better picture about gene function emerged with the fine structure analysis of gene by S. Benzer in 1955. Discovery of new types of genes further changed our thinking on gene structure and gene function. Gene fusion – (1) Mutations that cause portions of two genes to fuse together and form a hybrid gene are frequent in blood-related and lung cancers. Gene imprinting – The differential expression of maternal and paternal alleles independently evolved in mammals and in flowering plants. A unique feature of flowering plants is a double-fertilization event in which the sperm fertilizes not only the egg, which forms the embryo, but also the central cell, which develops into the endosperm (an embryo-supporting tissue). The distinctive mechanisms of gene imprinting in the endosperm, which involve DNA methylation and histone methylation, begin in the central cell and sperm prior to fertilization.

Glossary

G.15

Gene organization – Refers to the way genes are arranged in a cell. It may be considered from different points of view. One may consider whether individual genes are independently controlled or a group of them is under the control of one and the same operator, whether the message is contiguous or split, genes are overlapping or non-overlapping, genes are functional or non-functional, genes are repeated or not, whether the genes occupy a fixed position or they shift from one place to another. Present gene organization in different organisms is net result of concerted efforts of genome evolution. Gene regulation at the level of translation – Differential translation of finished mRNA occurs in the cell cytoplasm. For this process to occur, the dormant mRNA must have characteristics, such as structural determinants and subcellular location that permit their selection and timely translational activation. Gene regulation at translational level – For lac operon of E. coli, there is 10:5:2 ratio of lac z, lac y, and lac a gene products. The relative amounts of polycistronic mRNA depend upon the primary structure of mRNA. At the end of each cistron on mRNA there is atleast one chain termination codon followed by an intercistron segment prior to next translation initiation codon. The intercistron nucleotide sequences govern whether or not a ribosome ―out of gear‖ will fall off the message before initiating the next polypeptide chain. Presumably, in lac operon, half of the ribosomes fall off between lac z and lac y and another more than half (60%) fall off between lac y and lac a. Gene regulation by hormonal action – In higher plants and animals, various hormones control gene expression by acting as signals. Peptide hormones, such as insulin and epinephrine, are too large to enter into the cell. Their effect appears to be mediated by receptor proteins located in target cell membranes and by the intracellular levels of cyclic AMP, called second messenger. Gene regulation by nonhistone proteins – The acidic proteins are considered to be activators of gene expression as they can take out histones and thereby open some regions of DNA for transcription. Certain properties of nonhistone proteins which assemble with DNA are: (a) high rate of synthesis and metabolism, (b) high concentrations on the chromatin of metabolically active tissues, (c) localization on those parts of chromatin which demonstrate especially high levels of RNA synthesis, and (d) phosphorylation during the gene activation period. The acidic proteins promote transcription of free DNA and they also determine transcriptional specificity. Gene regulation –Mechanisms that turn on the genes when their product is required by the cell and turn off the genes when their product is not required by the cell. Gene silencing – A method that turns down or switches off the expression of a gene by using mechanisms other than genetic modifications. Broadly classifying, silencing can be brought about at two levels, viz., transcriptional level and at post-transcriptional level. Gene squelching – A gene is squelched when a transcription factor is over expressed and it not only activates all its target genes, but sequesters, by protein-protein interactions, components of the basic apparatus of genes, it does not normally regulate, causing global downregulation. Gene targeting – A technique allowing scientists to block or alter a gene's function. It works by infusing strands of lab-made DNA into stem cells, where they latch on to their target genes. These cells are injected into a mouse embryo, spreading the new gene throughout. Embryonic stem cells are collected from fertilized eggs a few days after they have started to divide. GeneCalling – An mRNA profiling technology finding increased use in the field of genomics and invented by Dr. Jonathan M. Rothberg and developed by CuraGen Corporation. This genomics technique rapidly identifies candidate genes for use in drug discovery and development. Gene-enzyme relationship – First clue to the nature of the primary gene function came from the studies on humans. Garrod in 1909 noted that several hereditary human defects are produced by recessive mutations. Such disorders were attributed to an absence or a defect in an enzyme. Thus, a relationship between genes and enzymes, which are responsible for conducting various biochemical reactions, was suggested. Gene-specific translational silencing – Gene-specific translational (GST) silencing has been found to be a novel function of the fused glutamyl- and prolyl-tRNA synthetase (GluProRS). GluProRS is released from a multisynthetase translation complex in response to γ-interferon and forms a four-protein GAIT complex that silences translation of ceruloplasmin (Cp), a protein linked to the inflammatory response. Genetic code – The processes involved in translating or decoding the information contained in the primary structure of DNA

G.16

Essentials of Molecular Genetics

Genetic code specificity – A particular base modification in a tRNA is important for reading its specific codon for an amino acid for the specificity of its aminoacylation. Genetic material – The substance of which the genes are made. DNA is the genetic material in all the eukaryotes and prokaryotes. Some viruses have RNA as the genetic material. Genetic RNA – RNA, which serves as genetic material. RNA viruses may have single-stranded RNA (ssRNA) or double-stranded RNA (dsRNA) as genetic material. Genome – Sum total of all the genes of an organism. Further classified according to location of the genome – nuclear genome, chloroplast genome, mitochondrial genome. Genome constancy – Gene content of all the cells of an organism is the same. The phenomenon which question the concept of genome constancy in development are: random and programmed changes, gene amplification, chromatin diminution and other instances of DNA loss, and programmed recombination. Genome stability – One of the important functions of RNA-dependent DNA polymerase (RdDM) is to target transposons to protect the genome from their deleterious effects. Transposable elements may carry some features that make them more distinguishable from other sequences and hence more susceptible to the RdDM. Several lines of evidence have pointed out that transposable elements could possess long untranslated regions or specific secondary structures that make them more recognizable to the methylation machinery. Transposons are methylated at the ends as well as at the middle of their sequence. Genomics – Genomics studies DNA isolated from any part of the plant or animal cell, since DNA in different cells do not differ in time and space. Two major branches of genomics are structural genomics and functional genomics. Genotype-phenotype relationship – Beadle and Ephrussi in 1936 and1937 conducted eye disc transplantation experiments in Drosophila melanogaster, which indicated a strong link between phenotype and genotype. Germline theory of antibody diversity – The genome contains thousands of related but unique DNA sequences, each of which specifies one of the V' regions synthesized by an organism. Thus a large percentage of the mammalian genome is devoted to carrying variable sequence families and each person will inherit and transmit all V regions through his or her germline. Each lymphocyte will selectively express only one of these regions (genes) for the V portion of a light chain and only one of the V portion of a heavy chain. Because most lymphocytes will happen to select different combinations of sequences to express, the organism will come to express its vast diversity of antibody producing cells. Global transcription machinery engineering (gTME) – An approach for reprogramming gene transcription to elicit cellular phenotypes important in technological applications. Globin gene family – In adult humans only single genes, each specifying alpha (α) and beta (β) polypeptide chains of hemoglobin are expressed. This is not enough during development, such that a whole family of related α globin genes and another family of β globin genes function. These genes are turned off and on in a coordinated and sequential manner during embryo development till the fetus is born, depending upon the intracellular milieu. Goldberg-Hogness box – Sequences discovered in 1978 by Goldberg and Hogness in promoters of eukaryotic genes. Sequences similar to Pribnow box. Sequence of the Hogness box is T82A97T93A35A43/T27A83 A50/T37. Hogness box is surrounded by G.C-rich sequence. Also called as TATA box. G-protein–coupled receptors (GPCRs) – An oganism‘s body is a fine-tuned system of interactions between billions of cells. Each cell has tiny receptors that enable it to sense its environment, so it can adapt to new situtations. About a thousand genes code for such receptors, for example, for light, flavor, odor, adrenalin (also called epinephrine), histamine, dopamine and serotonin. About half of all medications, including betablockers, anti-histamines and several psychiatric medications, achieve their effect through GPCRs. Guide RNA – This RNA is involved in RNA editing. RNA interference and related RNA silencing phenomena use short antisense guide RNA molecules to repress the expression of target genes.

H Half genes – Analysis of genome sequence of the small hyperthermophilic archaeal parasite Nanoarchaeum equitans has not revealed genes encoding the glutamate, histidine, tryptophan and intiator methionine transfer RNA species. A computational approach was used to search for widely separated genes encoding tRNA

Glossary

G.17

halves that, on the basis of structural prediction, could form intact tRNA molecules. The search revealed nine genes that encode tRNA halves; together they account for missing tRNA genes. Heterochromatic siRNAs (hcRNAs) – These are the 24-nt long small RNAs that are implicated in transcriptional gene silencing which results in silencing of endogenous genes or transgenes through the inactivation of promoter sequences. Heterochromatin – Hetrochromatin, representing the silenced state of transcription, consists largely of transposon-enriched and highly repetitive sequences. Hidden genes – These are short open reading frames, i.e., sequences initiated by the alternative translation initiation codons ACG, AUA, and GUG in the 5′-terminal extra-cistronic region. Hidden transcription – Cryptic unstable transcripts (CUTs) have been recently described as a principal class of RNAPII transcripts in budding yeast. These transcripts are targeted immediately for degradation immediately after synthesis by the action of the Nrd1-exosome-TRAMP complexes. Highly repetitive DNA sequences – Highly repetitive DNA sequences (106 or more) have simple sequence organization consisting of a basic unit of about 10 nucleotide pairs. As these basic units are repeated in tandem, these sequences are also called clustered repeats. These are generally not transcribed and in chromosome apparently arise from sudden replication of pre-existing sequences. And divergence of family members proceeds by nucleotide substitution in individual members and the extent of divergence, more or less, reflects age of a family. Highly repetitive DNAs have been demonstrated to play an important role in the process of speciation. Highly unstable RNAs – The bulk of eukaryotic genomes is transcribed. A class of short, polyadenylated and highly unstable RNAs has been reported. These promoter upstream transcripts (PROMPTs) are produced ~0.5 to 2.5 kilobases upstream of active transcription start sites. PROMPT transcription occurs in both sense and antisense directions with respect to the downstream gene. It requires the presence of the gene promoter and is positively correlated with gene activity. PROMPT transcription is a common characteristic of RNA polymerase II transcribed genes with a possible regulatory potential. Histocompatibility – Tissues can be transplanted between genetically identical animals without concern for immunologic rejection, whereas transplants between genetically non-identical animals are usually rejected with time. Also known as tissue compatibility. Histocompatibility antigens – Those antigens that determine the acceptance or rejection of a tissue graft. These antigens are produced by histocompatibility genes. Histocompatibility genes – The genes that code for histocompatibility antigens. These genes are involved in the production of cell surface antigens that are recognized by the rejection or tolerance of tissue transplants. Histone acetylation/deacetylation – Various acetyl transferases having histone acetyl transferase activity have been discovered. These enzymes control expression of genes and control cell cycle regulation. Deacetylases help in transcriptional silencing. Histone code – This term has been loosely used to describe the role of modifications to enable DNA functions. This term, although useful in defining the need for a specific set of modifications for a given task, is unlikely to truly reflect the presence of a predictable ‗‗code‘‘ in the strictest sense of the word. Histone gene family – There are five types of histones H1, H2a, H2b, H3 and H4. Genes synthesizing these histones are present in multiple copies. Histone modification – Histone modifications are implicated in influencing gene expression. Typical patterns of histone methylations exhibited at promoters, insulators, enhancers, and transcribed regions are identified. The monomethylations of H3K27, H3K9, H4K20, H3K79, and H2BK5 are all linked to gene activation, whereas trimethylations of H3K27, H3K9, and H3K79 are linked to repression. H2A.Z associates with functional regulatory elements, and CCCTC-binding factor (CTCF) marks boundaries of histone methylation domains. Histone modifications and the proteins that bind to nucleosomes revealed a mechanism that might control DNA packaging. Phosphorylation of histones is another mechanism for neutralization of basic proteins. This reaction occurs after the completion of the histone synthesis. It is a reversible reaction. Phosphorylation of lysine and argnine-rich histones precedes an increase in RNA synthesis. Histone shuttling – A mechanism proposed to guide specific proteins to sites of repair. In addition, histone shuttling driven by the poly ADP-ribosylation system seems to be involved in nucleosomal unfolding of chromatin in DNA excision repair.

G.18

Essentials of Molecular Genetics

Histone tail loss – Histone H3, which when clipped off, affects the expression of genes with which the histone is associated. At cellular level, H3-tail clipping could simply clear all repressive marks from chromatin, thereby allowing the binding of transcription-activator complexes to the affected DNA. Histone turnover – Chromatin plays role in processes governed by different time scales. Nucleosomes at promoters are replaced more rapidly than at coding regions and that replacement rate over coding regions correlates with polymerase activity. In addition, rapid histone turnover is found at known chromatin boundary elements. Rapid histone turnover serves to functionally separate chromatin domains and prevent spread of histone states. Histone ubiquitination – Ubiquitination is an enzymatic, protein post-translational modification (PTM) process in which the carboxylic acid of the terminal glycine from the di-glycine motif in the activated ubiquitin forms an amide bond to the epsilon amine of the lysine in the modified protein. Histones as repressors – Histones acts as repressors of gene activity. There are two hypotheses. According to first hypothesis, histones, which are assembled with DNA, prevent RNA polymerase motion during transcription. It is possible that this obstacle is simply mechanical in character and is result of position of histones in minor groove of double helix of DNA, along which RNA polymerase is moving. Second hypothesis suggests that structure of DNA is altered as a result of histone-DNA association and supercoiling of DNA occurs. RNA polymerase is not able to transcribe those regions that are supercoiled. Histones modifications in epigenetic control – Chromosomal regions can adopt stable and heritable alternative states resulting in bistable gene expression without changes to the DNA sequence. Such epigenetic control is often associated with alternative covalent modifications of histones. The stability and heritability of the states are thought to involve positive feedback where modified nucleosomes recruit enzymes that similarly modify nearby nucleosomes. HIV reverse transcriptase – The reverse transcriptase of human immunodeficiency virus (HIV) catalyzes a series of reactions to convert the single-stranded RNA genome of HIV into double-stranded DNA for host-cell integration. This task requires the reverse transcriptase to discriminate a variety of nucleic acid substrates such that the active sites of the enzyme are correctly positioned to support one of three catalytic functions: RNA-directed DNA synthesis, DNA-directed DNA synthesis and DNA-directed RNA hydrolysis. Homology-dependent gene silencing (HDGS) – Genes can be silenced due to sequence homology either between the transgene or the endogenous gene or among the transgenes themselves. It can either be due to cisinactivation or trans-inactivation or co-suppression. Homology-dependent post-transcriptional gene silencing (HDPTGS) – The integration of extra copies of an endogenous gene or multiple copies of a transgene into a plant genome often results in epigenetic silencing of both endogenous and exogenous genes. The silencing mechanism can occur at two levels, either transcriptionally (homology-dependent transcriptional gene silencing which may involve in cis or trans DNA pairing), or post-transcriptionally (homology-dependent post-transcriptional gene silencing or cosuppression, which may involve antisense RNA (degradosomes), but there is no change in DNA sequence. In many cases, epigenetic silencing correlates to hypermethylation of the repetitive sequences. Hormones and transcription – Study of sex differentiation and role of hormones shows that realization of genotypic capability depends upon and can be modified by hormones, while the production of hormones is itself guided by the genetic constitution of the individual. Hormones in gene regulatory sequence – The hormone (H) attaches to a specific cytoplasmic receptor (R) which is then further processed into a form (R′) enabling the H-R′ complex to cross the nuclear membrane. In the nucleus, the H-R′ segment is bound to a nonhistone protein acceptor to which phosphate groups are then added by specific phosphorylating kinase enzymes. The negatively-charged phosphorylated nonhistone protein can combine with the positively charged basic histones to cause removal of histones from DNA, and thereby allow transcription to proceed. Some phosphorylated nonhistones may also act as sigma factor components of RNA polymerase, enabling the enzyme to recognize promoters and transcribe specific DNA sections. Host restriction and modification – The phenomenon of restriction and modification was originally observed during the lytic phage infection of bacteria. When a strain of bacteriophage  infects E. coli K .K) phage infection and replication were normal, leading to lysis of the cell and release of progeny phage. By contrast, when bacteriophage .B (carrying the B modification) or .O (lacking modification) infects E. coli K, the

Glossary

G.19

host's restriction and modification system recognizes phage DNA as foreign and cleaves it before it has a chance to replicate. This modification protects the bacterial DNA from its restriction enzymes. Host specificity sites – These are those sites in host DNA where recognition and methylation takes place. The cleavage may take place elsewhere. Thus restriction enzyme recognizes two sites on DNA: the recognition site (host specificity site) and the cleavage site. The sK and sB are the recognition sites of EcoK and EcoB strains, respectively. These recognition sites are very similar structurally: sB 5' T-G-A*-N-N-N-N-N-N-N-NT-G-C-T 3'/3' A-C-T-N-N-N-N-N-N-N-N-A*-C-G-A 5'; sK 5' A-A*-C-N-N-N-N-N-N-G-T-G-C 3'/3' T-TG-N-N-N-N-N-N-C-A*-C-G 5'. Adenines that may be methylated are marked with asterisks (*). Housekeeping genes – Those genes that code for proteins involved in basic metabolic processes common to all cell types and as such probably constitute the major class of mammalian genes. Many lack a TATA box, and presumably, as a consequence of this, have multiple initiation sites. They all have CG-rich regions with multiple binding sites. Further, promoters of housekeeping genes are characterized by their ability to support bidirectional transcription. The housekeeping gene promoter has a simple array of control elements. Hpa tiny fragment (HTF) islands – HTF islands are methylated CpG-rich regions in the genome. Eighty per cent of these regions occur at or near genes, particularly the housekeeping genes. These HTF islands are not methylated on active X-chromosomes. hsd locus – The hsd locus of E. coli K includes restriction and modification genes, hsdR, hsdM and hsdS which code for hsdR, hsdM and hsdS subunits. hsd system – This host restriction and modification system confers specificity by methylating adenine. There are counter parts to this system in many bacteria. The specificity is characteristic of each system. Human leukocyte antigen (HLA) complex – A complex comprising of genes that are responsible for synthesis of antigens that determine acceptance or rejection of a tissue graft. HLA system is based on three classes of HLA genes on chromosome number 6 which determine immunological acceptability. Each gene may possess 8 to 40 alleles, each allele specifying a particular antigen. A particular number 6 chromosome may have one of 75,000 theoretically possible combinations of HLA alleles; each particular combination is called a haplotype. Hut operon of Salmonella typhimurium – In S. typhimurium, histidine is converted into glutamate and formate by four enzymes histidase, urocanase, imidazolone propionate hydrolylase, and farminino glutamate hydrolylase. Hut operon controls the synthesis of these enzymes. This operon carries two operators and two promoters. Hut repressor mediates the negative control. Only when Hut repressor is prevented from binding to respective operators can the signal of either C or N starvation be transmitted. Hybrid arrested translation – Hybrid arrested translation is based on the fact that an mRNA will not direct the synthesis of a protein in a cell-free system when it is in a hybrid from with its complementary DNA. Hybrid released translation – Hybrid released translation (HRT) enables a cloned DNA to be correlated with the protein(s) which it encodes. HRT is a direct method in which cloned DNA is bound to a cellulose nitrate filter and hybridized with an unfractionated preparation of mRNA or even total cellular RNA. The filter is washed and hybridized mRNA is eluted by heating in low salt buffer. Recovered mRNA is then translated in a cellfree translation system. Hybridoma – The term hybridoma is applied to fused cells resulting due to fusion of cells from two different sources. The role of the myeloma cell in this process is to provide the normal B lymphocytes with the cancerous property of uncontrolled growth, thereby allowing hybridomas to grow easily both in tissue culture and after injection into mice. Normal non-cancerous B-cell would quickly die in either situation. Hydroxyproline-based DNA mimics – A type of DNA mimic that has improved hybridization ability and are efficient gene silencers.

I Identical multigene families – Different members of the family are multiple copies of the same gene. Immunoglobin allotype – Refers to allele of the antibody chains found in the individual. Immunoglobulin class-switch recombination (CSR) – Activation induced cytidine deaminase (AID) is required for the DNA cleavage step in immunoglobulin class-switch recombination. AID is proposed to deaminate cytosine to generate uracil (U) in either mRNA or DNA. DNA cleavage depends on uracil DNA glycosylase

G.20

Essentials of Molecular Genetics

(UNG) for removal of U. UNG is involved in repair step of CSR yet by an unknown mechanism. Uracil removal introduces immunoglobulin class-switch recombination. Immunoglobulin genes – If convention that one polypeptide is coded for by one gene is accepted, each of the final DNA segment coding a complete heavy or light chain should be considered as a single gene. Various alternative sequences within each set (V, J, D or C) should according to this definition be termed simply as DNA sequences since they are all parts of a single gene coding for one polypeptide chain. With respect to immunoglobulin genes, the genome of somatic cells is not constant. They constitute a multigene system which comprises of three unlinked multigene families – λ, k, and H. Immunoglobulin heavy chain genes – These genes code for light chains of immunoglobulins. There are five types of heavy chains of constant region (CH) known: µ, δ, γ, ε and α which define five types of immunoglobulins, IgM, IgD, IgG, IgE and IgA, respectively. Each of the five heavy chain classes may be associated with either k or λ light chains. L (leader), V (variable), J (joining), D (diversity) and C (constant) sequences are present in heavy chain genes. Immunoglobulin light chain genes – These genes code for light chains of Immunoglobulins. Only two types of known are k and λ, each differing in amino acid sequence of constant light (C L) segment and includes light chains (L) with many different variable light chain (V L) sequences. L (leader), V (variable), J (joining) and C (constant) sequences are present in light chain genes. Immunoglobulins (Ig) – The antigenically-related proteins that are antigen-binding and are synthesized by Blymphocytes. Immunopurification – The process that involves separation of a specific antigen from a mixture of very similar antigens. This purified antigen can then be used for developing vaccine against a pathogen. Importin – Receptor on the pore to which signal sequence binds. In vitro recombination – Another name for recombinant DNA technology. Inducer – In inducible control, substrate (effector) is known as inducer. Inducible control – Substrate, also known as inducer, acts to induce the production of the enzymes. Repression occurs in the absence of substrate. Informational gene family – It has individual members that can differ markedly in sequence from one another although all are homologous and obviously show an ancient ancestry. Initiation codon – First codon in translation (usually AUG) is the initiation codon. GUG acts as initiation codon (although it has been assigned for valine) in vitro but with lesser efficiency. Insertional RNA editing – It involves editing of kinetoplastid mitochondrial mRNA. It involves base pairing to specific guide RNAs with poly(U) tails. Mechanism involves identification of editing sites in the mRNA by mismatches with the guide RNA, cleavage of the mRNA and religation of mRNA halves with added or deleted U residues by TUTase. Integrator gene – In eukaryotic gene regulation, in response to activation of a sensor site, integrator genes transcribe activator RNA molecules. Integrator gene is comparable to prokaryotic regulator gene and is responsible for synthesis of an activator RNA molecule that may or may not give rise to a regulatory protein before it activates the receptor site. Intein – An intein is a segment of a protein that is able to excise itself and rejoin the remaining portions (the exteins) with a peptide bond. Inteins have also been called protein introns. Intergenic spacers (IGS) – (1) The genomic regions that are not directly involved in the final product and are found between actual coding sectors of two adjacent genes. These spacers vary in length from a single base pair to many thousands base pairs. These are frequently referred to as flanking regions. Certain parts of these sequences exhibit specialization of function. The sector of the noncoding sequence preceding 5'-end of gene proper (mature gene, antisense strand) is called leader while that following the 3'-end is the train. (2) Each member of repeat unit in a ribosomal gene family has a coding region with genes that specify 18S, 5.8S and 28S rRNA molecules. This spacer region is called intergenic spacer (IGS). The variation in number and size of the subrepeats in IGS is responsible for variation in length of the repeat unit (IGS + coding region). Intergenic transcription – Transcription by RNA polymerase II in budding yeast and in humans is widespread, even in genomic regions that do not encode proteins. The purpose of such intergenic transcription is largely unknown, although it can be regulatory.

Glossary

G.21

Internal transcribed sequences – Each member of repeat unit in a ribosomal gene family has a coding region with genes that specify 18S, 5.8S and 28S rRNA molecules, a spacer region called intergenic spacers (IGS) and internal transcribed sequences (ITS). Intron-exon junctions – Consensus sequences at the boundaries of intron-exon junctions have shown that GT was always found at the 5′-side of the intron and AG at 3′-side. This is known as Chambon’s rule. Introns – Sequences present in split (compound) genes whose complementary sequences are absent in messenger RNA. These sequences of noncoding DNA interrupt the coding sequence of genes are called introns which excised from gene transcripts during RNA processing. Intron encoded proteins act as helpers in RNA splicing. Invariant DNA code – Variant code means that a mutation in the genetic code would place new amino acids in certain loci and entirely eliminate amino acids from the other loci of practically all proteins in an organism. But this actually is not the case as DNA code is invariant. Ionome – The ionome is defined as the mineral nutrient and trace element composition of an organism and represents the inorganic component of cellular and organismal systems. Ionomics – Ionomics is the study of elemental accumulation in living systems using high-throughput elemental profiling. This approach has been applied extensively in plants for forward and reverse genetics, screening diversity panels, and modeling of physiological states.

K Kinetoplast DNA maxicircle – Maxicircle in trypanosome kinetoplast DNA has 23 to 36 kbp. Maxicircle molecules are homologs of informational mitochondrial DNA molecules in animals and fungal cells and possess various genes. Several of maxicircle genes represent cryptogenes, i.e., incomplete genes whose transcripts are edited to yield translatable sequences. Kinetoplast DNA minicircle – Minicircles in trypanosome kinetoplast DNA has 465 to 2,500 bp. No obvious protein-encoding genes are known to be present in minicircles. Knockout’ technique – O. Smithies in 1985 studied gene function by knocking off genes in mice. Knockout mice are genetically modified mice that have one or more genes silenced. Knockout mice are used to recreate human diseases in mice and to study the effect of individual genes on an organism's development. Kozak’s scanning hypothesis – The first initiation codon AUG in 90 per cent of the cases occurs in the form of consensus sequence PuNNAUGG. This is known as Kozak‘s sequence. In 5 per cent cases, one or more AUG codons occur upstream to first AUG codon. These extra AUG codons are also read according to Kozak‘s hypothesis.

L Lactose operon in E. coli – The lactose operon in E. coli consists of promoter (p), operator (o) and three structural genes lac z, lac y and lac a. Gene lac z codes for enzyme β-galactosidase, lac y codes for enzyme βgalactoside permease whereas lac a codes for enzyme thiogalactoside transacetylase; polypeptide size of these three enzymes is β-galactosidase, 1021 amino acids; permease, 275 amino acids; transacetylase; 275 amino acids, respectively. Lambda bacteriophage lytic cascade – Among the most elegant examples of gene regulation known so far are the many precisely coordinated interactions between λ genes. When sensitive E. coli strains are infected by the temperate λ virus, a choice exists between two alternative pathways: lysis or lysogeny. If the lytic pathway is followed, approximately 100 viral particles are produced within one hour by an infected bacterial cell, and the cell is destroyed. On the other hand, the lysogenic pathway leads to the integration of the λ chromosome into the bacterial chromosome, and the cell can now continue to live and replicate, immune from lysis by λ until induction occurs. The choice of which pathway to follow is dependent on factors in both host and virus, including the nutritive state of the host and the number of λ viruses infecting a cell. Lysogeny is increased both in starved cells and when there is a high multiplicity of infection. Large-scale gene function analysis – Microarray transcript profiling and RNA interference are two technologies crucial for large-scale gene function studies in multicellular eukaryotes. Both rely on sequence-specific

G.22

Essentials of Molecular Genetics

hybridization between complementary nucleic acid strands. For this, gene-specific sequence tags (GSTs) are created representing almost all the genes. GSTs resources provide novel and powerful tools for functional genomics. Late lamba phage genes – Late genes for lysis, S and R, are located on the right end while those for tail and head (A to J) are located on the left end. Genes for recombination are also found on the left side of cIII and are transcribed as late genes. Soon after the infection, the two ends of the DNA molecule join to form a ring, so that all late genes are arranged in a single group containing S-R genes from the right end and the head and the tail genes A to J from the left end. There is a promoter P R′ between genes Q and S. In the absence of the product pQ of gene Q, the transcription from PR′ is constitutive but terminates at tR3 lying close to PR′ giving a product 6S RNA (194 bases long). However, when pQ is present, it suppresses tR3 and 6S RNA is extended with the result that late genes are heavily expressed. Thus, gene Q induces transcription of late genes, which continue to be transcribed. Transcription from both sides stops on the ring molecule before the RNA polymerase molecules from two sides could clash. Later steps in processing 5S rRNA – The 10S RNA contains 6 stem-and-loop structures generated by RNase III. Then by action of RNase P, fragment 7S RNA is generated which contains 5S rRNA and a termination stem. Removal of this stem is accomplished by RNase E. Leader peptides – In bacteria, proteins destined to be secreted are synthesized as pre-proteins with N-terminal signal sequences which are short (25 residues) and comprise of a hydrophobic central core which can adopt an α-helical structure, flanked by region containing several charged residues. Leader sequence – Sector of the noncoding sequence preceding 5'-end of gene proper (mature gene, antisense strand) Leucine zippers – It is a family of eukaryotic transcriptional factors that are rich in leucine. It includes C/EBP, Fos, Jun and GCN4. Levels of gene expression – Gene regulatory mechanisms operate at various different levels. Important levels of gene control operate at DNA, transcriptional, post-transcriptional, RNA transport, translational, and posttranslational levels. Levels of gene regulation in eukaryotic cells – Various potential control points recognized in eukaryotes are: regulation at the level of gene structure, initiation of transcription, processing of transcript (including alternative RNA splicing and RNA editing), mRNA transport from the nucleus to cytoplasm, and mRNA stability, time and frequency of translation in the cytoplasm, post-translational control, and protein targeting. Live vaccines – Large DNA viruses such as Vaccinia virus, are used as a biological delivery causing agents. Vaccinia virus has a linear double-stranded DNA genome of approximately 1,85,000 bp and replicates in the cytoplasm of infected cells. Localized protein secretion – The signal sequences of two surface proteins, M (PrtM) and F (PrtF), direct secretion to different subcellular regions. The signal sequence of protein M promotes secretion of the division septum, whereas that of F preferentially promotes secretion at the old pole. A signal sequence may contain information that directs the secretion of a protein to one subcellular region, in addition to its classical role in promoting secretion. Long noncoding (lnc) RNA – Many of the noncoding RNAs synthesized are longer than 200 nucleotides. These are called long noncoding (lnc) RNAs. Certain regulatory functions like controlling genome dynamics, cell biology and developmental programming have been associated with them. Long nuclear-retained noncoding RNAs (lnr ncRNA) – One of the ncRNAs. One of the roles assigned to lnr ncRNA Malat1 is to regulate synapse formation by modulating the expression of genes involved in synapse formation and/or maintenance. Long-term (irreversible) gene regulation – This type of gene regulation is associated with determination, differentiation, or more generally development. Short-term regulation is a feature of both developing and fully differentiated eukaryotic cells, i.e., even when a cell is undergoing differentiation, it responds to changing environment. During differentiation certain features of a short-term regulatory process may change dramatically. Lysis – When a virulent bacteriophage, such as lambda (λ), T2, T4, SPO, infects a host bacterium, it entirely depends on the host for its replication. It subverts the host‘s function and utilizes host machinery for producing a large number of phage particles. The bacterium cell undergoes lysis and dies to liberate the

Glossary

G.23

phage particles, which are each then ready to start another cycle by infecting new bacterial cells. This cycle is known as lytic cycle. Lysogeny – There are some phages like temperate phages which have a dual existence such that they may either perpetuate through a lytic cycle as above or may take the form of a prophage by integration of their DNA with the DNA of bacterial cell.

M Major splicing pathway – The major splicing pathway removes U2-type introns allowing the adjacent exons to be spliced together in the nucleus before export of the fully spliced transcripts to the cytoplasm. The transcripts that also contain U12-type introns are exported as partially spliced transcripts and their exons are spliced together by the U12 pathway in the cytoplasm. The mechanism by which they evade RNA surveillance in the nucleus is unknown. Although U2-type introns are found globally, U12-type introns may occur preferentially in particular sets of genes such as those involved in cellular proliferation. Mammalian RNA polymerase II gene promoters – These promoters are typically organized in the following order: upstream sequence motif(s)/TATA box/initiation site. Polarity of transcription is primarily determined by the linear order of an upstream sequence relative to a TATA box, rather than by the individual orientations of either of these two elements. Many genes-one polypeptide hypothesis – The immunoglobulin genes, which can be called assembled genes, do not fit any classical or neoclassical definition of the gene, since the genetic unit in the germline and in the mature immune cell is completely different. Functional genes of immunoglobulins mature by means of somatic recombination from a few units in the germline during the maturation of immune cells. The somatic recombination in immunoglobin genes led to many genes-one polypeptide hypothesis. Meiome – Meiome is the term used in functional genomics for meiotic transcriptome. It is the way to study transcript profiling using mRNA isolated from enriched meiotic cells (meiocytes). Merodiploids – Partial diploid genotypes. Also known as merozygotes. Messenger RNA (mRNA) – Messenger RNA serves as a template on which a polypeptide is constructed and contains an initiation codon (AUG or GUG), at least one of the termination codons (UAA/UGA/UAG) and base sequence in form of triplet codons that dictate the order of amino acid in a polypeptide chain. mRNA also includes certain trailer and leader sequences that are not translated. Messenger RNA fraction is heterogeneous in size, ranging from 500 to 6,000 nucleotides in E. coli. Protein encoding regions of mRNA are composed of contiguous, non-overlapping string of codons called open reading frames (ORFs), each of which specifies single protein. Translation starts at 5′-end of open reading frame and proceeds one codon at a time to 3′-end. First and last codons of ORF are called start and stop codons, respectively. Messenger RNA decay – Messenger RNA (mRNA) is recognized as a major control point in regulation of gene expression. After messenger RNA has been used as a template for translation, it is protected till needed by the cell or degraded by the cellular machiney when not required. mRNA has noncoding nucleotides at either end of the molecule. These segments contain information about the number of times mRNA is transcribed before being destroyed by ribonucleases. Hormones stabilize certain mRNA transcripts. Metabolic engineering – It is improvement of cellular activities by manipulation of enzymatic transport and regulatory functions of the cell with the use of recombinant DNA technology. Introduction of heterologous genes and regulatory elements made metabolic engineering very fascinating area of research. Cell function can be modified using targeted alterations in normal cellular activities. This may involve not only synthesis of a metabolite, but may sometimes also involve manipulation of processing pathways. Metabolic engineering has been successfully used for improving the nutritional value of crops like rice and maize. Metabolites – Intermediate products of metabolism are called metabolites. These are small molecules that exhibit enormous variation. A metabolite is usually defined as any molecule less than 1 kDa in size. In plant-based metabolomics, it is common to refer to "primary" and "secondary" metabolites. Metabolome – Sum total of all the metabolites in the cell or tissue at different developmental stages, and under different conditions. Metabolites are unique chemical fingerprints that specific cellular processes leave behind. Metabolome analysis – Metabolome anasysis involves (i) alignment of chromatographic and spectroscopic data using specific bioinformatics tools, and (ii) identification of different metabolites.

G.24

Essentials of Molecular Genetics

Metabolomics – The scientific study of chemical processes involving metabolites. It places a greater emphasis on metabolic profiling at a cellular or organ level and is primarily concerned with normal endogenous metabolism. Sometimes terms metabolomics and metabonomics are synonymously used. Metastatic genes – Genes which specifically promote dissemination of its cancerous cells to other organs. Methyl/phospho switch – Histone methylation and phosphorylation is known as ‗methyl/phospho‘ switch. The generation of H3K9me3S10ph depended on Suv39h and Aurora B, and occurred at pericentric heterochromatin during mitosis in different eukaryotes. Most PH1 typically dissociated from chromosomes during mitosis, but if phosphorylation of H3 serine 10 was inhibited, HP1 remained chromosome-bound throughout mitosis. H3 phosphorylation by Aurora B is therefore part of a ‗methyl/phospho‘ switch mechanism that displaced HP1 and perhaps other proteins from mitotic heterochromatin. Methylase reaction – Though methylase and cleavage reactions are radically different from each other, the first step of both reactions seems to be the same. EcoK enzyme and AdoMet produce EcoK-AdoMet complex through non-covalent binding. EcoK enzyme has a minimum of 2-3 AdoMet binding sites. EcoK-AdoMet complex undergoes a transition into an activated form (EcoK*). This is the rate-limiting step of the enzyme. Then EcoK* interacts with any DNA, regardless of presence of sK sites. Initial complexes are unstable and can be dissociated by exposure to heparin, a polyanion that mimics DNA. EcoK* then specifically binds to sK sites, regardless of their methylated state. It is the EcoK* and sK site interaction that determines whether EcoK* undergoes transition to restriction or modification mode. Methylated patterns – In higher organisms, 5-methylcytosine (m5C) is the only modified base yet found in DNA. Methylation occurs through the enzymatic reaction. This is a post-synthetic chemical modification of a base in DNA. These DNA modification mechanisms provide a molecular basis for inheritance of a particular pattern of gene activities. A particular pattern of unmodified or modified cytosines could be inherited. Maintenance methylase recognizes hemimethylated (half-methylated) DNA formed after replication and that methylates the nascent strands. Such an enzyme does not act on sequences that contained non-methylated cytosine. Cytosine methylation occurs in CpG doublets. Methylated sites – There are three possible forms of methylated sites: (1) Homoduplex modified DNA has a methyl group on each strand. (2) Heteroduplex DNA carries a methyl group on only one strand. There are two variants for heteroduplex DNA. (3) Homoduplex unmodified DNA has no methyl groups at all. Methylated state of DNA determines whether DNA will be substrate for restriction or modification. Unmodified DNA is susceptible to endonucleolytic action but is methylated very slowly in presence of AdoMet and EcoK. Heteroduplex DNA is rapidly methylated but undergoes neither single- nor doublestranded cleavage. Modified DNA is neither methylated nor cleaved. Methylation specificity – Most DNA methylation is maintenance-type methylation following DNA replication. The enzyme is not sequence-specific. De novo methylation occurs rarely. Here non-methylated DNA is the substrate. It is not known whether or not de novo DNA methylase is sequence-specific. CG doublets do not by themselves provide specificity. Microarrays – Nucleic acid arrays are based on hybridization of sample (labeled RNA or DNA) in solution to immobilized DNA fragments on a solid surface. The arrayed DNA fragments often come from cDNA, genomic DNA or plasmid libraries. Usually an array is designed based on specific sequence information, a process sometimes referred to as downloading the genome onto a chip. There are several variations in this basic technical theme, which as hybridization reaction may be driven by an electric field, other detection methods besides fluorescence can be used and the surface may be made of materials other than glass, such as plastic, silicon, gold, gel or membrane or even comprised of beads at the ends of fiber optic bundles. A few characteristic features of microarrays are: (1) Parallelism — this method allows parallel acquisition and analysis of massive data in a single reaction. (2) Miniaturization — this step leads to less consumption of DNA probes and reagents. (3) Multiplexing — multiple samples can be analyzed in a single assay. Labeling with multicolor fluorochrome comparison of multiple samples can be made on a single DNA chip. This removes chip to chip variation and discrepancies in reaction conditions. (4) Automation — advanced manufacturing techniques permits the mass production of DNA chips. The major drawback of microarrays is its cost and requirement for a specialized arraying robot and scanner. Also, the arrays cannot be reused, which further increases the cost. Also known as DNA/RNA chip technology. MicroRNAs – MicroRNAs (miRNAs) are noncoding RNAs that have emerged as key post-transcriptional regulators of gene expression, involved in diverse physiological and pathological processes. The precursor of

Glossary

G.25

a microRNA (pri-miRNA) is transcribed in the nucleus. It forms a stem-loop structure that is processed to form another precursor (pre-miRNA) before being exported to the cytoplasm. Further processing by the Dicer protein creates the mature miRNA, one strand of which is incorporated into the RNA-induced silencing complex (RISC). Base pairing between the miRNA and its target directs RISC to either destroy the mRNA or impede its translation into protein. Microsatellites – Tandem repeats of DNA sequences of only a few base pairs (1-6 bp) in length, the most abundant being the dinucleotide repeats. The term microsatellite was introduced to characterize the simple sequence stretches amplified by polymerase chain reaction (PCR). These are also known as short tandem repeats (STRs) or simple sequence repeats (SSRs) and these differ from minisatellites (often called variablenumber-of-tandem-repeats (VNTRs), which are repeated sequences having repeat units ranging from 11 to 60 bp in length. Middle lamba phage genes – cII, cIII and Q are the middle genes that exert control on lysogeny and lytic cycle. cII (on the left side) and cIII (on the right side) are regulatory genes regulate synthesis of repressor by cl, to allow phage enter lysogeny. On the other hand, gene Q (on the right side) is a regulator which acts as an antiterminator and allows the transcription of late genes meant for lysis. The pathways – lytic and lysogeny – are intimately related that it is difficult to predict which pathway will be followed and how the decision for alternative pathways is actually taken. Also known as delayed early genes. Middle repetitive DNA – See moderately repetitive DNA. Minisatellites – First reported by A.J. Jeffreys and coworkers in 1985, through their utility through PCR was suggested later. The microsatellites are more evenly dispersed in the genome than minisatellites, which are generally confined to telomeres, e.g., dinucleotide SSRs (CA) n occur in human genome, as many as 50,000 times, with n ranging from 10 to 60. The tri- and tetra-nucleotide repeats are also common in human genome. Minor splicing pathway – The minor spliceosome is ribonucleoprotein complex that catalyzes the removal of an atypical class of spliceosomal introns (U12-type) from eukaryotic messenger RNAs. First identified and characterized in animals, where it was found to contain several unique RNA constituents that share structural similarity with and seem to be functionally analogous to the small nuclear RNAs (snRNAs) contained in the major spliceosome. Subsequently, minor spliceosomal components and U12-type introns have been found in plants but not in fungi. –1 and –3 rule – In the mature protein, amino acids with small residual groups are often found adjacent to the cleavage site (the –1 position) and the next but one upstream residue (the –3 position). miRNA processing RNAi pathway – The miRNAs are expressed in the nucleus as parts of long primary miRNA transcripts that contain multiple miRNAs. The hairpin structure that forms around miRNA sequence of primiRNA acts as a signal for digestion by a ribonuclease Drosha to produce miRNA. Exportin-5 mediates nuclear export of pre-miRNAs. Then the cytoplasmic dsRNA nuclease (Dicer) cleaves the pre-miRNA leaving 1-4 nucleotide 3′ overhangs. Pre-miRNA then associates with RISC which is then activated by cleavage of one of the two strands. Then the activated complex represses protein translation by binding to the sequences in the 3′-UTR of specific mRNAs. Misfolded proteins – After insertion into endoplasmic reticulum (ER), proteins that fail to fold there are destroyed. Through a process termed dislocation such misfolded proteins arrive in the cytosol, where ubiquitination, deglycosylation and finally proteasomal proteolysis dispense with the unwanted polypeptides. Mitochondrial DNA transcription – The basal machinery for initiation of mitochondrial DNA transcription has been molecularly defined. Mitochondrial gene introns – Introns of split genes in fungal mitochondria are classified into two groups on the basis of their internal organization. Group I introns, are found in majority of the fungal mitochondrial split genes, do not carry any conserved sequence at intron-exon junctions, but carry internally a short conserved sequence called 'internal guide sequence'. Group II introns resemble nuclear genes and have consensus sequences (GT and APy) and a branch sequence that resembles the TACTAAC box. Moderately repetitive DNA sequences – Moderately repetitive DNA sequences (having 10 3-105 copies) are present mostly interspersed among unique sequences throughout the genome and are, therefore, also referred to as dispersed repeats. They have been proposed to play roles in the regulation of transcription, production of evolutionary novelty, the encoding of regions within large RNAs which allow them to function as regulatory activators, or the regulation of specific processing of large RNA transcripts and the integration of the chromosome structure or recognition. In addition to these, some middle repetitive DNA sequences,

G.26

Essentials of Molecular Genetics

known as nomadic sequences, are able to migrate from one position to another in the genome to regulate the expression of other genes. Examples of these include rRNA and tRNA genes and storage protein genes in plants such as corn. Also called middle repetitive DNA. Modified central dogma of molecular biology – Crick's concept of central dogma was modified with the emerging knowledge that in some plant viruses RNA is the genetic material where genetic information flows from RNA to DNA with the help of an enzyme RNA-dependent DNA polymerase, also known as reverse transcriptase. Another modification in Crick‘s central dogma was that RNA can self replicate. Protein cannot self-replicate, nor can it use amino acid sequence information to reconstruct RNA or DNA. However, under laboratory conditions DNA can directly be used as a template for translation. Molecular beacon assay – In this assay, an oligonucleotide probe (molecular beacon) is used, which is made of the target SNP sequence, with its two ends being complementary to each other. The two ends of this molecular beacon are labeled, i.e., 5′-end with reporter and 3′-end with quencher. The probe when fails to form a duplex with the template DNA due to presence of SNP, leads to the formation of hairpin structure due to self annealing of its two ends, thus quenching the reporter. But when the probe, anneals with the template, it gets linearized, thus separating the reporter from quencher and permitting fluorescence signal (as digestion will occur in PCR). These fluorescent signals can be detected by appropriate sensing devices. Molecular chaperone – A molecular chaperone is a protein that is needed for the assembly or proper folding of some other protein but which is not itself a component of the target complex. The molecular chaperones provide a structure on which proper folding of proteins takes place. The chaperones do not appear to provide the three-dimensional structure of the proteins they help, but rather the chaperones seem to bind to a protein in its early stages of folding and prevent unproductive folding. Thus molecular chaperones allow proteins to find a functional, stable state by allowing the proteins opportunities to fold into a thermodynamically stable and functional configuration. Each cycle of refolding requires ATP energy. Well-studied class of chaperones is known as chaperonins. Molecular markers – Consist of special molecules, which show easily detectable differences among different strains of species, or among different species. Molecular markers include biochemical constituents like secondary metabolites in plants and macromolecules like proteins and DNA. Molecular zippers in gene regulation – Molecular zippers are a class of transcriptional regulator proteins that binds to DNA at dyad-symmetric sites through a motif consisting of (a) a ―leucine zipper‖ sequence that associates into non-covalent, parallel, -helical dimers and (b) a covalently connected basic region necessary for binding DNA. Many gene regulating proteins ―zip‖ together into pairs. This linkage is critical to their ability to bind to DNA. The ―teeth‖ that join the molecules almost always consist of amino acid leucine. Monoallelic gene expression – Gene expression is termed ―monoallelic‖ when only one copy of the two copies of a gene is active while the other is not. Monoallelic gene expression is frequently initiated in the development of an organism and is stably maintained thereafter. Mammalian X inactivation, imprinting, and allelic exclusion are classic examples of monoallelic gene expression. Monoclonal antibody – The homogenous antibody derived from a single B-cell clone and therefore all bearing identical antigen binding sites and isotype. Moveable genes – Movable genes are DNA elements that can move from one location to another in the genome of an organism. These genes were first discovered in maize by Barbara McClintock in 1950s. The existence of movable genes shows that the hypothesis of a fixed location of the gene in the chromosome. mRNA helicase – Most mRNAs contain secondary structure, yet their codons must be in single-stranded form to be translated. Ribosome itself is an mRNA helicase. The mRNA helicase activity is localized to the middle of the downstream tunnel, between the head and shoulder of the 30S subunit. mRNA reading frame maintenance – The triplet-based genetic code requires that translating ribosomes maintain the reading frame of a messenger RNA faithfully to ensure correct protein synthesis. However, in programmed –1 ribosomal frameshifting, a specific subversion of frame maintenance takes place, wherein the ribosome is forced to shift one nucleotide backwards into an overlapping reading frame and to translate an entirely new sequence of amino acids. Multigene family – A group of nucleotide sequences or genes that exhibits four properties, namely, multiplicity, close linkage, sequence homology, and related or overlapping phenotypic functions. Multigene families may comprise of divergent or identical genes.

Glossary

G.27

Multiple polyadenylation sites – During the RNA processing in which the primary transcript ripens to form messenger RNA, about 200 adenosine nucleotides are added in a polyadenylation reaction at the 3′-end. These are not coded by the corresponding gene. In certain cases, there are multiple alternative polyadenylation sites in the primary transcript. This was first observed in adenoviruses. Multiple sigma factors in E. coli – For a long time, single sigma factor ζ70 or ζD was thought to exist in E. coli. However, in recent years new sigma factors in E. coli have been discovered which operate under the conditions of heat shock (ζ32 or ζH), nitrogen starvation (ζ54 or ζN) or chemotaxis and flagellar structure (ζ28 or ζF). Sigma factors from E. coli, B. subtilis, phage SP01, and phage T4 are homologous proteins. Multiple sigma factors in phage SPO1 – The regulation of gene expression through a cascade has been studied in phage SPO1 infecting Bacillus subtilis. It has been shown that for the expression of the early genes of SPO1, bacterial RNA polymerase (α2ββ'ζ55) is used because the promoters of the early genes like the promoters of host genes can be recognized by this RNA polymerase holoenzyme. The consensus sequence in SPO1 at –35 and –10 bp resembles those of E. coli and B. subtilis. For the expression of the middle and the late genes, however, expression of atleast three regulator genes, namely, 28, 33, and 34 is needed. The products of these genes are, respectively called gp28, gp33 and gp34. The gene 28 is an early gene whose expression is needed for the expression of the middle genes. Similarly, genes 33 and 34 are the middle genes whose expression is pre-requisite for the expression of late genes. It has been shown that gp28 replaces ζ55 in the holoenzyme α2ββ'ζ55 and, therefore, gp28 is also referred to as ζgp28. This RNA polymerase with gp28 can then transcribe the middle genes. The next transition involves replacement of gp28 by gp33 and gp34, so that RNA polymerase may now transcribe late genes. Thus it is obvious that selective expression of genes can be caused by changes in sigma factor of RNA polymerase enzyme. Multiple small RNAs as regulators – Small regulatory RNAs can act by pairing with their target messages, targeting themselves and the mRNA for degradation Multiple small RNAs are essential regulators of the quorum-sensing systems of Vibrio species, including the regulation of virulence in V. cholera. Multiplicational gene family – It consists of 101 to 104 copies of a gene, 80-100 nucleotides long. Repeat units are essentially identical. Multivalent vaccines – Multiple foreign genes are inserted into Vaccinia virus. Vaccinia virus strain VP168 carried three foreign genes – InfHA, HSVgD and HBsAg. Vaccinia virus recombinant VP168 responded by making antibodies to all three of the foreign antigens synthesized under Vaccinia virus regulation. Muton – Two mutations belong to the same muton if they do not show recombination. Unit of mutation is defined by recombination test. The smallest mutational unit in a double-stranded DNA molecule is a single nucleotide residue. Mutually exclusive mRNA splicing – A eukaryotic gene contains a large number of introns; many of them are alternatively spliced. Four mechanisms of mutually exclusive splicing are: stearic interference, spliceosomal incompatibility, disposal by nonsense-mediated decay, and antagonism of repressor by Docker:selector pairing. Reverse splicing provides considerable insight into the catalytic flexibility of the spliceosome.

N Natural antisense transcript-derived siRNAs (nat-siRNA) – A type of small RNA which is produced from partially overlapping transcripts of antisense gene pairs encompassing an inducible and a constitutive gene. Negative control – Product of a regulatory gene acts to repress transcription. Negative-stranded RNA viruses – These viruses have a genome that consists of single-stranded antisense RNA; that is, RNA that is the complement of the message sense. Examples of negative-stranded RNA viruses are measles, mumps, respiratory syncytial virus (RSV), parainfluenza viruses (PIV), human metapneumovirus, rabies, Ebola, influenza. N-end rule – According to N-end rule, the amino acid at the amino or N-terminal end of a protein is a signal to proteases that control the average length of life of a protein. Almost complete predictability was achieved in determining the life of β-glactosidase protein based on N-terminal amino acid. Protein life spans range from 2 minutes for those with N-terminal arginine to greater than 20 h for those with N-terminal methionine. Nested gene – A single gene which produces two or more nested products by modulating the end point of protein synthesis. This can occur by leaky read through of a termination codon or by co-translational frameshifting. Similar strategies are used by eukaryotic RNA viruses and nested products can also be produced from

G.28

Essentials of Molecular Genetics

eukaryotic genes by alternative RNA splicing or use of alternative polyadenylation sites. Nested genes serve as an example of a situation in which one gene resides within an intron of another gene New patterns of regulation – Evolutionary loss of the ability of two species to hybridize probably results from the accumulation of incompatibilities between genes during embryonic development. Therefore, mammals appear to have undergone both rapid regulatory evolution and rapid rearrangement of genes. Thus, the gene rearrangement provides an important means of achieving new patterns of regulation. No-go decay pathway – This pathway results in rapid mRNA cleavage and degradation. This pathway is different from other mRNA degradation pathways in yeast, in that stalling of the ribosomes by a stem-loop structure leads to cleavage of the mRNA, a process that involves the Dom34 and Hbs1 proteins. This cleavage generates free ends that are subject to degradation: the enzyme Xrn1 chews up the mRNA from the 5′-end, whereas the multienzyme exosome complex degrades from the 3′-end. Dom34 and Hbs1 might interfere directly with the ‗A site‘ of the stalled ribosome, possibly to promote release of the ribosome and allow subsequent mRNA cleavage, or they may directly stimulate cleavage of the mRNA Nonambiguous code – There is no ambiguity about a particular codon. A particular codon will always code for the same amino acid wherever it is found. Noncoding RNA – All nongenetic RNAs except messenger RNA are noncoding RNA. Also referred to as nonmessenger RNAs (nmRNAs) or functional RNAs (fRNAs). Noncoding RNA central dogma of molecular biology – mDNA is transcribed into pre-mRNA. uDNA is transcribed into uRNA which, as a part of spliceosome, is involved in the processing of pre-mRNA into mRNA. rDNA is transcribed into pre-RNA, which with the help of RNase MRP and snoRNP is processed into rRNAs, which are integral component of a ribosome. tDNA is transcribed into pre-rRNA, which with the help of RNase P matures into tRNA. SRP and tmRNA are involved in translocating nascent proteins to the site of action. Nondegenerate code – When there is only one codon for each amino acid. Nongel-based assays – Several nongel-based methods are available for single-nucleotide polymorphism (SNP) detection. If SNP is present at 3′-end of a PCR primer binding site, it can be detected simply by the failure of amplification due to mismatch between the primer sequence and the binding site in the template, although it may be difficult to distinguish this failure of PCR amplification due to SNP from the PCR amplification failure due to other reasons. The common, nongel-based assays for detection of SNPs at the internal sites are based on the detection of mismatch between the PCR product and an oligonucleotide used as a probe. The common methods used are Taqman assay and molecular beacon assay. Nongenetic RNA – The RNAs made in a cell through the process of transcription. There are three major types of nongenetic RNA – messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA). There are a number of other types of RNA present in smaller quantities as well, including small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), the 4.5S signal recognition particle (SRP) RNA, etc. Non-overlapping code – When only as many amino acids are coded for as there are code words represented in end-to-end sequence (UUUCCC has two code and two codons actually present are UUU and CCC. Non-repetitive DNA – Non-repetitive sequences are found once or a few times in the genome. Many of the sequences which encode functional genes fall into this class. These sequences contain but are not exclusively composed of protein coding sequences. Also known as unique DNA or single-copy DNA. Nonsense codon – A codon that does not code for an amino acid; also called termination codon. Nonsense-mediated mRNA decay – In multicellular eukaryotes, long introns are recognized through exon definition and most genes produce multiple mRNA variants through alternative RNA splicing. The nonsensemediated mRNA decay (NMD) pathway may further shape the observed sets of variants by selectively degrading those containing premature termination codons, which are frequently produced in mammals. Thus NMD functions in mRNA quality-control. Independently of alternative RNA splicing, species with large intron numbers universally rely on NMD to compensate for suboptims splicing efficiency and accuracy. Non-synonymous substitutions – Those mutations that occur at non-degenerate sites. The rate of nonsynonymous substitutions for an evolving protein is indicative of the intensity of the selection pressure to maintain its structure. Non-synonymous substitution rate observed is different for different proteins. Non-transcribed spacer (NTS) – These spacers in a ribosomal gene family makes only a part of the intergenic spacers (IGS) Nuclear DNA (nDNA) – DNA present in the nucleus of a eukaryotic cell.

Glossary

G.29

Nuclear organizer regions (NORs) – Regions where rRNA heteroclusters are present. For the production of about 10 million ribosomes per cell in eukaryotes, rRNA genes are present in multiple copies in tandem array of specific satellite chromosomes. The number of these genes may vary from 50-30,000 in a cell and this number may be equally distributed on NORs. The DNA comprising these genes is called ribosomal DNA (rDNA), which is repetitive in nature. Nuclear RNA editing – The post-transcriptional modification of a glutamine codon CAA to create a translational stop codon UAA has been reported in in apolipoprotien B (apoB) RNA in mammals. Nucleic acid vaccines – When a plasmid DNA encoding an antigen of interest is injected, it results in sustained expression of the antigen and generation of an immune response. Nucleus-cytosol compartmentalization – The origin of eukaryotic nucleus marked a seminal evolutionary transition. The nuclear envelope‘s incipient function was to allow mRNA splicing, which is slow, to go to completion so that translation, which is fast, would occur only on mRNA with intact reading frames. The rapid, fortuitous spread of introns following the origin of mitochondria is adduced as the selective pressure that forged nucleus-cytosol compartmentalization.

O Oligonucleotide microarrays – In oligonucleotide microarrays, the probes are short sequences designed to match parts of the sequence of known or predicted open reading frames. Oncogenes – The genes that confer the ability to convert cells to a tumorigenic state. All human and subhuman species contain a complement of oncogenes that in their non-activated state produce substances necessary for normal cell proliferation and cell surface properties. In an attempt to locate the sequence responsible for oncogenicity, with restriction enzymes, the proto-oncogene and oncogene are cleaved at the same site, the fragments recombined, and the recombinants tested for transforming activity. In this way, exact position was narrowed to 350-bp region, which was sequenced. The cause of oncogenic activity was found to be a single base change (G  A) that results in a valine substitution for glycine. There are four main classes of oncogenes: Class I-IV. Viral oncogenes and cellular oncogenes are formed from cellular proto-oncogenes. One cistron-one polypeptide hypothesis – With the discovery of cistron by Benzer in 1962 and colinearity by Yanofsky in 1964, cistron was considerd that part of the DNA molecule which coded information for the synthesis of a single polypeptide. This led to the proposal of one cistron-one polypeptide hypothesis. Some consider this hypothesis as an alternative name of one gene-one poplypeptide hypothesis. One codon-two amino acids – Strict one-to-one correspondence between codons and amino acids is thought to be an essential feature of the genetic code. However, it has been reported that one codon can code for two different amino acids with the choice of the inserted amino acid determined by a specific 3′ untranslated region structure and location of the dual-function codon within the messenger RNA (mRNA). For example, codon UGA specifies insertion of selenocysteine and cysteine in the ciliate Euplotes crassus, that the dual use of this codon can occur within the same gene. One enzyme-two functions concept – One enzyme may have two functions. For example, enzyme pon2 (human paraoxonase 2) has two functions, enzymatic lactonase activity and reduction of intracellular oxidative stress. This led to one enzyme-two functions concept. One gene-many protein hypothesis – Alternative splicing is a mechanism by which more than one protein is made from a single gene. This observation gave rise to one gene-many proteins hypothesis. It means that the old paradigm that one gene makes one protein is clearly in need of revision. The discovery of polyproteins led to one gene-many protein hypothesis. One gene-one antigen hypothesis – This hypothesis states that each of the genes in a tissue transplant governing the host‘s reaction to it is responsible for the manufacture of a particular transplant antigen alone. One gene-one chromomere hypothesis – On the basis of observation that there is one essential function expressed by genetic information contained in each chromomere, this hypothesis was put forward. The one chromomere-one function relationship holds regardless of the amount of DNA the chromomere contains. One gene-one enzyme hypothesis – This hypothesis states that each gene controls the reproduction, function and specificity of a particular enzyme. In this way, Beadle and Tatum demonstrated in 1941 relationship between

G.30

Essentials of Molecular Genetics

gene and enzyme in Neurospora. Ample support to one gene-one enzyme hypothesis came from studies on microorganisms like E. coli, Salmonella, and Neurospora. One gene-one mRNA-one protein hypothesis – One gene-one ribosome-one protein hypothesis was rejected when it was reported that RNA is translated into protein by ribosome and this led to one gene-one mRNAone protein hypothesis. One gene-one polypeptide hypothesis – One gene-one enzyme hypothesis could not be universally applied. Studies on mutant hemoglobins, lactate dehydrogenase and tryptophan synthetase challenged one gene-one enzyme hypothesis and lead to one gene-one polypeptide relationship. . One gene-one polypeptide hypothesis states that one gene controls the synthesis of one polypeptide. One gene-one primary cellular function hypothesis – One gene-one mRNA-one protein hypotheses put forward to explain gene function were based on the studies on protein-encoding genes. In addition to the structural genes, which code for polypeptides, there are other genes (e.g., tRNA genes, rRNA genes) which produce nongenetic RNAs required in protein biosynthesis. The major three types of nongenetic RNAs (mRNAs, tRNAs and rRNAs) are considered primary cellular products of genes. Thus one gene was noted to perform one primary cellular function. This led to one gene-one primary cellular function hypothesis. One gene-one reaction hypothesis – This hypothesis states that every specific biochemical reaction is under the ultimate control of a different gene. It was later redefined as one-gene one-enzyme hypothesis. One gene-one ribosome-one protein hypothesis – According to an early version of the theory of the information structure of protein synthesis, the RNA transcript was thought to provide the RNA moieties for newly formed ribosomes. Hence, each gene was imagined to give rise to the formation of one specialized kind of ribosome, which in turn would direct the synthesis of one and only one kind of protein – a scheme that Brenner and coworkers in 1961 epitomized as the one gene-one ribosome-one protein hypothesis. Now we know that synthesis of new kinds of ribosomes is not a precondition for the synthesis of new kinds of proteins. Operator – A sequence of bases in DNA which interacts with a regulatory molecule called repressor to control transcription of a group of genes that are co-transcribed. In lactose operon, the operator was mapped very close to the lac z gene. Operator mutations – Mutations in operator that determine how much of the structural gene proteins are made, but which do not affect the structure of the three proteins themselves. These mutations are called regulatory mutations as they are found to occur in the regulatory elements. The operator class of regulatory mutations controlled all the structural genes of an operon at the same time and in the same way. These mutants are termed as operator-constitutive (oc) mutants as they synthesize all the proteins constitutively regardless of the presence or absence of the substrate. Operon – (1) A group of genes that are co-transcribed (under the control of a single promoter- operator system. (2) An operon consists of promoter (p), operator (o) and a number of structural genes that are transcribed as a unit. Operon concept was given by F. Jacob and J. Monod in 1961, the concept applies to the genes that are closely linked, co-transcribed and regulated coordinately. In prokaryotes, metabolically-related genes lying close together are co-transcribed. Overlapping code – When more amino acids are coded for than there are code words represented in end-to-end sequence. UUUCCC has two code words but in case of overlapping code, codons may be UUU, UUC, UCC, CCC. But this actually is not found to be the case. Overlapping genes – A single nucleotide sequence coding for more than one polypeptide. Overlapping genes share some of the same sequence. Such an arrangement of genes was first discovered by Barrell and coworkers in 1976. Later overlapping genes were also found in eukaryotic multicellular organisms.

P Palindromic sequence – A sequence of letters (e.g., DAD, MADAM, MALAYALAM, REDIVIDER), words (AND MADAM DNA), phrases, nucleotides, or nucleotide pairs (e.g., GAATTC/CTTAAG) that reads the same regardless of which direction one starts from. Paracodons – The existence of a second code which is imprinted in the structure of the aminoacyl synthetases (AASs). It matches the amino acid with structural features of the tRNA, the phenomenon known as paracodons. The main features of this code are that it is non-degenerate, deterministic and older than the

Glossary

G.31

classical genetic code. A number of cases where the anticodon is not instrumental in recognition have also been reported. Paramutation – In epigenetics, paramutation is an interaction between two alleles of a single locus, resulting in a heritable change of one allele that is induced by the other allele. Paternal X chromosome inactivation – In rodents and marsupials, only the X chromosome of paternal origin (Xp) is silenced during early embryogenesis. This could be due to a carry-over effect of the X chromosome‘s passage through the male germline, where it becomes transiently silenced together with the Y chromosome, during meiotic sex chromosome inactivation (MSCI). Peptidyl transferase – Peptide forming enzyme peptidyl transferase is required one copy for one 50S subunit. On completion of peptide bond formation, the (A) site is occupied by F-met-amino acid-tRNA A1 and the (P) site is occupied by tRNAfMet. The translocation step follows. The tRNAfMet is discharged from the ribosome. F-met-amino acid 1-tRNAA1 dipeptide is shifted from (A) to (P) site as ribosome moves by length of one codon along the mRNA in a 5'  3' thus bringing a new codon to (A) site. Thus polypeptide assembly begins at the amino-end and ends at carboxyl-end. mRNA is read in a 5'  3' direction. Periodic introns – Many of the anomalous introns are periodic, that is, relics of internal sequence repetitions within the ancestral gene. Some of these periodic-intron patterns are shared between related genes. Pervasive transcription – Certain transcripts either existed stably in cells (stable unannotated transcripts, SUTs) or are rapidly degraded by the RNA surveillance pathway (cryptic unstable transcripts, CUTs). One characteristic of pervasive transcription is the extensive overlap of SUTs and CUTs. Transcription of SUTs and CUTs can be functional, through mechanisms involving the generated RNAs or their generation itself. PEST hypothesis – According to PEST hypothesis, protein degradation is determined by one of the four amino acids – proline, glutamic acid, serine and threonine; the one-letter abbreviations for these four amino acids, respectively, are P, E, S and T. Different proteins are programmed to survive different programs in the cell. Phage display technology – This technology allows production of monoclonal antibodies for chosen target antigen, without the use of any animal or hybridoma. DNA sequences coding for antibody V region are fused with DNA sequence coding for amino-terminal of the minor coat protein pIII of the phage. This fused DNA sequence, inserted in phage genome, will produce a fused protein (carrying the antibody) that will display on the surface of the phage (due to coat protein sequence). Phase transition in development – The switch from one developmental phase to the next is under the control of spatial and temporal patterns of gene expression, so that selective activation or silencing of genes directs development through different phase changes. For example, the floral transition marks the beginning of reproductive development and the timing of this developmental switch is crucial to determine the reproductive success of plant species; for that reason plants control very precisely the time of flowering initiation in response to both endogenous and environmental factors, ensuring that the production of flowers and fruits take place under optimal conditions. Phosphoproteins in gene regulation – Phosphoproteins have been demonstrated to play an important role in demonstrating high level of tissue-specificity. The nuclear phosphoproteins meet most of requirements of regulator of gene activity. These requirements are: (a) nuclear phosphoproteins are located on active segment of chromatin, (b) their distribution in various cell cycles is specific, (c) phosphorylation of hydroxyl group of serine residues reflects the intensity of RNA synthesis in this tissue, (d) these proteins are species-specific, (e) transcription of a DNA-phosphoproteins complex is greater than that of DNA alone, and (f) some nonhistone proteins that are synthesized in the S phase in Hela cells are able to activate the transcription of histones genes in the G1 phase. Plant hormones – Plant hormones are main internal chemical signals that influence plant development. Auxins promote shoot growth through cell elongation and lateral root formation and help prevent senescence. They also inhibit growth in lateral buds in favor of growth in apical meristem (apical dominance) and inhibit cell elongation in roots. Cytokinins promote growth by stimulating cell division, including cytokinesis. Plant-specific RNA polymerases – Eukaryotic organisms have three multisubunit DNA-dependent RNA polymerases (RNAPs) with distinct functions, namely, RNAP I, II and III. Higher plants have two additional RNA polymerases – RNAPIV and RNAPV – that function mainly in RdDM pathway. RNAPIV functions upstream in RdDM to produce and amplify the small-RNA trigger for silencing, whereas RNAPV acts

G.32

Essentials of Molecular Genetics

downstream to transcribe noncoding RNAs that provide scaffolds for attracting silencing complexes and could also be involved in the reinforcement of silencing through a positive-feedback loop. Polar mutations – A polar mutation reduces all wild-type activity distal to it. Direction of transcription was determined by studies on polar mutants. For example, a polar mutation in lac z gene influences lac z, lac y and lac a; a polar mutation in lac y influences lac y and lac a genes only; and a polar mutation in gene lac a influences gene lac a only. These studies showed that direction of transcription is distally, from promoter to the operator and on to lac z, lac y, lac a genes in that order. Poly(ADP-ribosyl)ation – Poly(ADP-ribosyl)ation often involves the addition of long chains of ADP-ribose units, linked by glycosidic ribose-ribose bonds, and is critical for a wide range of processes, including DNA repair, regulation of chromosome structure, transcriptional regulation, mitosis and apoptosis. Polycistronic mRNA – mRNA containing multiple ORFs are called polycistronic mRNA which often encode proteins that perform related functions such as different steps in the biosynthesis of amino acids and nucleotides. Prokaryotic mRNA frequently contains two or more ORFs and hence can encode multiple polypeptide chains. Polyploidy – Refers to multiple copies of a complement in a single cell. It is a common phenomenon in the evolution of most species of plants but it is much rarer phenomenon in animals. Polyploidy has played role in evolution by having several copies of the same genome but it does not seem to have played role in causing evolutionary changes in genome size. Polyprotein genes – Many genes encode for one single large polypeptide which, however, after translation, is cleaved enzymatically into smaller subunits. Such polyprotein genes are also known in multicellular eukaryotes. Polyprotein genes thus contradict the hypothesis adopted by the neoclassical view of the gene that each gene encodes for a single polypeptide. Polyribosome formation – If only one ribosome is attached to a single mRNA, protein synthesis will not be efficient. Instead, a single mRNA usually has several ribosomes attached to it, as many as 1 for every 90 nucleotides. This mRNA with its attached ribosomes is called a polyribosome or polysome. Polyteny – It is the multiplication of the number of DNA strands within a chromosome. This occurs in certain animal tissues, e.g., the salivary glands, the gut and the Malphighian tubules of Diptera. Polyteny can increase amount of DNA manifold in somatic cells but not in germ cells. Thus polyteny is not a general mechanism for increasing amount of DNA in evolution. Positive control of gene regulation – Product of a regulatory gene acts to stimulate transcription. For example, in addition to negative control, there also exists a positive control of gene action in lac operon. Positive control in lac operon of E. coli was studied by glucose effect. Cyclic adenosine monophosphate (cAMP) is the key metabolite which influences glucose effect. The glucose blocks the activity of other operons through cAMP. When glucose is available as energy source, cAMP level decreases, which prevents the functioning of other operons involved in the metabolism of other sugars, like lactose, arabinose, galactose, maltose. When glucose levels decrease, cAMP level increases and other source of sugar can be utilized. cAMP is also called second messenger. Positive-stranded RNA viruses – These viruses have a genome that consists of single-stranded sense RNA; that is, the RNA that can act as a messenger. Examples of positive-stranded RNA viruses are: polioviruses, rhinoviruses, coronaviruses, rubella, yellow fever virus, West Nile virus, dengue fever viruses, equine encephalitis viruses, hepatitis A, hepatitis C viruses, and tobacco mosaic virus. Post-transcriptional control by leader sequences – In tryptophan operon of E. coli, mRNA of gene trpE which codes for enzyme anthranilate synthetase has a sequence of 162 bases between its initiation codon (AUG) and the promoter-operator region of this gene. This sequence of 162 bases is called leader. This leader determines post-transcriptional control. This post-transcriptional control is in addition to normal transcriptional control provided in the operon model. Post-transcriptional control in phage T4 – Phage T4 Reg A protein binds to the consensus sequence that includes the AUG initiation codon in several T4 early mRNAs. T4 DNA polymerase binds to a sequence in its mRNA that includes the Shine-Dalgarno sequence needed for ribosome binding. Post-transcriptional control in RNA phage R17 – Coat protein of RNA phage R17 binds to a hairpin that encompasses the ribosome binding site in the phage mRNA. Post-transcriptional control of gene regulation – Post-transcriptional regulation and stabilization might be involved in finer regulation such as enzyme induction. In eukaryotes, mRNA production is not simple matter

Glossary

G.33

of transcribing RNA. The transcription units are actually much longer than the final RNA product. Several ―processing‖ steps occur to primary RNA transcripts in the nucleus. These steps involve: (a) addition of a methylated guanylate residues (mG), a ―cap‖, at the 5'-end of primary transcripts, (b) addition of adenylic acid residues to a site in the primary transcript to create a 3' poly(A) segment, (c) removal of specified internal sequences (introns) and rejoining or splicing of the remaining RNA pieces (exons) to create a final mRNA. Initiation site is approximately 30 nucleotides to the 3' site of TATA. Post-transcriptional control through attenuators – In tryptophan operon, when tryptophan supply is adequate, transcription stops at about 147 bases after initiation so that no mRNA for trpE and other genes of operon are formed. But if tryptophan is in short supply transcription continues uninterrupted. During adequate supply of tryptophan, mRNA for leader sequence is translated on ribosome. This synthesis of leader peptide leads to the formation of hairpin loop. Post-transcriptional gene silencing (PTGS) – (1) The mechanism whereby, RNA is not allowed to be translated into proteins. In this case, RNA is either degraded or blocked (i.e., translation is suppressed). (2) A form of gene silencing identified by a combination of transcription run-on and northern blotting analyses showing that RNA encoded by sense transgenes are degraded after transcription. Post-translational gene regulation – (1) This level of control on gene regulation occurs after translation. Stringent response and feedback (end-product) inhibition are mechanisms of post-translational gene regulation. (2) The function and longevity of a protein may be effected by proteolytic enzymes that cleave polypeptide chain at specific places and also by kinases and methylases, etc. that modify specific amino acid residues on specific polypeptide chain. It is only after such modification that active proteins or enzymes are synthesized. Post-translational modification (PTM) –Post-translational events involve phosphorylation, glycosylation, sulfation, etc., which are very important post-tranlational modifications (PTMs) for protein function. It is a key mechanism regulating protein function. Post-translational steps – From newly synthesized polypeptide, N-formyl group is removed by enzyme deformylase. Then the removal of first amino acid takes place by methionine-specific amino-peptidase. After that terminal tRNA is removed and complete protein becomes available. Pre-messenger RNA processing – Messenger RNA is not made directly in a eukaryotic cell. RNA transcribed from eukaryotic genes is called heterogeneous nuclear RNA (hnRNA) one of them is pre-mRNA which is a large primary transcript and is processed into mRNA. It is 5-10 per cent of total RNA in the cell. It codes for amino acid sequence. Therefore, it is also known as coding RNA. Various steps of pre-mRNA processing are: removal of parts of leader and train, addition of ―Cap‖ at 5'-end, addition of poly(A) tail at 3'-end, splicing of noncoding sequences, called introns and retaining only some sequences, known as exons and methylation of some adenines present. Pre-ribosomal RNA processing – Prokaryotic cells have three types of rRNA: 16S, 23S and 5S. Genomic arrangement of genes in the sequence 5'-16S, tRNAs, 23S, 5S, tRNA(s)-3' is in a single operon. The entire operon is often transcribed as a unit. Cleavage enzymes are required at a number of points even to separate several classes of pre-rRNAs. Ribosomal RNA transcripts are processed both in prokaryotes and eukaryotes to generate the mature rRNA molecules with unique 3′ and 5′ ends. The rRNA transcript undergoes various changes either during or, less likely, after its synthesis but before it‘s processing: (a) To each transcript, ~110 methyl groups are added to the ribose moieties; these methyl groups are retained in the mature molecules. Thus ribose methylation seems to act as a marker for the cleavage sites. (b) The RNA becomes bound to proteins so that it is processed as ribonulceoprotein complex rather than as free RNA. Pre-transfer (soluble) RNA processing – Different stages of tRNA maturation in fixed temporal order are: (a) Trimming of the 5' and 3' ends in which RNase P acts on the 5' terminus endonucleolytically and generates mature 5'-terminus by characterizing mature tRNA sequence. The 3'-end maturation takes place by exonuclease activity by RNase D and 3'-CCA sequences of tRNA C (as in prokaryotes). When -3'-CCA is not encoded in DNA, this triplet is enzymatically added. But this triplet has to be added before removal of intron. Second situation is present in eukaryotes. (b) Modification of the bases in mature coding region can occur on either the precursor molecular or cleavage products. (c) Removal of introns takes place when present. Introns are very short (13-60 nucleotides). The cleavage and splicing at intron junctions has been found to be highly dependent upon the presence of the -CCA and piece.

G.34

Essentials of Molecular Genetics

Pribnow box – Sequences discovered by Pribnow and coworkers in 1981 in promoters of prokaryotic genes. Also known as TATA box. Primary metabolite – A primary metabolite is the one that is directly involved in the normal growth, development, and reproduction. Processing (P) body – Translation and mRNA degradation are affected by a key transition where eukaryotic mRNAs exit translation and assemble an mRNP state that accumulates into processing bodies (P bodies). Cytoplasmic sites of mRNA degradation contain non-translating mRNAs, and the mRNA degradation machinery. Decapping activators (Dhh1p and Pat1p as function as translational repressors and facilitators of P body formation. This competition creates a finely balanced system setting the relative translation rate for an mRNA. mRNAs can be driven into either translation or repression by tipping the balance of this competition via any number of events. Eukaryotic cells contain non-translating messenger RNA concentrated in P bodies, which are sites where the mRNA can be decapped and degraded. mRNA molecules within yeast P bodies can also return to translation. Processing of 5S rRNA – The processing for maturation of 5S RNA takes place in two steps – the early steps and the late steps. In early steps, after transcription, the primary transcript is cleaved into two parts, one bears the genes for the minor rRNA and one or two tRNAs present on the intergenic spacer and other contains the major and the supplemental rRNAs, depending upon whether or not a tRNA occurs on the 3'-end. Neither RNase III nor E is involved in initial enzymatic reaction. In late step, the 10S RNA contains 6 stem-and-loop structures generated by RNase III. Then by action of RNase P, fragment 7S RNA is generated which contains 5S rRNA and a termination stem. Removal of this stem is accomplished by RNase E. Processing of the 5.8S rRNA – Processing of the supplemental species of rRNA is confined to the nucleolus and involved small RNA U3 and another type of RNA, 7-1. During much of its maturation, it is hydrogen bonded to the other pre-rRNAs. First product of this species is 12S RNA particle, followed by one sedimenting at 8S. Finally, 5.8S rRNA is produced. Processing the major species of rRNA – The primary transcript of shorter operons, 30S pre-rRNA is in the form of two loops containing the minor and major species, respectively, each closed at one side by a strongly basepaired stem. It is these stems that are cleaved by RNase III, the enzyme which specifically acts on doublestranded segments. The final trimming of 30S pre-rRNA occurs at the ribosome to produce 23S RNA. Producer genes – Producer genes in eukaryotes are same as structural genes in prokaryotes. It produces premRNA which after processing becomes mRNA. Expression of a producer gene may be under the control of many receptor sites. Progenote stage – DNA sequences of the most primitive unicellular eukaryotes may have originated from primordial DNA sequences which had been randomly generated. Before a self-replicating cell could come into existence but after biological energy-generating system had evolved, DNA molecules were synthesized in the primordial soup by random addition of the four nucleotides without the help of templates. The nucleotide sequences that coded for proteins were selected from the randomly generated nucleotide sequences in the primordial soup by natural selection. This represents the progenote stage in the terminology of Woese and Fox given in 1977. Prokaryotic gene – Eight regions: recognition region (~50 bp), transcription initiation site, 5' untranslated region, translation initiation, coding region, translation stop site, 3' untranslated region, and translation stop site. Prokaryotic ribosome – It has sedimentation coefficient 70S. It contains 65 per cent rRNA and 35 per cent protein. Prokaryotic ribosomes have two subunits. Larger subunit has sedimentation coefficient 50S. It contains one molecule of 5S rRNA, one molecule of 23S rRNA and 34 different proteins. The 50S subunit has a fairly compact body (~150  200  200 Å) from which a central protuberance and a stalk, stick out; this subunit is relatively more spherical than the smaller one. Smaller subunit (30S) contains 16S rRNA and 21 proteins. Its size is 55 × 220 × 220 Å. 30S subunit decodes the codon units of mRNA. Prokaryotic versus eukaryotic genomes – Organization of prokaryotic and eukaryotic genomes differs with respect to split versus non-split nature of gene, coding versus noncoding DNA, applicability of operon concept, polycistronic versus unicistronic mRNAs, nature of regulatory control, unicellularity versus multicellularity, singularity versus plurality of chromosomes, DNA packaging, haploidy versus diploidy, unique versus repeated sequences, and colinearity between gene and its product. Promoter – Specific sequences of a given gene that serve in recognition of the gene and subsequent attachment with the transcribing enzyme. Promoter is part of the leader region. It contains Pribnow box (term used in

Glossary

G.35

prokaryotic genes) or Goldberg-Hogness box (term used for eukaryotic genes). Minimum size of promoter is 12 bp. Promoter melting – This is crucial step in transcription initiation. The N terminus of the β' subunit (amino acids 1-314) and amino acids (94-507) of the ζ subunit, together less than one-fifth of RNAP holoenzyme, were able to melt an extended –10 promoter in a reaction remarkably similar to that of authentic holoenzyme. Promoter, consensus sequences in – The start point in greater than 90 per cent cases is a purine. Just upstream of the start point, a 6-bp region is recognized in almost all the promoters. The centre of the hexamer generally is close to 10 bp upstream of the start point. This hexamer is, therefore, often called the –10 sequence. This –10 sequence is TATAAT; conservation of these bases varies from 45 to 96 per cent (T80 A95 T45 A60 A50 T96). At –35 sequence of promoter, consensus sequence is T82 T84 G78 A65 C54 A45. Proofreading in translation – Protective role of proofreading in translation is evident from a case in which a small mistake has a catastrophic effect. Transfer of information from the DNA sequence of a gene to the corresponding amino-acid sequence of a protein requires transcription of the sequence into messenger RNA, then translation of that RNA into an amino-acid sequence at ribosomes, through the agency of amino-acidspecific transfer RNAs. These steps are all prone to error. Proteasome – In protein degradation, ubiquitin-tagged proteins are recognized by specific proteases in the cytosol which in turn cleave and degrade the tagged protein. These proteases are combined in a very large protein complex called the proteasome. The proteasome (20S) is comprised of 28 subunits and has a molecular weight of 700 kDa. Protein degradation models – Two well known models of protein degradation are N-end rule and PEST hypothesis. Protein engineering – Technology has been developed for incorporating non-standard amino acids into polypeptide by ribosome-based translation. For this, the genetic code is expanded through creation of a 65 th codon-anticodon pair from unnatural nucleoside bases having non-standard H-bonding patterns. This new codon anticodon pair efficiently supports translation in vitro to yield polypeptides containing a non-standard amino acid. Protein folding – Co-translational chaperones function in protein folding. Ribosome-associated trigger factor (TF) is the first molecular chaperone encountered by nascent polypeptides in bacteria. TF interacts with ribosomes and translating polypeptides in a dynamic reaction cycle. Ribosome binding stabilizes TF in an open, activated conformation. Activated TF departs from the ribosome after a mean residence time of ~10 s, but may remain associated with the elongating nascent chain for upto 35S, allowing entry of new TF molecule at the ribosome docking site. The duration of nascent-chain interaction correlates with the occurrence of hydrophobic motifs in translating polypeptides, reflecting a high aggregation propensity. Protein misfolding events during translation may provide a paradigm for the regulation of nucleotideindependent chaperones. Protein secretion – Free ribosomes synthesize specific non-serum liver proteins. It is possible that leader sequences on mRNAs determine which ribosomes shall translate them. In eukaryotes, ribosomes may be free or may be attached to the endoplasmic reticulum (ER). Membrane-bound ribosomes synthesize proteins that are exported from the cell into the surrounding serum. 60S subunit of ribosome is attached to ER. The ER membrane is used to form a vesicle containing the protein. This vesicle then becomes secretary message whose membrane fuses with the cell membrane, emptying its contents outside the cell. Protein sorting – Protein sorting involves interaction of cell membranes either with parts of newly synthesized protein or with a protein sequence added to this new protein. These protein sequences interacting with membranes are called signal sequences or transit peptides, or leader sequences, which are recognized by receptors located within the membrane. The transfer of proteins may be coupled with translation — cotranslational transfer (proteins synthesized on ribosomes attached to endoplasmic reticulum) or may take place after the translation — post-translational transfer (proteins synthesized on free ribosomes). Amino acid sequences found in transported proteins that selectively guide the distribution of the proteins to specific cellular compartments. Protein synthesis in cell organelles – In mitochondria and chloroplasts, following components of protein synthesis are found: DNA, RNA polymerase, ribosomes, tRNAs, initiation factors, elongation factors and termination factors. All these components are specific to cell organelles. Some of these components of protein synthesis may be synthesized outside these organelles and then transported to the organelles.

G.36

Essentials of Molecular Genetics

Protein synthesis in organelles versus cytoplasm – It has been shown that the translation apparatus in chloroplasts and mitochondria differ from that in cytoplasm in eukaryotes in the following respects: (a) Ribosomes in these organelles are smaller in size (70S) than those in the cytoplasm (80S). (b) The tRNAs are specific and differ in the number of tRNAs; in mitochondria being 22 as against 55 in cytoplasm. (c) Initiation of translation takes place by formyl-methionyl tRNA both in chloroplasts and mitochondria, although no formylation takes place in cytoplasm. (d) Translation in chloroplasts and mitochondria can be inhibited by chloramphenicol, as in bacteria since the 70S ribosomes are sensitive to chloramphenicol and not to cycloheximide; on the other hand, the translation in cytoplasm is inhibited by cycloheximide, since 80S ribosomes are sensitive to cycloheximide. Protein synthesis inhibition by microRNAs – miR2 inhibited translation initiation without affecting mRNA stability. Inhibition of m7GpppG cap-mediated translation initiation is the mechanism of miR2 function. Protein synthesizing machinery – In addition to mRNA, tRNA, rRNA and ribosome, other factors are also necessary for protein synthesis. Peptidyl transferase forms peptide bonds between amino acids. Protein factors catalyze partial reactions in the initiation, elongation and termination of peptides. Miscellaneous factors, such as ATP, GTP, Mg2+, K+, NH4+ are important for various biochemical reactions. The enzymes required for protein synthesis are aminoacyl synthetases, phosphatase, transformylase, deformylase, methionine-specific peptidase, etc. Protein targeting – The processes directing proteins to particular destinations in the cell. In bacteria, the choice of destination is between the cytoplasm, the inner and outer cell membranes, and the periplasmic space between them. Proteins can also be secreted. In eukaryotes, proteins can also be targeted to any one of several intracellular organelles, as well as to the nucleus. Since all bacterial proteins and most eukaryotic proteins are synthesized in the cytoplasm, targeted proteins must carry recognizable sequences or structures which allow them to be transported to the appropriate cellular compartment. This process is termed protein sorting or protein trafficking. Protein targeting in eukaryotes – Proteins synthesized on cytoplasmic ribosomes of eukaryotic cells can either be retained in the cytoplasm or transferred to subcellular organelles or inserted into membranes or secreted outside the cell. All living cells require specific mechanisms that target proteins to the cell surface. In eukaryotes, the first part of this process involves recognition in the endoplasmic reticulum of amino-terminal signal sequences and translocation through Sec translocons, whereas subsequent targeting to different surface locations is promoted by internal sorting signals. This process is termed protein sorting or protein trafficking. Protein targeting in prokaryotes – The prokaryotic signal recognition particle (SRP) targets membrane proteins into the inner membrane. It binds translating ribosomes and screens the emerging nascent chain for a hydrophobic signal sequence, such as the transmembrane helix of inner membrane proteins. If such a sequence emerges the SRP binds tightly, allowing the SRP receptor to lock on. This assembly delivers the ribosome-nascent chain complex to the protein translocation machinery in the membrane. Protein translation without ribosomes – Some short proteins are not synthesized on ribosomes. Such proteins are antibiotic polypeptides, such as gramicidin and tyrocidine. Protein trans-splicing – Intein-mediated protein splicing is a self-catalytic process in which the intervening intein sequence is removed from a precursor protein and the flanking extein segments are ligated with a native peptide bond. Splice junction proximal residues and internal residues within the intein direct these reactions. The protein trans-splicing poses problem for gene concept that start and end sites are not determined by genes. Protein turnover – It is the balance between protein synthesis and protein degradation. The turnover of some transcription factors is regulated through ubiquitin-dependent proteolysis. Usually, such processes are regulated by extracellular stimuli through phosphorylation of the target protein, which allows recognition by F box-containing E3 ubiquitin ligases. Protein-encoding gene – The gene which on transcription yields messenger RNA (mRNA) or precursor of mRNA (pre-mRNA) which after various processing steps matures into functional mRNA. Also known as structural gene. Protein-nucleic acid interactions – Protein-nucleic acid interactions are involved in almost all biological processes, viz., generalized recombination, site-specific recombination, DNA replication and initiation of transcription. The precise association of DNA-binding proteins with localized regions of DNA is crucial for regulated expression of the genome. For certain DNA transactions, the requirement for precision in

Glossary

G.37

localization and control is extremely high. Multiple DNA-protein interactions are also required for controlled transcription of the eukaryotic genome. Protein-protein interaction – Proteins do not act in isolation to perform a biological function in a cell. Proteinprotein interaction is an important step in many cellular functions. Thus a study of protein-protein interactions is an important component of proteomics research. These interactions can be exploited for biotechnological applications. Purification of entire multiprotein complexes can be done by affinity-based methods. Proteome – Sum total of all the proteins in a cell at a particular developmental stage or environment. Proteomics – A study focused on gene products (proteins) in a cell to conduct qualitative and quantitative measurements of different proteins. Aims to understand function of protein-encoding genes. Proto-oncogene activation – (1) A point mutation induced by a chemical or radiation carcinogen can cause a change in the protein, thereby initiating cancerous growth. (2) Induction of a chromosomal rearrangement that places in the proto-oncogene next to the regulatory region of an immunoglobulin gene. This will lead to the expression of the proto-oncogene inappropriately. (3) If the proto-oncogene is amplified either as repeat segments within the chromosome or extrachromosomally, the gene product will be over produced. (4) The retroviral transfer of a proto-oncogene picks up another animal cell. Thus, transcription of proviral regulatory signals, not by the signals associated with the proto-oncogenes in the cell's genome, abnormal regulation and/or abnormal product of v-onc expression yield(s) the tumor phenotype. Proto-oncogenes – Normal cellular genes from which the viral oncogenes are derived. Proto-oncogenes do not produce cancer cells under normal circumstances but can do so if they are modified or changed by their incorporation into viral genomes. Almost every viral oncogene tested so far has one or more closely related DNA sequences in uninfected host cells. Pseudoalleles – Those alleles which are allelic in complementation test and non-allelic on recombination test. Large progeny studies showed that no such alleles exist. Now a discarded term. Pseudogenes – Defective copies of genes present in an organism‘s genome. Such genes may arise due to mutations in initiation codons. One of the pseudogenes in mouse has no intron. This pseudogene seems to arise from reverse transcription of mRNA. Two important characteristics of pseudogenes are: (a) Most of the pseudogenes outnumber their genes and are therefore repetitive sequences. (b) They are flanked by 6-21 bases long direct repeats.

Q Quantitative trait locus (QTL) – A chromosomal region linked to or associated with a marker gene which affects a quantitative trait. A QTL that has large effect and can explain a major part of total variation can be analyzed genetically in the same way as major gene. Quelling – Homology-dependent gene silencing phenomena, observed in fungal systems, were called quelling.

R Random amplified hybridized microsatellites (RAHMs) – A novel strategy developed to combine the several advantages of oligonucleotide fingerprinting with RAPD-PCR and microsatellite Primed-PCR (MP-PCR). In this approach, the genomic DNA is amplified with either a single arbitrary 10-mer primer as in RAPD or with a microsatellite complementary 15-mer/10-mer primer as in MP-PCR and the PCR products are electrophoresed, Southern blotted and hybridized to 32P or digoxigenin-labeled microsatellite probe, e.g., (CA)8/(GA)8 (GAC)5, etc. Subsequent autoradiography reveals reproducible, probe-dependent fingerprints, polymorphic at interspecific level. This provides for speed of the assay along with high sensitivity, so that high level of polymorphism is detected. Random amplified microsatellite polymorphism (RAMP) – In case of RAMP, two primers are used, one is SSR-anchored primer and another is RAPD primer. Here the amplified products resolve length polymorphism that may be present either at the SSR target site itself or at the associated sequence between the two primer binding sites. The RAPD primer binding site actually serves as an arbitrary end point for SSRbased specific amplified sequence.

G.38

Essentials of Molecular Genetics

Reading frame – The particular nucleotide sequence that starts at a specific translation initiation point and is then partitioned into codons until the final word (in fact a termination codon) of that sequence is reached. Receptor site – A component of eukaryotic gene regulatory system. These are the sites with which activator RNA molecules or their protein products bind. At least one such receptor site is assumed to be lying adjacent to each producer gene. This interaction leads to transcription of producer genes. Receptor site is comparable to operator in bacterial operon. Recoding – Certain instructions in mRNA may result in reprogramming of genetic information which mRNA receives from DNA. This reprogramming of mRNA is reflected in amino acid sequence of proteins. Most RNA editing systems are mechanistically diverse, informally restorative, and scattershot in eukaryotic lineages. In contrast, genetic recoding by adenosine-to-inosine RNA editing seems common in animals; usually, altering highly conserved or invariant coding positions in proteins. Recoding signals – The set of instructions in mRNA which brings about this recoding are called recoding signals. Two mechanisms for recoding have been proposed: first, change in linear read out of nucleotide sequence and, second, change in meaning of the code. Recombinant DNA technology – Recombinant DNA is the DNA that has been artificially created. It is DNA from two or more sources that are incorporated into a single-recombinant molecule. This technology comprises of battery of experimental procedures used to isolate, characterize and clone the individual genes at the molecular level. The technology is based on restriction enzymes, which cut DNA into defined fragments having sticky ends, allowing them to be inserted into a vector capable of replicating in a bacterial cell. Also known as in vitro recombination. Recon – A unit of recombination. It is defined by recombination test. Two mutations belong to the same recon if they do not show recombination. Although there is no lower limit on the size of recon, the smallest recombinational unit probably is two neighboring nucleotide pairs. Redundant genes – Presence of several copies of a single gene in a cell. It allows the organism to obtain large amounts of the gene product in short time interval. These are known as redundant genes. Examples are: rRNA genes, tRNA genes, histone genes, antibody genes. Regulated gene – A gene whose expression is regulated. Regulation of Cap-dependent translation – Eukaryotic messenger RNAs contain a modified guanosine, termed as a cap, at their 5' ends. Translation of mRNAs requires the binding of an initiation factor, eIF4E, to the cap structure. family of proteins through a shared sequence regulate cap-dependent translation. Regulation of chromatin structure – DNA methylation, histone modifications, and RNA interference play critically important roles in regulating chromatin structure, thereby profoundly affecting transcription and other molecular events. Regulator(y) genes – (1) The genes that produce regulatory proteins. The genes whose product has a regulatory function. These genes have their own operator. (2) The genes that may switch or select between developmental pathways. There are three groups of genes — maternal genes, segment genes and homeotic genes. The distinction between structural and regulatory genes is more or less arbitrary, because the proteins such as lac-repressor of E. coli is the product of a structural gene; however, its function is regulatory in that it directly controls the expression of a set of structural genes coding for enzymes. Regulatory code – The genome has more than one code for specifying life. Buried in DNA sequences is a regulatory code akin to the genetic code. The regulatory code is encoded in the arrangement of an enhancer‘s DNA binding sites, in the spacing between binding sites, or by the loss or gain of one or more of these sites. Regulatory elements in eukaryotic gene regulation – Regulatory elements are specific to certain types of genes which are involved in regulating expression in response to changes in intra- or extracellular environment. It is in the upstream region of the promoter that regulatory elements, such as those involved in the transcriptional induction in response to heat shock or hormone/receptor complexes are located. Regulatory sequences – Specialized genetic determinants, called regulator(y) and operator genes, control the rate of protein synthesis through the intermediacy of cytoplasmic components or repressors. Removal of introns – Introns in many eukaryotic genes begin with dinucleotide 5′-GT-3′ and with dinucleotide 5′-AG-3′. Process of pre-mRNA splicing is a two-step pathway. In the first step, a precise cleavage occurs at the 5′-end of the intron, and a 2′-5′ phosphodiester linkage is formed between the 5′ position of the G at intron cleavage site and a conserved A residue located near the 3′-end of the intron. In the second step, the

Glossary

G.39

two exons are joined by a normal 3′-5′ phosphodiester bond, and the lariat-shaped intron is released. Small RNAs, known as U RNAs, are used in removal of introns. Repeated gene family – It consists of many identical or nearly identical genes that cohabit a single haploid genome. In some cases the genes are contained within tandemly repeated units. Repeated genes – Certain genes have many copies in the genome of a cell. The genes of ribosomal RNA are repeated in several tandem copies. Each one consists of one transcription unit, but the gene cluster is usually transmitted from one generation to the next as a single unit. Thus, the units of transmission and transcription are not always the same. The histone genes have been observed to be repeated in such tandem repeats in many higher eukaryotic organisms. Repeat-induced gene silencing (RIGS) – A cis-TGS event induced by the insertion of multiple copies of a transgene at one locus. RIGS is abolished by reducing copy number. Repeat-induced point (RIP) mutation – A phenomenon similar to MIP that is associated with point mutation in the duplicated sequences in the fungus N. crassa. Repeats in promoters – The nucleosome-free region directly upstream of genes (the promoter region) is enriched in repeats. One-fourth of all gene promoters contain tandem repeat sequences. Genes driven by these repeatcontaining promoters showed significantly higher rate of transcriptional divergence. Tandem repeats are variable elements in promoters that may facilitate evolutionary tuning of gene expression by affecting local chromatin structure. Repetitive DNA – Repeats in DNA sequences vary in size between 5-100,000 nucleotides. Most repeats are located at teleomeres and centromeres. Putative functions of repeated DNA are suggested to be many, including involvement in chromosome pairing, control of gene expression, processing of RNA (excision of introns) and participation in DNA replication. Replisome-RNA polymerase collision – Replication forks are impeded by DNA damage and protein-nucleic acid complexes such as transcribing RNA polymerase. Head-on collision of the replisome with RNA polymerase results in replication fork arrest. However, co-directional collision of the replisome with RNA polymerase has little or no effect on fork progression. Reporter gene – A reporter gene is a gene that researchers attach to a regulatory sequence of another gene of interest in bacteria, cell culture, animals, or cells. Reporter plasmid – Reporter plasmid carries a gene whose expression is driven by product of a regulatory gene carried by a effector plasmid. Representing gene in literature – An mRNA sequence is complementary to sense strand while it is similar to antisense strand except that where there is thymine in DNA, there is uracil in RNA. Therefore, in literature a gene is represented through the sense DNA strand (in 5′3′ direction). Repressible control – The end product, also known as co-repressor, acts to repress production of the enzymes needed for catabolism. Transcription is initiated in the absence of the end product. In some enzyme systems and many other amino acids as well as purines and pyrimidines, enzymes are repressible in the sense that their production is repressed by a specific substance that is activated during the course of metabolism. Repressor-operator binding – The gacS repressor which contains a helix-turn-helix motif binds to an operator within its own coding sequence. This operator is identical to the DNA sequence coding for residues 1-6 of the recognition helix of this repressor. Sometimes the products of regulator genes are affected by inducers in such a way that they cross-regulate working of two operons simultaneously. Repressors – The regulatory proteins which themselves are specified by regulatory genes. Restriction endonucleases – The enzymes that cleave DNA only at specific sites. Restriction fragment length polymorphisms (RFLPs) – Restriction fragment length polymorphisms (RFLPs) are based on Southern blotting. It comprises of the following steps: (1) RFLP analysis begins with a blood sample. (2) DNA is extracted from nuclei of white blood cells and digested with a restriction endonuclease. (3) The resulting DNA fragments are separated by gel electrophoresis which separates them according to size. (4) The RFLP is then detected by Southern blotting. First DNA in the gel is heated to denature (the two strands of DNA separate) and is blotted on a nylon membrane. (5) A probe, a radioactively labeled segment of single-stranded DNA that is complementary to the RFLP locus is applied to the membrane. (6) The probe hybridizes with the fragment carrying the locus. A sheet of X-ray film placed over the membrane detects the radioactively tagged fragment and reveals the RFLP. (7) In RFLP analysis, DNA samples from several

G.40

Essentials of Molecular Genetics

individuals are often analyzed at the same time. RFLP analysis is based on two assumptions: (a) All the bases are present in equal amount in a genome and (b) Random distribution of the bases in the whole genome. Restriction mapping – Restriction mapping technique is an efficient way of finding differences in mtDNAs because loss of a restriction site represents a mutational difference. Since mtDNA is known to be maternally inherited, the human mtDNA evolutionary tree data could lead to a speculation that all the present day humans might have shared a common ancestral mother. Retained introns – Retained introns may also result in the production of different mRNAs with corresponding different translated products. Intron retention, one form of alternative splicing, is common in plants but rare in higher eukaryotes, because messenger RNAs with retained introns are subject to cellular restriction at the level of cytoplasmic control and expression. Retrogene – In certain genes there is reverse transcription of an mRNA and insertion of the product into genome. This is a case of RNA to DNA flow of information. Some examples of retrogenes are RPL36AL, HSPA2, SPIN2B and FAM50B. Reverse epimutations – Removal of epigenetic marks. For example, some agents, such as 5-azacytidine, lead to demethylation and hence to reverse epimutations. Reverse RNA splcing – Introns are self-spliced in a very simple way. Catalytic steps of splicing can be efficiently reversed under appropriate conditions. Finding that reverse splicing occurs in vivo provides a piece of jigsaw that one day may allow us to reconstruct the paths that introns take when on the move. Reverse transcription – The process where mRNA is used as a template for synthesis of DNA. DNA thus produced is called complementary DNA or copy DNA (cDNA). Reverse transcription in bacteria – Reverse transcriptase encoded by cellular genes has also been described in bacteria – myxobacteria and E. coli. This enzyme is thus not unique to eukaryotes. This suggests that reverse transcriptase existed before retroviruses. Retrons with reverse transcriptase became retroposons, which with their ability to transpose became reterotransposons, which with long terminal repeats became retroviruses, which with the ability to form virus became pararetroviruses, which lost the ability to integrate and transpose, became DNA viruses. Reverse transcription in DNA viruses – One expects reverse transcription in RNA viruses only. Reverse transcription takes place in DNA viruses also. Reverse transcription step has been proposed in replication of cauliflower mosaic virus (CaMV), a double-stranded DNA virus, which infects plants, on the basis of structural studies of encapsidated CaMV DNA and of intracellular DNA and RNA species judged to be intermediates in the replication of CaMV. Reversible histone methylation – The discovery of enzymes that reverse the methylation of lysine and arginine challenges our current thinking on the unique nature of histone methylation, and substantially increases the complexity of histone modification pathways. Rho-dependent terminators – In case of rho-dependent terminators, there is a need for addition of rho (ρ) factor for termination in vitro. Rho factor is required for termination in vivo also. Most of the known rho-dependent terminators are found in phage genomes. Rho-independent terminators – Rho-independent or simple terminator core enzyme can terminate in vitro at certain sites in absence of any other factor. These terminators have two structural features — a hairpin in secondary structure and a run of 6 uracils (U's) at the very end of the unit. Both features are needed for termination. The hairpins contain a G-C rich region near base of the stem. Probably all the hairpins that form in the RNA product cause the polymerase to slow or pause in RNA synthesis. Pausing creates an opportunity for termination to occur. Ribonuclease E – This enzyme has a key role in mRNA degradation and the processing of catalytic and structural RNAs in E. coli. An evolutionarily conserved 17.4 -kDa protein, named RraA (regulator of ribonuclease activity) that binds to RNase E and inhibits RNase E endonucleolytic cleavages without altering cleavage site specificity or interacting detectably with substrate RNAs. Overexpression of RraA circumvents the effects of an autoregulatory mechanism that normally maintains the RNase E cellular level within a narrow range, resulting in the genome-wide accumulation of RNase E-targeted transcripts. Ribonuclease III family – Members of the ribonuclease III (RNase III) family are double-stranded RNA (dsRNA) specific endoribonucleases characterized by a signature motif in their active centers and a two-base 3' overhang in their products. Bacterial RNase III is structurally simpler. Ribonucleases (RNases) – The enzymes that cleave double-stranded or single-stranded RNA molecules.

Glossary

G.41

Ribonucleic acid (RNA) – A polymer of ribonucleoside-phosphates. RNA is a close cousin of DNA. Ribosomal RNA (rRNA) genes – The genes which on transcription yields precursor of rRNAs (pre-rRNA) which after various processing steps mature into functional rRNA. Ribosomes have two subunits – large subunit and small subunit. Large subunit has three rRNAs of size 28S, 5.8S and 5S. Small unit has 18S rRNA. There are 4 RNA genes. 18S rRNA, 5.8S RNA and 28S rRNA genes are present in form of clusters, 18S rRNA-5.8S rRNA-28SrRNA, known as heteroclusters. In eukaryotic cells, the genes for different rRNA molecules are transcribed as a single precursor. In most eukaryotes, the rRNA genes are found in clusters composed of 100 to 5,000 repeats of these genes. Also known as Class I genes. Ribosomal RNAs (rRNAs) – A family of noncoding RNAs responsible for principal functions of the ribosome. rRNAs contribute directly to the catalytic properties of protein synthesis. A large number of copies of 3 to 4 rRNAs are present in each cell. Genes coding for rRNA are present in nucleolar organizing regions (NOR) of the chromosome. rRNAs are important constituent of ribosome, the site of protein synthesis. Ribosomal elongation cycle – This cycle describes a series of reactions prolonging the nascent polypeptide chain by one amino acid and driven by two universal elongation factors termed EF-Tu and EF-G in bacteria. Extremely conserved LepA protein, present in all bacteria and mitochondria, is a third elongation factor required for accurate and efficient protein synthesis. Ribosomal RNA transcription regulation – Ribosomal RNA (rRNA) transcription is regulated primarily at the level of initiation from rRNA promoters. The unusual kinetic properties of these promoters result in their specific regulation by two small molecule signals, ppGpp and the initiating NTP, that bind to RNA polymerase at all promoters. Ribosomal translocation – During the ribosomal translocation, the binding of elongation factor G (EF-G) to the pre-translational ribosome leads to a ratchet-like rotation of the 30S subunit relative to the 50S subunit in the direction of the mRNA movement. This rotation is accompanied by a 20 Å movement of the L1 stalk of the 50S subunit, implying that this region is involved in the translocation of deacylated tRNAs from the P to the E site. These ribosomal motions can occur only when the P-site tRNA is deacylated. Prior to peptidyl-transfer to the A-site tRNA or peptide removal, the presence of the charged P-site tRNA locks the ribosome and prohibits both of these motions. Ribosome – The ribosome translates the DNA code into life. A ribonucleoprotein structure that coordinates the correct recognition of mRNA by each tRNA and catalyzes the peptide bond formation between growing polypeptide chain and amino acid attached to selected tRNA. Ribosome is the site of protein synthesis. The intact ribosome contains three tRNA binding sites that reach between two subunits; an A- site where charged tRNA enters the ribosome, a P-site that contains peptidyl-tRNA and E-site, where deacylated tRNA exits the ribosome. The ribosome consists of a large submit which contains peptidyl transferase center and small subunit which contains decoding centre. Each subunit is composed of one or more rRNAs and multiple proteins. It is the proteins in the ribosome that perform a largely structural function, not the RNA. Ribosome code – There is a possibility of ribosome code. In the budding yeast, 59 of the 79 cytoplasmic ribosomal proteins are encoded by two genes, stemming from an ancient genome duplication event. These paralogous genes are not functionally equivalent, suggesting the possible existence of a ―ribosome code‖. Duplicated genes escape gene loss by conferring a dosage benefit or evolving diverged functions. Ribosome profiling – A strategy that is based on the deep sequencing of ribosome-protected mRNA fragments and enables genome-wide investigation of translation with subcodon resolution. Riboswitches – Messenger RNA (mRNA), the RNA transcribed from a DNA template in order to make proteins, contains elements able to sense and bind to specific targeting molecules (metabolites or metal ions). In bacteria, fungi and plants, these binding mechanisms are used to control gene expression, and therefore act as genetic "switches", which is why these RNA elements are called "riboswitches". They are often found at the 5′-end of the mRNA, in the untranslated region (the stretch that precedes the start codon): this way, they are the first domain to be synthesized and can therefore influence expression before the entire mRNA is created. RNA antiswitches – These molecules contain an aptamer domain that binds to a specific effector (small molecule ligand), as well as a sequence recognition domain complementary to a target mRNA. Binding of the effector changes the conformation of the switch and enables control of target gene expression. RNA editing – Recent challenge to central dogma of molecular biology has been phenomenon of RNA editing. It is defined as a process resulting in difference in nucleotide sequence of RNA from its template DNA as a result of any process other than splicing. RNA editing is involved in the insertion, deletion or modification of

G.42

Essentials of Molecular Genetics

nucleotides. RNA editing is a process that results in changes in the nucleotide sequence of mitochondrial transcripts such that the RNA sequence differs from the DNA template from which it is transcribed. RNA editing is thus a post-transcriptional process that results in radical changes in amino acid specified by a codon. Most of the editing events occur in the first two nucleotides of the coding sequences. RNA editing adenosine-to-inosine – Primary transcripts of certain microRNA (miRNA) genes are subject to RNA editing that converts adenosine to inosine. A-to-I RNA editing leads to transcriptome diversity. RNA editing and self-splicing – There is parallelism between RNA editing (done by gRNA) and self-splicing of introns in hnRNA (done by ribozymes) A parallelism between the function of IG-S in RNA splicing and the proposed role of guide RNAs in RNA editing in kinetoplastid mitochondria has been recognized. They identified chimeric intermediate with gRNA covalently joined to 3' portion of the mRNA in vitro suggesting an evolutionary analogy to the catalytic mechanism involved in RNA splicing. Like self-splicing of introns, RNA-editing also involves transesterification, but with the help of guide RNA. In the first step, the guide RNA aligns itself (by base pairing) with the unedited RNA, splits it into two, and makes a new bond between one of the broken ends and the uridine at the tip of the tail of gRNA at its 3'-end. RNA editing cytosine-to-uracil – All canonical transfer RNAs (tRNAs) have a uridine at position 8, involved in maintaining tRNA tertiary structure. However, the hyperthermophilic archeon Methanopyrus kandleri harbors 30 (out of 34) tRNA genes with cytidine at position 8. Cytosine-to-uracil (C-to-U) editing at this location in the tRNA‘s tertiary core guarantees the proper folding and functionality of tRNAs. Three RNA editing mechanisms are potentially responsible for C-to-U conversion: deamination (or transamination) base exchange, and nucleotide exchange. RNA editing cytosine-to-uracil through base exchange – Exchange of base without cleavage of the sugarphosphate backbone is another possible way of biochemical modification. Such transglycosylate reaction has been described in transfer RNAs. Pyrimidine replacement is also found in DNA repair metabolism. This process like deamination does not include any cut in the RNA chain. RNA editing cytosine-to-uracil through deamination/amination – The simplest mechanism for C-to-U conversion is deamination at position U of cytosine leading to a uracil residue. This could be achieved by cytidine deaminase, an enzyme of the nucleotide metabolism. The reverse U-to-C modification found in plant mitochondria could be carried out by a CTP synthetase in an ATP-dependent reaction. However, existence of reverse editing from U-to-C implies the presence of two different enzymatic activities and not reverse action of the same enzyme. RNA editing cytosine-to-uracil through nucleotide exchange – Nucleotide deletions at RNA editing sites have been reported in wheat mitochondrial cDNA clones. These cDNAs were interpreted to represent singlenucleotide deleted intermediates of the editing process, and a deletion and insertion mechanism was proposed by these investigators. Nucleotide exchange involves cleavage of the RNA chain, deletion of a cytosine, addition of uracil, followed by religation. The nucleotide exchange involves several steps and is similar to the one described for U addition and deletion in editing of mRNA of trypanosomes. This mechanism is most complex and would probably need an apparatus composed of multiple factors. RNA editing in chloroplasts – ACG codon appears at 5'-terminus of chloroplast genes where initiation codon ATG would be expected. RNA editing in plant mitochondria – Mitochondria have adopted various modes of transcript maturation, such as RNA editing and trans-splicing. Transcripts of C-terminal gene modules undergo polyadenylation, and contiguous mRNAs are generated via cocatenation of separate module transcripts. Range of bases being edited varied from 0.8 to 5.8 per cent of the total nucleotides present in a transcript. The mechanism of RNA editing in plant mitochondria requires at least two steps: (a) the specific identification of the cytosines to be edited and (b) the biochemical modification. RNA interference (RNAi) – A process within living cells that moderates the activity of their genes. Historically, it was known by other names, including co-suppression, post transcriptional gene silencing (PTGS), and quelling. Two types of small ribonucleic acid (RNA) molecules – microRNA (miRNA) and small interfering RNA (siRNA) – are central to RNA interference. RNAs are the direct products of genes, and these small RNAs can bind to other specific messenger RNA (mRNA) molecules and either increase or decrease their activity, for example by preventing an mRNA from producing a protein. RNAi can be divided into four stages – double-stranded RNA cleavage, silencing complex formation, silencing complex activation, and mRNA degradation.

Glossary

G.43

RNA interference and dosage compensation – In mammals, dosage compensation is achieved by X chromosome inactivation (XCI) in the female. The noncoding Xist gene initiates silencing of the X chromosome, whereas its antisense partner Tsix blocks silencing. RNA interference proteins – Three nuclear-localized RNAi proteins have been implicated in generating and utilizing small RNAs in the RNA-directed DNA methylation pathway. RDR2, which is one of six RNAdependent RNA polymerases encoded in the Arabidopsis genome, presumably produces double-stranded RNA from single-stranded templates. DCL3, which is one of four DICER-LIKE activities present in Arabidopsis, processes double-stranded RNA into 24-nucleotide ‗heterochromatic‘ small RNAs. AGO4, which is one of ten ARGONAUTE proteins encoded in the Arabidopsis genome, is thought to be the small RNA-binding component of at least a subset of nuclear silencing effector complexes. RNA methylation – Methylation of RNA occurs at a variety of atoms, nucleotides, sequences and tertiary structures. Strongly related to other post-transcriptional modifications, methylation of different RNA species includes tRNA, rRNA, mRNA, tmRNA, snRNA, snoRNA, miRNA, and viral RNA. Different catalytic strategies are employed for RNA methylation by a variety of RNA‐methyltransferases. Different functions of methyl groups in RNA include biophysical, biochemical and metabolic stabilization of RNA, quality control, resistance to antibiotics, mRNA reading frame maintenance, deciphering of normal and altered genetic code, selenocysteine incorporation, tRNA aminoacylation, ribotoxins, splicing, intracellular trafficking, immune response, and others. RNA polymerase DNA-binding site – After initial binding to recognition site, RNA polymerase diffuses to RNAP binding site which is A=T rich region. This binding region is called TATA box. RNA polymerase tracks along DNA as template rotation generates twin domains of supercoiling. The binding sites at which RNAP forms a stable initiation complex with DNA lie within promoter. RNA polymerase I (RNAPI) – Synthesizes pre-rRNA 45S (35S in yeast), which matures into 28S, 18S and 5.8S rRNAs which form the major RNA sections of the ribosome. RNA polymerase II (RNAPII) – It transcribes all protein-coding genes but also some noncoding RNAs (e.g., snRNAs, snoRNAs or long noncoding RNAs). RNA polymerase II transcription preinitiation complex (PIC) – In PICs, the TFIIB linker and core domains are positioned over the central cleft and wall of the RNAPII. This positioning is not observed in the smaller RNAPII-TFIIB complex. RNA polymerase II ubiquitylation sites – Transcriptional arrest triggers ubiquitylation of RNA polymerase II. The yeast RNAPII ubiquitylation sites have been mapped. They play an important role in transcription elongation and the DNA damage response. RNA polymerase III (RNAPIII) – It transcribes 5S rRNA, transfer RNA (tRNA) genes, and some small noncoding RNAs (e.g., 7SK). Transcription ends when the polymerase encounters a sequence called the terminator. RNA polymerase III-specific promoters – Promoter sequences for RNAPIII are located downstream with exception of 7S genes of signal recognition particles where they are present both upstream and downstream. RNA polymerase II-specific promoters – Promoters for RNAPII are present upstream and showed three regions at start point. These three regions are TATA box or Hogness box (7-bp long) which is located at position –20, CAAT box, which is located between –70 to –80, and GC box, which is located between –60 to –100. RNA polymerase I-specific promoters – Promoter sequences for RNAPI are located upstream (before start point of transcription). RNA polymerase IV (RNAPIV) – Found in plants. Presumed maize RNAPIV is involved in paramutation, an inherited epigenetic change facilitated by an interaction between two alleles, as well as normal maize development. RNA polymerase V (RNAPV) – Synthesizes RNAs involved in siRNA-directed heterochromatin formation in plants. RNA polymerase recognition site – A site on template strand which RNA polymerase initially recognizes and binds. The bases of this site are not transcribed. RNA polymerase secondary channel – High-resolution crystal structures have highlighted functionally important regions in multisubunit RNAPs, including the secondary channel, or pore, which is postulated to allow the diffusion of small molecules both into and out of the active center of the enzyme. Regulatory

G.44

Essentials of Molecular Genetics

factors and small molecules can exploit the secondary channel to gain access to the active site and modify the transcription properties of RNA polymerase. RNA processing – (1) Refers to various post-transcriptional steps that a transcript undergoes before it becomes a functional RNA molecule. (2) RNA processing may be defined as modification, mainly through cleavage and/or splicing, of RNA transcripts so as to release functional transfer RNA (tRNA), ribosomal RNA (rRNA) and messenger RNA (mRNA) and other species of RNA molecules from them. RNA quality-control systems – Eukaryotic cells contain numerous RNA quality-control systems that are important for shaping the transcriptome of eukaryotic cells. These systems not only prevent accumulation of non-functional RNAs, but also regulate normal mRNAs, repress viral and parasitic RNAs, and potentially contribute to the evolution of new RNAs and hence proteins. RNA quality-control circuits depend on specific adaptor proteins that target aberrant RNAs for degradation as well as coupling of individual steps in mRNA biogenesis and function. RNA transport – How mRNA gets to the cytoplasm? The successive steps of mRNA maturation need to be emphasized. Studies on regulation of RNA transport by HIV-1 Rev protein have provided only a glimpse into the complex interplay between nuclear RNA splicing and RNA transport. RNA processing and assembly require many factors but the nucleus apparently lacks any active transport system to deliver these to the RNA. spliced mRNA is exported to the cytoplasm for translation. Mechanism for recruitment of the conserved mRNA export machinery (TREX complex) to mRNA has been investigated. Human TREX complex is recruited to a region near the 5'-end of mRNA, with the TREX component Aly bound closest to the 5' cap. Both TREX recruitment and mRNA export require the cap, and these roles for the cap are splicingdependent. CBP80, which is bound to the cap, associates efficiently with TREX to the 5'-end of mRNA, where it functions in mRNA export. As a consequence, the mRNA would be exported in a 5' to 3' direction through the nuclear pore. RNA world hypothesis – This hypothesis holds that during evolution the structural and enzymatic functions initially served by RNA were assumed by proteins, leading to the latter‘s domination of biologicl catalysis. This progression can still be seen in modern biology, where ribozymes, such as ribosome and RNase P, have evolved into protein-dependent RNA catalysis (RNPzymes). Similarly, group I introns use RNA-catalyzed splicing reactions, but many function as RNPzymes bound to proteins that stabilize their catalytically active RNA structure. RNA-dependent RNA polymerase – RNA polymerase II (RNAPII), in addition to DNA-dependent RNA polymerase activity, also possesses RNA-dependent RNA polymerase (RdRNAP) activity. RNAPII can use a homopolymeric RNA template, can extend RNA by several nucleotides in the absence of DNA, and has been implicated in the replication of RNA genomes of hepatitis delta virus (HDV) and plant viriods. RNA-directed DNA methylation (RdDM) – The most familiar type of RNA silencing occurs primarily in the cytoplasm and is termed post-transcriptional gene silencing (PTGS) in plants, quelling in Neurospora, and RNA interference (RNAi) in animals. A second form of RNA silencing involves sequence-specific changes at the genome level. RdDM has been described so far only in plants. It leads to de novo methylation of almost all cytosine residues within the region of sequence identity between the triggering RNA and the target DNA. RdDM is very important in many aspects of plant growth, development, variation, responses to biotic and abiotic stresses and genome stability. As plants rely heavily on the epigenetic changes for their development and responses to the environment and due to their sessile nature, plants may need to assume several epigenomes to control the global expression pattern at any given point in their life cycle. RNA-directed RNA synthesis – Noncoding small RNAs regulate gene expression in all organisms, in some cases through direct association with RNA polymerase. For example, the DNA-dependent RNA polymerase uses bound 6S RNA as a template for RNA synthesis, producing 14- to 20-nucleotide RNA product (pRNA). RNA-DNA hybrid in biological functions – Formation of RNA-DNA hybrids is important in transcription of DNA, reverse transcription of viral RNA and DNA replication. RNomics and gene regulation – RNomics majorly deals with the study of structure and function of different RNA populations comprising total noncoding RNAs in the cell. Noncoding RNAs play very important roles in biological systems like DNA methylation, chromatin conformational changes and histone code. Regulatory RNA research has marked a new paradigm of RNA-directed gene expression regulation. rRNA transcription regulation – Ribosomal RNA (rRNA) transcription is regulated primarily at the level of initiation from rRNA promoters. The unusual kinetic properties of these promoters result in their specific

Glossary

G.45

regulation by two small molecule signals, ppGpp and the initiating NTP, that bind to RNA polymerase at all promoters.

S Second code – Correct recognition of transfer RNAs by aminoacyl-tRNA synthetases is central to the maintenance of translational fidelity. The anticodon is indeed important for some of the 20 E. coli isoaccepting groups. For many of the isoaccepting groups, the acceptor stem or position 73 (or both) is important as well. For accurate translation of genetic messages the precision of two matchings is very important – the first that of amino acids with transfer RNA and the second that of tRNA with mRNA. The later matching is a strait forward interaction of the codon of mRNA with anticodon in tRNA but the first matching is indirect and is mediated by specific enzymes, the aminoacyl synthetases (AAS). These enzymes are specific for each amino acid, there being 20 AASs for 20 essential amino acids. Secretion of proteins synthesized on free ribosomes – Free ribosomes synthesize specific non-serum liver proteins. It is possible that leader sequences on mRNAs determine which ribosomes shall translate them. Secretion of proteins synthesized on membrane-bound ribosomes – In eukaryotes, ribosomes may be free or may be attached to the endoplasmic reticulum (ER). Membrane-bound ribosomes synthesize proteins that are exported from the cell into the surrounding serum. 60S subunit of ribosome is attached to ER. The ER membrane is used to form a vesicle containing the protein. This vesicle then becomes secretary message whose membrane fuses with the cell membrane, emptying its contents outside the cell. Selfish DNA – The stimulating term ―selfish‖ DNA or ―parasitic‖ DNA, given in 1980 by W.F. Doolittle and C. Sapienza, and L.E. Orgel and F.H.C. Crick refers to DNA sequences which appear to have little or no function. This includes all sorts of repetitive DNAs, introns, transposable elements and the DNA sequence present between the genes. Much of DNA in higher organisms was considered junk. Molecular analysis is helping in understanding function of this class of DNA in the genome. Self-splicing RNA – Group II introns are type of RNA enzyme (ribozyme) that catalyzes their own excision from RNA transcripts and insertion into new genetic locations. Structural and functional analogies support the hypothesis that group II introns and the spliceosome share a common ancestor. Sense DNA strand – Strand complementary to the antisense strand. Also known as noncoding strand, nontranscribing strand, non-template strand, or Crick strand. Compare with antisense strand. Sense word – A codon that specifies an amino acid normally present in that position in a protein. Sense-PTGS – Post-transcriptional gene silencing (PTGS) mediated by sense transgenes thatresults in RNA degradation and DNA methylation of the transcribed region. Sensor site – A component of eukaryotic gene regulation system. A sensor site regulates activity of an integrator gene, which can be transcribed only when the sensor site is activated. They respond to internal signals or external stimuli. Sensor sites thus turn on the integrator genes. Sequence characterized amplified regions (SCARs) – The DNA markers that overcome the limitation of RAPDs. The RAPD fragments that are linked to a gene of interest are cloned and their ends are sequenced. Based on the end sequencing, 20-mer primers are designed; which lead to a more specific amplification of particular locus. These are similar to STS markers in their construction and application. Both are dominant markers, i.e., presence/absence of bands reveals the polymorphism. Sequence tagged sites (STSs) – STS is a short unique sequence that identifies one or more specific loci, which can be amplified through PCR. Each STS is characterized by a pair of PCR primers, which are designed by partial end sequencing of RFLP probe (including genomic DNA + cDNA probes), representing a mapped low copy number DNA sequences. These primers (generally 20-mers) are then used for amplifying specific genomic sequences using PCR. Sequon – Proteins with asparagine residues in the tripeptide motif An-Xaa-Ser/Thr that are N-glycosylated with preformed oligosaccharide units, or other residues may be also be involved in the determination of a glycosylation site, or the structure of the protein itself may inhibit the glycosylation of certain polypeptides. Serial analysis of gene expression (SAGE) – SAGE gives quantitative information on the abundance of transcripts and to identify novel expressed genes. This technique is powerful but not very convenient for the comparison of many different samples and for the study of rare transcripts.

G.46

Essentials of Molecular Genetics

Shine-Dalgarno sequence – A short sequence many prokaryotic ORFs contain upstream on the 5′-end of start codon called ribosome binding site that facilitates the binding by a ribosome. This sequence typically located 3-9 bp on the 5′ side of start codon is complementary to the sequence located near 3′-end of one of RNA components (16S rRNA). Ribosome binding site base-pairs with 16S rRNA component thereby aligning ribosome with beginning of ORF. Short hairpin RNAs (shRNAs) – A sequence of RNA that makes a tight hairpin turn that can be used to silence target gene expression via RNA interference (RNAi). Also known as small hairpin RNAs. Short-term (reversible) gene regulation – This type of gene regulation represents cells response to fluctuations in environment. Short-term regulation involves changes in activities or concentration of enzymes as levels of particular substrate or hormones increase or decrease as the cell cycle is reversed. This is the only type of gene regulation present in viruses and bacteria. Sigma (σ) factors – Bacterial transcription factors that bind core RNA polymerase and direct transcription initiation at cognate promoter sites. Signal hypothesis –Signal hypothesis explains the process of secretion of proteins. Ribosomes synthesizing proteins are attached to the membrane via leader sequence. The mechanism is applicable for both prokaryotes and eukaryotes. Proteins have intrinsic signals that govern their transport and location in the cell. Signal peptide/sequence – Signal peptide takes part in a chain of events leading to membrane attachment by the ribosome and membrane insertion of the protein. The signal peptide does not seem to have a consensus sequence like the transcription or translation recognition boxes. Rather, similarities (at least for the endoplasmic reticulum and bacterial membrane-bound proteins) include a positively charged (basic) amino acid (commonly lysine or arginine) near the beginning (N-terminal end) followed by about a dozen hydrophobic (non-polar) amino acids, commonly alanine, isoleucine, leucine, phenylalanine, and valine. The signal peptide of the bovine prolactin protein reported by Sasavage and co-workers in1982 is: NH2-Met Asp Ser Lys Gly Ser Ser Gln Lys Ser Arg Leu Leu Leu Leu Leu Val Val Ser Asn Leu Leu Leu Cys Gln Gly Val Val Ser Thr Pro Val …. Asn Asn Cys-COOH. The amino acids represented in bold separate the signal peptide from the rest of the protein which consists of 199 amino acid residues. Signal Recognition Particle RNA (SRP RNA) – Noncoding RNAs that are involved in protein translocation to endoplasmic reticulum. Signal sequences – The amino acid sequences that target proteins for secretion from cells or for integration into cell membranes. As nascent proteins emerge from the ribosome, signal sequences are recognized by the signal recognition particle (SRP), which subsequently associates with its receptor (SR). In this complex, the SRP and SR stimulate each other‘s GTPase activity, and GTP hydrolysis ensures unidirectional targeting of cargo through a translocation pore in the membrane. The signal sequence is presented at the ribosomal tunnel exit in an exposed position ready for accommodation in the hydrophobic groove of the rearranged SRP54 M domain. Upon ribosome binding, the SRP 54 NG domain also undergoes a conformational rearrangement, priming it for the subsequent docking reaction with the NG domain of the SRP receptor. Also known as leader sequences. Signal transduction – Signal transduction pathways precede most of the mechanisms of regulation gene expression. In eukaryotes, a cascade of molecules leading to the activation of one or more specific transcription factors is involved. Among oncoproteins, G proteins are represented by Ras proteins. Since Gproteins transduce signals, they are also described as transducers. Silent epigenetic changes – Epigenetic changes and mutations in structural genes may have differing effects. Effect of epigenetic changes will primarily be on the regulation of gene activity, rather than on the integrity of coding sequences. The effects of epigenetic changes may be less harmful than those of mutations in structural genes. Genes that are inactive in somatic cells may often have a cluster of m 5C rather than one m5C in adjacent promoter region. Thus, the expression of gene depends not on the loss of a single methyl group but on the sequential loss of several. Thus many silent epigenetic defects are required to produce a deleterious phenotypic effect. Changes associated with aging accumulate more and more rapidly as the end of normal span is approached. Similarities between prokaryote and eukaryote genomes – Prokaryotic and eukaryotic chromosomes are similar with respect to two features – unineme structure and semi-conservative DNA replication. Simple gene – A continuous sequence in a nucleic acid that specifies a particular polypeptide or functional RNA. This condition exists in the prokaryotes and viruses.

Glossary

G.47

Simple transcription units – Those transcription units where only one functional mRNA is formed from the transcription unit. The primary transcript may or may not require poly(A) addition/splicing). For example, histone genes do not require poly(A) addition or splicing to their RNAs. Simple-sequence gene family – The simple sequence family encompasses segments of DNA derived from 10 2107 repetitions of a short fundamental sequence, generally 6-15 nucleotides in length. Degree of homology amongst the repeat units is often 80-100 per cent. Single-channel microarray detection – In single-channel or one-color microarray, the arrays provide intensity data for each probe or probe set indicating a relative level of hybridization with the labeled target. However, they do not truly indicate abundance levels of a gene but rather relative abundance when compared to other samples or conditions when processed in the same experiment. Single-nucleotide polymorphism (SNPs) – This class of marker has been referred to as the mother of all the markers. These are the recently discovered and most frequent markers in any genome. These are often pronounced as ‗snips‘, which represent the sites, where DNA sequence differs by a single base. Also referred as simple-nucleotide polymorphism. Some workers also include indels (insertions + deletions) and other sequence variations in this class. Single-polypeptide nuclear RNA polymerase – Transcription of some mRNAs in humans and rodents is mediated by previously unknown single-polypeptide nuclear RNA polymerase (spRNAPIV). It is expressed from an alternative transcript of the mitochondrial RNA polymerase gene POLRMT. The spRNAP-IV lacks 262 amino-terminal amino acids of mitochondrial RNA polymerase, including the mitochondrial-targeting signal, and localizes into the nucleus. Single-strand conformation polymorphism (SSCP) – This molecular tool allows the detection of polymorphism due to differences of one or more base pairs in the PCR products and relies on the secondary structure being different for single-strands derived from PCR products that differ by one or more nucleotides at an internal site within the strand. In order to detect such differences, PCR products are devalued, and electrophoretically separated in neutral acrylamide gels. PCR products that do not differ in fragment length have been shown to exhibit SSCP in several studies. Sequence information is required designing the primers and radioactive labeling and autoradiography are used to detect SSCP variants. So this makes this technique relatively unsuitable for routine mapping/tagging studies. Single-stranded RNA (ssRNA) viruses – Those viruses that have single-stranded RNA (ssRNA) as genetic material. Examples of ssRNA viruses are: Tobacco Mosaic Virus (TMV), Influenza Virus, Rous Sarcoma Virus, Pollomyelitis, and bacteriophages MS2, F2, r17. siRNA processing RNAi pathway – siRNAs (small interfering RNAs) are noncoding RNAs that are involved in RNA interference. In this processing pathway, the first step includes ATP-dependent processive doublestranded RNA cleavage into double-stranded fragments 21-25 nucleotides long. They contain a 5′ phosphate and a 3′ hydroxyl termini and a 2′ additional overhanging nucleotide on their 3′ ends. The fragments thus generated are called siRNA molecules. Experiments have revealed that this step is carried by RNaseIII like nuclease named Dicer. In the second step, siRNAs are incorporated into a protein complex (RISC) which is inactive in this form to conduct RNAi. The third step involves unwinding of the siRNA duplex and remodeling of the complex to generate an active form of RISC. The final step includes the recognition and cleavage of mRNA complementary to the siRNA strand present in RISC. Small molecules in gene regulation – Small molecules that target specific DNA sequences have the potential to control gene expression. For example, pyrrole-imidazole polyamides are cell permeable and can inhibit the transcription of specific genes. Small nuclear RNAs (snRNAs) – Noncoding RNAs that are involved in splicing of pre-mRNAs Small nucleolar organizer RNAs (snoRNAs) – Noncoding RNAs that are involved in modification of other RNAs Small temporal RNAs (stRNAs) – These RNAs downregulate target mRNAs via untranslated region (UTR) elements that are complementary to regulatory RNAs to specify the temporal expression of cell fates. Temporal modulation of gene expression is regulated by chromatin modification. Smart genes – This term was coined to describe the genes to combinations of signals sent from one gene to another in control networks. The brain of these smart genes is complicated assembly of proteins, known as transcriptional complex, which is composed of transcription factors. These transcriptional factors are DNAbinding proteins which can discriminate between distinctive DNA sequences and thus help RNA polymerase transcribe gene.

G.48

Essentials of Molecular Genetics

Smart transcription factors – Smart transcription factor first gets phosphorylated and then it binds to coactivator binding protein. After that this complex binds to DNA element and this leads to initiation of transcription. For example, smart transcription factor cAMP response element binding (CREB) protein first gets phosphorylated and then it binds to co-activator CREB binding protein (CBP). After that this complex binds to DNA element known as cAMP-regulated enhancer and this leads to initiation of transcription. snRNA gene family – snRNA genes are of 11-13 types. Major 5-6 types of snRNA genes (U1, U2, U3, U4, U5 and U6) are present in many copies and are transcribed frequently. These small nuclear RNAs, called U RNAs, constitute spliceosome formed during splicing of introns from precursors of messenger RNA. The rest small nuclear RNA genes are less frequent. snRNA genes – Term snRNAs stands for small nuclear RNAs. They also include U RNAs. No intervening sequences are found in U1, U2, and U3 genes. These genes are dispersed throughout the genome. Putative adenylation signal (AATAAA) and the termination signal (TTT, at the 3′-end of the RNA) are not found for the U1 gene. U1 RNA is not polyadenylated. Suggested number of genes for each snRNA is 2,000. snRNP – The term snRNP stands for small ribonucleoprotein. Also known as small nucleoriboprotein structures (snurps). snRNAs are complexed with proteins to form small nuclear ribonucleoproteins. There are three possibilities of their mechanism of action: (a) they help to arrange the intron in a configuration that encourages self-splicing, (b) in some organism precursor mRNA forms lariat structure and splicing of intron occurs without the aid of protein or other RNA factors, and (c) splicing may provide missing intron sequences necessary for spontaneous splicing. Snurps in pre-mRNA splicing – The introns of nuclear mRNA precursor transcripts are spliced out in a two-step reaction carried out by complex ribonucleoprotein particles called spliceosomes. They also attach new noncoding segments to the leading and trailing ends of the mRNA. This process is called splicing. Snurps stands for small nuclear ribonucleoprotein particles that help to remove meaningless introns from the messages issues by a cell‘s genes. Without snurps cellular activity would come to a halt. Somatic diversity theory – Only a small number of V sequences are transmitted from parents to offspring, perhaps as few as 3 Vk sequences, 5 Vγ sequences and a few VH sequences. These sequences then experience thousands of different base-pair changes in individual lymphocyte lines creating the observed diversity. One lymphocyte clone will, for example, carry sequences Vδ1a and Vγ11c, another Vδ1b and Vδ11e, and so on. Two of these sequences are then selected for expression in a given lymphocyte clone. Spatio-temporal regulation of mRNAs – MicroRNAs participate in spatio-temporal regulation of messenger RNA and protein synthesis. Aberrant miRNA expression leads to developmental abnormalities and diseases, such as cardiovascular disorders and cancer. Specialized genes – In such genes, different combinations of factor binding sites would produce promoters and enhancers with different properties. They have complex promoters. These promoters have interdigitated array of constitutive and regulatory elements. Both constitutive and regulatory elements are required for correct enhancer function. Splice site compatibility – This phenomenon leads to differential use of splice sites as certain splice site combinations are specifically favored over the others and different mRNAs are formed. Differential transfactors also affect splice site compatibility as they have ability to regulate the selection of differential splice site utilization in a positive or negative manner. Splice site recognition – SR proteins have important roles in facilitating splice site recognition. For example, they recruit the U1 snRNP to the 5′ splice site and the U2AF complex and U2 snRNP to the 3′ splice site by binding to an ESE and directly interacting with protein targets. Inhibition of splice site recognition can be achieved in many ways. First, when splicing silencers are located close to splice sites or to splicing enhancers, inhibition can occur by sterically blocking the access of snRNPs or of positive regulatory factors. Some silencers can be over 100-200 bp away from enhancers. Such splicing inhibitors function by masking splice site recognition through multimerization along the RNA. Spliceosome – A dynamic structure assembled on pre-mRNA templates in a stepwise fashion and then dissociates as the splicing reaction is completed. Even a small error in splicing will be intolerable resulting into nonfunctional polypeptide. Thus RNA processing involves high degree of specificity. Specialized nuceoriboprotein structures (snurps)-like particles identified in cytoplasm and nucleolus are, respectively, named as ―scRNPs‖ (or ―scyrps‖) and ―snorps‖.

Glossary

G.49

Spliceosome assembly – Exon-intron 5′ splice site is recognized by U1 snRNP. The precursor RNA is probably also associated with hnRNP core proteins. U5 snRNP binds to the 3′ splice site at this stage. Incubation of substrate RNA in the presence of ATP results in the rapid formation of a U2 snRNP complex. Further incubation yields assembly of the spliceosome containing at least U2, U4, U5 and U6 snRNAs. The U4 and U6 snRNAs can be associated in one snRNP particle. The two RNAs characteristics of the intermediate in splicing, the 5′ exon and the lariat intervening sequence terminating in the 3′ exon RNAs, are found exclusively in the spliceosome. Splicing and polyadenylation – Pattern of alternative RNA splicing and alternative cleavage and polyadenylation were strongly correlated across tissues, suggesting coordinated regulation of both splicing and polyadenylation. Splicing in mRNA cytoplasmic localization – An important newly discovered function for splicing is regulation of messenger ribonucleoprotein complex assembly and organization for mRNA cytoplasmic localization. Splicing mechanisms – Two types of splicing mechanisms, known as major and minor splicing mechanisms, have been identified. Accordingly, major and minor spliceosomes have been identified. These two splicing pathways are spatially separated in the cell and may have distinct functions. Spotted microarrays – In spotted microarrays, the probes are oligonucleotides, cDNA or small fragments of PCR products that correspond to mRNAs. The probes are synthesized prior to deposition on the array surface and are then "spotted" onto glass. A common approach utilizes an array of fine pins or needles controlled by a robotic arm that is dipped into wells containing DNA probes and then depositing each probe at designated locations on the array surface. The resulting "grid" of probes represents the nucleic acid profiles of the prepared probes and is ready to receive complementary cDNA or cRNA "targets" derived from experimental or clinical samples. Startise – Start point of transcription. A site on template strand required for initiation of transcription. Start point is the base pair on DNA that corresponds to the first nucleotide incorporated into transcript by RNA polymerase. Stochastic rearrangements – The rearrangements in DNA that take place without any need, so that a small fraction of cell population carrying rearranged genes are always ready. Storage protein gene family – Storage proteins in crop plants are encoded by multiple gene families and are represented mainly as prolamins or globulins. Homologies in genes coding for prolamins and globulins in different crop plants has also been shown. Their multiplicity meets the demand for rapid synthesis of storage proteins in the developing seeds. Stringent response – Stringent response occurs when prokaryotic cells are starved to amino acids when an uncharged tRNA finds its way into the A site of the ribosome, This event occurs under general amino acid starvation. It causes an idling reaction by the ribosome, which entails the production of the odd nucleotide guanine tetraphpsphate (3′-ppGpp-5′). This is a post-translational control of gene regulation. Structural alleles – Alleles determined on the bases of recombination test. Structural gene mutations – Mutation in structural genes may lead to amino acid changes in the protein. Structural genes – In general sense the term refers to the genes that are transcribed either to produce RNAs that either have functional role (e.g., tRNA genes, rRNA genes) or are translated to produce proteins (mRNAproducing genes or protein-encoding genes). In strict sense, this term refers to the genes that determine the molecular organization (amino acid sequence) of the proteins. Substitutional editing – No guide RNA is implicated. The phosphodiester backbone of the RNA is not cut and the new base is synthesized in situ. It blurs the distinction between editing and RNA modification. Symmetric RNA structures – Small RNA molecules containing a 19-bp duplex with 2-nt overhangs at each 3′end. This is the standard si RNA structure. Synergistic transcription – Gene expression is said to be synergistic when combined quantitative effect of two transcriptional activation elements is more than the sum of their individual expression. Synonymous codons – Different codons that specify the same amino acid in a degenerate code. Synonymous Mutations – Synonymous mutations do not alter the encoded proteins, but they can influence gene expression. Synonymous substitutions – Those mutations that occur at the degenerate sites where substitutions will not alter the sense of the codon, e.g., often in the third position of the codon. Such substitutions are neutral to selection pressure. The rate of synonymous substitutions in a gene is independent of selective pressure.

G.50

Essentials of Molecular Genetics

Syntheitc gene networks – Networks constructed to emulate digital circuits and devices, giving one the ability to program and design some of the principles of modern computing, such as counting Synthesis of 28S and 18S rRNA – At the nucleolar organizer region, 45S RNA is synthesized which acts as a precursor molecule for the synthesis of 28S and 18S rRNAs. An endonulease cleaves 45S RNA into two short precursor molecules of size 41S and 20S, which are cleaved into 28S and 18S rRNAs, respectively. The 41S RNA under goes three sequential cleavage steps by exonuclease to yield 28S rRNA whereas 20S RNA undergoes only one cleavage step by exonuclease to produce 18S rRNA. Synthetic biology – Branch of biology that deals with application of engineering approaches to produce novel devices using biological building blocks. Synthetic genetic circuit – It consist of genetic components optimized to function in their natural context, not in the context of the synthetic circuit. Objective of constructing such circuits is to regulate a cellular function. Synthetic vaccines – These vaccines do not contain intact viruses or complete polypeptides but merely small peptides that have been synthesized in laboratory to mimic a very small region of the outer coat of the virus. These peptides elicit antibodies capable of neutralizing the virus. Synthetic vaccines are prepared through short synthetic peptide chains. Also known as synthetic peptide vaccines. Systems biology approach – This approach aims to identify metabolic switches and transcriptional regulators and reveal the intrinsic biological networks that occur within cells. Mutants and metabolically-altered plants are analyzed using transcriptomics (study of transcript profile), proteomics (study of protein profile) and metabolomics (study of metaboloite profile) to understand how differently intrinsic networks are influenced.

T Taqman assay – In this assay, an oligonucleotide probe is labeled with fluorescent reporter molecule at 5´-end and a quencher molecule at 3′-end. This probe is called Taqman probe because it is degraded by endonuclease activity of Taq polymerase enzyme. T-cell receptor genes – T-cell receptors genes are more complex in structure than immunoglobulin genes. T-cell receptors (TCRs) – Proteins present on cell membrane that bind to special molecules to facilitate their entry into the cell. Telomerase RNA processing – The spliceosome is best known for shepherding primary messenger RNA transcripts to maturity. This enzyme complex also contributes to the synthesis of an enzyme that maintains chromosome ends. Chromosomal ends contain telomeric repeats, which are added to the 3'-end of DNA molecules by telomerase enzyme complex. Telomerase consists of a reverse transcriptase enzyme (T), which uses another component of this complex, the telomerase RNA, as a template for telomere synthesis. Temporal sequence of gene expression – Regulation of gene expression during the life cycle of virulent bacteriophages is quite different from the reversible on-off switches characteristic of bacterial operons. In phage-infected bacteria, the viral genes are expressed in genetically preprogrammed sequences or cascades. Tet-on and Tet-off – Tet-off activates expression in the absence of Dox, whereas Tet-on activates in the presence of Dox. Doxycycline is also used in "Tet-on" and "Tet-off" tetracycline controlled transcriptional activation to regulate transgene expression in organisms and cell cultures. Thiogalactoside transacetylase – Also known as protein “a” which is involved at some point in lactose metabolism. Acetylation reaction conducted by this enzyme may serve as a detoxifying mechanism. 3'-end processing of pre-mRNAs – During 3'-end processing, histone pre-mRNAs are cleaved 5 nucleotides after a conserved stem loop by an endonuclease dependent on the U7 small nuclear ribonucleoprotein (snRNP). The upstream cleavage product is degraded by a 5'-3' exonuclease, also dependent on the U7 snRNP. Tile-based nanostructures – Short oligonucleotides, known as motifs, assemble to form a tile and tiles assemble to form a complete lattice. Toeprinting assay – Method to look at the interaction of messenger RNA with ribosomes or other RNA-binding proteins. It is different from standard DNA footprinting assay. Originally called extension inhibition assay. Compare with DNA footprinting. Torpedo model of transcriptional termination – ‗Torpedo‘ model postulates that poly(A) site cleavage provides an unprotected RNA 5'-end that is degraded by 5'→3' exonuclease activities (torpedoes) and so induces dissociation of RNAPII from the DNA template. The 3'-end of the cleavage RNA is likely to be degraded by the exosome (Exo).

Glossary

G.51

Train sequence – Sector of the noncoding sequence following the 3'-end of gene proper (mature gene, antisense strand). Train contains signals concerned with termination of transcription. Typically, there exists a series of T-A bp, in prokaryotes usually six or more in number, while in metazoans four times seems to suffice. A stem-and-loop structure could conceivably form in the transcript. Trans-activating small interfering RNAs (tasiRNAs) – A class of endogenous small RNAs which are produced from noncoding transcripts of TAS loci. They are negative regulators of gene expression. Transcription – Synthesis of RNA using DNA as a template. Transcription is a vital process in biological life forms. It is through this process that the biological road map encoded in a strand of DNA is used to produce a complementary RNA copy. During transcription, only one of the two DNA strands is transcribed into RNA. DNA-dependent RNA polymerases, or simply called RNA polymerases (RNAPs) are the transcribing enzymes. Transcription elongation – Elongation of transcription proceeds in alternating laps of monotonous and inchworm-like movement with the flexible DNA polymerase configuration being subject to direct sequence control. Elongation continues till a termination complex is formed. The mechanism of substrate loading in multisubunit RNA polymerase is crucial for understanding the general principles of transcription. Transcription elongation factors enhance overall activity of RNA polymerase II leading to increase in elongation rate. Two such factors are TFIIF and TFIIS. TFIIF accelerates RNA chain growth uniformly and TFIIS helps in elongation by relieving obstacles. TFIIS causes hydrolytic cleavage at 3′-end of RNA chains which are blocked. Its action is similar to cleavage by GreA and GreB in E. coli. Rpb9 is small subunit of yeast RNAPII participating in elongation. Transcription factor CCAAT-enhancer-binding proteins (C/EBPs) – Transcription factors of family C/EBP use a bipartite structural motif to bind DNA. The C/EBP family consists of six transcription factors (α, β, γ, δ, ε and δ). Two protein chains dimerize through a set of amphipathic α-helices termed the leucine zippers. ―Scissors grip‖ model has been given to explain how C/EBP binds DNA in the major groove. Transcription in eukaryotes – In eukaryotes, transcription is performed by three major types of RNA polymerases (RNAPI, RNAPII, and RNAPIII), each of which needs a special DNA sequence called the promoter and a set of DNA-binding proteins – transcription factors – to initiate the process. In plants, two additional RNA polymerases – RNAPIV and RNAPV – have been found. Transcription in prokaryotes – Transcription in prokaryotes is carried out by a single type of RNA polymerase (RNAP), which needs a DNA sequence called a Pribnow box as well as a sigma factor (ζ factor) to start transcription. Transcription in viruses – The viral polymerases are diverse, and include some forms which can use RNA as a template instead of DNA. This occurs in negative-strand RNA viruses and dsRNA viruses, both of which exist for a portion of their life cycle as double-stranded RNA. However, some positive-strand RNA viruses, such as polio, also contain these RNA-dependent RNA polymerases. Transcription initiation – During this step of transcription RNA polymerase initially recognizes and binds to RNA polymerase recognition site, then it diffuses to RNA binding site. To the GCCAAT recognition motif, often referred as the CCAAT box, DNA-binding proteins, collectively designated as CCAAT-binding transcription factors (CTF) bind which act as bonafide sequence-specific transcription initiation proteins. Transcription of centromeric repeats – Heterochromatin in eukaryotic genomes regulates diverse chromosomal processes including transcriptional silencing. However, in fission yeast RNA polymerase II transcription of centromeric repeats is essential for RNA interference-mediated heterochromatin assembly. RNA polymerase II transcription during S phase is linked to the loading of RNA interference and heterochromatin factors such as the Ago1 subunit of the RITS complex and the Clr4 methyltransferse complex subunit Rik1. Transcription preinitiation complex (PIC) – No eukaryotic RNA polymerase appears to bind specifically and directly to DNA. Instead, interactions between polymerase and DNA require the presence of sequencespecific accessory proteins called transcription factors. PIC includes a set of general transcription factors (TFIIA B, D, E, F and H) along with RNA polymerase II. Transcription proofreading mechanism – Mistakes can occur as RNA polymerase copies DNA into transcripts. A proofreading mechanism that removes incorrect RNA is triggered by the erroneous RNA itself. Correction of misincorporated errors at the growing end of the transcribed RNA is stimulated by the misincorporated nucleotide.

G.52

Essentials of Molecular Genetics

Transcription termination – Once RNA polymerase starts transcription, the enzyme continues to move along the template synthesizing RNA until it meets a signal to stop this synthesis. At this stage, enzyme stops adding nucleotide to the RNA chain, releases the completed product and dissociates from the DNA template. Termination process in prokaryotes involves formation of ‗hairpin structure‘. DNA sequence that provides the signal to cease or stop transcription is called a terminator. Terminators require that all the hydrogen bonds holding RNA-DNA together must be broken. Termination of transcription occurs because the elongation complex is less stable when transcribing certain specific DNA sequences. Terminators may be rho-dependent or rho-independent. Transcriptional complexity – Transcribed portions of several organisms are larger and more complex than expected, and many functional properties of transcripts are based not on coding sequences but on regulatory sequences in untranslated regions or noncoding RNAs. Alternative start and polyadenylation sites and regulation of intron splicing add additional dimensions to the rich transcriptional output. Transcriptional control of eukaryotic gene regulation – Expression of mammalian genes may be controlled by repressors acting on the translation of messenger RNA. Eukaryotic DNA used for RNA polymerase recognition, binding and transcription shows presence of TATA box as in prokaryotes. TATA box lies exactly where the transcription begins. In addition, a number of promoters show a 9-nucleotide sequence GGC/TCAATCT called CAAT box. This CAAT box lies about 40 nucleotides upstream from TATA box. Other sequences implicated in gene regulation in eukaryotes are enhancer sequences, which seem to be involved in recognition and binding of RNA polymerase to DNA. Enhancers also seem to be important in cell differentiation. Transcriptional elements – Transcriptional elements have a consensus sequence. They play crucial role in gene expression. For example, maximal transcription of heat shock protein hsp70 gene requires two copies of a 14bp heat shock element (HSE) with consensus sequence CNNGAANNTTCNNG. Transcriptional factors (TFs) – Those proteins that are required for initiation of transcription. TFs are sequencespecific binding proteins that control transcription by RNA polymerase II. They interpret the genetic regulatory information, such as in transcriptional enhancers and promoters. Salient properties of transcriptional factors are: (1) They are composed of functional modules. (2) They regulate transcription via recruitment of coactivators and corepressors. (3) They can be regulated by post-translational modifications. (4) They are often members of mutiprotein family. (5) Chromatin is an integral component in the function of transcriptional factors. (6) Recognition sites for transcriptional factors tend to be located in clusters. Transcriptional forests – Genomic mapping of the transcriptome reveals transcriptional forests, with overlapping transcription on both strands, separated by deserts in which few transcripts are observed. The data provide a comprehensive platform for the comparative analysis of mammalian transcriptional regulation in differentiation and development. Transcriptional gene silencing (TGS) – The mechanism whereby, a gene is made inaccessible to transcriptional machinery, i.e., enzymes (RNA polymerase), transcriptional factors etc. In this case, the end result is that RNA is not formed. Two prevailing models exist for silencing: (1) stearic hindrance in silenced chromatin inhibits the binding of upstream activator protein or polymerase or (2) silencing primarily blocks steps downstream of the transcription preinitiation complex formation. Recent studies contradict both models. It has been shown instead that transcriptional silencing at several URA3 transgenes, and the naturally silenced endogenous HMRa and HMRα mating type genes, acts downstream of gene activator protein binding to strongly reduce the occupancy of TFIIB, RNA polymerase II, and TFIIE at the silenced promoters. Transcriptional hub – Why would two distinct genes on separate chromosomes and from different nuclear locations unite in response to signals for gene expression? They might be seeds for formation of transcriptional hubs. In the absence of estrogen, two genes (TEF1 and GREB1) that are activated by this hormone and found on different chromosomes, reside in different locations within the nucleus. Transcriptional pausing – Transcriptional pausing by RNA ploymerase plays an important role in the regulation of gene expression. Defined, sequence-specific pause sites have been identified biochemically. Singlemolecule studies have also shown that bacterial RNA polymerase pauses frequently during transcription elongation. Elongation may be impeded by pause sites which induces temporary reversible block to nucleotide addition or by arrest or dead ends which stops the transcription which could only be resumed by factors like GreA and GreB, which are elongation factors involved in transcription elongation including suppression of transcription arrest, enhancement of transcription fidelity and facilitating transcription from

Glossary

G.53

abortive initiation to productive elongation. During transcription, RNAP moves processively along a DNA template, creating a complementary RNA. This stage in early elongation appears to be an important broadly used target of gene regulation. Also known as transcriptional stuttering, Transcriptional processivity – The affinity of RNA polymerase for template DNA is referred to as transcriptional processivity. Transcriptional regulators – DNA-binding transcriptional regulators interpret the genome‘s regulatory code by binding to specific sequences to induce or repress gene expression. Transcriptional regulator may act as a therapeutic target. Transcriptional regulatory networks – Transcriptional regulatory networks consist of physical and functional interactions between transcription factors (TFs) and their target genes. Transcriptional unit – A sequence of DNA transcribed into a single RNA, starting at the promoter and ending at the terminator Transcriptome – Sum total of all RNA populations transcribed in a cell. Transcriptome is dynamic as it depends on cell type, developmental stage and environment. Transcriptome is more complex than a genome. Transcriptome analysis – Refers to analysis of transcriptional profile of a cell. Transcriptome analyses indicate that many of the noncoding regions, previously thought to be functionally inert, are actually transcriptionally active regions with various features. Transcriptomics – A field of biological research which analyzes all RNA populations transcribed in a cell. Transfer messenger RNAs (tmRNAs) – One of the noncoding RNAs that helps in bacterial translation and acts as a quality-control factor Transfer of amino acid to tRNA – Adenylated amino acid, which remains tightly bound to synthetase, reacts with tRNA. Each amino acid thus gets bound to accepter stem of tRNA by a specific aminoacyl-tRNA synthetase. These enzymes face two important challenges, they must recognize the correct set of tRNAs for a particular amino acid and they must charge all of these isoaccepting tRNAs with the correct amino acid. Some features of tRNA, called identity elements, aminoacyl-tRNA synthetases to discriminate isoaccepting tRNA from rest of 19 amino acids. The accepter stem is an especially important determinant for the specificity of tRNA synthetase recognition. Transfer RNA (tRNA) – A noncoding RNA to which amino acids must attach prior to their incorporation into polypeptides. The secondary structure of tRNA has a characteristic cloverleaf configuration. It is further folded through alternative hydrogen bonding geometries (including Hoogsteen base pairs) into an L-shaped tertiary structure. tRNA contains several unusual (modified) bases. The principal features of tRNA clover leaf are an accepter stem, three stem loops which are referred to as ψ loop, D loop and the anticodon loop, fourth variable loop. Transfer RNA is the direct interface between amino-acid sequence of a protein and the information in DNA. Also known an adaptor RNA or soluble RNA. Transfer RNA (tRNA) gene – A gene which on transcription yields precursor of tRNAs (pre-tRNA) which after various processing steps matures into functional tRNA. Transformation – In oncogenesis, transformation means cancerous growth of cells. Translation – The process in which genetic information contained within the order of nucleotides is used to generate the linear sequence of amino acids in a protein. Four steps involved in the process of translation are: charging of tRNA (preinitiation), initiation, elongation, and termination. Translation elongation – Polypeptide chain is lengthened by covalent attachment of successive amino acid units each carried to ribosomes by tRNAs, which base pairs with corresponding codon in mRNA at A-site. Elongation is promoted by elongation factors. Elongation requires (a) the initiation complex, (b) the next aminoacyl-tRNA, specified by the next codon in the mRNA, (c) a set of three soluble cytosolic proteins called elongation factors (EF-Tu, EF-Ts, and EF-G), and (d) GTP. Translation initiation – Three steps involved in this process are: (1) the 30S initiation complex is formed. It requires mRNA, 30S subunit of ribosome, a special initiating species of aminoacyl~tRNA (that apparently starts all polypeptide chain, and three initiation protein factors — IF1, IF2, IF3. The initiation factor 3 (IF3) prevents the 30S and 50S subunits from combining prematurely. Binding of the mRNA to the 30S subunit then takes place in such a way that the initiation codon (AUG) binds to a precise location on the 30S subunit. The initiating AUG is guided to the correct position on the 30S subunit by an initiating signal called the Shine-Dalgarno sequence in the mRNA. The first amino acid incorporated in prokaryotes is formylated Nmethionine. At this stage, formation of 30 initiation complex is complete. (2) The complex consisting of the

G.54

Essentials of Molecular Genetics

30S subunit, IF3, and mRNA now forms a still larger complex by binding IF2, which already is bound to GTP and the initiating fMet-tRNAfMet. The anticodon of this tRNA pairs correctly with the initiation codon in this step. (3) This large complex combines with the 50S ribosomal subunit; simultaneously, the GTP molecule bound to IF2 is hydrolyzed to GDP and Pi (which are released). IF3 and IF2 also depart from the ribosome. The 70S initiation complex is thus formed. Precise function of IF1 is not known. However, it has been demonstrated to play role in initial phase of protein synthesis in E. coli. Studies reveal RNA chaperone activity of IF1. During initiation, the mRNA bearing the code for polypeptide to be formed binds to smaller subunit of ribosome, this is followed by binding of initiating amino acyl-tRNA and large subunit to form initiation complex. The initiating aminoacyl-tRNA base-pairs with mRNA codon AUG at P site which signals the beginning of polypeptide chain. Translation initiation complex – This step involves reactions prior to forming the peptide bond between the first two amino acids of the protein which is a relatively slow step in protein synthesis. The initiation of polypeptide synthesis in prokaryotes requires (a) the 30S ribosomal subunit, which contains 16S rRNA (b) the mRNA coding for the polypeptide to be made, (c) the initiating fMet-tRNAfMet, (d) a set of three proteins called initiation factors (IF1, IF2 and IF3), (e) GTP, (f) the 50S ribosomal subunit and (g) Mg2+. Translation preinitiation – Activation and charging of transfer RNAs, binding of charged tRNA to the ribosome, loading of mRNA to the ribosome are important translation preinitiation steps. Translation termination – The completion of the polypeptide chain is signaled by a termination codon in the mRNA. The polypeptide chain is then released aided by proteins called release factors (RFs). Once a termination codon occupies the ribosomal A site three termination or release factors, the proteins RF1, RF2 and RF3 contribute to (a) the hydrolysis of the terminal peptidyl-tRNA bond, (b) release of the free polypeptide and the last tRNA, now uncharged, from the P site, and (c) the dissociation of the 70S ribosome into its 30S and 50S subunits, ready to start a new cycle of polypeptide synthesis. RF1 recognizes the termination codons UAG and UAA, and RF2 recognizes UGA and UAA.The class II RF3 is a GTPase that removes class I RFs (RF1 and RF2) from ribosome after release of nascent polypeptide. Termination of polypeptide synthesis, is signaled by one of three termination codons in the mRNA (UAA, UAG, UGA), immediately following the last amino acid codon. RF1 is required for UAG and RF2 for UGA. UAA can accept either RF1 or RF2. RF3 activates RF1 and RF2. Translational gene silencing – When gene silencing takes place through suppression of translation. Translational initiation in eukaryotic gene regulation – Assembly of the eIF4E/eIF4G complex has a central role in the regulation of gene expression at the level of translational initiation. This complex is regulated by the 4E-BPs, which competes with eIF4G for binding to eIF4E and which have tumor-suppressor activity. Trans-position – When two alleles or elements are present in different DNA strands Transposons –These elements are identified in a group of many prokaryotes (bacteria) and eukaryotes (Drosophila, maize, etc.), which have the ability to get inserted at various different places of the genome. These insertion sequences, if carrying any other gene(s), have the ability to insert them also in the genome alongwith itself. This system is thus capable of incorporating genes from unrelated species into another species. This leads to an increase in the genome size. This is also a mode for generating genetic variability. Also known as transposable genetic elements, insertion sequences. transposable DNA, jumping DNA, jumping genes, or translocatable genetic elements. Trans-TGS – Unidirectional TGS event affecting an active locus (i.e., a locus that does not undergo cis-TGS spontaneously) induced by allelic, ectopic or extra-chromosomic homologous sequences. Compare with cisTGS. Triple-helical DNA – Triple-helical complexes can repress transcription primarily by blocking promoter DNA assembly into initiation complexes rather than by occluding Sp1 binding. Triple-helix-induced repression most likely involves changes in DNA flexibility. Triplet code – Three consecutive nucleotides in DNA or RNA make one code. tRNA acceptance – During transfer RNA (tRNA) selection, a cognate codon-anticodon interaction triggers a series of events that ultimately results in the acceptance of that tRNA into the ribosome for peptide-bond formation. High-fidelity discrimination between the cognate tRNA and near- and cognate ones depends on their differential dissociation rates from the ribosome and on specific acceleration of forward rate constants by cognate species.

Glossary

G.55

tRNA CCA-adding polymerase – CCA-adding polymerases mature the essential 3'-CCA terminus of transfer RNA without any nucleic-acid template. Transfer RNA nucleotidyltransferases (CCA-adding enzymes) are responsible for the maturation or repair of the functional 3'-end of tRNAs by means of the addition of the essential nucleotides CCA. tRNA gene family – Each type of tRNA gene has 10 to several hundred copies per haploid genome. The tRNA genes are also present in form of heteroclusters. These heteroclusters may be tandemly repeated or 50-60 sites in different chromosomes. Ten to several hundred genes for each tRNA are present in each haploid genome. Trypanosome surface antigen switching – Many genes (about 1,000) for ~100 different variable surface glycoproteins (VSGs) are present scattered on different chromosomes in the genome of trypanosome. Diversity depends on changing expression from one pre-existing gene to another. Each VSG is coded by a single basic copy gene, which may be telomeric or internal in location, and there may be several isogenes for same VSG or similar VSGs. The copy of the gene, which is active, is called expression-linked copy (ELC) and is located on an expression site. Creation of an ELC may involve transfer of a basic copy gene to the expression site or vice-versa. Almost all switches in VSG type involve replacement of the ELC by a preexisting silent copy. Tryptophan (trp) operon in E. coli and Salmonella – The tryptophan operon of E. coli is probably the best known repressible operon. In this operon there are five different structural genes, viz., trp A, trp B, trp C, trp D and trp E which produce different enzymes meant for synthesis of tryptophan from anthranilate. Genes trpE and trpD code for two polypeptides of anthranilate synthetase, trpC codes for indoleglycerol sythetase, trpB codes for tryptophan sythetase β chain while trpA codes for tryptophan synthetase α chain. Flanking the 5 structural genes are promoter-operator-leader-attenuator (P-O-L-a) sequences (162 nucleotides) on the upstream and t (36 nucleotides) and t′ (250 nucleotides) terminator sequences on the downstream. In this operon, operator trpO lies within the promoter. Two genes-one polypeptide hypothesis – The lambda chain seems to be encoded by two separate germline genes (a specificity region gene and a common region gene) which are expressed as a single continuous polypeptide chain. Antibody light chains appear to be an exception to the rule of one gene-one polypeptide chain hypothesis. There is still a debate whether two genes-one polypeptide chain is fact or fiction. Two-channel microarray – Two-channel microarrays are typically hybridized with cDNA prepared from two samples to be compared (e.g., diseased tissue versus healthy tissue) and that are labeled with two different fluorophores. Fluorescent dyes commonly used for cDNA labeling include Cy3, which has a fluorescence emission wavelength of 570 nm (corresponding to the green part of the light spectrum), and Cy5 with a fluorescence emission wavelength of 670 nm (corresponding to the red part of the light spectrum). The two Cy-labeled cDNA samples are mixed and hybridized to a single microarray that is then scanned in a microarray scanner to visualize fluorescence of the two fluorophores after excitation with a laser beam of a defined wavelength. Relative intensities of each fluorophore may then be used in ratio-based analysis to identify up-regulated and down-regulated genes. Also known as two-color microarrays. Type I restriction endonucleases – These are multimeric and complex multifunctional enzymes as they undertake both endonuleolytic and methylation activities. Best known examples of type I RE are EcoB and EcoK, which are variants of the hsd system. They require AdoMet, ATP, and Mg++ for restriction. DNA sequences recognized in sB and sK strains are – sB: T-G-A-N8-T-G-C-T; sK: A-A-C-N9-G-T-G-C. DNA cleavage sites are possibly random, at least 1,000 bp from host specificity site. Type I secretion system (T1SS) – T1SS consists of only three protein subunits: the ABC protein, membrane fusion protein (MFP), and outer membrane protein (OMP). T1SS transports various molecules, from ions, drugs, to proteins of various sizes (20-900 kDa). The molecules secreted vary in size from the small E. coli peptide colicin V, (10 kDa) to the Pseudomonas fluorescens cell adhesion protein LapA of 900 kDa. Type II restriction endonucleases – Type II restriction endonucleases recognize, bind and cleave at palindromic sequences. These enzymes have separate endonucleases and methylases. They require Mg++ for restriction. DNA sequences recognized have two-fold symmetry. DNA cleavage sites are generally at host specificity site. These enzymes are used in the construction of recombinant DNA molecules. Type II secretion system (T2SS) – T2SS depends on the Sec or Tat system for initial transport into the periplasm. Once there, they pass through the outer membrane via a multimeric (12-14 subunits) complex of pore

G.56

Essentials of Molecular Genetics

forming secretin proteins. In addition to the secretin protein, 10-15 other inner and outer membrane proteins compose the full secretion apparatus. Type III restriction endonucleases – These are complex multifunctional enzymes. They require AdoMet, ATP, and Mg++ for restriction. DNA sequences recognized in sP1 and s15 strains are – sP1: A-A-A-C-C; s15: C-AG-C-A-G. DNA cleavage sites are 24-27 bp to 3' side of the host specificity site. In Type III REs, three modification and restriction enzymes recognized are: EcoP1, EcoP15 and Hinf. Each enzyme consists of two subunits, R and MS. R subunit is responsible for restriction whereas MS is responsible for both modification and recognition. Modification and restriction activities are performed simultaneously. Type III secretion system (T3SS) – T3SS is like a molecular syringe through which a bacterium can inject proteins into eukaryotic cells. The low Ca2+ concentration in the cytosol opens the gate that regulates T3SS. Type IV secretion system (T4SS) – T4SS is homologous to conjugation machinery of bacteria. It was discovered in Agrobacterium tumefaciens, which uses this system to introduce the T-DNA portion of the Ti plasmid into the plant host, which in turn causes the affected area to develop into a crown gall (tumor). Type V secretion system (T5SS) – T5SS is also called the autotransporter system. It involves use of the Sec system for crossing the inner membrane. Proteins which use this pathway have the capability to form a betabarrel with their C-terminus which inserts into the outer membrane, allowing the rest of the peptide (the passenger domain) to reach the outside of the cell. Type VI secretion system (T6SS) – T6SS gene clusters contain from 15 to more than 20 genes, two of which, Hcp and VgrG, have been shown to be nearly universally secreted substrates of the system. Structural analysis of these and other proteins in this system bear a striking resemblance to the tail spike of the T4 phage. Types of introns – Four different types of introns are known: Type 1 introns are self-splicing and are employed in nuclear, mitochondrial and chloroplast rRNA, tRNA and mRNA. In Type 1 introns, the 3′ OH group of a free guanosine acts a nucleophile, attacking the 5′ phosphate at the splice site, displacing it from the exon. The 3′ OH at the exon, usually a U, acts as a nucleophile, attacking the phosphodiester bond, removing the other end of the intron, and reforming the phosphodiester bond with itself. Type 2 introns are also self-splicing, and are used in genes in mitochondria and chloroplasts of fungi algae and plants. In Type II introns, a similar mechanism is used, except that an internal 2′ OH of an adenosine is used as the nucleophile, attacking the 5′ splice site of the exon to produce a lariat structure. In the final step, the 3′ OH of the 5′ exon attacks the phosphodiester bond of 3′ uracil, to remove the lariat. Type 3 introns require a spliceosome and small nuclear ribonucleoproteins (snRNPs) and are responsible for solely eukaryotic splicing. In type 3 introns, a lariat structure is formed, but the snRNPs are used to form the secondary structures and to mediate the reaction, though it still occurs without free energy input. There are 6 snRNP subunits that make up the splicesome. Type 4 introns require ATP and an endonuclease. Types of transfer RNAs – There are two types of tRNAs based on size of extra or variable loop – Class I tRNAs and Class II tRNAs.

U Ubiquitin – A small regulatory protein found in almost all tissues of eukaryotic organisms. It directs proteins to compartments in the cell, including the proteasomes, which destroys and recycle proteins. Ubiquitination – A post-translational modification process in which carboxylic acid of the terminal glycine from the di-glycine motif in the activated ubiquitin forms an amide bond to the epsilon amine of the lysine in the modified protein. Unfolded protein response – Protein folding in endoplasmic reticulum is a complex process whose malfunction is implicated in disease aging. By use of cell‘s endogenous sensor (the unfolded protein response), several hundred yeast genes have been identified with roles in endoplasmic reticulum folding and systematically characterized their functional independencies by measuring unfolded protein response levels in double mutants. Unicistronic mRNA – When one messenger RNA contains information only from one cistron (gene) and is used as a template to produce under defined conditions only one polypeptide. Eukaryotic mRNAs are unicistronic. Unique DNA sequences – Majority of the structural genes are unique DNA sequences. The role of noncoding unique DNA sequences in evolution is still very obscure. While most of these unique DNA sequences do not

Glossary

G.57

code for proteins and are not transcribed (regulatory regions such as promoters and other sequences of yet unknown function), some of them (introns, leader sequences and intercistronic sequences) are transcribed in association with structural genes. Any mutation that interferes with the correct removal of intron sequences may have considerable effect on the control of gene expression. Also known as non-repetitive DNA sequences or single-copy DNA sequences. Universality of code – Utilization of the same genetic code in all organisms. A codon codes for same amino acid in all the organisms. Mitochondrial genetic systems tolerate relatively frequent changes in codon assignment. Up mutations – Mutations with increased transcription level are known as up mutations. Up mutations can be produced involving only a single base pair, which shows that RNA polymerase reaction with these sites must be very much specific.

V V(D)J recombination – Developing B lymphocytes assemble immunoglobulin genes from widely scattered gene segments, using a somatic DNA rearrangement process known as V(D)J recombination. The V(D)J recombination is catalyzed by a ―recombinase‖ enzymatic machinery. An immunoglobulin heavy chain variable region gene is generated from three sequences of DNA, VH, D and JH. Conserved hepta- and decanucleotides in vicinity of V, D, and J genes form recognition system required for recombination. These conserved sequences are separated by a 10-11 or 21-23 bp non-homologous stretch of nucleotides. Presumably similar sequences are found in the vicinity of D regions of the heavy chain genes. Histone H3 trimethylation at lysine 4 is involved in V(D)J recombination. V(D)J recombination assembles antigen receptor genes from component gene segments. Vaccines – The antigens that do not reproduce but elicit production of antibodies. Variant repetitions of DNA sequences – Some repetitive DNA sequences differ in the number of repeats in different individuals. The different members of a variant repetitive gene family evolve by gene duplication and divergence of an ancestral gene. Viral oncogenes – Many of the oncogenic retroviral genomes carry coding segments other than the usual viral genes. These segments are modified cellular genes that account for oncogenicity. Viral oncogenes are altered normal cellular gene. Virus-induced gene silencing (VIGS) – A form of post-transcriptional gene silencing (PTGS) that is induced by viruses rather than transgenes. VSG Switching – Trypanosome brucei is the causative agent of African sleeping sickness in humans and one of the causes of nagana in cattle. This protozoan parasite evades the host immune system by antigenic variation, a periodic switching of its variant surface glycoprotein (VSG) coat. Introduction of a DNA double-strand break (DSB) adjacent to the ~70 bp repeats upstream of the transcribed VSG gene increases switching in vitro. A DSB is a natural intermediate of VSG gene conversion and that VSG switching is the result of this DSB by break-induced replication. Antigenic switching is induced by a single I-sceI-generated DSB.

W Western blotting – A technique used to identify and locate proteins based on their ability to bind to specific antibodies. Western blotting can give information about the size of a protein (with comparison to a size marker or ladder in kDa), and also give information on protein expression (with comparison to a control such as untreated sample or another cell type or tissue). Wobble rules – The base at 5′-end of the anticodon is not as spatially confined as the other two, allowing it to form hydrogen bonds with any of several bases located at 3′-end of codon. For example, U at the wobble position can pair with either adenine or guanine, while I can pair with U, C or A.

Y Yeast mating type switching – A remarkable feature in yeast is the ability of some strains (which carry dominant HO allele and not those which carry recessive HO allele) to switch their mating type from a to α and vice-

G.58

Essentials of Molecular Genetics

versa. Consequently, an HO haploid strain, irrespective of its original mating type, will soon have cells of both mating types giving diploids (MATa/MATα). This is achieved by the presence of a MAT (a or α) locus with an active cassette (type a or α) associated with two silent cassettes at loci HML(α) on the left and HMR(a) on the right. The switching is achieved by the activity of an endonuclease coded by the dominant gene HO. Yeast protein RAP1 – The yeast protein RAP1 was initially described as a transcriptional regulator that binds in vitro to sequences found in a number of seemingly unrelated genomic loci. These sequences include the silencers at the transcriptionally-repressed mating-type genes, the promoters of many genes important for cell growth and the poly[(cytosine)1-3adenine] repeats of telomeres. RAP1 may be involved in telomere formation in vivo. Yeast transcription activator GCN4 – ―bZIP‖, a DNA-binding protein, consists of a basic region that contacts DNA and an adjacent ―leucine zipper‖ that mediates protein dimerization. A peptide model for basic regions of yeast transcription activator GCN4 has been developed in which leucine zipper has been replaced by disulfide bond. Yeast transcription regulator – Yeast transcription regulator has two functions – DNA binding and gene activation. These functions of a yeast transcription regulator can be separated. The yeast GAL4 regulator consists of two domains, the amino domain which recognizes sequences in the galactose upstream activating sequence (UASG) and the carboxyl domain which when bound to DNA through the amino domain, contacts another protein to mediate positive control. Yeast two-hybrid (Y2H) system – Yeast two-hybrid is an in vivo system for protein-protein interaction studies. A transcription factor generally consists of a DNA-binding domain (DBD) and an activation domain (AD), the former helping in DNA binding and the later facilitating the activation of a gene lying downstream. If one wants to study interaction between two proteins X and Y, two hybrid gene constructs are prepared, one having a hybrid gene coding for fusion protein DBD+X along with a reporter gene and the other coding for the fusion protein AD+Y. Yeast cells are co-transformed with both hybrid constructs, so that X and Y interact physically. This will bring DBD and AD in close proximity. This will activate transcription of a reporter gene and the reporter gene will thus be expressed. Thus yeast two-hybrid system is used for study of proteinprotein interaction.

Z Zinc fingers – Zinc fingers constitute important eukaryotic DNA binding domains, being present in many transcription factors. Also known as zinc finger proteins.

Subject Index A typical promoter 12.9 A typical RNA operon 17.13 Ac element 10.11 Ac transposition mechanism 10.13 Actin gene family 14.4, G.1 Activating transcription factor (ATF) 24.20 Activation of tRNA 19.6, G.1 Activator protein 24.7 RNA 24.6, 24.7 Adaptor RNA 19.3-19.4, G.53 Addition of cap at 5′-end of pre-mRNA 17.1-17.2, G.1 poly(A) tail at 3′-end 17.2, G.1 Adenine methylation in mRNA 25.20-25.21, G.1 A-DNA 3.6 AdoMet complex 25.7 Advantages of Watson-Crick model 3.5 Alanine transfer RNA in yeast 17.8 Alarmones 22.24-22.25, G.1 Allelic exclusion 13.8, G.1 mutations G.1 Alteration of DNA-binding specificity 24.19 Alternate base-pair conformations in trp leader 12.14 forms of DNA 3.6-3.13 Alternative RNA splicing, 15.13-15.14, 17.18-17.21, G.1 functional effects of 17.20 future goals of 17.20-17.21 patterns of 17.18-17.20 alternative donor and acceptor splice sites 17.20 promoters and polyadenylation sites 17.20 combinatorial exons 17.19 mutually exclusive patterns 17.19 retained introns 17.19 sigma factors 22.1-22.2, G.1 Ambiguous code 18.2, G.2

Amino acid sequences of light and heavy chains 13.2 Aminoacyl site (A site) 19.8 Amplified fragment length polymorphism (AFLP) 27.37, G.2 Anaphase-promoting complex/cyclosome 24.2224.23 Anchoring enzyme 21.4 Ancient jumping DNA 13.16 Ancillary sites 12.10, G.2 Aneuploids, production of 9.7 Aneuploidy and cancer 13.20-13.21 Animal mitochondria 4.5 picornaviruses 2.8 Antibody diversity G.2 Anticodon 18.2, G.2 Antigenmarked macrophages 23.16 receptor genes 13.8 Antioncogenes 13.22-13.23, G.2 Antirepressor 23.9 Antisense DNA strand 12.8, G.2 molecules as anticancer drugs 13.23-13.24 oligonucleotides 24.17-24.18 RNA technology G.2 transcript stabilization 24.14 Antitermination 12.11-12.12, G.2 Antiterminator protein pN 23.12 Antiterminators 12.11, G.2 Aporepressor 22.16 Arabinose (ara) operon of E. coli 22.12-22.13, G.2 Arginine synthesis in Neurospora 15.5-15.6 Argonaute (AGO) proteins 26.14-26.15, G.2 Artificial gene repressors 24.17 Assays for SNPs, gel-based 27.37-27.38 non-gel-based 27.37-27.38 molecular beacon 27.38

S.2 Taqman assay 27.38 Attenuation 12.12, G.2 Attenuator Attenuator(s) 22.21-22.22, G.2 AU-rich elements (AREs) 19.10 Autogenous circuit of repressor production 23.8 maintenance of repressor 23.11-23.12 regulation G.2 Autonomous transposons 10.17 Autoregulation of repressor synthesis 23.11-23.12 Autotransporter system 20.4, G.3 Avery et al. (1944) experiment 2.3 Bacteria signal sequences G.3 Bacterial methylase systems 25.2, G.3 dam system 25.2, G.8 dcm System 25.2, G.8 hsd system 25.2, G.19 Bacterial nucleoid 6.3 promoters 12.8-12.9, G.3 RNA polymerase G.3 Transposable elements 10.3-10.4 Bar variants of Drosophila 9.4 B-DNA 3.6-3.7 Beads-on-a-string model 6.11 β-galactosidase 22.4, G.3 β-galactoside permease 22.4 Bidirectional replication 7.4, 7.12, 7.28 Binding sites on the ribosome for tRNA 19.6-19.7 Biosynthetic pathway of tryptophan 15.9 Bipolar gene regulation G.3 Blank codons G.3 Blunt DNA ends G.3 Bone morphogenetic protein (BMP) 26.17 Branch migration 8.13, 8.18 Britten-Davidson model 24.6-24.7, G.3 C+.G.C. triplets 3.10 C+GC and TAT base pairing 3.10 CAAT box 12.17, 24.6, G.3 Cairn's experiment 7.3 cAMP-CAP complex 22.10 cAMP-cga 22.9 cAMP-CRP complexes 24.24 Cap-independent mRNA translation 19.18, G.3-G.4 Caps of mRNA 17.2 Cardinal nucleotide G.4 CAT/Enhancer-binding protein 24.15-24.16 Catabolic gene activator (cga) 22.9 Catabolite activator protein (CAP) 22.9, G.4 CAT-binding transcription factors 24.13 Categorizing regulated genes 21.9 Catenation by topoisomerase 7.20

Essentials of Molecular Genetics CCCTC-binding factor (CTCF) 24.23 C-DNA 3.8 cDNA chips 27.39 Cell cycle control of 24.21 memory 25.15 Cellular genomics 21.6 transformation G.4 Cellulose utilization operon of E. coli 22.13, G.4 Central dogma and its modification 16.1-16.3 Central dogma of molecular biology 16.3, G.4 Central dogma, new contemporary 21.10, G.4 Centriole DNA 4.9 Change from lysogeny to lytic cycle 23.13-23.14 in coding regions 14.12-14.13 DNA sequences 14.12-14.13 genome size 14.10-14.12 noncoding regions 14.13 Chaperone of histone H3/H4 6.10 Chaperonins 20.13-20.14, G.4 Charged coupled device (CCD) 27.40 Charging of transfer RNAs 19.5-19.6, G.4 Chemical complexity 5.7-5.8 composition of chromosomes 6.7-6.10 mutagenesis 9.35-9.41 Chemicals affecting replicating nucleic acids 9.35-9.37 resting nucleic acids 9.37-9.41 Chiasma (plural chiasmata) 8.6 Chi-sites 8.10 Chloroplast DNA packaging 6.19 transcription16.26, G.4 maternally Inherited 4.2-4.3, 4.4 gene introns G.4-G.5 genomes 4.1 mutations 9.43-9.44 Chromatin 6.18 assembly factor 1 (CAF1) 6.13 dynamics and gene expression 24.17 regulators G.5 Chromodomains G.5 Chromomeres 6.17 Chromonemata (plural chromonema) 6.17 Chromosome end replication problem 7.22-7.23 imprinting 25.13-25.16, G.5

Subject Index jumping/hopping 27.12 landing 27.12-27.13 mis-segregation 9.8 packaging, rules of 6.11 segregation 8.13 walking 27.11-27.12 breakage during crossing-over 8.7 Cisposition 12.3, G.5 TGS 25.4, G.5 trans complementation test 12.3 trans effect 12.2 Cistron 12.6, G.5 Class discovery 21.8-21.10, G.5 Classification of eukaryotic transposons based on nature of terminal repeats 10.5-10.17 Classification of mutations based on 9.1-9.17 cause of mutations 9.8 induced mutations 9.8 spontaneous mutations 9.8 changes in chromosome number 9.6-9.8 monoploid 9.6 haploid 9.6 aneuploids 9.7-9.8 tirsomic 9.7 disomic 9.7 monosomic 9.7 double tirsomic 9.7 double monosomic 9.6 nullisomic 9.7 tetrasomic 9.7 euploids 9.8 allopolyploids (amphidiploids) 9.8 autopolyploids 9.8 hexaploids 9.8 pentaploids 9.8 teteraploids 9.8 triploids 9.8 changes in chromosome structure 9.2-9.6 deletion 9.2-9.3 duplication 9.3 inversion 9.4 paracentric 9.4 pericentric 9.4 translocation 9.4-9.6 intercalary 9.5 simple 9.5 reciprocal 9.5 degree of character expression 9.17 amorphic mutations 9.17 hypermorphic mutations 9.17 hypomorphic mutations 9.17 isoallelic mutations 9.17

S.3

change in the sense of codon 9.10 same-sense mutation 9.10-9.11 missense mutation 9.11 nonsense mutation 9.11 direction of mutation 9.10-9.12 forward mutation 9.10 reverse (back) mutation 9.10 exact reversion 9.10 equivalent reversion 9.10 expression or effect on function 9.13-9.14 morphological mutations 9.13 conditional mutations 9.13 biochemical mutations 9.14 resistance mutations 9.14 polar mutations 9.14 loss-of-function mutations 9.14 gain-of-function mutations 9.14 dominant negative mutations 9.14 location of genes 9.15-9.16 cytoplasmic mutations 9.16 nuclear mutations 9.15 autosomal mutations 9.15 sex-linked mutations 9.15 holandric mutations 9.15 X- and Y-linked mutations 9.15 relevance to evolution 9.17 favored mutations 9.17 forbidden mutations 9.17 size of mutation 9.1-9.2 intermediate lesions 9.1 macrolesions 9.2 microlesions 9.1 survival of the organism 9.14-9.15 lethal 9.14 recessive lethals 9.14 dominant lethal 9.14 conditional lethal 9.14 balanced lethal 9.14 gametic lethal 9.14-9.15 sub/semi lethal 9.15 supervital 9.15 vital 9.15 type of amino acid substituted 9.16 conservative substitutions 9.16 drastic substitutions 9.16-9.17 type of damage 9.8-9.10 altered bases 9.9 blocked DNA replication 9.9 cross-linking of DNA bases 9.9 double-strand breaks 9.9 incorrect base 9.9

S.4 misreading by RNA polymerase 9.9 missing bases 9.9 single-strand breaks 9.9 translesion replication type of tissue in which mutation induced 9.13 somatic mutations 9.13 germinal mutations 9.13 type of suppressor mutation 9.12-9.13 intragenic 9.12 frameshift of opposite type at a second site 9.12 second site missense mutations 9.13 extragenic suppressors 9.13 nonsense suppressor 9.13 missense suppressor 9.13 frameshift suppressor 9.13 Class-switch recombination 13.7 Clathrin-coated vesicles 20.9 Clock theory 24.22 Coactivator proteins 24.22, G.5 Code for pyrrolysine 18.10 selenocysteine 18.10 letter 18.2, G.5 Coding dictionary 18.2, G.5 DNA 5.8 strand 12.8 RNA 3.13, G.5 strand 16.2, G.45 Codon 18.2 length 18.2, G.6 Cohesin complex 24.23 Cohesinopathies 24.23 Cohesins 8.7 Cohesive ends 27.44, G.6 Colinearity 5.9, 12.24-12.27, G.6 of code 18.6 Colony/plaque hybridization 27.10-27.11 Combinatorial exons 17.18, G.6 Commaless code 18.2, 18.4 Complementary DNA (cDNA) G.40 Complementation 12.2, G.6 map 12.5-12.6 mapping 12 2 matrix 12.5-12.6 Complex gene 12.17-12.18, G.6 promoters 24,12, G.6 transcription units G.6 Components of eukaryotic genomes 5.11 Compound gene 12.13-12.17, G.6

Essentials of Molecular Genetics Condensins in bacteria 6.6 Conjugation 8.3-8.4, 8.5 Consensus sequence 12.9, G.6 Constitutive gene 22.2, G.6 transcription factors 24.12 Construction of short oligonucleotides 27.4, 27.5 Continuous synthesis of leading strand 7.10-7.11 Controlling elements G.6 region of lac operon 22.9-22.10, G.6 Convergent transcription G.6-G.7 Conversion of ATP into cAMP and to AMP 22.11 Cooperation between nuclear and chloroplast DNA 19.22, G.7 mitochondrial DNA 19.22-19.23, G.7 response genes 13.19-13.20 Coordinated gene expression G.7 ribosomal biogenesis and translation 19.19 Copia gene family 14.5, G.7 Core enzyme 16.4, G.3 particle of nucleosome 6.14 Corepressors 22.3, G.7, G.39 Cosuppression 26.5 in plants 26.17, G.7 Cot curve 5.3 of mouse DNA 5.5 Cot½ 5.4 Co-transcriptional cleavage (CoTC) 16.25, G.7 Co-translational import 20.7 protein translocation G.7 targeting in eukaryotes 20.7-20.9 transfer G.7 translocation 20.8 Coupled expression 4.11 CpG doublets 25.1 islands 25.14 methylation maintenance G.7 Crick strand 12.8, 16.2, G.45 Cro protein 23.10, 23.14 Crossing-over event 8.17-8.18 Crossing-over, base pairing in 8.17 specific enzymes in 8.18 Cross-talk at composite response elements 24.19-24.20

Subject Index between histone modifications 25.27 Cruciform DNA 3.9 Cryptic genetic variation G.7 satellite DNA 5.19 unstable transcripts (CUT) G.31 Cryptomorphic gene 12.18 Curved and Bent DNA 3.10 C-value 5.1-5.2 paradox 5.2-5.3 Cyanelle DNA 4.9 Cyclic adenosine monophosphate (cAMP) G.7 Cyclobutane pyrimidine dimmers 11.21 Cytoplasmic gene control G.7 Cytosine methylation in DNA 25.1, G.7-G.8 RNA G.8 vertebrates 25.7-25.9 Daisy-like rosette 6.17 D-DNA 3.8 Deciphering the genetic code 18.7-18.9 Deep division in DNA code 18.7 Defense genes 13.1-13.26 Defining gene 12.18-12.19 Degenerate code 18.2, 18.4-18.5, G.8 Deletion loop 9.3 mapping method 12.4 /deficiencies 14.11 Demethylation models 25.9, G.8 Deoxyribonucleic acid (DNA) G.8 Deoxyribonucleses (DNases) 27.41, G.8 Detection of mutations in 9.17-9.27 bacteria 9.18 Ames test 9.18-9.19 replica-plating technique 9.18 Drosophila 9.20-9.23 autosomal recessive lethals 9.21-9.22 chromosomal aberration detection 9.22 dominant lethal test 9.22 sex-linked recessive lethals 9.20 ClB technique 9.20-9.21 Muller-5 technique 9.21 yellow-Bar test 9.21 visible mutations 9.22 toxicity test 9.23 translocation test 9.22-9.23 man 9.24 Neurospora crassa 9.19 plants 9.19-9.20 auxotrophic mutants 9.19 cytological effects 9.19-9.20 disease-resistant mutants 9.19-9.20

S.5 mutations at unspecified loci 9.20 for endosperm characters 9.20 recessive mutations 9.20 small mammals 9.23-9.24 dominant lethal assay 9.24 host-mediated assay 9.24 mammalian spot test 9.24 microbial-like selection technique 9.24 micronucleus test 9.23 viruses 9.17 molecular methods 9.24-9.27 diagonostic assays allele-specific amplification 9.26-9.27 oligonucleotide hybridization 9.26 artificial introduction of restriction sites 9.27 ligation 9.27 primer extension 9.27 screening methods 9.25-9.26 cleavage of DNA heteroduplexes with bacteriophage resolvases 9.25 oligonucleotide ligation assay 9.25 PCR-single strand conformation method 9.25 with carbodiimide 9.26 denaturing gradient gel electrophoresis 9.25-9.26 ribonuclease 9.25 Differences in prokaryotic and eukaryotic DNA replication 7.14 Differential RNA processing G.8 Differentiation and genetic cascades 24.14-24.15 Digested random amplified microsatellite polymorphism (dRAMPs) 27.36 Dimorphic gene 12.8 Direction of replication 7.3-7.4 Directionality problem of replisome 7.10 Discontinuous synthesis of lagging strand 7.11 Disease -causing mutations in mitochondrial DNA 4.8 due to DNA repair defects 11.34 Dissociator (Ds) locus (element) 10.11 element, variants of 10.13 Distribution of methylated sequences 25.4 Divergent multigene family G.8 transcription G.8 Divided operons G.8 Divisive introns G.8 DNA

S.6 and non-DNA repair mechanisms 11.1-11.40 damage checkpoints and response 11.1 double-strand break sensing and repair pathways 11.29 end-joining reaction 7.17 fingerprinting 27.31-27.32, G.8 footprinting G.8 hybridization 5.5 kinetics 5.1-5.8 looping G.8-G.9 markers 27.33-27.39, G.9 restriction fragment length polymorphisms (RFLPs) 27.33 methylase specificity 25.4 methylation analysis 25.7-25.8, G.9 and gene expression 25.8, G.9 in differentiation 25.10 gene regulation 25.10 gene silencing 25.11 genome stability 25.11 invertebrates 25.18 plants 25.18-25.19, G.9 transposable element silencing G.9 of imprinted genes G.9 methylation, characteristics of 25.3-25.5 cyclical changes in 25.20 methyltransferase 1 (Dnmt1) 25.1 modification 25.1 packaging 5.9 in bacteria 6.3-6.6 bacterial plasmids 6.6 eukaryotic organelles 6.19 nucleus of eukaryotic cells 6.6-6.18 T4 phage 6.2 viruses/bacteriophages 6.1-6.3 DNA polymerase I 7.4-7.5 II 7.5 III. 7.5-7.6 III β subunit 7.6 IV 7.6 V 7.6 processivity 7.9 profiling 27.31-27.32 reassociation 5.5 recombination in bacteria 8.1-8.4 bacteriophages 8.1 chloroplasts 8.23

Essentials of Molecular Genetics eukaryotes 8.4 mitochondria 8.23 renaturation 5.5 repair mechanisms in eukaryotes 11.20-11.34 DNA damage tolerance 11.20, DNA direct reversal of damage 11.20-11.21 alkyl DNA transferase-mediated repair 11.20 photoreactivation 11.20-11.21 DNA damage removal 11.21-11.26 excision repair 11.21 base excision repair 11.22-11.23 long-patch BER 11.22-11.23 short-patch BER 11.22-11.23 nucleotide excision repair 11.23-11.25 global genome repair 11.23 transcription-coupled repair 11.24 mismatch repair 11.25-11.26 DNA damage tolerance 11.26-11.34 alternate end-joining (Alt-EJ) 11.31-11.32 microhomology-mediated end-joining 11.31-11.32 synthesis-dependent microhomologymediated end-joining 11.31-11.32 homologous recombination repair 11.28 non-homologous end-joining 11.29-11.31 repair of double-strand breaks 11.20. 11.2611.34 post-replication repair 11.27-11.32 translesion synthesis (TLS) 11.32-11.34 repair mechanisms in prokaryotes 11.1-11.20 direct DNA damage repair 11.2-11.4 alkyltransferase-mediated damage reversal 11.2 photoreactivation 11.3-11.4 DNA damage removal 11.4-11.14 excision repair 11.4-11.5 base excision repair 11.5-11.8 oxidative damages 11.8 thymidine dimmers 11.7 incision 11.5 pre-excision step 11.5 excision and reinsertion 11.5 short-patch repair 11.5 long-patch repair 11.5 nucleotide excision repair 11.8-11.9 mismatch repair 11.9-11.13 correction 11.11 proteins involved 11.12 DNA damage tolerance 11.14-11.20 recombinational (post-replication) DNA repair 11.14-11.15, 11.17 SOS

Subject Index regulon 11.17-11.19 regulatory system 11.18 response 11.16-11.17 translesion synthesis 11.14 replication in chloroplasts 7.25-7.26 eukaryotic nuclear chromosomes 7.12-7.24 features of 7.16 initiation 7.17 mitochondria 7.24 D-loop model 7.24 prokaryotes 7.2 viruses 7.12 steps of 7.13 models of, 7.26-7.27 Butterfly model 7.26 D-loop model 7.27 Fork model 7.27-7.28 Loop rolling model 7.26, 7.27 Rolling circular model 7.26, 7.27 steps of 7.6-7.11 elongation 7.9-7.11 gap filling 7.15 initiation 7.6-7.9 termination 7.11 sites 7.11 separation from proteins, lipids, RNAand carbohydrates 27.3 sequence of cancer genomes 13.17 polymorphism 14.13 sequencing 27.17-27.22 automated 27.20-27.21 chemical-cleavage method 27.17 dideoxy method 27.19 enzymatic method 27.17-27.18 structure, some facts about 3.5-3.6 synthesis 27.23-27.29 from precursor tRNA 27.27-27.28 automated solid-phase technique 27.30-27.31 organo-chemical DNA synthesis 27.24-27.26 various approaches 27.24 phosphodiester method 27.24 phosphoramidite approach 27.4 phosphoryl chloridate method 27.4 phosphotriester method 27.4 solid phase peptide synthesis (SPPS) 27.24 tags 27.39, G.9 topoisomerases 16.8, G.10 type I 16.8, G.9 type II 16.8, G.9 transmethylase 25.10

S.7 transposition 10.1-10.27 viruses 2.7, G.10 double-stranded 2.7 single-stranded 2.7 rapping by histone-like protein HU 6.5 structure of 3.1-3.18 /RNA chip technology 27.39 /RNA hybridization 5.14 Domains G.10 Domino theory 24.22 Dosage compensation 25.15-25.17 complex (DCC) G.10 recruitment map 25.17 Dot blots and slot blots 27.8-27.9 Doublehelix structure for DNA 3.1-3.5 stranded DNA, reassociation patterns of 5.3 RNA (dsRNA) 26.6 in viruses G.10 Down mutations G.10 Dspm elements, variants of 10.10 dsRNA-induced TGS 26.6 Duplication(s) 14.11-14.12 loop 9.3 of RNA 7.28 Early intron theory 12.21-12.22 lambda phage genes G.10 steps in processing 5S rRNA G.10 Eastern blotting 27.8 Ecdysone 24.24 E-DNA 3.8-3.9 Effector(s) 22.2, G.10 gene G.10 plasmid G.10 18S-5.8S-28S heterocluster of rRNA genes 14.7 Electrophoresis 27.9-27.10 Electrostatic interactions in DNA 3.3 Encrypted genes G.10 Endonucleases G.10 Engineered vaccines G.10 Enhanceosome G.11 Enhancers 12.10, G.11 Enhansons G.11 Environmental mutagenesis 9.46 Enzyme and protein activities in DNA replication 7.11 Enzyme-linked immunosorbent assay G.11 Enzymes of mismatch repair 11.26 Epigene G.11 Epigenetic(s) 25.1-25.30, G.12 alterations and neoplasia 13.20-13.22

S.8 changes 25.1, G.11 code G.11 gate keepers G.11 gene silencing maintenance G.11 landscape G.11 mechanisms 25.12-25.13 patterns 25.1 programming 25.11-25.12 regulation G.11 silencing G.11 trait G.11-G.12 variations 25.17-25.18 Epigenome 25.18, G.12 Epimutations 25.10, G.12 Establishment of global chromatin environment 25.26-25.27 Estimates of amount of double-stranded DNA 5.6 Ethionine 25.10 Euchromatin 6.18 Eukaryotic DNA polymerase(s) 7.18-7.20 DNA polymerase  7.18 DNA polymerase  7.19 DNA polymerase  7.19 DNA polymerase Rev1 7.20 DNA polymerase β 7.18 DNA polymerase γ 7.18-7.19 DNA polymerase δ 7.20 DNA polymerase ε 7.20 DNA polymerase ζ 7.20 DNA polymerase η 7.19-7.20 DNA polymerase η 7.20 DNA polymerase θ 7.20 DNA polymerase ι 7.20 DNA polymerase κ 7.20 DNA polymerase σ 7.20 DNA polymerase φ 7.20 gene G.12 activation G.12 regulation at translational initiation level 24.10 constitutive elements G.6 transcription, specificity in 24.18-24.19 genome 5.10-5.19 components 5.6 fast component of 5.11 intermediate component of 5.11 slow component of 5.11 gyrase 7.20 kinases 7.16-7.17 primases 7.20 promoters 12.17, 16.13-16.16 Class I 16.14

Essentials of Molecular Genetics Class II 16.15, 16.16 Class III 16.15-16.16 ribosomal recycling 19.20 ribosomes 19.16, G.12 Eukaryotic RNA polymerase(s) 16.16-16.22, G.12 RNA polymerase I 16.17 RNA polymerase II 16.18-16.19 RNA polymerase III 16.20-16.21 RNA polymerases IV and V 16.22 transcription elongation 16.24 initiation 16.23-16.24 and reinitiation 24.14 intermediate complexes in 24.13 factors 19.16-19.17 termination 16.24, 16.25 „Torpedo‟ model 16.25 transcription, process of 16.22-16.25 transcriptional factors 16.23 leucine zipper family 16.23 yeast protein RAP116.23 yeast transcription activator GCN4 16.23 zinc fingers 16.23 translation elongation 19.19, G.12 initiation G.12 termination 19.19-19.20, G.12 transposable elements 10.5-10.17 Exit site (E site) 19.8 Exocytosis 20.9 Exon(s) G.12 array 21.1 shuffling 12.21, G.12-G.13 theory 12.22 -intron boundaries in pre-mRNA 17.4 Exonucleases 27.4, G.13 Exosome 20.17, G.13 Expanded DNA 3.10-3.11 repeats 5.20 Expansion of genetic alphabet 18.10, G.13 Expressed sequence tags (ESTs) 21.1, 27.35, G.13 Expression component G.13 of immunoglobulin genes G.13 system 21.3, G.13 Extein G.13 Extended anticodon hypothesis G.13 Extrachromosomal rDNA of Tetrahymena thermophila 17.15, 17.16 Extranuclear genomes 4.1-4.14 Eye transplantation experiment in Drosophila 15.4

Subject Index F Factor 8.2 F– strain 8.2 F+ strain 8.2 Fate of nascent proteins, ribosomes and messenger RNAs 20.1-20.20 ribosomes after translation 20.16 F-duction 8.3 Feedback inhibition 22.23-22.24, 24.10-24.12, G.13 aspartate transcarbamylase 22.24 isoleucine ← threonine pathway 22.23-22.24 Fidelity of transcription G.13 5S RNA gene family 14.8, G.13 5-azacytidine 25.10 5-methylcytosine (m5C) 25.1 5S RNA from Bacillus subtilis 17.5 Flanking regions G.20 Flexible DNA 3.11 Flow of genetic information from organelles 4.10 Flower-like model of RNA 3.15, 3.16 F-mediated sexduction 8.3 Foldback (FB) loop 10.5 elements G.13 Folded mRNA 19.8 Forbidden base pairs 9.29 Fork junction 7.8 Formylated methionine 19.6 Forward epimutations G.13 14-3-3σ in cap-independent translation 19.18 Fraenkel-Conrat and Singer experiment (1957) 2.6 FTO (fat mass and obesity-associated) gene 25.21 Functional alleles 12.1, G.13 domains in eukaryotic chromosome G.14 state of chromatin G.14 G.C-rich sequence 12.17 Galactose permease G.14 GCRMA LVS-GCRMA 21.1 Gel-based assays G.14 Gene 12.18, 12.19, G.14 annotation 21.9, G.14 concept 12.1-12.6 classical phase 12.1 modern phase 12.2-12.6 transitional phase 12.2 conversion 8.18-8.20 and DNA crossover 8.19 biased 8.20 effect of 8.20 copy number variants 21.2 discoveries 15.11-15.12, G.14 expression 16.1, G.14

S.9 analysis 21.1-21.14 applications of 21.8 glossary 21.1-21,2 limitations of 21.10 networks 21.3, G.14 profiling, single-cell 21.5-21.6 systems biology approach in 21.10-21.13 families 14.1-14.10 function 15.1-15.16, G.14 recent thoughts on 15.14 fusion G.14 imprinting G.14 organization 14.1-14.14, G.14 regulation 22.1-24.30, G.15 at DNA level 22.1-22.2 translational level G.15 transcriptional level 22.2-22.18 by hormonal action 24.24, G.15 nonhistone proteins 24.16, G.15 controlling elements 22.2-22.3 in bacteria, 22.1- 22.26 alternative sigma factors 22.20 antisense RNA in 22.18-22.20 control at translational level 22.22 control of transcription initiation 22.18 lactose operon in E. coli 22.4-22.12 structure 22.5 models of 22.2-22.3 multiple sigma factors in E. coli, 22.20 post-transcriptional control 22.20-22.22 post-translational control 22.22-22.23 sporulation in bacteria 22.20, 22.21 eukaryotes 24.1-24.30 at DNA level 24.3-24.6 immunoglobulin genes 24.3-24.4 trypanosome surface antigen switching 24.4-24.5 yeast mating type switching 24.4, 24.5 eukaryotes, Britten-Davidson model 24.6-24.7 control at the level of translation 24.10 cytoplasmic gene control 24.9-24.10 post-transcriptional control 24.7-24.8 post-translational control 24.10-24.12 transcriptional control 24.6 viruses 23.1-23.16 human immunodeficiency virus 23.1523.16 lambda bacteriophage 23.4-23.24 SPOI bacteriophage 23.2-23.4

S.10 T4 bacteriophage 23.14-23.15 T7 bacteriophage 23.15 regulation, molecular zippers in 24.15 Set Enrichment Analysis (GSEA) 21.2 silencing 26.2-26.4, G.15 by antisense-mediated RNA degradation 26.4 autoregulation 26.4 cellular mechanisms 26.4 chromatin-mediated repression 26.4 DNA-DNA pairing 26.4 DNA-RNA pairing 26.4 in normal cells 13.22 silencing, exploitation of 26.22-26.23 squelching G.15 structure 12.1-12.30 synthesis machines 27.29 using PCR 27.29-27.30 targeting 9.45, G.15 Genebattery 24.7 Calling 21.6, G.15 enzyme relationship G.15 protein colinearity 12.24-12.27 Genes associated with entire pathways 21.9 Class I G.5 Class II G.5 -specific sequence tags (GSTs) 21.9-21.10 translational silencing 26.22, G.15 Genetic code 18.1-18.14, G.15 at work 18.9 dictionary 9.2, 18.3 glossary18.2 in nuclear and mitochondrial genes 18.6-18.7 size of 18.1 specificity 18.9, G.15 control of biochemical reactions 15.1, 15.3 differentiation 24.16-24.17 engineering 27.45 evidence for triplet code 18.3-18.4 material, 1.3, G.15 characteristics of 2.1 DNA as 2.2 fundamental properties 3.15-3.16 continuity of genetic information 3.15 mutation 3.15, recombination 3.16 repair 3.17 storage of information 3.15

Essentials of Molecular Genetics nature of the 2.1-2.10 organization of 5.1-5.20 RNA as 2.5 structure of 3.1-3.18 recombination 8.1-8.30 RNA G.16 switches 24.21-24.22 variation, amplification of 8.24 Genome 21.10, G.16 alignment and reads assembly 27.15 complexity 5.6-5.7 constancy G.16 imprinting 26.9 sequencing, clone by clone approach 27.22 shotgun approach 27.21-27.22 strategies of 27.21-27.223 size and transposable elements 10.2 stability G.16 Genomic(s) 21.10, G.16 map of the psq gene 10.8 Genotype-phenotype relationship G.16 Germline theory of antibody diversity G.16 Global gene expression profiling 21.2 regulator of gene expression 16.10 transcription machinery engineering G.16 Globin gene family G.16 Glossary G.1-G.58 Glucocorticoid receptor (GR) 24.19 element (GRE) 24.20 Goldberg-Hogness box G.16 G-protein–coupled receptors (GPCRs) 24.25, G.16 Griffith effect 2.2 GroEL-GroES complex 20.14 Guide RNA G.16 in RNA editing 17.23-17.25 interference 26.17 Gyrases 7.7 Half genes G.16 Haplodiploids 9.7 Haploid number (n) 9.6-9.7 versus diploidy 5.9 Heavy chain genes 13.4 Helicase(s) 7.7 movement 7.8 Helix destabilizing (HD) proteins 7.21 Hemoglobin 24.11 Hepta- and decanucleotides 13.5 Hershey and Chase experiment (1952) 2.5 Heterochromatic siRNAs (hc-siRNAs) 26.16, G.16

Subject Index Heterochromatin 6.18, G.17 protein 1 (HP1) 25.21 Heterocluster of tRNA genes 14.10 Heteroduplexes 8.11-8.12 Heterogeneous nuclear (hn) RNA 24.6 Heterologous genes for metabolic engineering 21.12 Hexameric helicases 16.10 Hfr strain 8.2 Hidden genes G.17 transcription G.17 Highly repetitive DNA 5.18-5.19, G.17 unstable RNAs G.17 Histidine metabolism in Salmonella 15.5 Histocompatibility G.17 antigens 13.8, G.17 genes 13.8, G.17 Histone acetylation 25.22 acetylation/deacetylation G.17 code G.17 deacetylation 25.23 demethylation 25.22 gene clusters 14.5-14.6, 14.7 family G.17 -like proteins of bacteria 6.5 methylation 25.21 modification 25.21-25.25, G.17 in cellular differentiation 25.25 epigenetic control 25.26, G.18 consequences of 25.26-25.27 functions of 25.25-25.26 phosphorylation 25.23-25.24 proteins 6.7-6.10 shuttling G.17 synthesis 6.8, 7.22 tail loss G.17 turnover G.17-G.18 ubiquitination 25.24, G.18 Histones as repressors G.18 HIV reverse transcriptase G.18 HLA antigens and diseases 13.11 genes, functions of 13.10 HMG boxes 6.19 H-NS and H-NS-DNA2 complexes 6.4-6.5 Hogness box 12.17 Holliday junction 8.13 resolution 8.13 -resolving enzyme 8.12-8.13

S.11 Holoenzyme 16.4, 16.5, G.3 Homology-dependent gene silencing 26.3, G.18 post-transcriptional gene silencing G.18 Homopurine-homopyrimidine DNA sequences 3.10 Hormonal control systems in plants 24.28 Hormone-receptor protein complexes 24.24 Hormones and transcription G.18 in development 24.25-24.26 gene regulatory sequence G.18 Host restriction and modification 25.5-25.7, G.18 biological significance of 25.7 enzyme, action of 25.6 enzymes 25.5 Host specificity sites 25.6, G.18-G.19 Housekeeping genes 24.12, G.19 Hox gene cluster 12.27 Hpa II methylation 25.3 tiny fragment (HTF) islands G.19 hsd/hsd locus 25.5, G.19 M gene 25.5 M protein 25.5 R gene 25.5 R protein 25.5 S gene 25.5 S protein 25.5 Human  and  globin gene clusters 14.3 disorders due to aneuploidy 9.7 leukocyte antigen complex 13.8-13.10, G.19 mitochondrial genomes 4.5-4.6 Hut operon of S. typhimurium 22.13-22.14, G.19 Hybrid arrested translation 19.24, G.19 dysgenesis 26.10 gene formation in blood-related cancers 13.17 released translation 19.24, G.19 Hybridoma G.19 Hydroxyproline-based DNA mimics G.19 Hypermethylation 25.19 Hypervariable regions 13.2 Identical multigene families G.19 repetitive DNA 5.15 Identifying a human oncogene 13.15-13.16 Idiotypic diversity 13.8 Illegitimate recombination 10.18-10.19 transposon immunity 10.18 transposon tagging 10.18-10.20

S.12 non-targeted approach 10.19 targeted approach 10.18-10.19 transposon-mediated gene transfer 10.20 Immediate early genes 23.4, G.10 Immune system counteracts oncogenes 13.22 Immunoglobulin(s) 13.1-13.3, G.20 allotype G19 class-switch recombination (CSR) G.19 genes 13.1-13.8, G.19 heavy chain genes 13.2, G.20 light chain genes 13.2, G.20 Immunopurification G.20 Importin G.20 In vitro biologically active DNA synthesis 27.27 DNA synthesis from mRNA 27.27 DNA synthesis using DNA template 27.26-27.27 mutagenesis 9.42 ordered deletions for DNA 9.42 using a PCR technique 9.42 recombination G.20, G.38 Indels 9.12 Induced mutations 9.31-9.41 due to -particles 9.33 alkylating agents 9.40-9.41 antibiotics 9.40 azide 9.40 base analog(s) 9.35-9.36 5-bromouracil (5-BU) 9.35, 9.36 2-aminopurine 9.35, 9.36 β-particles 9.33 dyes 9.36-9.37 γ-irradiation 9.34 high temperature 9.38-9.39 hydrazine 9.38 hydrogen peroxide 9.39 hydroxylamine 9.38 maleic hydrazide 9.40 neutrons 9.33 nitrous acid 9.37 protons 9.33 radiations 9.32-9.35 ultraviolet (UV) light 9.33-9.34 X-rays 9.34-9.35 Inducer(s) 22.3, G.10, G.20 Inducible and repressible operons 22.16-22.17 control 22.2, 22.3-22.13, G.20 Informational gene family G.20 Inheritance of HLA antigen alleles 13.10 Inhibitors of nucleic acid precursors 9.35 Initiation

Essentials of Molecular Genetics and termination codons 18.7 codon 18.2, G.20 Insertion sequence (IS) elements 10.3-10.4, G.54 Insertional RNA editing 17.22, G.20 Integrator gene 24.6, 24.7, G.20 Intein G.20 Interaction of transcription factors with DNA 24.19 Intergenic spacers (IGS) G.20 transcription G.20 Internal control region (ICR) of a 5S rRNA gene 16.21 tRNA gene 16.17 transcribed sequences G.20 Interspersed repetitive DNA 5.16-5.17 Intragenic recombination 8.23-8.25 Intramolecular triplex helical DNA 3.10 Intron 12.19-12.21, G.20 -exon junctions 17.3-17.4, G.20 Introns, classification of 12.19 functions of 12.20-12.21 alternative splicing 12.21 gene regulation 12.21 intron-dependent spatial expression 12.21 intron-mediated enhancement 12.21 markers 12.21 origin of 12.21-12.22 Invariant DNA code 18.7, G.21 Inverted terminal repeats in transposons 10.3 Ionizing and non-ionizing radiations 9.32 Ionome 21.13, G.21 Ionomics 21.13, G.21 Jacob and Monod model, confirmation of 22.9 Jumping DNA 10.1 genes 10.1 Kallikrein gene 12.8 Kinetoplast DNA 4.8 maxicircle G.21 minicircle G.21 Klenow fragment 7.5 Knocking off genes 15.13 Knockout‟ technique 15.13, G.21 Kozak‟s scanning hypothesis 19.18, G.21 Lac a gene 22.4 operon in E. coli, 22.4-22.12, G.21 nucleotide length of different components 22.9 working of 22.5-22.6 repressor binding to lac operator 22.12 y gene 22.4

Subject Index z gene 22.4 Lactate dehydrogenase (LDH) 24.11-24.12 Lamba bacteriophage genes, the 23.4-23.9 early genes 23.4-23.8 late genes 23.8-23.9, G.21-G.22 middle (delayed early) genes 23.8 lytic cascade G.21 repressor 23.14 Large-scale gene function studies 21.9, G.21 Late intron view 12.21-12.22 Latent viruses 2.9 Later steps in processing 5S rRNA G.22 Leader 12.8-12.10 peptides/sequence(s) 22.20, G.22, G.46 Leucine zippers G.22 Levels of DNA packaging 6.11 gene control in eukaryotic cell 24.1-24.2 expression G.22 regulation in eukaryotic cells G.22 Life cycle of bacteriophage 23.2 Light chain genes 13.3-13.4 k 13.3 ι 13.3 Linker DNA 6.8 Links among regulated genes 21.9 Live vaccines G.22 Localized protein secretion G.22 Long interspersed nuclear elements 5.17-5.18 noncoding RNAs (lncRNAs) 26.1, G.22 nuclear-retained noncoding RNAs G.22 -term (irreversible) gene regulation G.22 lozenge alleles 12.2 Lyon hypothesis 25.14 Lysis G.22 Lysogeny G.22 Lytic cascade 23.4 cycle and lysogeny 23.9-23.10 Main band DNA 5.19 Major and minor grooves of DNA 3.5 splicing mechanisms 17.6 splicing pathway G.22-G.23 Mammalian dosage compensation 25.15 Mammalian RNA polymerase II gene promoters G.23

S.13 X chromosome inactivation 25.13-25.14 Many genes-one polypeptide hypothesis G.23 Mass spectrometry 21.1 Mature promoting factor (MPF) 24.22 Mechanism for diversity and specificity 24.19 of microhomology-mediated end-joining 11.32 Meiome 21.6, G.23 Meiotic segregation pattern 8.7 Membrane proteins – TatA, TatB, and TatC 20.3 Mendelism down to molecular level 1.2 Merodiploids 22.7, G.23 Merozygotes 22.7, G.23 Meselson-Stahl experiment 7.2-7.3 Messenger RNA (mRNA) 19.1-19,.3G.23 decay 20.16-20.19, G.23 no-go decay (NGD) pathway 20.16, 20.18 nonsense-mediated mRNA decay 20.19 Metabolic engineering 21.12, G.23 switches 21.11 Metabolites 21.12, G.23 Metabolome 21.12, G.23 analysis 21.12, G.23 Metabolomics 21.11, G.23 Metabonomics 21.11, G.23 Metal-responsive elements (MREs) 24.20-24.21 transcription factors (MRTFs) 24.21 Metastatic genes G.23 Methionine structure 19.6 tRNA (tRNAMet) structure 19.6 Methyl transferases in RdDM 26.7 /phospho switch G.23 Methylase reaction G.24 patterns G.24 Methylated sites, G.24 forms of 25.6 Methylation changes, overall levels of 25.4-25.5 induced premeiotically (MIP) 25.4 patterns, 25.3 clonally inherited 25.3-25.4 symmetrical 25.3 tissue-specific 25.4 specificity G.24 mic genes 22.19 micFRNA 22.19 Microarray(s) 21.1, 27.39-27.41, G.24 data normalization 21.1

S.14 quality controls 21.1 Microarrays, features of, 27.40 automation 27.40 miniaturization 27.40 multiplexing 27.40 parallelism 27.40 spotted vs. in situ synthesized 27.40 two- vs. one-channel 27.40-27.41 MicroRNA/miRNA(s) 26.1, G.24 as oncogenes 13.20 tumor suppressors 13.20 -directed TGS 26.5-26.6 in protein synthesis 19.23 processing RNAi pathway 26.16-26.17, G.25 Microsatellite(s) 27.34, G.24 and minisatellites 27.34-27.35 Primed-PCR (MP-PCR) 27.38 Middle lamba phage genes G.25 repetitive DNA 5.17, G.25 Minimum size of promoter 12.9 Minisatellites G.24-G.25 Minor splicing pathway G.25 –1 and –3 rule 20.3, G.25 –10 sequence 12.9 –35 sequence 12.9 Misfolded proteins 20.13, G.25 Mistake pairing between neoguanine and common guanine 9.29, 9.30 synadenine and common adenine 9.29, 9.30 Mistranslation 19.14 Mitochondrial diseases 4.7-4.8 DNA 4.3 packaging 6.19 transcription 16.26, G.25 maternally inherited 4.5, 4.7 gene introns 12.16, G.25 genome(s) 4.3-4.4 analysis of 4.7 mutations 9.42-9.43 RNA 4.7 Mixed copolymers 18.8 Mobile elements and genome evolution 10.26 Models of genetic recombination 8.10 double-strand break repair 8.14, 8.15 Holliday 8.10-8.14 one-sided invasion (OSI) 8.14 single-strand annealing (SSA) 8.14-8.15 Moderately repetitive DNA sequences G.25 Modes of DNA replication 7.1-7.2

Essentials of Molecular Genetics Modified central dogma G.25-G.26 Molecular basis of mutation 9.27-9.41 beacon assay 27.32, G.26 chaperone 20.13, G.26 genetics, birth of 1.1 markers G.26 in plant breeding and genetics 27.39 properties of histones 6.8 techniques and tools 27.1-27.52 weight of DNA 5.1 zippers in gene regulation G.26 Monintron gene 12.14 Monoallelic gene expression 26.12, G.26 Monoclonal antibody G.26 Monoploid number (x) 9.6-9.7 Mouse IgH enhancer 14.3 satellite DNA 25.4 Moveable genes 15.11-15.12, G.26 mRNA helicase G.26 -interfering complementary RNA 22.18 quantification 21.3-21.4 reading frame maintenance G.26 -Seq analyses 27.16 Multigene families, 14.1-14.10, G.26 divergent ,14.1-14.5 actin gene family 14.4 globin gene family 14.2 immunoglobulin genes 14.2 identical 14.5-14.10 copia gene family 14.5 412 gene family 14.5 5S RNA genes 14.8 histone gene family 14.5 ribosomal RNA gene family 14.6-14.7 small nuclear RNA gene family 14.10 storage protein gene family 14.10 tRNA gene family 14.9-14.10 Multintron gene 12.14 Multiple alleles 8.25 origins of DNA replication 7.12, 7.13 polyadenylation sites 15.11, G.26 sigma factors in E. coli 22.20, 22.21, G.26 in phage SPO1 23.3, G.27 small RNAs as regulators G.27 steps in transcription 16.19 Multiplicational gene family G.27 Multisite mutant alleles 8.24 Multivalent vaccines G.27 Mutagenesis in organelle genomes 9.42

Subject Index Mutation(s) 9.1-9.49 in operator 22.7 promoter 22.9 regulatory gene 22.7, 22.8 structural genes 22.7 usefulness of studies on 9.44-9.46 Muton 12.6, G.27 Mutually exclusive mRNA splicing 17.18, G.27 Myoglobin differentiation 14.4 N6-methyladenosine (m6A) 25.20 Natural antisense transcript-derived siRNAs (natsiRNAs) 26.16, G.27 Nature of regulatory control 5.8 Negative control 22.2, 22.4, G.27 -stranded RNA viruses G.27 N-end rule 20.14-20.15, G.27 Nested gene 12.24, G.27 New codon generation 13.7-13.8 patterns of regulation G.27 N-formylmethionine tRNA structure 19.6 Nick translation 27.13 Nobel Prize winner(s) 1933 T.H. Morgan 12.2 1946 H.J. Muller 9.21 1958 F. Sanger 27.20 1958 J. Lederberg 8.2 1959 Arthur Kornberg 1959 S. Ochoa 18.7 1960 F.M. Burnet and P.B. Medawar 13.1 1962 J.D. Watson, F.H.C. Crick and M.H.F. Wilkins 3.1 1965 A. Lwoff 2.9 1965 J. Monod and F. Jacob 22.4 1967 R. Granit, H.K. Hartline and G. Wald 1968 M.W. Nirenberg and H.G. Khorana 9.2, 18.3, 18.9 1968 R. Holley 27.14 1969 S.E. Luria and M. Delbruck 9.27 1971 Earl W. Sutherland, Jr. 24.25 1972 C.B. Anfinsen, S. Moore and W.H. Stein 27.41 1972 R.R. Porter and G.M. Edelman 13.1 1974 G. Palade 20.7 1975 M. Temin and D. Baltimore 16.12 1978 W. Arber, D. Nathans and H. Smith 27.42 1980 F. Sanger and W. Gilbert 27.20 1980 G. Snell, A. Benacerraf and J. Dausset 13.9 1980 P. Berg 27.44 1983 Barbara McClintock 10.11, 15.12 1987 S. Tonegawa 13.5. 24.3

S.15 1988 Sir James W. Black 24.25 1989 T.R. Cech and S. Altman 17.16 1989 J.M. Bishop and H.E. Varmus 13.12 1993 Michael Smith 9.41 1993 P.A. Sharp and R.J. Roberts 12.13 1994 A.G. Gilman and M. Rodbell 24.4 1999 G. Blobel 20.12 2001 L. Hartwell, R.T. Hunt and P.M. Nurse 13.11 2004 Linda B. Buck and Richard Axel 24.25 2006 A. Fire and C. Mello 26.13 2006 R. Kornberg 16.22 2007 M. Capeechi 9.45 2007 O. Smithies 15.13 2008 H. zur Hausen 13.23 2009 V. Ramakrishnan, T.A. Steitz and A.E. Yonath 19.4 2012 R.J. Lefkowitz and B.K. Kobilka 24.25 No-go decay pathway 20.18, G.27-G.28 Nomenclature of the two strands of double-stranded DNA 16.2 Nonambiguous code 18.4, G.28 Nonautonomous transposons 10.17 Noncoding DNA 5.8 strand 12.8 RNA central dogma 16.3, G.28 RNAs (ncRNAs) 26.1-26.2, G.28 and gene silencing 26.1 antisense 26.2 classes of 26.2 cytosine methylation of 20.21 methylation of 26.21 strand G.2 Nondegenerate code 18.2, G.28 histone proteins 6.10 identical repetitive DNA 5.15 messenger RNAs (nmRNAs) 26.1 overlapping code 18.2, 18.4, G.28 particulate and particulate radiations 9.32-9.33 repetitive DNA sequences 5.11-5.14, G.28, G.57 Nongenetic RNA(s) 15.11, G.28 Nonsense codon 18.2, G.28 -mediated mRNA decay 20.19, G.28 Nondisjunction 9.7 DNA repair mechanisms 11.34-11.37 degeneracy of the genetic code 11.34 suppression 11.34-11.37 intergenic/extragenic suppression 11.35-11.37 nonsense suppressors 11.35-11.36, 11.37 missense suppressors 11.37

S.16 frameshift suppressors 11.37 physiological suppressors 11.37 codon-anticodon misreading suppressors 11.37 through change in amino acid activating enzyme 11.37 intragenic suppression 11.34-11.35 gel-based assays 27.38, G.28 synonymous substitutions G.28 template strand G.45 transcribed spacer (NTS) G.28 transcribing strand G.45 viral family class 1.2 elements 10.16-10.17 long interspersed repeated sequences 10.17 short interspersed repeated sequences 10.17 Northern blotting 27.6-27.7 Novel biosynthetic pathways 21.13 Nuclear DNA G.28 localization signal 20.2 pore 20/2 RNA editing 17.26, G.28 in mammals 17.26 Nucleases, 27.41 functions of 27.41 Nucleic acid vaccines G.28 Nucleolar organizer regions (NORs) 17.13, G.28 Nucleosome 6.13-6.15 structure 6.12 Nucleotide polymorphism 14.13 Nucleotidyltransferases 7.10 Nucleus-cytosol compartmentalization G.29 Nutritional mutants of Neurospora 15.5, 15.6 Okazaki fragments 7.11, 7.17 Oligonucleotide chips 27.39 microarrays 27.39, G.29 Oncogene(s), 13.10-13.18, G.29 Class I 13.10, G.5 Class II 13.10, G.5 Class III 13.10, G.5 Class IV 13.11, G.5 classes of 13.10-13.11 fine structure of an 13.17-13.18 identification of an 13.15 origin of 13.14-13.15 One cistron-one polypeptide hypothesis 15.10, G.29 codon-two amino acids 18.9, G.29 enzyme-two functions concept 15.14, G.29 gene-many proteins hypothesis 15.13-15.14, G.29 gene-one antigen hypothesis 15.10, G.29

Essentials of Molecular Genetics gene-one chromomere hypothesis 15.9, G.29 gene-one enzyme hypothesis 15.5-15.8, G.29 gene-one mRNA-one protein hypothesis 15.1015.11, G.29 gene-one polypeptide hypothesis 15.8-15.9, G.29 hemoglobin 15.8 lactate dehydrogenase 15.8 tryptophan synthetase 15.8-15.9 gene-one primary cellular function hypothesis 15.11, G.29-G.30 gene-one reaction hypothesis 15.3, G.30 gene-one ribosome-one protein hypothesis 15.10, G.30 -start solenoid model 6.18 Onset of lytic cycle 23.8 Operator 22.2, G.30 -constitutive (oc) mutants 22.7 mutations G.30 Operon 22.2, G.30 concept 5.8 Orchestration of DNA-based processes 25.27 Ordered and cooperative activation by cAMP 22.12 Organization of 5S rDNA 14.9 of ribosomal DNA 14.8-14.9 Organo-chemical synthesis of gene, examples 27.26 Ovalbumin gene 12.19, 12.20 Overlapping code 18.2, G.30 Overlapping genes 12.23-12.24, G.30 p53 gene 13.19, 13.23 Packaging of nucleic acids 6.1- 6.22 Palindromic sequence(s) 27.42, G.30 Paracodons 18.12, G.30 Parallel strand switches 8.18 Paramutation 25.19, 26.9-26.10, G.30 Partial diploids 22.7 Paternal X chromosome inactivation 25.18, G.30 Pathway of histidine biosynthesis 15.7 T4 head assembly 6.3 Patterns among regulated genes 21.9 P-DNA 3.9 Peptide bond formation 19.9, 19.11 and translocation cycle 19.12 release 19.9 Peptidyl site (P site) 19.8 transferase 19.11, G.30-G.31 Periodic introns G.31 Pervasive transcription G.31 PEST hypothesis 20.15, G.31 Phage display technology 21.7, G.31 Phase transition in development G.31

Subject Index Phenomena associated with TGS 26.7-26.11 Phosphoproteins in gene regulation G.31 Phs1 gene 8.22-8.23 Phylogenetic relationships 14.4 Plant comoviruses 2.8 hormones 24.28, G.31 metabolomics, challenges in 21.13 mitochondria 4.5 -specific RNA polymerases 16.22, G.31 Plasmid replication 7.12 pM-pR region of lambda 23.7 Polar mutations 22.9, G.31 Poly(ADP-ribosyl)ation 25.24-25.25, G.31 Polycistronic mRNA 5.8, G.31 versus unicistronic mRNAs 5.8 Polymerase chain reaction (PCR) 7.29-7.30 Polypeptide release from the ribosome 19.15 Polyploidy 14.10, G.32 Polyprotein(s) 15.14 genes 15.14, G.32 Polyribosome formation 19.15, G.32 Polysome 19.15 Polyteny 14.11, G.32 Porin 22.19 Position effect 9.3 variegation 25.14, 26.10-26.11 Positive control of 22.2, 22.4 gene regulation G.32 lac operon 22.10 -stranded RNA viruses G.32 Post-transcriptional control by leader sequences G.32 in phage T4 G.32 in RNA phage R17 G.32 of gene regulation G.32 through attenuators G.32 Posttranscriptional gene silencing (PTGS) 26.6, 26.12-26.21, G.32-G.33 Co-suppression 26.18 Nonsense-mediated mRNA decay 26.18-26.19 Quelling 26.18 Riboswitches 26.20 RNA interference 26.12-26.17 -directed chromatin modification 26.12 transgene silencing 26.20-26.21 virus-induced gene silencing 26.19 translational gene regulation G.33 modification (PTM) 21.7, G.33

S.17 steps 19.16, G.33 targeting in eukaryotes 20.9-20.12 Premature termination codon (PTC) 20.19 Pre-messenger RNA processing 17.1-17.6, G.33 processing of leaders 17.1-17.2 addition of cap at 5´-end 17.1-17.2 processing of trains 17.2 3'-end processing 17.2 addition of poly(A) tail 17.2 removal of introns Pre-miRNAs 26.16 Preproinsulin 24.11 Preprokallikrein 24.11 Pre-ribosomal RNA processing 17.12-17.16, G.33 Pre-transfer RNA processing 17.6-17.12, G.33 Pre-transfer (soluble) RNA processing, intron removal 17.9-17.12 Pribnow box 12.17, 22.3, G.33 Primary metabolite 21.12, G.33 Primase 7.9 Pri-miRNAs 26.16 Primosome assembly 7.9 Process of translation in prokaryotes 19.5-19.15 Processing of early steps in 17.14 5S rRNA, G.34 5.8S rRNA 17.15, G.34 major rRNA species 17.14-17.15, G.34 P body 20.17, G.33 telomerase RNA 17.6, 17.7 Producer gene 24.6, 24.7, G.34 Progenote stage 12.22, G.34 Prokaryotic and eukaryotic genome differences 5.8-5.10 similarities 5.8 DNA polymerases 7.4-7.6 genes 12.19, G.34 genome 5.10 insertion elements 10.4 origin of histones H2A and H4 6.10 ribosome 19.4, G.34 70S initiation complex 19.7 subunit composition 19.4-19.5 versus eukaryotic genomes G.34 Proliferating cell nuclear antigen 11.33, 7.21 Promiscuous DNA 4.10 Promoter 12.8-12.10, 22.2, G.34 Promoter consensus sequences in G.34 melting G.34 Proofreading in translation 19.12, G.34-G.35 Pro-opiomelanocortin (POMC) complex 24.11

S.18 Properties of genetic code 18.2-18.7 Proteasome G.35 Protection of telomeres 11.33 Protein lac “a” 22.5, G.50 lac “y” 22.5, G.14 lac “z” 22.5, G.3 biosynthesis 19.1- 19.27 degradation 20.14-20.16 models G.35 N-end rule 20.14-20.15 PEST hypothesis 20.15-20.16 engineering 19.24-19.25, G.35 folding 20.12-20.13, G.35 modification 20.1-20.2 quantification 21.6-21.8 secretion 20.1, G.35 in Grampositive bacteria 20.5-20.6 negative bacteria 20.2-20/5 I prokaryotes 20.2-20.5 sorting 20.1, G.35, G.36 synthesis in cell organelles G.35 chloroplasts and mitochondria 19.20-19.21 organelles versus cytoplasm G.35 inhibition by microRNAs G.35 synthesizing machinery G.36 targeting 20.1, G.36 in eukaryotes 20.7, 20.9, G.36 prokaryotes 20.2, 20.5, G.36 trafficking G.36 translation without ribosomes G.36 translocation 20.1 in prokaryotes 20.5-20.6 trans-splicing 15.12, G.36 turnover 20.14, G.36 Proteinconducting channel 20.8 encoding gene G.36 nucleic acid interactions G.36 protein interactions 21.7, G.36 phage display 21.7 yeast two-hybrid (Y2H) system 21.7-21.8 Proteins involved in DNA repair in Escherichia coli 11.20 genetic recombination 8.16-8.17 Proteome 21.10, G.36 Proteomics 21.10, G.36 Proto-oncogene 13.12-13.15, 13.18-13.20, G.37

Essentials of Molecular Genetics activation 13.18, G.36-G.37 Pseudoalleles 12.2, G.37 Pseudogenes 12.24, 15.12, G.37 Pyrrolysine insertion sequence element 18.10-18.11 Quadruplex DNA 3.11 Quantitative trait locus (QTL) G.37 mapping 27.45-27.47 limitations of 27.47 principle of 27.46 using a RFLP marker 27.47 Quelling 26.18, G.37 RAD system 8.22 Radial-loop scaffold model 6.17 Random amplification of polymorphic DNA 27.36 Random amplified hybridized microsatellites 27.38, G.37 microsatellite polymorphism 27.36, G.37 RAPD-PCR 27.38 RdDM functions of 26.8 in defense against viruses 26.9 development 26.8 genome stability 26.9 stress responses 26.8 transcriptional activation in plants 26.9 Reactive female and inducer male Drosophila 10.8 Reading frame 18.2, G.37 Reassociation kinetics 5.4-5.5 Rec system 8.21-8.22 RecBCD complex 11.26 enzyme 8.21 proteins 8.10 Receptor site 24.6, 24.7, G.37 Recessive oncogenes G.2 RecF pathway 8.11 Recoding 18.10-18.12, G.37 signals 18.10-18.11, G.38 applications 18.12 Recognition system V(D)J recombination 13.5-13.7 Recombinant DNA 27.44 Recombinant DNA construction of 27.45 technology 27.43-27.45, G.20, G.38 Recombination genetics and enzymology of 8.20-8.23 nodules 8.6 repair 11.28 Recon 12.6, G.38 Redirecting metabolic flow 21.12-21.13 Redundancy of integrator genes 24.8

Subject Index receptor as well as integrator genes 24.9 receptor genes 24.8 Redundant genes 14.12, G.38 Regulated gene 22.2, G.38 Regulation of Cap-dependent translation 19.17, G.38 chromatin structure G.38 histone protein levels 6.9 transposable elements 24.21 Regulator(y) genes 22.7, G.38 code G.38 elements in eukaryotic gene regulation G.38 sequences G.38 steps in lambda development 23.6 transcription factors 24.12 Relationship among tRNA, rRNA and mRNA in protein synthesis 17.17 between gene and enzyme 15.1, 15.2 genotype and phenotype 15.3-15.5 TGS and PTGS 26.6 Removal of introns 17.2-17.6, G.38 Repeated DNA sequences and diseases 5.19-5.20 gene family G.38 genes 15.11, G.38 Repeat-induced gene silencing (RIGS) 25.4, G.38 mutation (RIP) 25.4 point (RIP) mutation G.38 Repeats in promoters G.39 Repetition frequency of DNA 5.7-5.8 Repetitive DNA 5.14-5.19, G.39 Replication of factor A 7.21 factor C7.21 DNA termini 7.22-7.23 nucleic acids 7.1-7.32 Replicons 7.4 in eukaryotes 7.14 Replisome progression complex 7.21 -RNA polymerase collision G.39 Reporter gene G.39 plasmid G.39 Representing gene in literature 12.7, G.39 Repressible control 22.2, 22.13-22.16, G.39 Repressor(s) G.39 establishment 23.10 -operator binding 22.18, G.39

S.19 synthesis 23.11 Requirements of recombination 8.7-8.8 Restriction endonucleases 27.41-27.43, G.39 fragment length polymorphisms (RFLPs) G.39 mapping G.39 Retained introns G.39 Retrogene 15.12, G.39 Retroposition 10.16 Retroviruses 2.8 Reverse epimutations G.40 RNA splcing G.40 transcription 16.12-16.13, G.40 in bacteria G.40 DNA viruses G.40 Reversible histone methylation G.40 Reversion of imprinted X chromosome 25.14-25.15 RFLP analysis 27.33 assumptions 27.34 limitations 27.34 Rho-dependent terminators G.40 Rho-independent terminators G.40 Ribonuclease(s) (RNases) 27.41, G.40 E G.40 III family G.40 Ribonucleic acid (DNA) G.40 Ribosomal elongation cycle G.41 (r) RNA(s) 19.3-19.4, G.40 genes G.40 RNA transcription regulation G.41 translocation 19.10, G.41 Ribosome(s) G.41 -associated trigger factor 20.12 code G.41 in prokaryotes, eukaryotes, mitochondria and chloroplasts 19.21 profiling G.41 recycling factor 19.16 stalling 20.16 Riboswitches G.41 gene regulation by 26.20 rII locus 12.2 fine structure of 12.6 RNA antiswitches G.41 editing 17.21-17.26, G.41 adenosine-to-inosine 17.22, G.41 and self-splicing G.41-G.42 cytosine-to-uracil 17.22, 17., G.42

Essentials of Molecular Genetics

S.20 through base exchange 17.22, G.42 deamination/amination 17.22, G.42 nucleotide exchange 17.22, G.42 in chloroplasts 17.22-17.23, G.42 kinetoplastid mitochondria 17.24 plant mitochondria 17.26, G.42 editing, functional significance of 17.26 mechanisms of 17.21-17.22 hairpin loops 3.15 interference (RNAi) G.42 and dosage compensation G.42 proteins G.42 methylation 25.20-25.21, G.43 polymerase(s) I (RNAPI) 16.16-16.18, G.43 I-specific promoters G.43 II (RNAPII) 16.18-16.20, G.43 II-specific promoters G.43 II transcription machines 16.24 II transcription preinitiation complex G.43 II ubiquitylation sites G.43 III (RNAPIII) 16.20-16.21, G.43 III-specific promoters G.43 IV (RNAPIV) 16.22, G.43 V (RNAPV) 16.22, G.43 DNA-binding site G.43 of RdDM pathway 26.7 recognition site G.43 secondary channel G.43 primer removal 7.15, 7.17, 7.22 synthesis 7.8-7.9, 7.22 processing 17.1-17.21, G.43 and RNA editing 17.1-17.29 quality-control systems G.43-G.44 recombination, 8.25-8.27 classification of 8.26-8.27 homologous 8.26-8.27 molecular mechanisms of 8.25-8.26 class I copy-choice mechanism, 8.25 class II copy-choice mechanism 8.25-8.26 class III copy-choice mechanism 8.26 non-homologous 8.27 replicase-driven 8.27 template-switching 8.27 transesterification reactions in 8.27 sequencing 27.13-27.16 direct RNA sequencing 27.15-27.16 ion exchange chromatography 27.14 next generation sequencing 27.14-27.15

radioactive chain-terminating ribonucleotides 27.14 through reverse transcription 27.13 silencing, model for 26.13, 26.14 synthesis 27.22-27.23 homopolymer RNA synthesis method 27.23 phage RNA polymerase method 27.23 RNA phosphorylase method 27.22 RNA polymerase method 27.22-27.23 single-stranded DNA virus method 27.23 transport G.44 virus(es) 2.7 replication 7.28 with a DNA phase 7.29 without a DNA phase 7.28 virus(es), double-stranded 2.8 negative-stranded 2.7 positive-stranded 2.8 shaping genome of 8.26 single-stranded 2.7 world hypothesis G.44 RNA, secondary structure of 3.15 structure of 3.13-3.15 three-dimensional structure of 3.15 RNAdependent RNA polymerase dependent RNA polymerase 16.19-16.20, G.44 directed DNA methylation 26.7-26.9, G.44 directed RNA synthesis G.44 DNA hybrid in biological functions G.44 RNA recombination 8.26 RNAi processing pathways 26.15-26.17 proteins for RdDM 26.7 RNAi, discovery of 26.12-26.13 stages of 26.13 RNomics 26.1-26.2 and gene regulation G.44 RNomics, cDNA capture analysis 26.1 computational 26.1 direct RNA sequencing 26.1 experimental 26.1 genomic SELEX‟ 26.1 microarray analysis 26.1 rRNA transcription regulation G.44 RuvABC system 8.22 S-adenosyl methionine (SAM) 25.10 SAM-dependent DNA methylases 25.4 Satellite (band) DNA(s) 5.16-5.18, 5.19

Subject Index sB recognition site 25.6 Scaffold structure 6.15-6.16 Sec translocons 20.10 Second code 18.12, G.44-G.45 messenger 20.12, G.7, G.32 Secretion of proteins synthesized on free ribosomes G.45 membrane-bound ribosomes G.45 Secretory (Sec) pathway(s) 20.5 in eukaryotes 20.7-20.9 SecYEβ structure 20.3 Selenocysteine insertion sequence element 18.10 Selfish DNA G.45 Self-splicing pre-rRNA intron in 28S precursor RNA of Tetrahymena 17.16-17.17, G.45 Semi-conservative DNA replication 5.8 Sense DNA strand 12.8, G.45 -PTGS G.45 -suppression 26.18 word 18.2, G.45 Sensitivity of protein synthesis to antibiotics 19.21 Sensor site 24.6, 24.7, G.45 Separation of DNA/RNA 27.1CaCl2 density gradient centrifugation 27.1 ethidium bromide 27.1 fluorescent in situ hybridization (FISH) 27.4-27.5 hydroxyapatite columns and nitrocellulose filters in situ hybridization (ISH) 27.3 nanospheres method 27.3 Silica adsorption method Sequence characterized amplified regions 27.36, G.45 distribution 5.12 tagged sites (STSs) 27.35, G.45 Sequon 20.11, G.45 Serial analysis of gene expression (SAGE) 21.1, 21.4-21.5, G.45 70S ribosome binding sites 19.8 Sexduction 8.3 Shine-Dalgarno sequence 19.3, 22.21, G.45 Short hairpin RNA (shRNA) 26.16, G.45 interspersed nuclear elements (SINES) 5.17 tandem repeats (STRs) 27.34, G.24 -term (reversible) gene regulation G.45 Sigma (σ) factors G.46 Signal anchor-dependent docking of SRP 20.6 hypothesis 20.9, G.46 peptidase (SPase) 20.5

S.21 peptide 20.8, G.46 recognition particle (SRP) 20.6 RNA G.49 sequence(s) G.46 at the ribosomal tunnel exit 20.8 -dependent SRP-ribosome interaction 20.9 transduction G.46 Silent epigenetic changes G.46 Similarities between prokaryote and eukaryote genomes G.46 Simple gene 12.7-12.13, G.46 sequence repeats (SSRs) 27.34, G.24, G.47 transcription units G.46 -sequence gene family G.46 Singlechannel microarray detection G.46 copy DNA sequences G.57 nucleotide polymorphism (SNPs) 27.37, G.46 polypeptide nuclear RNA polymerase G.47 strand binding (ssb) proteins 7.10, 8.10 conformation polymorphism 27.36, G.47 stranded DNA 3.12 multicopy 3.12 RNA (ssRNA) viruses G.47 Singularity versus plurality of chromosomes 5.9 SiRNA(s) in RdDM 26.8 processing RNAi pathway 26.15, G.47 SiRNP assembly 26.15 Site-directed mutagenesis 9.41 procedure of 9.41 65th codon 19.25 sK recognition site 25.6 Slipped DNA 3.11 Sloppy copier 11.33 Small hairpin RNAs 26.16, G.45 molecules in gene regulation G.47 noncoding RNAs (sncRNAs) 26.1 nuclear RNAs (snRNAs) 17.2-17.3, G.47 nucleolar organizer RNAs (snoRNAs) 26.1, G.47 temporal RNAs (stRNAs) 26.1, G.47 Smart genes G.47 transcription factors G.47 Snapback (SB) loop 10.5 snRNA gene(s) G.47-G.48 family G.47 snRNPs (snurps) 17.4, G.48 pre-mRNA splicing 17.1-17.6, G.48 Solenoid 6.15

S.22 formation by H1 polymers 6.12 Somatic diversity theory G.48 Southern blotting 27.5-27.6 Spatial, temporal, and quantitative colinearity 12.27 Spatio-temporal regulation of mRNAs G.48 Specialized genes 24.12, G.48 Specificity of composite DNA elements 24.1924.20 Sperm RNAs 1.3 Splice-site compatibility 17.20, G.48 mutations 9.12 recognition G.48 Spliceosome 17.4, G.48 assembly 17.5, G.48 Splicing and polyadenylation G.48 in mRNA cytoplasmic localization G.49 mechanisms G.49 of rRNA precursors, 17.10-17.11 autocatalytic 17.14 mechanism of 17.13-17.14 Split gene(s) 12.13-12.17, G.6 in chloroplasts 12.16 mitochondria 12.16 discovery of 12.15-12.16 versus non-split nature of gene 5.8 Spm element, 10.1-10.11 developmental regulation of 10.10-10.11 Spontaneous mutation(s) 9.27-9.31 due to agents residing inside the cell 9.31 changes in electronic structure of bases 9.27 endogenous DNA damage 9.30 alkylation 9.31 deamination 9.30 oxidative damage 9.30-9.31 errors in DNA replication 9.28-9.29 mispairing during replication 9.30 rare base pairing 9.28 tautomerization 9.30 transposition 9.31 Spotted microarrays G.49 Spotting in corn kernels 10.12 Squash and dot hybridization 27.5 src activation 13.14 oncogene 13.13-13.14 SRP receptor 20.8 -signal sequence-mRNA-ribosome complex 20.7

Essentials of Molecular Genetics Stable unannotated transcripts ( SUTs) G.31 Standard amino acids 19.2 Startise 16.6, G.49 Stem-and-loop 10.5 Steroid hormones 24.24 Stochastic rearrangements G.49 Storage protein gene family 14.10, G.49 Strand resolution 8.12 separation 7.7 specification 11.14 transfer repair 11.19 Strange alleles 13.8, 13.9 Stringent response 22.22-22.23, G.49 Structural alleles 12.1, G.49 basis of TGS 26.5 gene(s) 22.3, G.36, G.49 in non-repetitive DNA 5.12 mutations G.49 of deoxyribonucleotides of DNA 3.2 mammalian immunoglobulin 13.2 molecules forming nucleic acids 3.2 ribonucleotides 3.14 tRNA precursor 17.9 Substitutional editing G.49 Supercoiled DNA 3.12-3.13 Supercoiling in bacterial chromosome 6.4-6.5 Surveillance controls of LTR retrotransposons 24.21 SV40 enhancer region 12.10-12.11 Symmetric RNA structures G.49 Synapsis and crossing-over 8.7 Synaptonemal complex 8.5-8.6 Synchronization of polymerization on leading and lagging strands 7.11 Synergistic transcription G.49 Synonymous codons 18.2, G.49 mutations G.49 substitutions G.49 Synthesis of a piece of double helix 27.25 arginine 15.7 28S and 18S rRNA 17.15, G.49 Synthetic biology G.49, G.50 gene networks G.49 genetic circuit G.50 vaccines G.50 Systems of numbering of nitrogenous bases 3.2 T.A.T triplets 3.10

Subject Index Tagging enzyme 21.4 Tandemly repeated DNA 5.15 Taqman assay G.50 Target mRNA recognition and cleavage 26.17 theory 9.34 TATA binding transcription factors 24.13 box 12.17, 22.3, 24.6, G.16, G.33 Tautomerism 9.28 Taylor et al. (1957) experiment 7.16 T-cell receptor(s) (TCRs) 13.1, G.50 genes G.50 Telomerase 7.23-7.24 RNA processing G.50 Telomere replication 7.22-7.24 Template strand G.2 Temporal sequence of gene expression 23.3, G.50 Terminator(s) 12.10 in a tRNA operon 16.8 Tet-on and Tet-off 21.3, G.50 TGS through antisense RNA stabilization 26.6 convergent transcription 26.5 The second code 18.12-18.13 Thiogalactoside transacetylase 22.4, G.50 3′-end processing of pre-mRNAs G.50 Three classes (I-III) of genes in human leukocyte antigen (HLA) complex 13.9 parts of lambda operator OR 23.13 states of methylation 25.3 Thymine dimers 9.34 Tile-based nanostructures G.50 Tissue compatibility G.17 Toeprinting assay G.50 Tools of proteome analysis 21.7 Topoisomerases 7.7-7.8 Torpedo model of transcriptional termination G.50 Train 12.10-12.13, G.51 Trans-activating small interfering RNAs (tasiRNAs) 26.16, G.51 Transactivator of transcription synthesis 23.16 Transcribing strand G.2 Transcription 16.1-16.28, 24.22, G.51 elongation 16.6-16.8, G.51 inchworm-like movement 16.6 monotonous movement 16.6 two-step mechanism 16.7 factor C/EBP G.51 CCAAT-enhancer-binding proteins G.51 in bacteria 16.4-16.10

S.23 RNA polymerase 16.4-16.5 operons 16.6 start, elongation and termination 16.5 initiation 16.5-16.6, G.51 termination, 16.8-16.9, G.52 factor rho 16.10 hairpin structure 16.8 kinds of 16.9-16.10 rho-dependent terminators 16.10 rho-independent terminators 16.10 bacteriophages 16.11-16.12 eukaryotes, G.51 metal-regulated 24.20-24.21 non-bacterial viruses 16.10-16.11 prokaryotes G.51 viruses 16.10-16.12, G.51 of centromeric repeats G.51 eukaryotic genes 16.13-16.26 preinitiation complex (PIC) G.51 proofreading mechanism 16.7, G.52 start site tagging 16.20 Transcriptional complex G.47 complexity G.52 control of eukaryotic gene regulation G.52 elements G.52 factors (TFs) G.52 forests G.52 gene silencing (TGS) 26.4-26.12, G.52 hub G.52 initiation in mammalian genes 24.12-24.13 pausing 16.8, 16.20, G.52-G.53 processivity G.53 regulators G.53 regulatory networks G.53 stuttering 16.8, G.53 unit 12.8, G.53 Transcriptionassociated mutagenesis 16.25-16.26 -coupled nucleotide excision repair 11.9 Transcriptome 21.1, 21.10, G.53 analysis G.53 Transcriptomics 21.10, G.53 Transduction, 2.3, 8.2-8.3 generalized 8.2-8.3 messenger RNAs (tmRNAs) G.53 of amino acid to tRNA G.53 RNA (tRNA) 19.3, G.53 genes G.53 nucleotidyltransferases 17.9 Transformation 2.2, G.53 Transforming

S.24 growth factor β (TGF-β) 26.17 principle 2.3 Transinduction 26.11 Translation 19.1-19.27, G.53 editing reactions 19.13-19.14, G.10 elongation 19.9-19.4, 19.16-19.19, G.53 in eukaryotic cytoplasm 19.16-19.20 prokaryotes, 19.1-19.16 elongation 19.9-19.14 initiation 19.7-19.9, G.53-G.54 complex G.54 post-termination complexes 19.15-19.16 preinitiation steps 19.5-19.7, G.54 termination 19.14-19.15, G.54 quality-control mechanism 19.12-19.13 Translational gene silencing 26.21-26.22, G.54 initiation in eukaryotic gene regulation G.54 Translocases 16.10 Translocatable genetic elements 10.1 Translocation-and-pause cycles 19.10 Transposable DNA 10.1 phages 10.4-10.5 element structures 10.13 elements, defense against spread of 25.19-25.20 genetic elements 10.1, G.54 Trans-position G.54 Transposition, 10.23 mechanism of 10.20-10.23 DNA To DNA transposition 10.21 tconservative transposition 10.21-10.22 replicative transposition 10.21-10.22 through RNA as an intermediate 10.23 Transposon(s) 10.1, 14.12, G.54 hypothesis 12.21-12.22 mutagenesis 10.18 advantages of 10.18 silencing 26.10 tagging, isolation of a gene through 10.19-10.20 Tn3, functional components of 10.2 with long terminal direct repeats (LTDR) 10.5 copia and copia-like elements 10.5 intracisternal a type particles (IAPs) 10.5 Ty elements in yeast 10.6 long terminal inverted repeats 10.5, 10.6 short terminal inverted repeats 10.5, 10.7 Ac-Ds system in maize 10.11 I-R elements in D. melanogaster 10.7-10.8 Piggy Back 10.9 P-M elements in D. melanogaster 10.7

Essentials of Molecular Genetics Sleeping Beauty 10.9 Tam1 in Antirrhinum majus 10.14 Tam3 in Antirrhinum majus 10.14 Tc1 in Caenorhabditis elegans 10.10 Te1 element in Caenorhabditis elegans 10.7 without terminal repeats 10.5, 10.14-10.16 pararetroviruses 10.15 retroelements 10.14 retrotransposons 10.15 retroviruses 10.14-10.15 Transposons, characteristics of 10.1-10.3 limitations of 10.25-10.26 role of 10.23-10.24 Tn family of 10.4 uses of 10.24-10.25 Trans-TGS 25.4-25.5, G.54 Transvection 26.11 Tri-isopropyl-benzenesulphonylchloride (TPS)mediated condensation reaction 27.4 Triple helical DNA 3.9-3.10, G.54 Triplet code 18.2, G.54 repeat amplification 5.20 tRNA acceptance G.54-G.55 CCA-adding polymerase 17.8-17.9, G.55 gene family G.55 processing in E. coli 17.13 eukaryotes 17.11 yeast 17.12 tRNAs, Class I G.5 Class II G.5 trp gene leader 12.10, 12.12 leader region 22.15-22.16 Trypanosome surface antigen switching G.55 Tryptophan (trp) operon 12.12 in E. coli and Salmonella 22.14-22.15, G.55 production 22.16, 22.17 Tumor-suppressor genes (TSGs) 13.22, G.2 12±1 or 23±1 nucleotides spacers 13.6 Twin-arginine translocation (Tat) pathway 20.3 Two genes-one polypeptide hypothesis 15.13, G.55 Twochannel/color microarray 27.40-27.41, G.55 start supercoiled model 6.18 Type B introns G.8 I restriction endonucleases 27.42, G.55

Subject Index I secretion system (T1SS) 20.2, G.55 II restriction endonucleases 27.42, G.55-G.56 II secretion system (T2SS) 20.2-20.4, G.56 III restriction endonucleases 27.42, G.56 III secretion system (T3SS) 20.4, G.56 IV secretion system (T4SS) 20.4, G.56 V secretion system (T5SS) 20.4, G.56 VI secretion system (T6SS) 20.4-20.5, G.56 Types of genes 12.6-12.18 introns 12.19, G.56 nucleotides in DNA 3.1 recombination 8.8 DNA transposition 8.8-8.9 homologous recombination 8.8 illegitimate recombination 8.9 aberrant site recognition 8.9 illegitimate end joining 8.9 illegitimate replication 8.9 illegitimate strand exchange 8.9 illegitimate V-(D)-J joining 8.9-8.10 unequal crossing-over 8.9 restriction endonucleases 27.42-27.43 transfer RNAs G.56 transitions 9.29 Ubiquitin 25.24, G.56 Ubiquitination G.56 Understanding gene regulation through mutations 22.7-22.9 Unfolded protein response 20.12-20.13, G.56 Unicellularity versus multicellularity 5.9 Unicistronic mRNA 5.8, G.57 Unidirectional replication 7.12 Unineme structure 5.8 Unique DNA sequences G.57 versus repeated sequences 5.9 Unisite mutant alleles 8.24 Universality of code 18.2, 18.6, G.57 Untranslatable exons 14.13 active and passive 7.7

S.25 Up mutations G.57 Unwinding of DNA 7.2 Updated central dogma of molecular biology 16.3 V(D)J recombinase 13.7 recombination G.57 Vaccines G.57 Variable-number-of-tandem-repeats 27.34, G.24 Variant repetitions of DNA sequences G.57 Viral and non-viral retrotransposons 10.15-10.16 family class 1.1 retroelements 10.15 retrogenes 10.17 retron 10.17 retrosequence 10.17 genomes 2.7 infectivity 2.4 oncogenes 13.11, 13.14-13.15, G.57 RNA replicase 7.29 Virtual Southern blot 21.1 Virus-induced gene silencing (VIGS) G.57 VSG switching 24.4-24.6, G.57 Waalwizk-Flavell experiment 25.4 Watson strand 12.8, 16.2, G.2 Western blotting 27.7-27.8, G.57 Wobble rules 18.5-18.6, G.57 WT1 gene 13.23 X chromosome counting 25.15 X-inactivation center (Xic) 25.15 Xist (X-inactivation-specific transcripts) gene 25.14 Yeast mating type switching 24.4, G.58 mitochondrial genomes 4.6 protein RAP1 G.58 transcription activator GCN4 G.58 transcription regulator G.58 Yeast two-hybrid (Y2H) system G.58 Z-DNA 3.7-3.8 Zinc fingers G.58 Zinder and Lederberg experiment (1952) 2.4

    About the Author  

Gurbachan S. Miglani retired as Professor of Genetics from Department of Plant Breeding, Genetics and Biotechnology, Punjab Agricultural University (PAU), Ludhiana, India after putting in 35 years of service. Taught general genetics, advanced genetics, biotechnology, biochemical genetics, molecular genetics, immunogenetics, developmental genetics, and evolution to undergraduate and graduate students of constituent colleges of the PAU. Also taught genetics to undergraduates at Howard University, Washington, D.C., USA, as a Graduate Teaching Assistant. Dr. Miglani was invited by School of Agricultural Biotechnology, PAU in March 2010 as Adjunct Professor, to teach biotechnology to B.Sc. (Biotechnology), M.Sc. (Biotechnology) and Ph.D. (Biotechnology) students, which he did till December 2010. He was rehired in January 2011 as Visiting Professor and continues to teach there. Guided eleven M.Sc. and one Ph.D. students. Completed two prestigious research projects funded by the University Grants Commission, New Delhi, India. Authored 140 publications (including research papers, review papers, books, book chapters, short notes/communications, abstracts, popular science articles) in Indian and foreign journals/magazines. Six laboratory manuals authored by him for different genetics courses and were published by the Punjab Agricultural University, Ludhiana, India. Contributed several chapters for books edited by Indian and foreign authors. Also authored Dictionary of Plant Genetics and Molecular Biology (1998), Basic Genetics (2000); Advanced Genetics - First edition (2002), Developmental Genetics (2006), Advanced Genetics – Second Edition (2007), and Fundamentals of Genetics (2008), Genetic Material (2013), and Gene Expression (2013), Gene Regulation (2013). He is always keen in popularization of science of genetics by way of radio talks and writing popular articles for magazines and newspapers. Associated with The Journal of Plant Science Research, published by the Society for the Promotion of Plant Science Research, Jaipur (India), for last more than 12 years in different capacities, including editor. Recipient of Meritorious Teacher Award of the Punjab Agricultural University, Ludhiana (for the year 1997-98) and Sneh Prabha Shukla Memorial Award of Honor by Punjab Sahitya Kala Manch (Regd.), Ludhiana, Punjab, India for the year 2001.