304 35 33MB
English Pages 221 [222] Year 2023
An Introductory Course on Molecular Biology
An Introductory Course on Molecular Biology By
Ramón Serrano
An Introductory Course on Molecular Biology By Ramón Serrano This book first published 2023 Cambridge Scholars Publishing Lady Stephenson Library, Newcastle upon Tyne, NE6 2PA, UK British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Copyright © 2023 by Ramón Serrano All rights for this book reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyright owner. ISBN (10): 1-5275-1009-3 ISBN (13): 978-1-5275-1009-8
For my wife Mariche, light of my shadows, and for our sons and daughters, sons-in-law and daughters-in-law and our 11 grandchildren. Without them this book would have been completed much quicker but everything would be meaningless.
TABLE OF CONTENTS
Acknowledgments .................................................................................... xii Introduction ............................................................................................. xiii Chapter 1 .................................................................................................... 1 What is Molecular Biology? 1.1. What is not and what is Molecular Biology 1.2. Gene nano-programs and protein nano-machineries explain the hallmarks of life 1.3. Molecular Biology is a basic science but its methods have an enormous application 1.4. An example of Molecular Biology approach: growth control by proton transport 1.5. Hallmarks of a Molecular Biology 1.6. The chemical bonds important in Molecular Biology 1.7. References Chapter 2 .................................................................................................. 18 Genes and Genomes 2.1. The discovery of the double helix of DNA 2.2. Scarcity of genes is solved by Genetic Engineering or Technology of Recombinant DNA: cloning in vivo and in vitro 2.3. Reverse transcriptase makes RNA accessible to Genetic Engineering 2.4. The polymerase chain reaction (PCR) 2.5. DNA sequencing: Sanger method and Next Generation Sequencing (NGS) 2.6. Genome sequencing: from yeast to human and beyond 2.7 Genetic modification of organisms 2.8. References
viii
Table of Contents
Chapter 3 .................................................................................................. 35 Protein Structure 3.1. The four levels of protein structure 3.2. Flexibility for conformational changes but fragility for denaturation 3.3. Visualizing protein structures 3.4. Alpha-helices 3.5. Beta-sheets 3.6. Reverse turns 3.7. Chaperones or HSPs 3.8. LEA proteins and osmolytes 3.9. The Proteasome 3.10. References Chapter 4 .................................................................................................. 51 Bioinformatics and Evolution 4.1. Sequence alignment and homology 4.2. Comparison of protein structures 4.3. The concept of homology 4.4. Evolutionary trees 4.5. Substitution matrixes 4.6. Genomics and proteomics: “big data” 4.7. The European Bioinformatics Institute, EMBOSS programs and Data Banks 4.8. References Chapter 5 .................................................................................................. 62 The Different Works of Protein Nanomachineries and the Chemical Work of Enzymes 5.1. The different works of proteins 5.2. Why enzymes are so specific and efficient 5.3 Mechanisms of enzymatic catalysis in reactions of nucleophilic substitutions 5.4. Covalent catalysis 5.5. The special work of ligases or synthetases to drive biosynthetic reactions 5.6. Every enzyme is a potential motor 5.7. P-loop is a sequence motif specialized to bind nucleotide triphosphate 5.8. Classes of proteases 5.9. References
An Introductory Course on Molecular Biology
ix
Chapter 6 .................................................................................................. 73 Protein Nano-Motors and Mechanical Work 6.1. Myosin walks over actin 6.2. Conformational changes of myosin 6.3. Role of myosin in muscle 6.4. Other cellular motors 6.5. Bacteria have a rotary motor in their flagella Chapter 7 .................................................................................................. 85 Regulation of Gene Transcription 7.1. Promoters, enhancers (UAS) and repressing sequences (URS) 7.2. RNA polymerases and helicases 7.3. Antibiotics that inhibit transcription 7.4. Protein complexes involved in transcription and splicing 7.5. Types of transcription factors 7.6. Chromatin immunoprecipitation (ChIP) 7.7. Epigenetic modifications 7.8. References Chapter 8 ................................................................................................ 110 Membrane Proteins Involved in Transport across Biological Membranes and Biophysical Regulation of Cellular Activities 8.1. Ion homeostasis: the role of major cellular ions 8.2. The two chemiosmotic circuits: the one of sodium showed up first, the one of protons later 8.3. Pumps, channels and transporters 8.4. The mechanisms of primary active transporters or pumps 8.5. Channels and transporters 8.6. Antiporters, symporters and uniporters or facilitated diffusion 8.7. The regulatory cations: Na+, H+, Ca2+ and K+ 8.8. Ion sensors and downstream pathways 8.9. References Chapter 9 ................................................................................................ 136 Biochemical Regulation by Proteins 9.1. The regulatory work of proteins 9.2 Protein kinases and phosphatases 9.3 G proteins 9.4 Ubiquitin ligases and the proteosome 9.5 Second messengers: phosphoinositides and cyclic nucleotides 9.6 Receptors of hormones and growth factors. Action potentials
x
Table of Contents
Chapter 10 .............................................................................................. 154 Growth Control in Cells and Cancer 10.1. The three stages of growth and proliferation control 10.2. Markers for growing state and ion homeostasis 10.3. The hallmarks of cancer 10.4. pH homeostasis and cancer 10.5. Doxorubicin and the pH gradient 10.6. How to conquer cancer 10.7. Metastasis 10.8. Inhibition of pH regulation 10.9. The triggering role of Hypoxia via HIF1 in cancer development 10.10. Inhibition of bicarbonate transport 10.11. A promising approach to specifically inhibit growth of cancer cells 10.12. References Chapter 11 .............................................................................................. 170 The Tolerance of Plants to Abiotic Stresses 11.1. The importance of tolerance of crops to abiotic stresses 11.2. The physiology and biochemistry of abiotic stress 11.3. The molecular biology of abiotic stress 11.4. The usefulness of biodiversity 11.5. Cold and freezing stress 11.6. LEA proteins and chaperones 11.7. References Chapter 12 .............................................................................................. 182 An Example of Scientific Research in Molecular Biology 12.1. The goal of the research on intracellular pH 12.2. A forward genetic approach with the model plant Arabidopsis thaliana to identify targets and defense mechanisms of intracellular acidification 12.3 Results of an exhaustive screening of the activation-tagging collection 12.4. Additional phenotypes of the sbt4.13 mutant 12.5. Electrophysiological characterization of the sbt4.13 mutant 12.6. How can less H+-ATPase explain acid and oxidation tolerance 12.7. References
An Introductory Course on Molecular Biology
xi
Recapitulation ......................................................................................... 197 Index ....................................................................................................... 198
ACKNOWLEDGEMENTS
I want to thank the many students of the grade on Biotechnology at the Polytechnic University of Valencia (UPV), Spain, who attended my courses on Molecular Biology and Genetic Engineering, because by answering my questions and asking me questions they helped to clarify a lot of concepts. I want to recognize the work of the architect Rafael Ligorit for the layout of the book. Thanks also, to Hilario Teruel, consul of Burkina Faso, who always encouraged me to finish this book, my son Ramón, a professional translator, and proof reader Laurence Fenton, for a full correction of the manuscript. Last, but not least, thanks to the computer scientist Ramón Nogales who helped a lot with the computer work. Organizations such as Wikipedia, ResearchGate and others that offer figures and illustrations free of copyright on the internet have been very useful, especially given the enormous amounts of money demanded by some editorials for reproduction of figures and illustrations. My gratitude to Prof. Alberto Sols (Research Institute on Enzymology, Autonomous University of Madrid, Spain), Richard Reeves (University of Louisiana, New Orleans, USA), Efraim Racker (Cornell University, Ithaca, New York, USA) and Gerald R. Fink (Whitehead Institute, Massachusetts Institute of Technology, Boston, USA) for being my teachers in scientific research and for dealing good-naturedly with my difficult character. sometimes I learned with them most of what I know and I will always be indebted to them. Finally, thanks to the Polytechnic University of Valencia, Spain, and the Research Institute on Molecular and Cellular Plant Biology (IBMCP) for providing me the opportunity to teach and enabling me to continue research after retirement as Emeritus Professor.
INTRODUCTION
This book is intended to introduce students and scholars to Molecular Biology, the modern approach to living organisms that has revolutionized biology and generated novel biotechnological applications. The best text book in the field of Molecular Biology is that of Berg, Tymoczkko, Gatto and Stryer (2015). However, that book is a mixture of Biochemistry and Molecular Biology, which in my opinion are not the same thing. I want to pinpoint in this introduction two great paragraphs from Steven Pinker (2018, page 378): Any curriculum will be pedagogically ineffective if it consists of a lecturer yammering in front of a blackboard, or a textbook that students highlight with a yellow marker. People understand concepts only when they are forced to think them through. A second impediment to effective teaching is that pupils don’t spontaneously transfer what they learned from a concrete example to another in the same abstract category. Science is not a game with an arbitrary rulebook; it’s the application of reason to explaining the universe and to ascertaining whether its explanations are true. Finally, we should consider the thinking of the great Russian science fiction author Ivan Efremov (1952, 2): The school must teach the latest knowledge. A lot of time is wasted teaching past things. Another consideration is that the best way to think about a subject is to discuss it frequently with others, as Socrates and Plato did. For example, an isolated person like the one in the figure below gets bored while the group in the figure below has a good time.
xiv
Introduction
Finally, Albert Einstein said once that: “You do not really understand something unless you can explain it to your grandmother.” I would also like to make a digression about scientific literature. The Thomson Reuters impact factor of a journal in 2022 is calculated as follows: A = citations in 2022 in all journals to articles published in journal X during 2020 and 2021. B = number of articles published in journal X during 2020 and 2021. Then the 2022 impact factor of journal X is A/B. This method has some problems: (a) the citation period of only 2 years discriminates against papers that are slow to be accepted; and (b) it does not correct for the size of one scientific field (for example, biomedicine is much larger than plant sciences). A fashionable field has more journals that could provide citations (A above), but one particular journal might publish a similar number of articles in all fields (B above) Other classifications of journals are: (a) journals with no publication fee (with the exception of color figures) but where institutions must pay to have access to the digital form of the journal or to receive the printed form in the library; and (b) journals with expensive publication fees (from €2,000 up to €10,000). These are the Open Access journals. In the latter case, some potential authors of articles may be excluded based on financial but not scientific criteria. Randy W. Schekman, the 2013 Nobel Prize winner in Physiology and Medicine from the University of California at Berkeley, has said: “Pressure to publish in luxury journals encouraged researchers to cut corners and pursue trendy fields of science instead of doing more important work. The problem was exacerbated by editors who were not active scientists but professionals who stimulated studies that were likely to make a splash”.
An Introductory Course on Molecular Biology
xv
Randy W. Schekman (left) and Jorge E. Hirsch (right).
In order to evaluate the scientific production of scientists, Jorge E. Hirsch, presently professor of Physics at the University of California at San Diego, USA, proposed the “h index” (Hirsch 2005, 16569-16572). This is defined as the number of articles with a QXPEHURIFLWDWLRQVh. A scientist has an “h index” of X if X of his N articles have at least X citations each, and the other N-X papers have < X citations. It is calculated by ordering publications by decreasing number of citations. See an example below. How to calculate the “h index” of an author:
The advantage of this index is that the impact factor of the journals does not matter because journals with a high impact factor may have poor articles with few citations. The important thing is the number of citations received by the articles of one scientist, independently of the impact factor of the journals where the articles are published.
xvi
Introduction
References Berg, Jeremy M., Tymoczko, John L., Gatto, Gregory J. and Stryer, Lubert. 2015. Biochemistry. New York: W.H. Freeman and Company Efremov, Ivan. 1952.The Andromeda Nebula. Translation from Russian by Maria K. Molodaya Gvardiya, Moscow, Russia, Foreign Language Publishing House Hirsch, Jorge E. 2005. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences (USA), vol. 102, pages 16569-16572 Pinker, Steven. 2018. Enlightenment Now. The case for Reason, Science, Humanism and Progress. Great Britain, Allen Lane Penguin Books
CHAPTER 1 WHAT IS MOLECULAR BIOLOGY?
1.1 What is not and what is Molecular Biology Molecular Biology is not the study of the molecules of living organisms. Biochemistry, also called Biological Chemistry, fits this definition because it studies biological molecules and their chemical reactions in cells (metabolism). The adjective “Molecular” before Biology does not refer to all molecules of living organisms but only to the two crucial ones: nucleic acids and proteins (structural and catalytic, as well as motors, membrane transporters and regulators). Molecular Biology is not the classical study of genes. This corresponds to Genetics, which deals with genes in relation to heredity, loci and genetic variation. Molecular Biology is an approach to biological phenomena that is based on the atomic structures and genetic modifications of the mechanisms and physiological functions of the two crucial molecules for life: nucleic acids (DNA and RNA) and proteins. It is a reductionist approach derived from Biochemistry and from Genetics and it has developed a specific tool called Genetic Engineering (also known as Recombinant DNA Technology) to isolate genes, modify them in vitro and reintroduce them into organisms to investigate the physiological functions of genes and encoded proteins. It has also developed physical methods to determine the atomic structure and mechanism of these macromolecules. In X-Ray Diffraction, ordered molecules in fibers or crystals give a three-dimensional atomic picture of nucleic acids and proteins. More recently, in Cryo-Electron Microscopy (cryo-EM), many pictures of single, unordered molecules at liquid helium temperature are combined to provide similar atomic pictures of macromolecules because at such low temperatures the molecules are static, and without need of crystallization. Both Genetic Engineering and structural methods require a specific informatics approach: Bioinformatics, which is crucial to generate and utilize the big data provided by the above methods.
2
Chapter 1
Some time ago, the physicist Ernest Rutherford (1871-1937, Nobel Prize in Chemistry 1908, Figure 1.1) said the following: “All science is either Physics or stamp collecting”. He was right because Physiology is fantasy, just correlations without cause-effect connections, Biochemistry may be an artefact because it consists of just “in vitro” studies and Genetics is just phenotypes and loci and does not reach the molecular level (genes and proteins). However, Molecular Biology has raised Biology to the level of an exact science because Genetic Engineering can prove hypotheses about the functions of genes and proteins, and structural studies of nucleic acids and proteins can demonstrate their mechanisms.
Figure 1.1. Ernest Rutherford. Courtesy of gettyimages.com.
Genetic Engineering can be used to generate mutants of specific genes with either gain-of-function (over-expression) or loss-of-function (knockout) and then check the phenotypes of the organisms. If a function of an organism is worsened by the loss of function of a gene, this is a proof of the role of the gene and its corresponding protein in this function. If a function of an organism is improved by the gain of function of a gene, this is a demonstration that this gene and its corresponding protein not only participate in this function but that they are rate-limiting. Molecular Biology explains “what is life”, the question and title of a famous 1944 book of Erwin Schrödinger, by testing hypotheses thanks to Genetic Engineering and uncovering biological mechanisms at the atomic level thanks to X-Ray Crystallography and Cryo-Electron Microscopy. These methodologies provide biotechnological tools such as transgenic organisms and rational drug design.
What is Molecular Biology?
3
1.2 Gene nano-programs and protein nano-machineries explain the hallmarks of life The mystery of life has always intrigued scientists. With regards to the chemical activities of organisms, the Swedish chemist Jöns Jacob Berzelius proposed in 1815 that organic compounds (those made by organisms) could only be produced by a God-given “vital force”. But a German disciple of Berzelius, Friedrich Wöhler, synthesized urea in the laboratory in 1828, without the use of animals, and so dispelled the need of a mysterious vital force. However, the chemical activities of organisms (metabolism) are relatively simple functions of life and cells are much more complex than just a bag of enzymes (the biological catalysts). The six hallmarks of life, as adapted from SCIENCEFACTS, https://www.scifacts.net/biology/what-is-life/, are: 1. Carbon-based and cellular organization Living things are made from organic compounds (carbon-based); they are composed of cells and are either unicellular or multicellular. 2. Reproduction and heredity Asexual or sexual reproduction, the later involving the joining of gametes or sex cells. Daughter cells and children organisms inherit features from mother cells and parent organisms, respectively. 3. Metabolism The chemical reactions of living things; nutrients are taken and utilized as sources of energy (catabolism) and as sources of pillars to build up cell components (anabolism). 4. Growth and development Cells enlarge and divide; development of multi-cellular organisms occurs by growth and differentiation of cells. 5. Response to stimuli and homeostasis Cells and organisms make changes in response to external stimuli to keep their internal environment within a narrow range.
4
Chapter 1
6. Adaptation and evolution Variation of individuals is important for adaptation of species to certain environments; selection of adaptation traits is the basis for evolution. Living things have all these features, while other beings, such as viruses, demonstrate only a few of these characteristics, specifically numbers 2 and 6 above, and are not living organisms. Because all living things have these common features (the hallmarks of life) and utilize the same basic molecules, nucleic acids and proteins as key molecules and carbohydrates and lipids as secondary ones, it has been proposed that all present organisms derived from a single ancestral cell called LUCA (Last Universal Common Ancestor), who lived about 4 billion years ago, see https://phys.org/news/2018-12-luca-universal-common-ancestor.html). The emerging picture of Molecular Biology is a scientific understanding of the complexity of life with no need to invoke mysterious forces in organisms (“vitalism” theories). Molecular Biology was born in the middle of the 20th century through the realization that life consisted of molecular phenomena that transcended classical biochemistry and genetics. Genes are complex nano-programs of DNA that dictate all cellular activities and determine inheritance and evolution (hallmarks 2 and 6). Sophisticated nano-machineries made from proteins (and sometimes including RNA) replicate and express genes, convert chemical energy into mechanical and osmotic energy, move molecules across biological membranes, and regulate all cellular activities. See Figure 1.2 for a graphic description of these statements.
1.3 Molecular Biology is a basic science but its methods have enormous applications Molecular Biology has raised Biology from a descriptive level into the category of an “exact science” because hypotheses can be tested by genetic modifications of organisms and the mechanisms of nucleic acids programs and protein nano-machineries can be understood at the atomic level through either X-ray Crystallography or Cryo-Electron Microscopy. Genetic Engineering is the basic tool that proves the physiological function of genes and proteins by generating gain-of-function and loss-offunction mutants. Basically, modifications of isolated genes are made “in vitro” to generate gain-of-function (higher expression or higher protein
What is Molecular Biology?
5
activity) or loss-of-function (knock out/null mutants or mutants with partial loss of function) and then these mutated genes are introduced into organisms by replacing the wild type copy. Phenotypes are investigated to ascertain what functions of the organism are modified. A final development has been the study of thousands of genes at the same time with special microarrays containing thousands of nano-samples in small pieces of glass. This process has been named “Genomics”, and the word “omics” has been generalized to massive studies with proteins (Proteomics).
Figure 1.2. Molecular Biology is an approach to understanding life by considering proteins and nucleic acids as the “hardware” (nano-machineries) and “software” (nano-programs), respectively, of living cells. These crucial molecules are studied at the atomic level and modified to test hypotheses.
An example is shown in Table 1.1. The physiological function during intracellular acid stress of the two prolyl isomerases in the model plant Arabidopsis thaliana was investigated using two kinds of transgenic plants: (a) single (rof1 or rof2) and double (rof1 rof2) mutants with lossof-function (knock out) by insertion of T-DNA (the part of the Ti plasmid of Agrobacterium tumefaciens transferred to plant cell genomes by conjugation); and (b) gain-of-function by over-expression of ROF2 (OE ROF2) through transformation with a chimeric gene containing the strong 2x35S viral promoter and the coding regions of ROF2. Intracellular acid stress was produced by the addition of acetic acid, which diffuses into
6
Chapter 1
cells in the protonated form and dissociates protons inside. The experiment demonstrates that these two prolyl isomerases are redundant (single mutants have little phenotype but the double mutant is clearly more sensitive to the acid than the wild type) and required for maximum tolerance to intracellular acidification. On the other hand, the improved tolerance to intracellular acidification resulting from over-expression of ROF2 suggests that this prolyl isomerase gene is limiting for tolerance to intracellular acid stress. Over-expression of a gene corresponding to a notlimiting step would not improve stress tolerance. Table 1.1. Arabidopsis ROF1 and ROF2 prolyl isomerases modulate germination in media containing weak organic acids. Percentage of germination and seedling establishment of Arabidopsis wild-type, the rof1 mutant, rof2 mutant, rof1 rof2 double mutant, and a line over-expressing ROF2 (OE ROF2) four days after planting in normal MS medium (pH 5.5) and in this medium supplemented with 3.5 mM acetic acid. Data from the author.
_____________________________________________________ Percentage of small germinated plants with green cotyledons Wild type rof1 rof2 rof1, rof2 OE ROF2 _____________________________________________________ Control 99 98 98 98 99 Acetic acid 43 36 28 18 75 _____________________________________________________ These methodologies, developed to test the physiological function of genes, have also led to transgenic organisms (GMOs or Genetically Modified Organisms) with applications in agriculture (useful transgenic crops), animal breeding (useful transgenic animals) and medicine (useful transgenic microorganisms producing drugs, hormones or antibodies). One example of a genetically modified organism widely utilized is the insectresistant potato, which expresses insecticide proteins from Bacillus thuringiensis called Bt and is encoded by Cry genes (see Figure 1.3). Other examples are the fast-growing salmon (which over-expresses growth hormone) and the insulin-producing bacteria (which express the human insulin gene). An irrational rejection of transgenic crops has been developed in Europe by environmental (Green) parties and non-scientific populist organizations such as Greenpeace. This rejection has no scientific basis and it is hoped it will go away with time because GMOs are the future of agriculture and cattle raising. In America and China there is no such rejection. When the
What is Molecular Biology?
7
Spanish discoverers of America brought tomatoes and potatoes from the new world to Europe (16th century) they were immediately eaten in Spain but it took two centuries for the French and Germans to eat these wonderful healthy foods. Non-Mediterranean Europeans are very conservative and easily confused by environmental activists. The European Union, for example, had for a period a Scientific Advisor, the English scientist Prof. Anne Glover, but as she made a document in favor of transgenic organisms she was inmediately fired. Clearly, the European Union operates by the non-scientific ideology of Greenpeace and has no respect for scientists.
Figure 1.3. Transgenic potato plants that express the Bt protein and are resistant to the potato beetle (Leptino sp.) (top of picture) and control plants (bottom of picture). Courtesy of Professor Francisco Garcia-Olmedo (School of Agricultural Engineers, Madrid, Spain).
Atomic structures require the biochemical purification of proteins from either cell tissues or from transgenic microorganisms or convenient organisms expressing the protein of interest. X-ray diffraction of crystals or cryo-Electron Microscopy (cryo-EM) of single particles provide the required structures. Protein crystals provide a static array of molecules for X-ray diffraction, but by using single protein molecules at the liquid helium temperature (-269 ºC) during Cryo-EM, molecules are static and, as indicated above, the three-dimensional structure can be solved without the tedious crystallization. Atomic structures allow us to understand the mechanisms of the protein nano-machineries involved in chemical, mechanical, osmotic and regulatory works in cells. An example of a complex protein machinery is the mitochondrial Fo.F1-ATPase or ATP synthase of bacteria, mitochondria
8
Chapter 1
and chloroplasts, which is composed of 31 protein subunits and has had its atomic structure solved by modern Cryo-Electron Microscopy (see Figure 1.4). Mitochondria carry out oxidation of reduced substrates with oxygen and the respiratory complexes couple the chemical energy of red-ox reactions to pumping protons out of mitochondria. This is the chemiosmotic theory of Peter Mitchell, who received the Nobel Prize in Chemistry in 1978 and buried the previous model based on chemical phosphorylated intermediates as occurs in glycolysis. The electrochemical proton gradient (pH difference and electrical potential) drives ATP synthesis by forcing mitochondrial Fo.F1-ATPase to run in reverse, working as an ATP synthase and coupling proton uptake by mitochondria to ATP synthesis. The enzyme was discovered by Efraim Racker and is made of two parts: Fo - with the “o” coming from conferring sensitivity to the drug oligomycin, binding to the trans-membrane part - and F1 - factor 1 of the oxidative phosphorylation with ATPase activity protruding from the membrane. Both parts contain static subunits and rotary ones. The mechanism was solved by the model of Paul Boyer and the enzyme structure of John E. Walker and both shared the Nobel Prize in Chemistry in 1997. This protein nano-machinery works as a rotary motor: ATP hydrolysis (ATP + H2O -> ADP + Pi + 30 kJ/mol) forces rotation of the rotor part with respect to the stator part (counter clock-wise from the top of F1), resulting in the pumping of protons out of the mitochondria. When the proton gradient generated by the mitochondrial respiratory chain is big enough (> 40 kJ/mol), protons move into mitochondria though the ATPase, and force rotation of the rotor part clockwise from the top of F1, i.e. in the opposite direction to before, and the enzyme is converted into ATP synthase. Rotation of the rotor part modifies the three catalytic subunits (E) sequentially from active site open (O), to active site binding ADP + Pi (L), to active site binding preferentially ATP + H2O (T). Alternatively, ATP hydrolysis when no respiration is working results in a T > L> O sequence. See Figure 8.19.
What is Molecular Biology?
9
Figure 1.4. Structure of mitochondrial Fo.F1 ATPase. Courtesy of Protein Data Bank.
In addition to providing insight into mechanisms, the atomic structures of crucial proteins in human diseases, such as viral proteins and oncogenic proteins, paved the way for rational drug design. Major examples are inhibitors of the essential protease of human immunodeficiency virus such as Crixivan (Figure 1.5 A) and inhibitors of tyrosine kinases such as Gleevec (imatinib mesylate), which inhibits the oncogenic Bcr-Abl fusion protein active in chronic myelogenous leukemia (Figure 1.6 A). Both cases illustrate the point that determination of the atomic structure of targets is essential for rational drug design. Organic molecules are then designed to block the active site of the protein. In the first case the inhibitor resembles the peptide substrate of HIV protease (Figure 1.5B). In the second case, Gleevec binds to the catalytic cleft of the kinase without resembling the substrates of the enzyme (Figure 1.6 B).
10
Chapter 1
A
B
Figure 1.5. The design of indinavir as an inhibitor of the essential protease of Human Immunodeficiency Virus (HIV). A: determination of the atomic structure of HIV protease was essential for the rational design of indinavir (Crixivan, bound to the enzyme); B: this drug resembles a peptide substrate of the protease. Courtesy of Protein Data Bank (A) and SciFinder (B).
What is Molecular Biology?
11
A
B
Figure 1.6. Gleevec was designed to bind the catalytic cleft of the Bcr-Abl protein tyrosine kinase. A: structure of Gleevec; B: binding of the inhibitor into the catalytic cleft. A courtesy of es.m.wikipedia.or and B courtesy of The Medical Dictionary.
The three steps of rational drug design are: 1. Identifying the protein responsible for a disease. 2. Solving the atomic structure of this protein. 3. Designing an organic molecule able to block the active site of the protein. Another methodology of Molecular Biology is Bioinformatics, which is the analysis of the sequences and structures of genes and proteins. This has become especially prevalent with the “big data” that resulted from the “omic” approaches (global studies of all the genes and proteins of an organism – called genomics and proteomics, respectively). Algorithms for “big data” analysis have enormous importance in health and marketing
12
Chapter 1
studies. An example is the genotyping of human beings and microorganisms, plants and animals. The sequencing of different genomes from the same species has shown differences between individuals belonging to geographical and racial groups. This has become a very profitable business for an American company named “23andMe” that utilizes saliva samples from customers to isolate DNA and analyze SNPs (Single Nucleotide Polymorphisms - changes of one nucleotide at a position of the genome in different individuals) to estimate ancestry and health risks. More than 600,000 oligonucleotides are printed onto small chips by the company Illumina (San Diego, California, USA) to be hybridized with the DNA samples of the organism by automated techniques and the resulting map is correlated with phenotypes (https:// www.23andme.com/?mdc2=true).
1.4. An example of a Molecular Biology approach: growth control by proton transport Physiological and biochemical studies have shown that cell plasma membranes have transport systems for two monovalent cations: protons and sodium. Two types of transporters exist: primary active transporters and secondary transporters. The first couple light or chemical energy (redox reactions or ATP hydrolysis) to pump H+ or Na+ out of cells, creating the electrochemical gradient of these cations. Secondary transporters couple the uptake of nutrients or the efflux of toxic molecules to the energy released by the downhill movements of H+ or Na+ into cells. Animal cells and some prokaryotes have a primary transport of sodium and a secondary one of protons, while fungi, plants and most prokaryotes have a primary transport of protons and a secondary one of sodium. In addition to nutrient uptake, intracellular pH and sodium concentrations seem to have a more specific regulatory role in cell growth (see Chapters 8, 9 and 10). A Molecular Biology approach to this important aspect of Biology has the following steps: 1. Isolation by biochemical methods of the plasma membrane H+ATPase (Pma1) from the model organism Saccharomyces cerevisiae (baker’s yeast) and demonstration in vitro (upon reconstitution in lipid membranes) that the enzyme couples ATP hydrolysis to pump protons into vesicles. See Table 10.1. 2. Isolation by Genetic Engineering methods of the PMA1 gene encoding the yeast proton pump.
What is Molecular Biology?
13
3. Generation by reverse genetics of mutant yeast cells with different levels of proton pump activity. 4. Correlation between growth rate and Pma1 activity in the above series of mutants (Table 1.2). 5. Expression in mouse fibroblasts of the yeast PMA1 gene, resulting in fast growth and tumorigenic transformation (Table 1.3). 6. The conclusion is that proton transport regulates growth not only in yeast, where H+ transport is primary, but also in mouse fibroblasts, where H+ transport is secondary to Na+ transport. Cancer is a disease of unregulated growth control where proton efflux and intracellular pH are greater than normal and the above results open novel therapeutic strategies centered on proton transport. Table 1.2. Correlation between activity of yeast plasma membrane H+-ATPase in a series of yeast mutants generated by genetic engineering and intracellular pH and growth rate in media at pH 4.0. Data from the author.
___________________________________________ Relative ATPase Growth rate Intracellular pH Activity (%) (h-1) ___________________________________________ 30 0.05 5.4 50 0.07 5.6 60 0.10 5.9 100 0.15 6.1 ___________________________________________ Table 1.3. Correlation between H+-ATPase activity, tumorigenic transformation and intracellular pH of mouse fibroblasts expressing different mutants of yeast plasma membrane H+-ATPase. Data from the author.
________________________________________________________ (%) ATPase activity of (%) Tumorigenic Cell pH the introduced gene transformation _______________________________________________________ No gene introduced >> R-OH + HO-PO3where A- is R-O-, x+ is +PO3- and B- is OH- and H+ is H+ A concrete example is the reaction of lysozyme, as shown at Figure 5.2.
Figure 5.2. The reaction of lysozyme. The active site binds six monosaccharides but only four are shown. NAG corresponds to N-acetyl glucosamine and NAM to N-acetyl muramate. The broken bond is the one between central NAM and NAG. Courtesy of Encyclopedia of Biophysics.
The Different Works of Protein Nanomachineries and the Chemical Work of Enzymes
65
The mechanism of lysozyme suggested by its atomic structure is shown in Figure 5.3. Binding of substrate is made by hydrogen bonds of side chains from D101 and W62 and by carbonyl groups of the main chain of amino acids at positions 59 and 107. Catalysis is mediated by E35 and D52. In the first case, this is by general acid catalysis, where the protonated carboxylic group donates a H+ to the oxyanion generated at the transition state when the glycosidic bond is broken. In the second case the dissociated carboxylic group of D52 stabilizes the positive charge acquired by carbon 1 of the substrate when a water molecule attacks the sugar residue (electrostatic catalysis).
Figure 5.3. The catalytic mechanism of lysozyme. residues involved in stabilization of transition state. Courtesy of Iowa Wsstern Community College. Adapted from Tanaka, Nishinomiya, Gotto, Shimazaki et al., 2021, page 288.
5.4 Covalent catalysis A special mechanism utilized by many enzymes consists in forming an intermediate substrate covalently bound to the enzyme while a strong nucleophile at the active site of the enzyme facilitates subsequent attack by water. The family of serine proteases contains a highly reactive nucleophilic serine at position 195 that attacks ester or amide bonds of substrates making an acyl-enzyme. Hydrolysis of this intermediate substrate is easier than that of the starting substrate (Figure 5.4).
66
Chapter 5
Figure 5.4. The mechanism of covalent catalysis in the family of serine proteases. Courtesy of ACS Publications, American Chemical Society.
The high reactivity of this serine is due to its forming part of a catalytic triad that facilitates the dissociation of the hydroxyl group (see Figure 5.5). The alcoholic hydrogen of S195 makes a hydrogen bond with the nitrogen of a histidine (H57). This side chain contains a hydrogen that makes a hydrogen bond with one oxygen of the carboxylate from D102. This chain of events stabilizes the alkoxide anion of dissociated serine (Ser195--O-).
Figure 5.5. The catalytic triad of chymotrypsin, a serine protease. Courtesy of Researchgate.net.
5.5 The special work of ligases or synthetases to drive biosynthetic reactions A category of enzymes known as ligases or synthetases act as nanomachines utilizing the energy of ATP hydrolysis to create new bonds by a thermodynamically irreversible reaction. As indicated below,
The Different Works of Protein Nanomachineries and the Chemical Work of Enzymes
67
basically the condensation of alcohol and carboxylic acid is made possible by avoiding the liberation of a water molecule (due to the fact that high concentration of water in the medium opposes the reaction). The reaction is divided into two parts: first ATP reacts with the carboxylate group of one substrate to make an acyl phosphate intermediate and pyrophosphate (PPi). The later product is hydrolyzed by soluble pyrophosphatases. The acyl-AMP bond is high energy and in the second step the alcohol substrate attacks the carbonyl-AMP bond to create the ester bond. In this way no water is liberated during condensation because it is utilized for ATP hydrolysis to AMP and PPi. See the scheme below:
5.6. Every enzyme is a potential motor During the catalytic cycle of enzymes there are conformational changes as illustrated in Figure 5.6.
Figure 5.6. Changes in conformation during the catalytic cycle of an enzyme.
68
Chapter 5
The binding of the substrate triggers the first change and conversion into product the second one. The number of conformations increases if more than one substrate or product is involved. This creates enormous possibilities for protein nano-machineries, as will be described in the next chapters. Visualization of the structure of these multiple conformations is not easy because an enzyme performing its catalytic cycle cannot be crystalized for X-ray analysis or used for Cryo-Electron Microsocopy as it is continuously changing in structure. Therefore, special tricks are needed to fix particular conformations, such as, a non-hydrolyzable analogous of ATP that contains a nitrogen bridge between the beta and gamma phosphates (AMPPNP) in the case of enzymes using ATP, as shown in Figure 5.7.
Figure 5.7. Structure of AMP-PNP. Courtesy of Brend-enzymes.org.
Another difficulty is that we cannot see videos of the changing conformation of an enzyme, as mentioned previously. Therefore, researchers must collect structures of different conformations of the protein and make a guess at the sequence of events.
5.7 P-loop is a sequence motif specialized to bind nucleotide triphosphates Many proteins binding ATP or GTP have a common sequence motif between a beta-sheet and an alpha-helix. This sequence is (X means any amino acid): GXXXXGK(T/S). It constitutes an anion hole to bind the
The Different Works of Protein Nanomachineries and the Chemical Work of Enzymes
69
beta- and gamma-phosphate of the ATP or GTP (Saraste, Sibbald and Wittinghofer 1990, page 430). One example is shown in Figure 5.8.
Figure 5.8. Structure of one P-loop. Courtesy of ResearchGate.com.
5.8 The classes of proteases According to the crucial amino acid or prosthetic group in the active center there are four classes of proteases: (a) Serine proteases (Figure 5.5) (b) Cysteine proteases (Figure 5.9) (c) Aspartyl proteases (Figure 5.10) (d) Metalloproteases (Figure 5.11). In papain (a cysteine protease), the cysteine in position 25 is the attacking nucleophile, assisted by the histidine in position 159. The ribbon structure of papain and the details of the active site are shown in Figure 5.9. The asparagine, histidine and cysteine form a catalytic triad.
70
Chapter 5
Figure 5.9. Structure of papain, a cysteine protease (top, courtesy of Protein Data Bank) and detail of the active site catalytic triad (below, courtesy of puns.rsc.org).
As indicated in Figure 5.10, in renin (an aspartyl protease) the aspartate in the active center accepts a proton from a water molecule generating a hydroxyl group that attacks the peptide bond of the substrate giving rise to a carboxylate and an amine. As indicated in Figure 5.11B, a metal group (Zn2+) chelates the oxygen of the carbonyl group of the peptide bond to be broken. Then the carbon atom of the carbonyl group acquires a negative charge and breaks an O-H bond in a water molecule. Histidine 231 accepts the proton in the nearby nitrogen and the >N-H group of the histidine ring gives a proton to aspartate 226. Protease inhibitors are important drugs. One example is captopril (Figure 5.11C). This drug inhibits the Angiotensin Converting Enzyme (ACE), a metalloprotease that increases blood pressure.
The Different Works of Protein Nanomachineries and the Chemical Work of Enzymes
71
(A)
(B)
Figure 5.10. (A) structure of renin, an aspartyl protease (Courtesy of Protein Data Bank) and (B) detail of the active site (Courtesy of eu.wikipedia.org).
Figure 5.11. (A) structure of thermolysin, a metalloprotease, bound to an inhibitor (Courtesy of Protein Data Bank); (B) detail of the active site (Courtesy of jbc.org); (C) Structure of captopril (Courtesy of es.wikipedia.org).
72
Chapter 5
5.9 References Tanaka, Ichiro; Nishinomiya, Ryoto; Goto, Ryosuke; Shimazaki, Shun and Chatake, Toshiyuki. 2021. Recent structural insights into the mechanism of lysozyme hydrolysis. Acta Crystallographic D Structural Biolology. vol. 77: pages 288-292. Saraste, Matti; Sibbald, Peter R. and Wittinghofer, Alfred. 1990. The Ploop, a common motif in ATP- and GTP-binding proteins. Trends in Biochemical Sciences, vol. 15: pages 430-434.
CHAPTER 6 PROTEIN NANO-MOTORS AND MECHANICAL WORK
An important motor of eukaryotic cells is myosin, which functions as a locomotive moving on rails made of actin filaments. The scheme of this dimeric protein is shown in Figure 6.1. One accessory subunit is essential for activity and the other contains an EF hand and is regulatory by sensing calcium. The impressive muscles of the rearing horse (Figure 6.2), like the muscles of all animals, are powered by the molecular motor protein myosin. A part of myosin, the globular catalytic domain, moves dramatically in response to ATP binding, hydrolysis, and product release, propelling myosin along an actin filament (Figure 6.3). This molecular movement is translated into the movement of the entire animal, as vividly depicted in Leonardo da Vinci’s rearing horse. The crucial conformational change of the myosin ATPase domain during the catalytic cycle is shown in Figure 6.3. At the carboxyl-terminus of this domain, some parts move as much as 2.5 nm. The filamentous domain of myosin is shown in Figure 6.4.
74
Chapter 6
A
B
Figure 6.1. Structure of myosin. (A) Schematic view. Every monomer has (from left to right) a globular catalytic domain for ATP hydrolysis and actin binding, a neck (also known as lever arm) with two small accessory subunits (LC17 and LC20) wrapped around the neck of each heavy chain and connecting to the long filamentous (rod) domain. (B) Ribbon diagram of the structure of myosin. The actin binding site, the nucleotide binding site with a P-loop, the essential and the regulatory light chains and the lever arm are shown. Courtesy of researchgate.net.
Figure 6.2. The rearing horse painted by Leonardo da Vinci. Courtesy of alamy.com.
Protein Nano-Motors and Mechanical Work
75
Figure 6.3. The conformational change of the myosin ATPase domain (head domain). The ATP-bound structure is on the left (with the not hydrolyzable ADPN-phosphate, Figure 6.5) while the myosin structure bound to ADP-vanadate (Figure 6.6), the analogous of the transition state of the reaction, is on the right. Courtesy of Protein Data Bank.
Figure 6.4. The filamentous domain of myosin. Courtesy of sciencedirect.com.
76
Chapter 6
The not-hydrolyzable ATP analogous is shown in Figure 6.5. The analogous of the transition state of the reaction with theJ-phosphate of ATP replaced by orthovanadate (VO43-) is shown in Figure 6.6.
Figure 6.5. The non-hydrolysable ATP analogous ADP-N-phosphate with the oxygen bridge between the J and E phosphate being replaced by an amino group. Courtesy of Researchgate.net.
Figure 6.6. The analogous of the transition state of the reaction with orthovanadate replacing the J-phosphate of ATP. Courtesy of pubchem.ncbi.gov.
These conformational changes are transmitted to the lever arm upon binding the non-hydrolyzable ATP analogous ADP-VO43- (Figure 6.7).
Protein Nano-Motors and Mechanical Work
77
Figure 6.7. Structure of the ATPase site of myosin and the conformational change of the lever arm when ADP-VO43- is bound (transition state with ADP + Pi). Courtesy of pinterest.com and ksumsc.com.
The regions known as Switch I and Switch II bind to the ATP analogous (ADP-N-phosphate) but not to the ADP. A long D-helix changes position according to the bound nucleotide. All these conformational changes are transformed into “walking” over a path made by microfilaments of the actin protein. These filaments have directionality because the actin molecule is not symmetric and there are plus (barbed) and minus (pointed) ends (see Figure 6.8).
Figure 6.8. An actin filament has directionality. The direction of myosin movement is from minus to plus ends. Courtesy of Protein Data Bank.
78
Chapter 6
The movement of myosin over actin filaments is caused by conformational changes of this motor protein (Figure 6.9). When myosin binds ATP, it dissociates from the actin filament and when ATP is hydrolyzed there is a big conformational change of the lever arm that afterwards allows the myosin to bind the actin filament in a different position. Finally, the release of phosphate results in the “power stroke”, with the movement of actin relative to myosin. The release of ADP from the active site of the enzyme allows another reaction cycle.
Figure 6.9. Scheme of the movement of a myosin monomer over a filament of actin. The myosin head is a circle and the lever arm a line. Small arrows indicate the direction of movement of the different parts of the myosin.
Some functions of myosin resemble that of a locomotive pulling wagons of cell secretory vesicles to the plasma membranes, where they should fuse. It can also move organelles inside cells along actin tracks.
6.3. Role of myosin in muscle A specialized function occurs in the striated muscles of animals, where the locomotive activity results in muscle contraction. As indicated in Figure 6.10 filaments of actin (called thin filaments) are anchored to the so-called “Z-lines”, which are made of structural proteins. Myosin molecules are
Protein Nano-Motors and Mechanical Work
79
grouped into thick filaments by the inter-binding of their filamentous domains. Two groups bound to separate Z lines are joined by interlocking their filamentous domains. When the myosin heads walk along thin filaments the two connected Z lines approach and the muscle contracts. Muscle contraction is activated by cytosolic free calcium, which binds to the regulatory protein tropomyosin that was blocking the binding sites for actin. After binding calcium troponin experiences a conformational change and releases these binding sites. Calcium is released from the sarcoplasmic reticulum of muscle cells when a nerve impulse arrives at these cells and a component of the troponin complex senses the increase in calcium. In a resting muscle fiber, tropomyosin partially covers: (a) calcium binding sites on troponin, (b) actin binding sites on myosin, (c) myosin binding sites on actin and (d) calcium binding sites on actin.
Figure 6.10. Role of myosin during muscle contraction. Courtesy of ib.bioninja.com.au.
6.4. Other cellular motors Eukaryotic cells contain two other nano-motors, kinesin and dynein, that move on tracks made of microtubules (Figure 6.11). Microtubules are made of two similar tubulins: D-tubulin andE-tubulin.
80
Chapter 6
Figure 6.11. Structure of microtubules hollow cylindrical polymers. The D-tubulin is shown in purple and the E-tubulin in blue. Courtesy of shutterstock.com.
Kinesins move organelles, secretory vesicles and chromosomes, while dyneins power the movement of cilia and flagella. Microtubules in eukaryotic flagella form doublets and aggregate in a complex of nine surrounding two single microtubules (Figure 6.12).
Figure 6.12. Microtubules in eukaryotic flagella. Courtesy of shutterstock.com.
The movement of eukaryotic flagella and cilia is shown in Figure 6.13.
Protein Nano-Motors and Mechanical Work
81
Figure 6.13. Movement of flagella and cilia. Courtesy of microscopemaster.com.
The anticancer drug taxol (see Figure 6.14) binds to microtubules and stabilizes the polymerized form. Therefore, it is not possible to make new microtubules within the cells and growth and proliferation are inhibited.
Figure 6.14. Structure of taxol. Courtesy of en.wikipedia.org.
The structure of kinesin is shown in Figure 6.15. The kinesin motor has two steps (see Figure 6.16). In the first, the head domain with ADP (black neck) binds to the microtubule. Then the ATP replaces the ADP and this produces a conformational change and the other head domain (black neck) moves ahead. Upon ATP hydrolysis the head domain with the black neck moves ahead of the head domain with the red neck. Then ATP hydrolysis produces ADP in the head domain with the red neck. Finally, the head domain with the red neck moves ahead of the head domain with the black neck to complete the cycle.
82
Chapter 6
Figure 6.15. Kinesin contains two head domains and two tail domains intertwined as a coiled coil that connects the head domains to the cargo binding domains with light chains. Courtesy of eng.thesaurus.rusnano.com.
Figure 6.16. Kinesin moving along a microtubule.
6.5. Bacteria have a rotary motor in their flagella Bacteria swim by rotating their flagella. This motor spins on a central axis instead of moving along a track. The structure of bacterial flagella is shown in Figure 6.17. It contains many different types of proteins: the flagellum itself is made of the protein flagellin which can be recognized by the defenses against plant pathogens. After a hook it crosses the outer membrane (L ring) and the peptidoglycan layer (P ring) and finally is inserted in the inner membrane at the MS ring, surrounded by another ring made by the proteins MotA and MotB.
Protein Nano-Motors and Mechanical Work
83
The proton gradient across the inner bacterial membrane forces the uptake of protons and this causes the rotation of the flagella by a mechanism described in Figure 6.17. The MotA subunits have two half protonchannels and H+ movements force rotation of the MS ring. When protons, driven by the electrical membrane potential positive outside, enter the cell they go through the external half-channel and to complete the entrance into the cell they force the counter-clockwise rotation of the MS ring to bridge the separation of the two half-channels. Finally, protons are released through the inner half-channels (see Figure 6.18). This mechanism is similar to the one of the rotary ATPases that will be described in Chapter 8 (Figure 8.19).
Figure 6.17. Structure of the bacterial flagellar motor. A: a realistic picture (courtesy of commons.wikipedia.org). B: a schematic view.
However, the bacterial flagella are more complicated than the rotary ATPases and contain many more subunits.
84
Chapter 6
Figure 6.18. The rotary mechanism of bacterial flagella.
CHAPTER 7 REGULATION OF GENE TRANSCRIPTION
7.1. Promoters, enhancers (UAS) and repressing sequences (URS) As indicated in Figure 7.1, upstream of the TATA box of eukaryotic promoters there are Upstream Activating Sequences (UAS) and Upstream Repressing Sequences (URS) that can be recognized by different transcription factors. They not only promote or hinder the recruitment of RNA polymerase II but also recruit the machineries that modify the chromatin structure to activate or inhibit accessibility.
Figure 7.1. The basics of regulation of gene expression.
In bacteria and their viruses (phages), genes coding for functionally related proteins are clustered into operons, and are under the control of a single promoter. Transcription produces a single mRNA that is either translated directly or cut into mono-cistronic mRNAs.
86
Chapter 7
7.2. RNA polymerases and helicases RNA polymerase II unwinds the template double helix. This is made by the preferential binding of single strand DNA without the need of ATP. Then RNA polymerase catalyzes the nucleophilic attack of the 3c-hydroxyl group of the last nucleotide in the chain to the Į phosphoryl group of the incoming nucleoside triphosphate, and pyrophosphate (PPi) is produced as in DNA replication (Figure 7.2).
Figure 7.2. The reaction of RNA polymerase. Courtesy of web-books.com.
The RNA polymerase from Escherichia coli has 5 subunits and the TATA binding protein is subunit V It is important to distinguish between the RNA transcript, the template/antisense strand and the coding/sense strand (Figure 7.3). CAGUACC … …ATTCGTAACGGTCATGG … …TAAGCATTGCAGTACC …
Transcript Antisense Sense
Figure 7.3. Sense, antisense and transcript strands.
RNA polymerase distinguishes between sense and antisense strands because the TATA box has directionality (Figure 7.4).
Regulation of Gene Transcription
87
Eukaryotes: 5’-TATAAA ATATTT-5´ Bacteria: 5’-TATAAT ATATTA-5’ Figure 7.4. The directionality of TATA box.
RNA polymerases backtrack and correct errors but the final error rate (1 mistake/104-105 nucleotides) is higher than that for DNA replication (1 mistake /109-1010 nucleotides). The human genome has 3 x 109 bp and therefore there is 1 or no errors at all during replication of the whole genome. On the other hand, mRNAs have from 300 to 3000 nucleotides and therefore there are almost no errors during transcription of one particular mRNA. RNA polymerases bind to promoter sites on the DNA template and initiate transcription. In bacteria the core promoter contains a TATA box (TATAAT) at 10 nucleotides before the transcription start site (-10) and another recognized motif (TTGACA) at -35: 5’…….TTGATA…………..TATAAT…………Transcription start site It must be indicated that the above motifs are consensus sequences of different motifs that may differ at some position of the consensus. In bacteria the sigma subunit of RNA polymerase recognizes both the -10 and -35 motifs. There are alternative bacterial promoters recognized by alternative sigma subunits, as indicated in Figure 7.5. 5’ ……. TNNCNCNCTTGAA…..CCCATNT….. Heat-shock promoter 5’…….. CTGGGNA….TTGCA…………Nitrogen-starvation promoter Figure 7.5. Alternative bacterial promoters. “N” means any nucleotide
Another important aspect is that RNA polymerases require no primers while DNA polymerases require primers to start replication. Elongation of mRNA takes place at transcription bubbles that move along the DNA. (Figure 7.6)
88
Chapter 7
Figure 7.6. Transcription bubble during transcription. In blue and red the two strands of DNA and in green the emerging mRNA.
Termination of transcription is better understood in bacteria, where it is determined by sequences of the transcribed RNA that make a hairpin and a repetition of uracils (see Figure 7.7).
Figure 7.7. A hairpin and a repetition of uracils signal transcriptional termination in bacteria.
Finally, the prokaryotic protein Rho (U) acts as an ATP-dependent helicase that binds the nascent RNA chain and pulls it away from the transcription bubble of RNA polymerase and the DNA template. It is an example of mechanical work with ATP energy (Figure 7.8).
Regulation of Gene Transcription
89
Figure 7.8. The Rho helicase utilizes the energy of ATP hydrolysis to pull the mRNA away of the transcription bubble.
Transcription termination is different in eukaryotes. In RNA polymerase I, the “transcription termination factor” binds downstream of the pre-rRNA coding regions, causing the dissociation of the RNA polymerase from the template and the release of the new RNA strand. In RNA polymerase II, the termination occurs via a polyadenylation/cleaving complex. The 3' tail on the end of the strand is bound at the polyadenylation site, but the strand will continue to code. The newly synthesized ribonucleotides are removed one at a time by the cleavage factors CSTF and CPSF, in a process that is still not fully understood. The remainder of the strand is disengaged by a 5ƍ-exonuclease when the transcription is finished. RNA polymerase III terminates after the addition of uracil residues in the transcribed mRNA. Unlike in bacteria and in RNA polymerase I, the termination hairpin needs to be upstream to allow for correct cleaving. In RNA polymerase I, a transcription termination factor binds downstream of pre-RNA coding region. See Xie et al., 2023.
7.4. Antibiotics that inhibit transcription There are some important antibiotics that inhibit transcription in different biological systems. Rifampicin (see Figure 7.9) inhibits initiation of RNA synthesis in bacteria but not in eukaryotes. It binds to a pocket in the channel occupied by the newly formed RNA-DNA hybrid in bacterial RNA polymerase.
90
Chapter 7
Figure 7.9. Structure of Rifampicin. Courtesy of CAS SciFinder.
Actinomycin D (Figure 7.10) intercalates into a DNA double helix and prevents formation of DNA-RNA hybrids in both eukaryotes and bacteria.
Figure 7.10. Structure of actinomycin D. Courtesy of CAS SciFinder.
D-Amanitin (Figure 7.11) strongly inhibits RNA polymerase II of eukaryotes, with little or no effect on polymerases I and III.
Regulation of Gene Transcription
91
Figure 7.11. Structure of D-amanitin. Courtesy of CAS SciFinder.
7.5. Types of transcription factors The central role of transcription factors for life was a discovery of Roger D. Kornberg (Figure 7.12), who proposed that transcription factors regulate the expression of the genetic message and who received the 2006 Nobel Prize in Chemistry. All transcription factors contain a DNA binding domain and a domain for interaction with proteins involved in transcription, including RNA polymerases, and chromatin remodeling proteins. Sometimes these interactions occur through intermediate mediator proteins.
Figure 7.12. Roger D. Kornberg, Nobel Prize in Chemistry 2006.
There are several kinds of transcription factors (TFs) classified by their DNA-binding domains and oligomerization domains. As indicated in
92
Chapter 7
Figure 7.13, Heat Shock Factors (HSFs) are trimers with monomers bound by leucine zipper repeats (Perisic, Xiao and Lis, 1989, page 797).
Figure 7.13. Scheme of HSFs. Arrows indicate binding sequences in different orientations.
They bind to DNA sequences known as the heat shock promoter elements (HSPE) with the sequence xGAAxxTTCxxGAAx, where “x” is any nucleotide. A scheme of the binding is shown in Figure 7.13 and a more detailed view in Figure 7.14 (Perisic, Xiao and Lis 1989, pages 797-806).
Figure 7.14. Binding of HSF to HSPE (Heat Shock Promoter Element).
Homeodomain transcription factors are heterodimers with two different DNA-binding domains containing one homeodomain (see Figure 7.15).
Regulation of Gene Transcription
93
Each homeodomain has a helix-turn-helix motif and one helix is inserted into the major groove of the DNA for recognition.
Figure 7.15. Binding of homeodomain transcription factors to DNA. Courtesy of es.m.wikipedia.
Basic-leucine zippers are heterodimers of two basic-leucine zipper proteins stabilized by the zippers. The basic regions are intercalated into the major groove of the DNA (Figure 7.16).
Figure 7.16. How a basic-leucine zipper transcription factor binds to its binding site in DNA. Courtesy of es.m.wikipedia.org
Transcription factors with zinc-finger domains are a family containing several structures with a Zn+2 cation chelated by two cysteines and two histidines (see Figures 7.17 and 7.18)
94
Chapter 7
Figure 7.17. In zinc-finger domains the Zn+2 cation is chelated by two cysteines and two histidines. Courtesy of en.wikipedia.org.
Figure 7.18. Binding of zinc fingers to DNA occurs at the major groove. Courtesy of youtube.com.
Another important kind of transcription factor are the mammalian nuclear receptors for hydrophobic hormones, such as estradiol, retinoic acid and thyroxine (see Figure 7.19). These hydrophobic hormones diffuse into cells and reach the nucleus to bind these transcription factors.
Figure 7.19. Structure of some hydrophobic hormones of mammals. (A) Estradiol (an oestrogen). Courtesy of en.wikipedia.org. (B) Thyroxine (L-3,5,3’,5’Tetraiodothyroxine (a thyroid hormone). Courtesy of researchgate.net. (C) Alltrans-retinoic acid (a retinoid). Courtesy of researchgate.net.
Regulation of Gene Transcription
95
Nuclear hormone receptors have two functional domains, a DNA-binding domain and a ligand-binding domain made of zinc fingers (Figure 7.20 B). When the hormone binds to its site a conformational change occurs at the DNA binding domain and then the transcription factor becomes able to bind DNA (compare Figures 7.20 A and 17.20 C).
Figure 7.20. Structure of estradiol receptor with bound estradiol (A). Structure of the DNA binding domain with DNA bound. The points correspond to the Zn2+ of the zinc fingers (B). Structure of the estradiol receptor without hormone (C). Courtesy of researchgate.net.
7.6. Chromatin immunoprecipitation The DNA sequences recognized by transcription factors can be determined “in vitro” by so-called retardation gels. Labeled double strand oligonucleotides are incubated with the purified protein and the mixture is run in an electrophoresis gel to separate DNA molecules. If the oligonucleotide binds to the tested protein its motility is greatly reduced. This procedure does not ensure that this binding occurs “in vivo” because many regions of chromatin are blocked by histones in nucleosomes (see below) and are not functional “in vivo”. The method called “Chromatin ImmunoPrecipitation” (ChIP) determines DNA binding sites under “in vivo” conditions (Figure 7.21).
96
Chapter 7
Figure 7.21. The steps of a ChIP experiment. A purple protein is used as an example
Cells or tissues are first fixed with formaldehyde, which intertwines amino groups of the protein of interest with amino groups of nucleotides bound to it (step 1). Then chromatin (DNA plus associated proteins) is isolated (step 2) and sonicated to fragment it into small pieces which should contain a single protein (step 3). Then the fragments containing the protein investigated are isolated by use of a specific antibody that will bind to the protein in question (step 4). There are bacterial proteins (protein A and protein G) that bind to the Fc part of antibodies (IgGs), a domain common to all antibodies produced by an animal. If these A or G proteins are immobilized in some spherical material such as Sepharose, the transcription factor and its recognized sequence can be easily isolated. Then the crosslinking of the formaldehyde is reversed by high concentrations (0.1 M) of a substance such as glycine with amino groups (step 5). Finally, the free DNA fragment is sequenced or detected by PCR or hybridization. It is usually found that the sites bound “in vivo” by a transcription factor are only a small fraction of the identical sequences found in the genome that are recognized “in vitro” by the protein.
7.7. Protein complexes involved in eukaryotic transcription and splicing Eukaryotes have three types of RNA polymerase that can be differentiated by their sensitivity to D-amanitin (see Figure 7.11), a poison produced by the mushroom Amanita phalloides (see Figure 7.22). The eating of this mushroom is a common cause of poisoning of unaware mushroom collectors.
Regulation of Gene Transcription
97
Figure 7.22. Picture of Amanita phalloides (left) and structure of D-amanitin (right). Courtesy of Getty Images.
Type I RNA polymerases are insensitive to D-amanitin and transcribe the 28S, 18S and 5.8S ribosomal RNAs (rRNAs). Its promoter contains a TATA-like transcriptional initiator sequence and an upstream promoter element. Type II RNA polymerases are very sensitive to D-amanitin and transcribe mRNAs for translation into proteins. Additional promoter elements to the TATA box of this RNA polymerase are the CAAT box (5’-GCNCAATCT) and the GC box (5’-GGGCGG). Type III RNA polymerases have little sensitivity to D-amanitin and transcribe 5S rRNA and the tRNAs involved in protein synthesis at the ribosomes. The initiation of transcription by RNA polymerase II is shown in Figure 7.23.
Figure 7.23. The initiation of transcription by RNA polymerase II involves formation of protein complexes at the initiation site.
98
Chapter 7
Transcription factor II for RNA polymerases II, has several subunits named A, B, D, E and F. The TFIID protein initiates the assembly of the transcription complex and includes the TATA binding protein (TBP). The ten-subunit TFIIH factor opens the DNA double helix by a helicase using ATP hydrolysis. It also phosphorylates the C-terminal domain of the RNA polymerase II to leave the promoter and start transcription The initial mRNA transcript undergoes two modifications: a 5’-cap and a 3’-polyA tail. The 5’-cap includes a 7-methylguanilate (see Figure 7.24) and the methylation of some adjacent riboses. The polyA tail is made by two enzymes: an RNAse that recognizes a signal sequence (AAUAAA) and cleaves the nascent mRNA and then a poly (A) polymerase that adds up to 200 adenylates from the ATP, which is split into AMP and PPi (see Figure 7.25). Both modifications improve the translation and stability of the mRNA.
Figure 7.24. The 5’-cap of eukaryotic mRNAs. Courtesy of researchgate.net.
Regulation of Gene Transcription
99
Figure 7.25. Polyadenylation of a primary transcript.
Primary transcripts of eukaryotic mRNAs contain coding sequences (exons) and insertions of non-coding sequences (introns) that must be removed by a process called “splicing” (see Figure 7.26). Sequences at the ends of introns specify splicing sites (see Figure 7.27). The splicing machinery, called the spliceosome, is made of small nuclear RNAs with catalytic activity (ribozymes) and structural proteins.
Figure 7.26. Removal of introns by splicing.
Figure 7.27. Specification of splice sites. N means any nucleotide.
100
Chapter 7
The splicing reaction (see Figure 7.28) proceeds as follows: the 2´OH of a ribose at the branch site makes a nucleophilic attack on the phosphate bridge between exon 1 and the intron, forming a lariat intermediate. Then the liberated 3´end of the exon1 attacks the phosphate bridge between the exon 2 and the intron, giving rise to the spliced product and the liberated intron in lariat form. No energy in the form of ATP is needed because in the lariat intermediate a new bond is formed and another is broken and the same happens to generate the final products.
Figure 7.28. The splicing reactions. P refers to the phosphate between two nucleotides. Y refers to pyrimidines (cytosine and uracil), “x” means many pyrimidines (Y).
It has been proposed that exons encode structural modules of proteins and that the recombination of exons in the course of evolution (exon shuffling) has contributed to the generation of protein diversity (Gö 1985, page 91). As indicated in Figure 7.28, polyadenylation, capping and splicing occur in eukaryotes when nascent mRNAs are still bound to the transcriptional complex
Regulation of Gene Transcription
101
Figure 7.29. Polyadenylation, capping and splicing occurs in eukaryotes when nascent mRNAs are still bound to the transcriptional complex. Upper part contains the subunits of TF II of the transcriptional complex
7.6. Epigenetic modifications DNA in chromatin is bound to histones (Figure 7.30). DNA wraps up around histone octamers making cylindrical structures known as nucleosomes (Figure 7.31). The amino-terminal basic tail of the histones interacts with the negatively charged phosphates of DNA. Octamers have two copies each of histones 2A, 2B, 3 and 4. The core of nucleosomes have a diameter of 10 nm and 146 base pairs (bp). The linker between nucleosomes has 55 bp and binds histone 1.
102
Chapter 7
Figure 7.30. Ribbon diagram of the four histones of chromatin. The basic D-helix is a courtesy of pinterest.com and the rest is work of the author.
Figure 7.31. Structure of nucleosomes. (A): General structure (courtesy of researchgate.net). (B) Top view (courtesy of ib.bioninja.com.au). (C): Side view (courtesy of en.wikipedia.org).
Epigenetic modifications are those that do not affect the coding of genes but changing their expression and these modifications can be inherited. The French scientist Jean-Baptiste Lamarck (see Figure 7.32) postulated the inheritance of acquired characters long time ago but at his time this was not taken into consideration because of the opposition of Charles Darwin (Figure 7.33), whose ideas pointed to random genetic variation and natural selection, without inheritance of acquired characters (Burkhardt Jr. 2013, page 793).
Regulation of Gene Transcription
103
Figure 7.32. Jean-Baptiste Lamarck (1744-1829) proposed the inheritance of acquired characters.
Figure 7.33. Charles Darwin (1809-1882), the father of the theory of evolution, opposed Lamarck’s theory on the inheritance of acquired characters. At present we know that Lamarck was right.
Chromatin has several levels of packing as indicated in Table 7.1. After the nucleosome there are helical arrays of nucleosomes, also called the 30 nm chromatin fiber (Wu, Bassett and Travers 2007, page 1129). This is followed by interphase chromosomes and metaphase chromosomes. The packing factor is defined as the length of naked DNA divided by the length of the chromatin level considered. Table 7.1. The packing factor of different chromatin stages.
________________________________________________ Chromatin structure Packing factor ________________________________________________ Nucleosome 7 Helical arrays of nucleosomes § 90 Interphase chromosome § 103 Metaphase chromosome § 104 ________________________________________________
104
Chapter 7
The inheritance of acquired characters was first observed in bacteria, which methylate its DNA at cytosines to protect it from degradation by restriction endonucleases. Both the restriction enzyme and the methylase are encoded by the same operon. When cells duplicate their DNA these enzymes are distributed between the two copies of DNA and the methylation is inherited. In eukaryotes histone acetylases (HAT) add acetyl groups from acetylcoenzymeA (acetyl CoA) to the lysines in the histone N-terminal tails (Figures 7.34 and 7.35). This reaction eliminates the positive charge of histones and inhibits their interaction with the negative charges of the phosphates in DNA. The enzymes histone deacetylases remove the acetyl groups from the lysines and facilitate the interaction of histones with DNA.
Figure 7.34. The reaction of histone acetylases. Courtesy of researchgate.net.
Regulation of Gene Transcription
105
Figure 7.35. Ribbon diagram of one acetylated histone (histone H4). The histone part is from Figure 7.30 and the acetylated lysine is a courtesy of pubchem.ncbi.nlm.nih.gov.
In addition, acetylation of histones is recognized by bromodomains, which bind acetylated lysines in the histones. These domains are present in chromatin remodeling machineries, systems that utilize ATP hydrolysis to break nucleosomes in one particular site and remake them in another site. Other epigenetic reactions are those of histone methyl-transferases (Figure 7.36) and of DNA methyl-transferases (Figure 7.37). Histone methylation is required for DNA methylation because the DNA methyl-transferases contain chromodomains that recognize histone methylation. The later modification hampers the interaction of transcription factors with the large groove of DNA. DNA methylation is reversed by DNA demethylases.
106
Chapter 7
Figure 7.36. Reactions of histone methyl-transferases and histone demethylases. Courtesy of mdpi.co.
Figure 7.37. Reaction of DNA methyl transferases. Courtesy of Researchegate.net.
Protein bromodomains bind to acetylated histones (Figure 7.38) and protein chromodomains bind to methylated DNA to stabilize the inactive state of chromatin (Figure 7.39).
Regulation of Gene Transcription
107
Figure 7.38. How protein bromodomains bind acetylated histones. Courtesy of researchgate.net.
Figure 7.39. How protein chromodomains bind methylated DNA. Courtesy of researchgate.net.
A scheme of gene regulation in eukaryotes is shown in Figure 7.40.
108
Chapter 7
Figure 7.40. One possible scheme for regulation of gene expression in eukaryotes.
First, a transcription factor binds to its regulatory sequence in DNA (step 1). Then, binding of an intermediate coactivator protein helps recruitment of histone acetyl-transferase to acetylate histone tails (step 2). The next step is the recruitment of a chromatin remodeling machinery by binding acetylated histone tails through its bromodomain (step 3). Movements of nucleosomes mediated by the chromatin remodeling machinery expose transcription initiation sites in DNA (step 4) that are recognized by the initiation complex of RNA polymerase II. A derivative of cytidine known as 5-azacytidine (see Figure 7.41) can activate gene expression in growing cells because it cannot be methylated (this should occur at position 5, which is changed) and also inhibits DNA methyl-transferase.
Figure 7.41. Structure of 5-azacytidine. Courtesy of medchemexpress.com.
Regulation of Gene Transcription
109
DNA methylation is analyzed with the restriction enzyme Hpa II, which cleaves CCGG sites but cannot do so if any of the cytosines are methylated. As shown in Figure 7.42, after cutting mouse DNA with Hpa II most of the DNA is not cut while in Drosophila and Escherichia coli DNA is fully cut.
Figure 7.42. Effect of restriction enzyme Hpa II on DNA from mouse, Drosophila melanogaster and Escherichia coli determined by gel electrophoresis.
7.7. References Burkhardt Jr, Richard W. 2013. Lamarck, evolution, and the inheritance of acquired characters. Genetics, vol. 194: pages 793-805. Gö, Mitiko. 1985. Protein structures and split genes. Advances in Biophysics, vol.19: pages 91-131. Perisic, Olga; Xiao, Hua and Lis, John T. 1989. Stable binding of Drosophila Heat Shock Factor to Head-to-Head and Tail-to-Tail Repeats of a Conserved 5 bp Recognition Unit. Cell, vol. 9: pages 797-806. Wu, Chenyi; Bassett, Andrew and Travers Andrew. 2007. A variable topology for the 30-nm chromatin fiber. EMBO Reports, vol. 8: pages 1129-1134. Xie, Juanjuan; Libri, Domenico and Porrua, Odil. 2023. Journal of Cell Science, jcs259873
CHAPTER 8 MEMBRANE PROTEINS INVOLVED IN TRANSPORT ACROSS BIOLOGICAL MEMBRANES AND BIOPHYSICAL REGULATION OF CELLULAR ACTIVITIES
8.1. Ion homeostasis: the role of major cellular ions Nucleic acids, especially RNA, are the major source of negative charges inside cells. The negative charges of nucleic acids correspond to the phosphate groups between nucleotides: adenine-ribose-phosphate-…uracil-ribose-phosphate- ……. The concentration of these negative charges inside cells if they were not incorporated to the nucleic acids would be 0.1-0.2 M. Inorganic phosphate (Pi) and phosphorylated metabolites could contribute another 10-20 mM and chloride another 1-20 mM. Concerning cations, potassium (K+) at 0.1-0.2 M is the major positive charge neutralizing the negative charges of nucleic acids. In addition, K+ is the bigger contributor to cellular osmotic concentration and turgor and is required for the proper activity of ribosomes and by some metabolic enzymes. Protons (H+, 10-6-10-7 M, pH 6-7) have important permissive and regulatory roles of intracellular and extracellular pH. Permissive refers to allowing some activities, for example the optimum pH of enzymes. Regulatory refers to modulation of the activity of crucial proteins for metabolism, cell growth and proliferation. Protons are essential for the chemiosmotic circuit of most prokaryotes, fungi and plants.
Membrane Proteins Involved in Transport across Biological Membranes 111 and Biophysical Regulation of Cellular Activities
Sodium (Na+, § 10 mM) is toxic at high concentrations by counteracting K+, Ca2+ and Mg2+ but is essential to animals for the plasma membrane chemiosmotic circuit and osmotic regulation. Na+ extrusion compensates for the colloid osmotic pressure of macromolecules (see below). Calcium (Ca2+, 0.1-1 μM free, § 1 mM bound) has a structural role in membranes and a regulatory role as a second messenger. High concentrations would precipitate with phosphates. Magnesium (Mg2+, § 1 mM free, 10 mM bound) is required by many enzymes either directly or with nucleotide triphosphates. Iron (Fe3+/2+, 0.1-1μM free, § 1 mM bound) is required by many enzymes in the form of either heme groups or iron-sulfur complexes in proteins. High concentrations produce oxidative stress. Finally, Cu2+, Zn2+ and Mn2+ are oligo-elements present as prosthetic groups in many proteins.
8.2. The two chemiosmotic circuits: the one of sodium showed up first, the one of protons later When students hear “ion homeostasis”, their response is one of surprise (Figure 8.1). They have rarely heard these words! Ion homeostasis has two aspects: ion transporters and their regulation. In present-day organisms there are two chemiosmotic circuits: one based on Na+ and another based on H+ (Figure 8.2). Intracellular potassium provides ionic strength and turgor while sodium, chloride and protons are toxic and must be extruded. The sodium circuit, on the left part of the figure, is the animal solution, although a few prokaryotes also use this type of circuit. It is based on a plasma membrane enzyme called Na+/K+- ATPase which couples the hydrolysis of one ATP to ADP and phosphate (Pi) to the extrusion of three sodium ions and the uptake of two potassium ions. Therefore, it is electrogenic and generates a modest (§ 60 mV) membrane potential ('