Genetic Engineering 9789350433430, 9788183183161


218 18 32MB

English Pages 325 Year 2008

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Genetic Engineering
 9789350433430, 9788183183161

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

v

,

GENETIC ENGINEERING

MOHAN P. ARORA M.Sc., M.Phil., Ph.D., F.E.S.I., F.A.Z., F.A.S.E.A., A.I.C.C.E.

K4)Jl Gflimalaya GFublishing Gflouse • Mumbal • Deihl • Bangalore • Hyderabad • Chennal • Ernakulam • Nagpur • Pune • Ahmedabad • Lucknow

© No part of this book shall be reproduced, rerpinted or translated for any purpose whatsoever without prior permission of the publisher in writing.

ISBN :978-81-83183-16-1

REVISED EDITION: 2009

Published by

Mrs. Meena Pandey for HIMALAYA PUBLISHING HOUSE, "Ramdoot", Dr. Bhalerao Marg, Girgaon, Mumbal-400 004. Phones: 23860170123863863 Fax: 022-23877178 Email: [email protected] Website' www.himpub.com

Branch Offices Delhi

Nagpur

Bangalore

Hyderabad

Chennai

Pune

Lucknow Ahemdabad

Eranakulam

Printed at

"Pooja Apartments". 4-B. Murari Lal Street. Ansari Road. Darya Ganj, New Delhi-I 10 002 Phones: 23270392, 23278631 Reliance: 30180394/96 Fax: 011-23256286 Email: [email protected] Kundanlal Chandak Industrial Estate, Ghat Road, Nagpur-440 018 Phone: 2721216, Telefax: 0712-2721215 No. 16/1 (old 1211), 1st floor, Next to Hotel Highland, Madhava Nagar, Race Course Road, Bangalore-560 001 Phones: 22281541, 22385461 Fax: 080-2286611 No. 2-2-1 16712H, 1st Floor, Near Railway Bridge, Tilak Nagar, Main Road, Hyderabad-500 044 Phone: 26501745, Fax: 040-27560041 No.2. Rama Krishna Street, North Usman Road, T-Nagar, Chennai-600 017 Phone: 28144004,28144005 Mobile: 09380460419 No. 527. "Laksha Apartment", First Floor, Mehunpura, Shaniwarpeth, (Near Prabhat Theatre), Pune-411 030 Phone: 020-24496333,24496333,24496323 C-43, Sector C, Ali Gunj, Lucknow - 226 024 Phone: 0522-4047594 114, ShaH. 1st Floor, Opp. Madhu Sudan House. C.G. Road. Navrang Pura. Ahemdabad-380 009 Mobile: 9327324149 No. 3911 04A. Lakshmi Apartment, Karikkamuri Cross Road Eranakulam, Cochin-622 OIl, Kerala Phone. 0484-2378012, 2378016 A to Z Printers. Daryaganj, New Delhi-I 10002

CONTENTS 1.

TOOLS IN GENETIC ENGINEERING

1-5

Restriction Enzymes, Type II Restriction Endonucleases, Uses of Restriction Endonucleases, Restriction Mapping, DNA Modifying Enzymes, Nucleases, Polymerases, Enzymes that Modify the Ends of DNA Molecule, DNA Ligase.

2.

ISOlATION OF NUCLEIC ACIDS

6-12

Lysing of Cells, Breaking up Cells and Tissues, Enzyme Treatment, Phenol-chloroform Extraction, Alcohol Precipitation, Gradient Centrifugation, Alkaline Denaturation, Column Purification, Detection and Quantification of Nucleic ACids, Gel ElectrophoresiS, Analytical Gel Electrophoresis, Preparative Gel Electrophoresis, Radiolabelling of Nucleic Acid, End Labelling, Nick Translation, Nucleic Acid Hybridization.

3.

CurnNG AND JOINING DNA MOLECULES

13-29

Cutting DNA Molecules (DNA Molecule Cleaving), Restriction Endonucleases, Restriction and Modification (R-M) System Classification, Specificity, Nomenclature, Recognition Sequences, Isoschizomers, Number and Size of Restriction Fragments, Variations on Cutting and Joining DNA Molecules, Dam and Dcm Methylases of E. coli, Importance of Eliminating Restriction Systems in E. coli Strains Used as Hosts for Recombinant Molecules, Enzyme Quality Importance, Ligation, Optimum Conditions for Ligation, Alkaline Phosphatase, Double Digests, Modification of Restriction Fragment Ends, Trimming and Filling, Linkers and Adapters, Homopolymer Tailing, Joining Polymerase Chain Reaction (PCR) Products, Incorporation of Extra Sequence at the S' End of a Primer into Amplified DNA. Joining DNA Molecules without DNA Ligase.

4.

CLONING VECTORS

30-55

Properties of a Good Vector, Properties of a Good Host, Plasmid Vectors, Host Range of Plasmids, Bacterial Transformation, Electroporation, Incompatibility of Plasmids, Purification of Plasmid DNA, Desirable Properties of Plasmid Cloning Vehicles, pBR322, a Purpose-built Cloning Vehicle, Vectors Based on the Lambda Bacteriophage, Lambda Biology, In vitro Packaging, Insertion Vectors, Replacement Vectors, M13 Vectors, Expression Vectors, Vectors for Cloning and Expression in Eukaryotic Cells, Yeasts, Mammalian Cells, Supervectors: YACs and BACs.

5.

COSMIDS, PtiAsMiDS AND OTHER ADVANCED VECTORS

56-70

Vectors Used for Cloning Large Fragments of DNA, Cosmid Vectors, Alternatives to Cosmids (BACs and PACs) , Choice of Vector, Specialist-purpose Vectors, Vectors for making Singlestranded DNA for Sequencing, Expression Vectors, Vectors for Making RNA Probes, Vectors

for Maximizing Protein Synthesis, Vectors for Facilitating Protein Purification, Vectors to Promote Solubilization of Expressed Proteins, Vectors for Promoting Protein Export, Putting it All Together: Vectors with Combinations of Features.

6.

GENOMIC AND

cDNA

LIBRARIES

71-83

Genomic Libraries, Partial Digests, Choice of Vectors, Construction and Evaluation of a Genomic Library, Growing and Storing Libraries, cDNA Libraries, Isolation of mRNA, cDNA Synthesis, Bacterial cDNA, Random, Arrayed and Ordered Libraries.

7.

FINDING THE RIGHT CLONE

84-94

Screening Libraries with Gene Probes, Hybridization, Labelling Probes, Steps in a Hybridization Experiment, Screening Procedure, Probe Selection and Generation, Screening Expression Libraries with Antibodies, Rescreening, Subcloning, Characterization of Plasmid Clones, Restriction Digests and Agarose Gel Electrophoresis, Southern Blots, PCR and Sequence Analysis.

8.

OTHER SORTS OF CLONING

95-99

History, First Steps Towards Cloning, Nuclear Totipotency, Frogs and Toads and Carrots, A Famous Sheep-the Breakthrough Achieved, Beyond Dolly.

9.

POLYMERASE CHAIN REACTION

100-115

History of the PCR, Methodology of the PCR, Essential Features of the PCR, Design of Primers for PCR, DNA Polymerases for PCR, PCR in Practice, Optimization of the PCR Reaction, Analysis of PCR Products, Cloning PCR Products, Long-range PCR, Reverse-transcription PCR, Rapid Amplification of cDNA Ends (RACE), More Exotic PCR Techniques, PCR using mRNA Templates, Nested PCR, Inverse PCR, RAPD and Several other Acronyms, Applications of PCR, PCR Cloning Strategies, AnalYSis of Recombinant Clones and Rare Events, Diagnostic Applications.

10.

SEQUENCING AND SEQUENCE ANALYSIS

116-122

Sequencing, Sequence Analysis, Further Manipulation of the DNA Sequence, Is the sequence complete? What does it encode? What does this protein potentially look like? Some Useful Internet Addresses.

11.

BACTERIAL CLONING

123-133

IntrodUcing DNA into Bacterial Cells. Maintenance of Recombinant DNA in New Hosts, Integration of Recombinant DNA, Cloning in Gram-negative Bacteria other than E. coli, Vectors Derived from the IncQ-group Plasmid RSFI 0 1 0, Vectors Derived from the IncP-group Plasmids, Vectors Derived from the IncW Plasmid Sa, Vectors Derived from pBBRl, Cloning in Grampositive Bacteria, Vectors for Cloning in Bacillus subtilis and other Low-GC Organisms, Influence of Mode of Replication: Vectors Derived from pAMbl, TranSCription and Translation, Controlled Expression in B. subtilis and other Low-GC Hosts, Secretion Vectors for Low-GC Bacteria, Vectors for Systematic Gene Inactivation, Cloning in Streptomycetes, Vectors for Streptomycetes, Homoeologous Recombination.

12.

CLONING IN FUNGI

134-145

Introducing DNA into Fungi, Fate of DNA Introduced into Fungi, Plasmid Vectors for Use in Fungi, Yeast Episomal Plasmids, Yeast Replicating Plasmids, Yeast Centromere Plasmids, Yeast Artificial Chromosomes, Retrovirus-like Vectors, Choice of Vector for Cloning. Plasmid Construction by Homologous Recombination in Yeast. Expression of Cloned Genes, Overexpression of Proteins in Fungi, Specialist Vectors, Yeast Surface Display, Detecting Protein-protein Interactions, Identifying Genes Encoding Particular Cellular Activities, Determining Functions Associated with Particular Genes.

13.

146-155

GENETIC ENGINEERING IN YEAST

Transformation, Procedures, Genetic Markers for Yeast Transtormation, Constructing Recipient Strains with Suitable Genotypes. Yeast Cloning Vectors, Integrating Vectors, Extrachromosomally Replicating Vectors. Yeast Episomal Plasmids, Yeast Replicating Vectors, Yeast Centromere Vectors, DNA Cloning in Yeast Plasmids, DNA Cloning in Yeast Cosmids, Fusion Plasmids, Vectors for High Level Gene Expression, Industrial Applications.

14.

WEED CONTROL BY GENETICALLY ENGINEERED FUNGI

156-163

Potential of Toxin-producing Pathogens for Weed Control, Precedents for Biological Control of Pests, Prospects for Genetic Engineering of Weed Pathogens, Feasibility of Cloning Genes for Toxin, Development of Cochliobolus for Gene Cloning Experiments, Mutants, DNA, Construction of Genomic Libraries, Autonomously Replicating Sequences, Vectors for Transformation, General Considerations, Vectors for Transformation Based on Correction of Auxotrophy, Vectors for Transformation Based on Drug Resistance, Transformation with Minichromosomes, Vectors for Chochliobolus Transformation, Transformation Procedure, Advantages of Microorganisms over Pathogen Produced Chemicals Alone, Assessment of Risk.

15.

RECOMBINANT

DNA

164-171

VACCINES

Rationale, Methods, Preparation of DNA, DNA Cloning Vectors, DNA Restriction Endonucleases, DNA Ligation, Cloning Strategy, Cosmid Cloning, Identification cf Recombinants, Development of an i-DNA Vaccine to Prevent ETEC Induced Diarrhea, Rationale or Development, Cloning of E. coli Pilus Genes, Subcloning.

16.

172-183

TRANSGENIC ORGANISMS

Transgenic Plants, Why Transgenic Plants? Ti Plasmids as Vectors for Plant Cells, Making Transgenic Plants, Putting the Technology to Work, Transgenic Animals, Why Transgenic Animals? ProdUCing Transgenic Animals, Applications of Transgenic Animal Technology.

17.

184-188

TRANSGENIC FISHES

What is Transgenic Fish, Transgenic Technology, Genes and Transgenic Fish, Gene Manipulation, Types of Microinjection for Fish, Method of Gene Transfer, Integration, Transmission, Expression, Generalization on Expression, Application.

18.

189-197

TRANSGENIC TREES

Steps in the Production of Transgenic Trees, Gene Isolation and Characterization, Insect Pest Resistance, Fungal and Bacterial Disease Resistance, Herbicide Resistance, Rower Sterility, Phytoremediation, Lignin Modification, Frost and Drought Tolerance, In Vitro Tissue Culture, Genetic Transformation Procedures, Agrobacterium-mediated DNA delivery, Direct DNA Transfer, Environmental Aspects of Deploying Transgenic Trees, Integration of Genetic Engineering in the Tree Improvement Cycle, Transgenic Insects, Improvement of Parasites and Parasitoids, Conclusion.

19.

DRUGS FROM

PLANTs

198-211

The Biotic Resource, The Role of Plants in Drug Discovery and Development, Progress in Plant Drug Research, Some Significant Plant Drugs, Drugs for Heart Diseases, Local Anaesthetics, Analgesics, Antimuscarinics, Miotics, Muscle Relaxants, Bronchodilators, Antineoplastic Agents, Antiprotozoals, Other Miscellaneous Drugs from Plants, Recently Discovered Plant-based Drugs, Taxol and Camptothecins, Artemisinin, Summary and Conclusions.

20.

APPUCATIONS OF RECOMBINANT

DNA

TECHNOLOGY

212-248

Nucleic Acid Aequences as Diagnostic Tools, Detection of Sequences at the Gross Level, Comparative Sequence Analysis: Single-Nucleotide Polymorphisms (SNPs) , Variable Number Tardem Repeat (VNTR) Polymorphisms, Forensic Applications of VNTRs, Historical Genetics,

New Drugs and New Therapies for Genetic Diseases, Prot~ins as Drugs, Transgenic Animals and Plants are Bioreactors: 'Pharming', Plants as Bioreactors, Impact of Genomics, Transgenic Animals as Models of Human Disease, Gene Transfer to Humans-Gene Therapy, Importance of SNPs, Combating Infectious Disease, Novel Routes to Vaccines, Selecting Targets for New Antimicrobial Agents, In vivo Expression Technology (NEll, Different Auorescence Induction, Signature-tagged Mutagenesis, Environomics, Protein Engineering, Improving Therapeutic Proteins with Single Amino Acid Changes, Improving Enzymes: Substilisin as a Paradigm for Protein Engineering, Methods for Engineering Proteins: Rational Approach, Protein Engineering through Directed Evolution, Gene Families as Aids to Protein Engineering, Metabolic Engineering, Designed Overproduction of Phenylalanine, New Routes to Small Molecules, Combinatorial Biosynthesis, Engineering Metabolic Control over Recombinant Pathways, Metabolic Engineering in Plant Cells, Plant Breeding in the Twenty-first Century, Improving Agronomic Traits, Modification of Production Traits, Epilogue: from Genes to Genomes.

21.

MEDICAL APPUCATIONS, PRESENT AND

FUTURE

249-258

Vaccines, Subunit Vaccines, Uve Attenuated Vaccines, Uve Recombinant Vaccines, DNA Vaccines, Detection and Identification of Pathogens, Human Genetic Diseases, Identifying Disease Genes, Genetic: Diagnosis, Gene Therapy.

22.

BIOTECHNOLOGY: APPUCATIONS TO GENETICS

259-264

Principles of Genetics, Heredity and Its Analysis, Gene Expression and Regulation, Genetic Transactions, Principles of Biotechnology, Pathways of Biotechnological Development, Mutants, Hybridization and Fusion, Transformation, Killer Characters, Biotechnological Applications to Genetics, Genomic Banks and Ubraries, Cloned Natural Genes, Crossing the OrganismiC Barriers, Evolution Adaptation and Concept of Functional Species, Conclusions.

23.

MAPPING OF GENES

265-270

Recombination, Mechanism of Homologous Recombination, Role of Recombination in Nature, Recombination and Mapping, How are Maps Derived in the Laboratory? Mapping in Bacteria, Mapping in Yeast, Chromosome Walking, Mapping in Humans.

24.

GEL ELECTROPHORESIS AND BLOTTING

271-275

Electrophoresis, Purity and Concentration of Nucleotides, Spectroscopy, Gel Electrophoresis, Analysis by Blotting, Southern Blotting, Northern Blotting. GLOSSARY INDEX

276-312 313-318

1 TOOLS IN GENETIC ENGINEERING Genetic engineering can be defined as "a biological science whose aim includes control of hereditary defects by the modification or elimination of certain genes and the mass production of use for biological substances by the transplanting of genes". This is done by producing cloning and identifying recombinant DNA molecules and certain modifications which have to be carried out on DNA. Now, a genetic engineer must be able to cut and join DNA from different sources. To do so a genetic engineer must have certain tools which enables him to carry out these manipulations. The tools that allow these manipulations are enzymes that are procured from different organisms. In this lession we will discuss some important classes of enzymes which constitute these tools. REsTRICTION ENlYMES

The restriction enzymes are defined as the enzymes which cut DNA at defined sites. They represent one of the most crucial groups of enzymes needed for the modifications of DNA. These enzymes are present in bacterial cells, where they act as a part of a protective mechanism called "restriction-modification" system. In this system, the restriction enzymes hydrolyses any exogenous DNA that is introduced in the cell but they don't act on host DNA because methylase (a modification enzyme) modifies particular bases in the recognition sequence and prevents the restriction-enzyme from cutting the DNA. Restriction enzymes are classified in three types: type I, type II, and type Ill. Most of the enzymes used today are type II enzymes, which have the simplest mode of action. These enzymes are nucleases, and as they cut at an internal position in a DNA strand as opposed to beginning degradation at one end, they are known as (a) (b) endonucleases. They are often simply called restriction enzymes or specifically type II restriction endonucleases.

Type II Restriction Endonucleases There are several conventions upon which nomenclature of restriction enzyme is based. The generic and specific names of the organism from which the enzyme is taken are used to give the

Fig. 1.1. Binding of the restriction enzyme BamHI to the DNA helix. (a) and (b) show different views with respect to the axis of the DNA helix.

1

2

Genetic Engineering

first part of the nomenclature, which comprises the first letter of the generic name and the first two letters of the specific name. Thus, an enzyme from a strain of Escherichia coli is termed Eco, and so on. Depending on the bacterial strain involved and on the presence or absence of extra chromosomal elements, further deSCriptors may be added. One example of widely used enzyme from the bacteria mentioned above is EcoRI. The value of restriction endonucleases is in their specificity. The most common recognition sequences are four, five or six base-pairs in length, each particular enzyme recognizes a specific sequence of bases on the Pstl EcoRI DNA. Thus, with four bases (a) Haelll in the DNA, and assuming a (b) S'-GAATTC-3' S'-CTGCAG-3' S'-GGCC-3' random distribution of bases, the expected freq'Jency of any particular sequence can be calculated as 4 n , where n is the length of the recognition sequence. This shows (d) that tetranucleotide sites will 3'-Protruding S'-Protruding occur after every 256 baseBlunt pairs, pentanucleotide sites Fig. 1.2. Types of ends generated by different restriction enzymes. The enzymes every 1024 base-pairs, and are listed in (a), with their recognition sequences and cutting sites shown hexanuc1eotide sites every in (b) and (c), respectively. (d) A schematic representation of the types of 4096 base-pairs. There may ends generated. be considerable deviation from these values, but generally the fragment lengths produced will be around the calculated value. Thus, an enzyme called 'four cutter' recognising a tetranucleotide sequence will produce shorter DNA fragments than a six-cutter.

Uses of Restriction Endonucleases Restriction enzymes are very simple to use-a proper amount of enzyme is added to intended DNA in a butter solution, and the reaction is heat at 27°C. Enzyme activity is expressed in units, with one unit being the amount of enzyme that will cleave one microgram of DNA in one hour at 37°C. Insert DNA Although most experiments need complete digestion of th~ target DNA, there are some instances where various combinations of enzyme concentration and incubation time may be used to accomplish only partial digestion. The recognition sequence and the place of the cutti~g site within this sequence are responsible for the type of DNA fragment formed by a particular enzyme. Fragment length is dependent on the frequency of turning up of the recognition sequence. The real cutting site of the enzyme will determine the type of ends that the cut fragment has, which is important with regard to further manipulation of the DNA. Three types of fragments may be produced, these being (i) blunt or flush-ended fragments, (ii) fragments with protruding 3 ' ends, and (iii) fragments with protruding 5' ends. Enzymes such as EcoRI produce DNA fragments with Vector DNA cohesive or 'sticky' ends, as the protruding sequences can Fig. 1.3. Generation of recombinant DNA. base-pair with complementary sequences produced by the same enzyme. Thus, .recombinant DNA can be produced by cutting two different DNA samples with the same enzyme and mixing the fragments together. This is one of the most useful applications of . restriction enzymes, and is a crucial part of many modifications in genetic engineering.

3

Tools in Genetic Engineering

Restriction Mapping Most of the DNA will have recognition sites for different restriction enzymes, and it is often useful to know the samples locations of some of these sites. The techniques used to get this information is known as restriction mapping which involves cutting a DNA fragment with a selection of restriction enzymes, singly and in different combinations. The fragments so formed are checked for their sizes on an agarose gel. The relative locations of the cutting sites can be worked out from the obtained data.

DNA

MODIFYING ENZVMES

Restriction enzymes and DNA ligase are the two tools needed for cutting and joining functions that are must for the production of recombinant DNA molecules. Other enzymes used in genetic engineering may be lightly termed DNA modifying enzymes, which are for used degradation, syntheSis and alteration of DNA. Some of the commonly used enzymes are:

Nucleases Nuclease enzymes decompose nucleic acids by cutting the phosphodiester bond that holds the nucleotides together. Restriction enzymes are good examples of endonucleases, which cut within a DNA strand . A second group of Bal31 nucleases, which decompose DNA from the termini of the (a) molecule, are known as exonucleases. Exolll There are four useful nucleases apart from restriction (b) enzymes that are mostly used in genetic engineering. These are " Bal 31 and exonuclease III DNase I (Uexonucleases) j and deoxyribonuclease (DNase) I and S1- (c) TTT TT TIT IT .J...Ll .l..L ll.1.. .Ll nuclease (endonucleases). These enzymes differ in their precise mode of action, and provide the SI genetic engineer with a variety of tools for manipulating DNA. (d) 11II II 1111011111 TTT There are also ribonuFig. 1.4. Mode 0/ action 0/ uarious nuc/eases. cleases which act on RNA. These may be needed for many of the stages in the preparation and analysis of recombinants. Polymerases Polymerase enzymes are very useful in genetic engineering. They form copies of nucleic acid molecules. When describing a polymerase enzyme, the terms 'DNA-dependent' or 'RNA-dependent' may be used to indicate the type of nucleic acid mould that the enzyme uses. So, a DNA-dependent DNA polymerase copies DNA into DNA, an RNA-dependent DNA polymerase copies RNA into DNA, and a DNA-dependent RNA polymerase transcribes DNA into RNA. These enzymes synthesize nucleic acids by joining together nucleotides whose bases are complementary to the template strand bases. The synthesis takes the course towards 5' 3 ' direction, as each following nucleotide addition requires a free 3'-OH group for phosphodiester bond formation. This needs a free 3'-QH group for the formation of the phosphodiester bond . This requirement also means that a short double-stranded region with an exposed 3 '-OH is necessary for synthesis to begin.

I:: 1111111::1-+5'~ll: 1~5' JItIIIIIIII ...

-+

-+

4

Genetic engineering

The enzyme DNA polyplerase I has, in additiqn to its PQlym(:!r~se function, 5' 3' and 3' 5' exonuclease activities. The enzyme catalyses a stranq r~plg~efTl~nt r~ctiQn, wh(;!rQ the 5' 3' exonuclease function decomposes the non-template strand as the polyrnergS@ form I'WW C()PV, A majqr use of this enzyme is in the nick translation procedure for radiolabelling ONA. By splitting the enzyme to produce what is called 'KlenQw fragment', the 5' 3' exonuclease function of DNA polymerase I can be removed. This retains the polYfTlerase and 3' 6' exonuclease activities. The K1eriow fragment is used where a single-stranded DNA molecule needs to be copied; and because 5' 3' ~onuclease c function is missing, the enzyme $l¥ cannot decompose the non1!!:o~ ~Eo template strand of dsDNA Q) 8 / OIl!? during formation of the new ," !;; DNA. The 3' 5' exonuclease activity is subdued under the .~ Q) '8lij' conditions generally used for .~ .c.lO: 'OQ) C ... g.:E the reaction ..Major uses for the III III 0 .t::. Klenow fragment include - 0. radiolabelling by primed Q) Ul elQ) synthesis and DNA sequencing co. e~ by the dideoxy method in addition to the copying of Single-stranded DNAs during the formation ,of recombinants. Reverse transcriptase (RTase) synthesizes a DNA strand from an RNA template < Z since it is an RNA-dependent o DNA polymerase. The enzyme is used mainly for copying mRNA molecules in the synthesis of cDNA for cloning, although it will also act on DNA templates. Enzymes that Modify the Ends of DNA Molecule The enzymes alkaline phosphatase, polynucleotide kinase and terminal transferase act on the terminal of .1 DNA molecules, and provide crucial functions that are used in different ways. As their Ul name suggests, the phos§ lG phatase and kinase enzymes are

*

tI)

.-

involved in the removal or addition of phosphate groups. Bacterial alkaline phosphatase (BAP) there is also a similar enzyme, calf-intestinal alkaline phosphatase (CIP) removes

tI)

'00 :::I tic tI) 0

·c

"'-g tI)

so ...III

... 1Il

IIlZ

';;0

5

Tools in Genetic Engineering

phosphate groups from the 5 ' ends of DNA, leaving a 5 '-OH group. The enzyme is used to avoid unwanted ligation of DNA molecules. It is also used before the addition of radioactive phosphate to the 5 ' ends of DNAs by polynucleotide kinase. Terminal transferase (terminal deoxynucleotidyl transferase) repeatedly adds nucleotides to any available 3 ' terminus. Although it works best on protruding 3' ends, conditions can be maintained so that blunt-ended or 3 '-recessed molecules may be utilized. The enzyme is mainly used to add homopolymer tails to DNA molecules prior to the construction of recombinants.

DNA

LIGASE

DNA ligase is a very important cellular enzyme. Its function is to mend broken phosphodiester bonds that may occur randomly or due to DNA replication or recombination. In genetic engineering, it is used to repair discontinuities in the sugar-phosphate chains that are produced when recombinant DNA is made by joining DNA molecules from different sources. It therefore acts as molecular glue, which is used to glue pieces of DNA together. This function is crucial to the success of many experiments, and thus in genetic engineering, DNA ligase is a key enzyme. The enzyme used most often in experiments is T4 DNA ligase, which is pUrified from E. coli cells infected with bacteriophage T4. The enzyme performs best at 37°C, but is used at much lower temperature (4-15°C) to prevent thermal degradation of the short base-paired regions that hold the cohesive ends of DNA molecules together.

2 ISOlATION OF NUCLEIC ACIDS In genetic engineering for every experiment of gene manipulation, a source of nucleic acid is needed in the form of DNA or RNA. Reliable methods are available for separating these compounds from cells. The whole process have three basic requirements: (a) Opening the cell in the sample cellulose bead so as to expose the nucleic acids for further processing. (b) Separation of the nucleic acid (a) ,...,-.: from other cell components, and ,..., -.: ,...,-.: (c) Recovery of the nucleic acid in ,..., -.: purified form. ,..., -.: , -.: Beginning with simple procedures, upto more complex purificaRNA solution tions, including several different stages / polyadenylated mRNA for which a variety of techniques are used.

~

MJe

LYSING OF CEllS

Breaking up Cells and Tissues There are many methods for the purification of nucleic acids, but they have a number of basic common characteristics. First of all we need the starting material, which could be a culture of bacterial or eukaryotic cells. These materials need to be isolated from the growth medium, or a more complex tissue sample. In the beginning, this separation requires to be homogenized so that the individual cells can be lysed. The material should be freshly harvested or frozen until ready to use, to avoid decomposition by enzymes present in the cell extract.

(d)

(c)

(b)

... ethanol supernatant- ... : .:

6

c\

~

lJ

higgh-salt wash

low-salt wash

!

mRNA

Fig. 2.1. Preparation of mRNA by affinity chromatography using oligo(dT)-cellulose. (a) Total RNA In solution is passed through the column in a high-salt buffer, and the oligo(dT) tracts bind the poly{A) tails of the mRNA. (b) Residual RNA is washed away with high-salt buffer, and (c) the mRNA is eluted by washing with a low-salt buffer. The mRNA is then precipitated under ethanol and collected by centrifugation (d).

6

Isolation of Nucleic Acids

7

For releasing their components, the cells are required to be lysed. According to the type of cell, the nature of treatment differs. Before the release of the cell contents, bacterial cell walls need to be broken out. This is generally accomplished by using lysozyme, often in combination with EDTA and a detergent like SDS (sodium dodecyl sulphate). Lysozyme is an enzyme which is present in the white part of the egg naturally. It works in breaking down bacterial cell walls. EDTA separates divalent cations and thus destabilizes the outer membrane in bacteria like E. coli. It also inhibits DNase, which would otherwise tend to degrade the DNA, while the detergent will solubilize the membrane lipids. In plants and fungi, cell wall differs from that of bacteria. and also needs different treatments, either enzymatic or mechanical. In order to denature and dissolve macerated hyphae leaving DNA intact, the technique uses SDS with phenol. Since the animal cells do not have cell wall, so they can generally be lysed by more gentle treatment with a light detergent. A complex solution of DNA, RNA, proteins, lipids and carbohydrates get released after breakdown of the cell walls and plasma membrane. Sudden lysis of the cell wall generally results in some breakdown of chromosomal DNA. Specially, the bacterial chromosome, which is usually circular in its native state, will be broken into linear fragments. For obtaining very large chromosomal DNA, more gentle lysis conditions are essential. However, bacterial plasmids are readily obtained by standard lysis conditions in their original circular state.

Enzyme Treatment Through a treatment of RNase, RNA can be easily removed from a DNA preparation. RNase, being a very heat stable enzyme is free to traces of deoxyribonuclease (DNase), which would otherwise degrade DNA, simply by heating the enzyme before use. Removal of DNA from RNA preparations used to be less easy, since it requires DNase without any RNase activity. However, now it is possible to buy RNase-free DNase as well as DNase-free RNase. By digestion with a proteolytic enzyme like proteinase K, protein contamination can be removed. In different nucleic acid purification methods, these treatments are used if required. However, they are left out in some methods either because the contamination is not significant for a specific purpose or because there may be other ways for removing the contaminants as given below: Phenol-chloroform Extraction There are a number of enzymes in a cell that will degrade nucleic acids, and other proteins that will intervene with subsequent procedures by binding to the nucleic acids, so the removal of proteins is very important. Therefore, extraction with liquefied phenol, or preferably a mixture of phenol and chloroform, is a classical and best method of removing proteins. Phenol and chloroform are generally insoluble in water, and so two layers can be obtained when added to the cell extract. The proteins will be denatured and precipitated at the interphase, when the mixture is vigorously stirred. The nucleic acids remain in the aqueous part when the phenol which is used is equilibrated with a neutral or alkaline buffer. Acidic pH

Neutral pH

Aqueous layer (DNA and RNA) Protein precipitate

Aqueous layer (RNA) Protein predpitate Phenol layer (DNA)

Phenol layer

Fig. 2.2. Phenol extraction.

8

Genetic Engineering

If extraction is being carried out with acidic phenol on the other hand, DNA will separate into the organic phase, which is wasted. This allows the recovering of RNA from the aqueous phase. Phenol is naturally acidic, so equilibrium with water, or the use of an acidic buffer, will produce the required conditions. In subsequent stages of manipulation when it becomes necessary to make sure that all traces of an enzyme have been removed before going to the next step, phenol extraction is also useful.

Alcohol Precipitation After 'employing the process of phenol extraction, a protein-free sample of nucleic acid is obtained. However, there are still traces of phenol and chloroform and also it is more dilute. In particular, phenol have a significant solubility in water. Thus, it can lead to denaturation of enzymes in subsequent steps. The solution is further concentrated by precipitating the nucleic acid. This is done by adding an alcohol, which is either isopropanol or ethanol and this is done in the presence of monovalent cations (Na+, K+ or NH/). Through centrifugation, nucleic acid is depOSited at the bottom of the test tube. Some of the salt will precipitate as well as removed by washing with 70% ethanol. The entire procedure of alcohol precipitation differs greatly on the basis of the nature of nucleic acid. Add ethanol and salt solutio,",

Centrifuge and remove liquid

DNA

DNA precipitate

Fig, 2,3, Ethanol precipitation,

Gradient Centrifugation Generally gradient centrifugation technique is used to prepare DNA, i.e. mainly plasmid DNA (pDNA). At moderate speeds, centrifugation is used in the purification of DNA for separation of particulate matter from a solution. It hardly matters that if it is for removing cell debris or the recovery of precipitated nucleic acids. It is also frequently used in several column purification methods. Apart from these methods, there is an old method for separating nucleic acids called ultracentrifugation. This includes the use of caesium salt gradients, generally with the addition of ethidium bromide for the isolation of plasmid DNA from bacterial genomic DNA, or of RNA from DNA. While forming a genomic library. sucrose gradients can be additionally used for size-selection of large DNA fragments. DNA molecules in these systems are isolated because of differences in size and/or configuration. Alkaline Denaturation In bacterial cell extracts, the separation of plasmids from chromosomal DNA is made possible by a process of alkaline denaturation. Due to the fragmentation that occurs during lysis of the cell, chromosomal DNA is present as linear fragments in a bacterial cell extract. The hydrogen bonding get broken up when pH is raised to about 12. This makes the separation of the linear strands of DNA easy. Being tougher, the plasmids are not disrupted by cell lysis. They remain intact as supercoiled circular DNA. Denaturation or renaturation can also be effected by the changes in pH. Due to high pH, the hydrogen bonds get disrupted and the two circular strands will not be able to separate physically and will remain interlinked. However, if the pH lowers down, the interlinked plasmid strands will snap back to reform the double-stranded plasmid. On the other hand, the separated linear chromosomal fragments cannot do this: instead they will aggregate into an insoluble network that can be removed by centrifugation, leaving the plasmids in solution. It is also observed that by using this

Isolation of Nucleic Acids

Chromosomal DNA as linear fragments; plasmid as covalently closed circular DNA

9

Denatured plasmid DNA strands remain interlocked; linear chromosomal fragments dissociate

Interlocked plasmid strands snap fogether,' chromosomal fragments aggregate

Fig. 2.4. Alkaline denaturation procedure for plasmid purification.

procedure, other cell components including cell wall debris and many proteins get removed, thus phenol extraction may not be required. Through this method, the plasmid preparation obtained is pure enough for many purposes including restriction digestion. Whenever further purificaion is required or is essential. then it can be attained by using affinity matrices. Column Purification There are two types of column purification method used for the purification of nucleic acids. Both the methods further require centrifugation. (i) Size-selection chromatography In this method, a sample is passed through a column of small porous beads. Smaller molecules like salts and unincorporated nucleotides will enter the beads while larger molecules like longer nucleic acid chains will pass right through the column. In order to purify alcohol precipitation, this type of purification is a valuable tool and generally a fast and simple method. (ii) Affinity chromatography purIfication Here the macromolecules present in the sample will bind to the resin in the column. This could be an anionic resin, which binds to the negatively charged phosphate groups in the nucleic acid backbone, or more sophisticated ones such as resins coated with oligo-dT sequences, which specifically bind to the poly-A tails of eukaryotic mRNA molecules. Undesirable molecules can be washed from the column in both cases. After this, the stringency conditions are changed and the bound nucleic acids get diluted into a small volume of water or buffer. DETECTION AND QUANTIFICATION OF NUCLEIC ACIDS

Estimation of the DNA concentration, in a reasonably pure DNA preparation, is made by measuring the absorbance of the solution in a spectrophotometer at 260 nm. This is convenient but not a very sensitive method-a solution of 50 Ilg/ml of double-stranded DNA will have an absorbance of 1. However, this estimate is affected by the presence of proteins or phenols. Dyes like ethidium bromide are commonly used for detecting and quantitating nucleic acids. Ethedium bromide has a flat ring structure which is able to stack in between the bases in nucleic acids. This is called intercalation. The dye can be detected by its fluorescence, i.e., in the redorange region of the spectrum when exposed to UV radiation. For staining electrophoresis gels, this is the most widely used method. By comparing the intensity of the fluorescence with a sample of known concentrat on the same gel, this process can also be used for estimating the amount of DNA

10

Genetic Engineering

or RNA in the given sample. Ethidium bromide should be used with great care to eliminate health risks, since it is a well known mutagenic. GEL ELECTROPHORESIS

To power supply

Gel electrophoresis is an important technique used for the purification of nucleic acids. When a charged molecule is placed in an electric field, it will move towards the electrode with the opposite charge; and nucleic acid molecules, being negatively charged, will move towards the positive pole (anode). The capability of penetrating through . ,.' .: :" '0 .. ' ..0: .' ' ... this network determines the rate at which a . . . . " nucleic acid molecule moves in a gel which consists of a complex network of pores. For Gel linear fragments of double-stranded DNA Buffer Platinum within a certain size range, this will reflect electrode the size of the molecule, i.e., the length of Fig. 2.5. A typical system used for agarose gel electrophoresIs. the DNA. In a gel, its effective size range is determined by its composition or formation. Agarose gels are used for separating nucleic acid molecules greater than a few hundred base pairs, reducing the M A B agarose concentration to obtain separation of longer fragments or increasing it for small fragments. Position of walls --c::::::J Polyacrylamide gels is always used for Measure the 23.1 distance of smaller molecules, for example, down 9.6 each band 6.6 to only a few base pairs. , 0 ' '0 •

'

•••

•••

0

'0'

••

'

:'

••••••

I

4.4

-

from the well

2.0

+

1.6-r--------------------...., 1.4 1.2 ~

O.B

~ 0.6 ,_---~.... 0.4 0.2 .j+-----+----'i~

o 0.2 0.4

o

10

20

30 40 50 60 Distance from the well

70

BO

0'

.,

t

Analytical Gel Electrophoresis

2.3

0.56

,0 • • • • • • • •

••

90

Fig. 2.6. Analytical gel electrophoresis; using the standard marker to proulde a calibration curve, the size of fragment A is estimated as 2.5 kb and fragment B as 6.0 kb.

Agarose gel electrophoresis is utilized for analysing the composition and quality of a nucleic acid sample. It is particularly valuable for determining the size of the DNA fragments from a restriction digest or the products of a peR reaction. By calibrating the gel, it is made possible by running a standard marker containing fragments of known sizes. HindIII digest of DNA from the lambda bacteriophage provides the standard marker. There is a direct linear relationship between the logarithm of the fragment size and the distance it has moved. The size of unknown fragments of double-stranded DNA can be known workedly from this calibration graph. The graph exhibits a non-linear curve because larger DNA molecules move through the gel in a different way. Also, the larger molecules are very

Isolation

0/ Nucleic Acids

11

thin so they can in effect slide through the gel end-on. It takes some time for them to become lined up, but once they are, then the rate at which they move is not dependent on their size. Thus, all molecules above a certain size will have virtually tl,e same mobility for a particular gel. All the DNA molecules which are larger than about 20 kb will not be separated from the gel. But the use of gels with a lower agarose concentration will extend the size range. Special techniques are available for separating very large DNA molecules. These techniques include frequent switching of the direction of the electric field. If a more precise confirmation of the nature of the sample is required, the gel is blotted onto a membrane support, and hybridized with a nucleic acid probe. Polyacrylamide gel electrophoresis offers a much precise size-separation of nucleic acid molecules, down to the isolation of fragments that only differ in size by one single base. This is used in methods like primer extension analysis and gel retardation assays, and has been made the basis for the development of DNA-sequencing methods. Most of the DNA Nicked, open circular plasmids Slower than supercoiled or linear molecules are linear, howformed by single-strand breads DNA of the same mass ever, plasmids are in open circular form. Native plasmids are supercoiled cirLinear molecules formed by double-strand breadds cular molecules, but if one Size dependent, with of the strands is broken different mobility (i.e. nicked), the loose 00000 Supercolled plasmids. native form, covallently closed circles ends are free to rotate. A linearized plasmid is produced through a doubleFig. 2.7. Electrophoretic mobility of forms of plasmid DNA. strand break. Although all three forms are the same size having same number of base pairs, they will move differently in a gel, with the open circular form moving more slowly than either the linear or the supercoiled DNA. It is more difficult to predict the relative mobility in the latter two forms. Thus, it will depend on the size and the electrophoretic conditions. Therefore, it is quite normal for a pUrified plasmid preparation to show two or three bands in a gel, but it does not necessarily mean that there is more than one plasmid. Mostly DNA molecules are double-stranded, however, single-stranded DNA has also been reported. In order to separate the hydrophobic bases from the aqueous environment, single-stranded nucleic acids tends to fold up into complex secondary structures. Through the folding ways, migration of the molecules get greatly influenced. In an unfolded state only, the true picture of the size of nucleic acid appears. The use of denaturing gels having denaturing agents like urea or formaldehyde is always recommended for this purpose.

0

Preparative Gel Electrophoresis In the purification of specific nucleic acid fragments from a complex mixture, Gel electrophoresIs also plays an important role. In this case, recovery of DNA fragment is made much easier by using a low-melting point preparation of agarose. After separating the sample, it is visualized with ethidium bromide. With the gel still on the transilluminator, and using suitable equipment to get protection from UV radiation, the bands that need to be purified are scraped using razor blade or scalpel. By using standard DNA purification procedures, DNA can be easily recovered and purified from the gel fragment. RADlOlABEWNG OF NUCLEIC ACID

Generally in all the cloning procedures, a major problem is that of keeping track of small amounts of nucleic acids involved. At each stage of the process, this problem get intensified because losses mean that the amount ~f material usually decreases after each step. One way of tracing the material is to label the nucleic acid with a radioactive molecule. Thus, for determining the amount present nucleic acid, each portion of reaction may be counted in a scintillation counter.

12

Genetic Engineering

Production of highly radioactive nucleic acid molecules for use in hybridization experiments is a second application of radiolabelling. These types of molecules are called radioactive probes, and possess a variety of uses. The difference between labelling for tracing purposes and labelling for probes is largely one of specific activity, i.e., the measure of how radioactive the molecule is. A lowspecific activity will suffice for tracing purpose while a high-specific activity is essential for probes. The PNK radioactive label is usually the high-energy ~-emitter 32p in case of probe prepara- (a) (b) tion. There are several methHO ..........,....,...........,.-,........-r-r- OH ods of labelling nucleic acid HO --r~""T'""rT'T"1-r- OH molecules, some of which HO -L-'-'--L....L.~--'-. OH are given below:

End Labelling In this technique, transfer of terminal phosphate group of ATP onto 5'-hy-

Fig. 2.8. End labelling DNA uSing polynucleotide kinase (PNK). (a) DNA is dephosphorylated using phosphatase, to generate 5'-OH groups. (b) The terminal phosphate of {y-32P}ATP (solid circle) is then transferred to the 5' terminus by PNK. The reaction can also occur as an exchange reaction with 5'-phosphate termini.

droxyl termini of nucleic acid molecule is done by the enzyme polynucleotide kinase. If the ATP donor is radioactively labelled, this forms a labelled nucleic acid of relatively low specific activity, as only the tennini of each molecule become radioactive.

Nick Translation This method simply depends upon the capability of the enzyme DNA polymerase I to translate, a nick created in the phosphodiester backbone of the DNA double helix. Nicks may occur naturally or may be caused by a low concentration (a) 5' I I I f l . I I I I I I I I I I I I I I I I I I I I 3' 3' I I I I , I I I , , I I I I I I I I I I I I I A LL5' of the nuclease DNase I in the reaction mixture. DNA polymerase I catalyses a nick strand replacement reaction which DNA poll incorporates new dNTPs into the DNA 3' chain. If one of the dNTPs supplied is 5' --r-r-r-r-,--r--..... -rT""lrr"TOrr'-r-T-r-T""1radioactive, the result is a highly-labelled (b) 3' -.L.....L..L....L...II..... ........t-'-~t-'-__~-'-.L...I-5' DNA molecule.

..

site of nick

NUCLEIC ACID HYBmDlZATION

Nucleic acid hybridization is used as an extremely sensitive detection method Labelling DNA by nick translation. (a) A single-strand nick capable of picking out specific DNA is introduced into the phosphodiester backbone of a DNA sequence from complex mixture, in fragment using DNase I. (b) DNA polymerase I then addition to providing infonnation about I

Fig. 2.9.

synthesizes a copy of the template strand, degrading the non-template strand with its 5' 3' exonuclease actiuity. If {a._ 32PjdNTP is supplied this will be incorporated into the newly synthesized strand (filled Circles).

sequence complexity. Generally a single pure sequence is labelled with 32p and used as a probe. Before using, the probe is denatured so that the strands are free to base-pair with their complements. The DNA to be probed is also denatured and is generally fixed to a supporting membrane made from nitrocellulose or nylon. Hybridization is carried out in a sealed plastic bag or tube at a temperature between 65-68°C for several hours to allow the fonnation of duplexes. By counting the sample in a scintillation spectrometer or preparing an autoradiogram, the excess probe is then washed off and the degree of hybridization can be monitored. While preparing the autoradiogram, the sample is exposed to the x-ray film. Nucleic acid hybridization is also applied for identifying cloned DNA fragments.

3 CUTTING AND JOINING CurnNG

DNA

MOLECULES

(DNA

DNA

MOLECULES

MOLECULE CLEAVING)

DUring early 1960s, there was no available procedures of cutting DNA at specific points. The methods for fragmenting DNA available were not very specific. The available endonucleases had little site specificity and very small fragments of DNA were produced through chemical methods. However, mechanical shearing was the most common method of cutting DNA. Duplex DNA molecules are made up of long, thin threads which are quite rigid. Thus, these cannot be broken easily through shearing forces in solution. Through intense sonication with ultrasound, the length of DNA can also be decreased to about 300 nucleotide pairs. More controlled shearing can be done by high-speed stirring in a blender. At random, breakage occurs essentially with respect to DNA sequence. The terminals include short-stranded DNAs which may have to be taken into account in further joining procedures. Some years later, phage biologists explained the biochemical basis of the phenomenon of host restriction and modification. The restriction endonuclease of E. coli K12 was isolated by Meselson and Yuan in the year 1968. This endonuclease cuts unmodified DNA into large specific fragments, and it was reasoned that it must recognize a target sequence. Thus, this was the beginning of the search for controlled manipulation of DNA. Unfortunately, the K12 endonuclease turned out to be perverse in its properties. Cleavage occurs at a 'random' site several kilobases away, while the enzyme does bind to a different recognition sequence. The much needed breakthrough finally came in 1970 with a discovery in Haemophilus inJluenzae of an enzyme that behaves more simply. This means that the enzyme recognizes a particular target sequence in a duplex DNA molecule, and in order to give rise to specific fragments of exact length and sequence, it breaks the polynucleotide chain within that sequence. Restriction Endonucleases The enZymes which are used to cleave DNA molecules in specific places are called "restriction endonucleases". They got their names from the phenomenon of host-controlled restriction and modification. This can occur when a bacteriophage preparation grown, using one bacterial strain is used to infect a different strain. For example-phage obtained using E. coli strain C will infect E. coli strain K very inefficiently. The growth of the phage obtained from E. coli C is restricted by E. coli K. The reason of this restriction of phage growth is that E. coli K produces an endonucleases, which is therefore called restriction endonuclease. It cuts the DNA into pieces, so that incoming phage DNA is rapidly broken down and only occasionally escapes to produce phage progeny. The host DNA must be protected against the action of the endonuclease, and this is attained by a second enzyme that modifies DNA, by methylation, which attaches itself to recognition sequence is not attacked by the endonuclease. 13

14

Genetic Engineering

Restriction and Modification (R-M) System Classification Four different kinds of RM systems are known-type I, Chromosomal type II, type III, and type lIs. DNA (a) Type-I-Qne enzyme with different subunits for recognition cleavage and methylation. It recognizes and methylates a single sequence but cut DNA upto Chromosomal Restriction endonuclease DNA is protected 1000 bp away. attacks incoming by methylation (b) Type-II-Two different enDNA zymes which recognize the same target sequence, which is symmetrical. The two enzymes either cut or modify the recognition sequence. Degraded (c) Type-III-Qne enzyme with phage DNA two different subunits, one for recognition and modification and one for cutting. Fig. 3.1. Bacteriophage restriction. It recognizes and methylates the same sequence but cleaves 24-26 bp away. (d) Type-lIs-Two different enzymes but recognition sequence is asymmetric. Cleavage occurs on one side of the recognition sequence upto 20 bp away. The type I system was the first to be systemized. The best known example of this is E. coli K12. The active enzyme have two restriction subunits-two modification (methylation) subunits and one recognition subunit. These subunits are the products of the hsdR, hadM and hsdS genes. ATP and S-adenosylmethionine both are needed from methylation and cutting reactions as co-factor. The recognition sequences are quite long with no recognizable features like symmetry. The enzyme also cuts unmodified DNA at some distance from the recognition sequence. However, the target DNA may be modified before it is cleaved because the methylation reaction is done by the same enzyme which mediates cleavage. These characteristics suggest that type I systems are of little value for gene manipulation. However, in E. coli strains, their presence can effect recovery of recombinants. Type III enzymes have asymmetrical recognition sequences but otherwise resemble type I systems and are of little value. Most of the useful R-M systems are of type II. They have several benefits over type I and III systems. First of all, restriction and modification are performed by separate enzymes, so it is possible to cleave DNA without modification. Secondly, the restriction activities do not require co-factors like ATP or S-adenosylmethionine, making them easier to use. Most important of all, type II enzymes recognize a defined, usually symmetrical sequence and cut within it. Most of them also make a staggered break in the DNA. In type lIs system, co-factors and macromolecular structure are similar to those of type II systems. Specificity While manipulating DNA, a very large number of type II restriction endonucleases are featured and are also used. First of all, they are identified by the name of the organism from which they are obtained, using the first letter of the genus and the first two letters of the species name, tog~ther with

15

Cutting and Joining DNA Molecules

a suffix indicating the specific enzyme from that species. Thus, PstI indicates a specific enzyme obtained from the bacterium prooidencia stuartii and Hael, Haell and HaeIII indicates three different enzymes with different specificities, from Haemophilus aegypticus. The convention is to write first part of italics, just as it is done with the species name.

Nomenclature For a uniform system of nomenclature, discovery of large number of restrictions and modifications are needed. Smith and Nathans (1973) proposed a suitable system whose simplified version is used today. Its important features are as follows: (al The name of the species of the host organism is identified by the first letter of the genus name and the first two letters of the specific epithet to generate a three-letter abbreviation. This abbreviation is always written in italics. (b) When a particular strain has been the sourc~, thenlhis is identified. (c) When a particular host strain has several different R-M systems, these are identified by roman numbers. Table 3.1. Examples of restriction endonuclease nomenclature. Enzyme

Enzyme source

Recognition sequence

Sma! HaeIII HindII HindIII

Serratia marcescens, 1st enzyme Haemophilus aegyptius, 3rd enzyme Haemophilus injluenzae, strain d, 2nd enzyme Haemophilus injluenzae, strain d, 3rd enzyme Bacillus amyloliquejaciens, strain H, 1st enzyme

CCCGGG GGCC GTPyPuAC AAGCIT GGTCC

BamHI

Recognition Sequences Most of the type-II restriction endonucleases (but not all) recognize and cut DNA within particular sequences of four to eight nucleotides which possess a two-fold axis of rotational symmetry. These intermolecular association intramolecular association types of sequences, because of their similarity to words that read the same back5,AATI wards as forwards, are generally called 5'..;..A..;..ATI____ 5' -~n~A~A5' pallindromes. For example, the restriction TIAA and modification enzymes R.EcoRI and 5' AATI b"'-08;,;n9 M.EcoRI recognize the sequence: -=n~A~A 5' 5' -GAA TTC-3' 3'-CTT AAG-5' base· pairing Axis of symmetry The symbol '/' determines the position at AATI 5,AATI which, the restriction enzyme cuts and the TIAA 5' TIAA nucleotides methylated by the modification enzyme are generally marked with an asterisk. Fig. 3.2. Cohesive ends of DNA fragments produced by digesti~n with EcoRl. For EcoRI these would be shown as: 5'-G/AA*TTC-3' 3'-CTTA*A/G-5' It is an usual practice for convenience to simplify the description of recognition sequences by showing only one strand of DNA, that which runs in the 5' to 3' direction. Thus, the EcoRI recognition sequence would be shown as G/AATTC. From the above information, it can be observed that EcoRI makes single-stranded breaks four bases apart in the opposite strands of its target 'sequence. Thus, fragments with protruding 5' termini are generated.

j

~

16

Genetic Engineering

5'-G 5'-MTIC-3' 3'-cTIM-5' G-5' In between the overlapping 5' termini, these DNA fragments can be associated by hydrogen bonding, or the fragments can be circularized by intramolecular reaction.

Table 3.2. Some restriction endonucleases and their recognition sites. Enzyme 4-Base cutters MOOI, DpnI, Sau3AI MspI, HpaII AluI HaeIII

Tail

Recognition sequence /GATC C/CGG AG/CT GG/CC ACGT/

6-Base cutters . Bglll

CIa! Pvull PvuI KpnI 8-Base cutters NotI Sbjl

A/GATCT AT/CGAT CAG/CTG CGAT/CG GGTAC/C GC/GGCCGC CCTGCA/GG

Not all type-II enzymes cleave their target sites like EcoRI. Some, such as Pst! (CTGCA/G), produce fragments bearing 3' overhangs, while others, such as SmaI (CCC/GGG), produce blunt or flush ends.

Isoschizomers From the existence of several enzymes that recognize the same sequence of bases, there arises an additional element of flexibility. These are called isoschizomers (Greek iso, equal; skhizo, to split). In most of the cases, isoschizomers not only have the same recognition site, but cut in the same place within that recognition sequence. Although this can be of some use, for technical reasons, we don't need to go into detail, more important are examples of isoschizomers which recognize the same sequence but cut in a different position within that sequence. For example, Acc65I and Kpnl both recognize the sequence GGT ACC, but cut it at a different place. Therefore, they can generate different sticky ends. This gives you the option of obtaining virtually the same fragment of DNA but with different sticky ends that can be ligated to other fragments. Another example is the pair of isoschizomers. XmaI and SmaI. XmaI cuts asymmetrically and produces sticky ends that can be ligated to other XmaI fragments, while SmaI, as mentioned above, will generate blunt-ended fragments at the same site, allowing to ligate the fragment to other blunt-ended DNA sequences. Number and Size of Restriction Fragments The number and size of the fragments produced by a restriction enzyme depend on the frequency of occurrence of the target site in the DNA to be cleaved. Assuming a DNA molecule with a 50% G+C content and a random distribution of the four bases, a four-base recognition site occurs every 44 (256) bp. Similarly, a six-base recognition site occurs every 4 6 (4096) bp and an eight-base recognition sequence every 4 8 (65,536) bp. In practice, there is not a random distribution of the four bases and many organisms can be AT- or GC-rich, e.g. the nuclear genome of mammals is 40% G+C and the dinucleotide CG is five-fold less common than statistically expected. Similarly, CCG and

17

Cutting and Joining DNA Molecules

5'nn1uunr T1j 111 I j j T1T -

ACC651 , 3

s Kpn1

3'

5'



5'

5'lIn HrUIU

3'

3.1 1T nn1.. ,,'1111.

~

t

bIll ItIJrI I1 _

TlI~fnrrT1t

Fig. 3.3. Isoschizomers.

CGG are the rarest trinucleotides in most A+ T-rich bacterial genomes and CTAG is the rarest tetranucleotide in G+C-rich bacterial genomes. Thus, different restriction endonucleases with sixbase recognition sites can generate average fragment sizes significantly different from the expected 4096 bp.

Table 3.3. Average fragment size (bp) produced by different enzymes with DNA from different sources . Enzyme

Target

ApaI AvrIl BamHI DraI SpeI

GGGCCC CCTAGG GGATCC TITAAA ACTAGT

. Arabidopsis

Nematode

Drosophila

E. coli

Human

25000 15000 6000 2000 8000

40000 20000 9000 1000 8000

6000 20000 4000 1000 9000

15000 150000 5000 2000 60000

2000 8000 5000 2000 10000

There are some of the sites in the same DNA molecule where certain restriction endonucleases show preferential cleavage. For example, phage J.. DNA has five sites for EcoRI but the different sites are cleaved non-randomly. The site nearest the right terminus is cleaved 10 times faster than the sites in the middle of the molecule. There are four sites for SacII in J.. DNA but the three sites in the centre of the molecule are cleaved 50 times faster than the remaining site. There is a group of three restriction enzymes which show an even more dramatic site preference. These are NarI, NaeI and SacII and they require simultaneous interaction with two copies of their recognition sequence before

18

Genetic Engineering

they will cleave DNA. Thus, Narl will cleave two of the four recognition sites on plasmid pBR322 DNA rapidly but will seldom cleave the remaining two sites. Variations on Cutting and Joining DNA Molecules For joining two fragments of DNA together, it is not necessary that they should be produced by the same restriction endonuclease. There are several compatible cohesive ends which are produced by many different restriction endonu-GAATTCcleases. For example, Agel (AI -CTTAAGCCGGn and Aval (C/CCGGG) produce molecules with identical EcoRI 5' overhangs and so can be li-G AATTC--gated together. What is more, if -CTTAA G--the cohesive ends were produced by six-base cutters, the ligation DNA polymers products are often recleavable by four-base cutters. Thus, in the -GAATT AATTC--example cited above, the hybrid -CTTAA TTAAG--site ACCGGG can be cleaved by Hpall (C/CGG), Ncil (CC/GGG) DNA ligase or ScrFI (CC/NGG). -GAATTAATTCBy filling in the overhangs -CTTAATTAAGgenerated by restriction endonucleases and joining the products together, new restriction sites can be produced. After filling in the cohesive ends produced by EcoRI, Tsp5091 ligation produces restriction sites -GAATT AATTC---GAAT TAATTC--recognized by four other enzymes. -CTTAA TTAAG----CTTAM ThAG----There are several known examples of creating new target sites by fill-G AATTC--ing and ligation. G-CTTAA There are also many ex(Identical to EcoRI cleavage) am pies of combinations of blunt- Fig. 3.4. The generation of three new restriction sites after filling in the end restriction endonucleases overhangs produced by endonuclease EcoRI and ligating the products which produce recleavable ligation together products. For example, when molecules generated by cleavage with Alul (AG/Cn are joined to ones produced by EcoRV (GATI ATC), some of the ligation sites will have the sequence GATCT and others will have the sequence AGATC. Both can be cleaved by Mbol (GATC). A methyltransferase, M.SssI, that methylates the dinucleotide CpG has been separated from Spiroplasma. This enzyme can be used to modify in vitro restriction endonuclease target sites which contain the CG sequence. Some of the target sequences modified through this way will be resistant to endonuclease cleavage, while others will remain sensitive. In the genomic DNA of many animals, including vertebrates and echinoderms, 90% of the methyl groups occur as 5-methylcytosine in the sequence CG. M.Sss can be used to imprint DNA from other sources with a vertebrate pattern.

1

j

1

;/

Dam and Dem Methylases of E. coli In most of the laboratory strains of E. coli, three site-specific DNA methylases are present. The methylase encoded by the dam gene transfers a methyl group from S-adenosylmethionine to the N6 position of the adenine residue in the sequence GATe. The methylase encoded by the dcm gene (the

Cutting and Joining DNA Molecules

19

Dcm methylase, previously called the Mec methylase) modifies the internal cytosine residues in the sequences CCAGG and CCTGG at the CS position. In DNA in which the GC content is 50%, the sites for these two methylases appear, on an average, every 256 - 512 bp. The enzyme M.EcoKI is the third methylase but the sites for this enzyme are much rarer and occur about once every 8 kb. Due to two main reasons, these enzymes are of great importance. First, some or all of the sites for a restriction endonuclease may be resistant to cleavage when isolated from strains expressing the Dcm or Dam methylases. This is brought about only when a particular base in the recognition site of a restriction endonuclease is methylated. The relevant base may be methylated by one of the E. coli methylases if the methylase recognition site overlaps the endonuclease recognition site. For example, DNA separated from Dam+ E. coli is completely resistant to cleavage by MboI, but not Sau3AI, both of which identified the sequence GATC. In the same way, DNA from a Dcm+ strain will be cut by BstNI but not by EcoRII, even though both recognize the sequence CCATGG. It is INorth noticing that most cloning strains of E. coli are Dam+ Dcm+ but double mutants are available. The second reason these methylases are of importance is that the modification state of plasmid DNA can affect the frequency of transformation in special situations. When Dam-modified plasmid DNA is introduced into Dam- E. coli or Dam- or Dcm-modified DNA is introduced into other species, the transformation accuracy gets decreased. It is best to use a strain lacking the Dam and Dcm methylases whenever DNA is to be moved from E. coli to another species. As will be seen later, it is difficult to get stably clone DNA that contains short, direct recurring sequences. Deletion of the recurring units occurs quickly, even when the host strain is deficient in recombination. However, it appears that Dam methylation includes the deletion mechanism, for it does not occur in dam mutants.

Importance of Eliminating Restriction Systems in E. coli Strains Used as Hosts for Recombinant Molecules If foreign DNA is introduced into an E. coli host, it may be attacked by restriction systems active in the host cell. In these systems, an important feature is that the fate of the incoming DNA in the restrictive host depends not only on the sequence of the DNA but also upon its history: the DNA sequence mayor may not be restricted, depending upon its source immediately before transforming the E. coli host strain. The post-replication modifications of the DNA are usually in the form of methylation of particular adenine or cytosine residues inthe target sequence. This provides protection against cognate restriction systems but not, in general, against different restriction systems. It is quite normal to use a K restriction-deficjent E. coli K12 strain as a host in transformation with newly formed recombinant molecules because restriction provides a natural defence against invasion by foreign DNA. Thus where, for example, mammalian DNA has been ligated into a plasmid vector, transformation of the EcoK restriction-deficient host removes the possibility that the incoming sequence will be restricted, even if the mammalian sequence contains an unmodified EcoK target site. If the host happens to be EcoK restriction-deficient but EcoK modification-proficient, propagation on the host will confer modification methylation. Thus, if desired, the subsequent propagation of the recombinant in EcoK restriction-proficient strains is allowed. Whereas the EcoKI restriction system, encoded by the hsdRMS genes, cuts DNA that is not protected by methylation at the target site, the McrA. McrBC and Mrr endonuc1eases cut DNA that is methylated at specific positions. All three endonucleases restrict DNA modified by CpC methylase (M.SssJ) and the Mrr endonuclease will attack DNA with methyladentne in specific sequences. The significance of these restriction enzymes is that DNA from many bacteria, and from all plants and higher animals, is extensively methylated. In cloning experiments, its recovery will be greatly decreased if the restriction activity is not removed. There is no problem with DNA from Saccharomyces cereulsiae or Drosophila melanogaster since there is little methylation of their DNA. In E. coli, all the restriction systems are grouped together in an 'immigration control region' about 14 kb in length, Some strains carry mutations in one of the genes. For example, strains DH1

20

Genetic Engineering

and DH5 have a mutation in the hsdR gene and so are defective for the EeoKI endonuclease but still mediate the EeoKI modification of DNA. Strain DP50 has a mutation in the hsdS gene and so lacks both the EeoKi restriction and mcIC mc,S hsdS hsdM hsdR mrr modification activities. Other strains, like E. coli C and the widely used 14 kb cloning strain HB10l, consists of a -------deletion of the entire merC - mrr Fig. 3.5. The immigration control region of E. coli strain K12. region and hence lack all restriction activities. Enzyme Quality Importance There are several different market sources from which restriction enzymes can be procured. In selecting a source of enzyme, it is important to consider the' quality of the enzyme supplied. Highquality enzymes are purified extensively to remove contaminating exonucleases and endonucleases and tests for the absence of such contaminants form part of routine quality control (QC) on the finished product. The absence of exonucleases is particularly important. This presence can nibble away the overhangs of cohesive ends, thereby eliminating or decreasing the production of subsequent recombinants. Contaminating phosphatases can remove the terminal phosphate residues, thereby preventing ligation. The resulting product may contain small deletions, even where subsegment ligation is achieved. A typical QC procedure is as follows. Through an excessive overdigestion of substrate DNA with each restriction endonuclease DNA fragments are formed. These fragments are then ligated and recut with the same restriction endonuclease. Ligation can occur only if the 3' and 5' termini are left intact, and only those molecules with a perfectly restored recognition site can be recleaved. After cleavage, appearance of a normal banding pattern indicates that both the 3' and 5' termini are intact and the enzyme preparation is free of detectable exonucleases and phosphatases. The blue/white screening assay is an additional QC test. The basic material in this case is a plasmid carrying the E. coli laeZ gene in which there is a single recognition site for the enzyme under test. With the help of restriction enzyme, the plasmid is overdigested, religated and transformed into a lacZ- strain of E. colt. The transformants are plated on media containing the ~-galactosidase substrate Xgal. If the IaeZ gene remains intact after digestion and ligation, it will give rise to a blue colony. A white colony will be produced if any degradation of the cut ends appears.

//L-lililllliiiiiiiilDii.iiii[]•••••i.,//

LIGATION

In gene cloning, the next step is to join the DNA fragment to a vector molecule, like plasmid or bacteriophage, that can be replicated by the host cell after transformation. There is an enzyme called DNA ligase, which facilitates the joining, or ligation of DNA fragments. To repair single-stranded breaks in the sugar-phosphate backbone of a double stranded DNA molecule is the natural role of DNA ligase. For example, it may occur through damage to DNA (or following the repair of such damage), as well as the joining of the short fragments formed as a result of replication of the 'lagging strand' during DNA replication. The action of the ligase requires that the nick should expose a 3'-OH group and a 5'-phosphate. Digestion with restriction endonucleases cuts the DNA in this way, i.e" it leaves the phosphate on the 5' position of the deoxyribose. The unstable pairing of two restriction fragments with compatible sticky ends can therefore be considered as a double-stranded DNA molecule with a nick in each strand, very close together. Therefore, it serves as a. substrate for DNA ligase action. Some DNA ligases, like T4 DNA ligase (encoded by the bacteriophage T4), are also capable of ligating blunt-ended fragments, but much less efficiently, while others (notably the E. coli DNA ligase) are not; they require pairing of overlapping ends. In order to keep things simple, only the action of T4 DNA ligase is considered which is by far the most extenSively used. We will refer to this enzyme as T4 ligase (or just 'ligase') although there is also a much less commonly used T4 RNA ligase.

21

Cutting and Joining DNA Molecules

ATP is needed as a co-substrate for the T4 ligase. In' the first step, the ligase S' reacts with ATP to form a covalent enzyme-AMP complex. This in turn reacts with the 5' -phosphate on one side of the nick, transferring the AMP to the phosphate group. In the final stage, the 3'OH group attacks leading to the formation of a new covalent phosphodiester bond (thus restoring the integrity of the sugar-phosphate backbone) and ,releasing AMP. For the 5'-phosphate, the absolute requirement is extremely important; by removing the 5 '-phosphate. Thus, the occurrence of unwanted ligation can be prevented.

Nick 3'

Nicked DNA

Ligase-AMP ~ PPi ' - - - - - - - - - Ligase

~

ATP

,~

.

~.

,I

Optimum Conditions for Ligation In the cloning process, ligation can be one of the most unpredictable steps. Factors that may compromise the success of ligation include the presence of inhibitory material contaminating DNA preparations and degradation of the enzyme or the DNA. In addition, the Nicked DNA Is Joined. conditions need to be adjusted correctly to achieve the optimum effect. Among these conditions, temperature is the, most controversial one . Originally, it was thought that since ligation was In effect Fig, 3.6. Action oj T4 DNA ligase. a repair of single-strand nicks In a double-strand molecule, we should optimize the pairing of the fragments. Therefore, much of the early work used ligation at 10°C (or even 4°C ), which needs long incubation since the enzyme activity is low at that temperature. Many protocols now recommend 16°C, but this is inconvenient unless you have a cooled water bath at hand. Room temperature is often used as a compromise. According to the technical advances, buffers are now commercially available that allow much faster and more efficient ligations. Since the expected reaction normally involves two different molecules of DNA (intermolecular ligation), it would be extremely sensitive to DNA concentration. It is therefore important to use high concentrations of DNA; but which DNA component? There are two components: the vector and the insert. Therefore, there are a number of possible reactions that can occur. Thus, the likelihood of these different reactions can easily be influenced just by adjusting the relative amount of the components as well as overall concentration of DNA. It is more likely to obtain the two ends of the same molecule joining (an intra molecular reaction), at low concentratiol1 of DNA. Since the rate of a reaction involving one component will be linearly related to its concentration, a reaction involving two different components will be proportional to the product of the two concentrations. It is easy to obtain a greater enhancement in the ligation of two vector molecules together (which we don't want) than in the production of the recombinant vector-insert product if we increase the concentration of the vector, but not the insert. Conversely, increasing insert concentration will

22

give increased levels of insert-insert dimers (which we also do not want, although they are less of a problem as they will not give rise to transformants).

Genetic Engineering

Vector

Insert

. .... .. ....... ............ ....... .

.0

1 Circular

Thus, it is not only important to' keep the overall DNA : monomers concentration high, but creation of an optimum vector-insert ................................. . ratio is also desirable. It is tough to predict reliably what ...................... . . ...... ...... . .... ... ...................... that ratio should be but typically it would range from 3: 1 : Linear : dimers to 1:3. Note that these are molar ratios and have to take I 1 (or ~igher account of the-relative size of the vector and the insert. For : example, if the vector is 5 kb and the insert is 500 bases, :..................................: multimers) then a 1: 1 molar ratio would involve 10 times as much ........... _.............. _....... . vector, by weight, as insert (e.g. 500 ng of vector and 50 ~ Circular : dlmers ng of insert). Conversely, for the same 5 kb vector but an insert size of 50 kb, if you used 500 ng of vector, you would need 5 J.19 of inSert to achieve a 1: 1 molar ratio. In .................................... general, to convert the amount of DNA by weight into a value that can be used to calculate the molar ratio, divide Recombinant plasmid the amount used (by weight) by the size of the DNA Thus, if Wyand WI are the weights used of vector and insert Fig. 3.7. Lfgatron-some of the potentIal DNA respectively, and 5y' 51 are the sizes of vector and products; only the shaded cIrcles are insert (e.g. u{ kilobases), then the vector:insert ratio is W/5y: expected to be replicated after W(SI ' transformatIon . A further complication arises if the work is proceeding with a heterogeneous collection of potential insert fragments, as would be the case if we were making a gene library. PUtting more insert DNA Into the ligation mixture will increase the possibility of obtaining multiple inserts. In other words, we may produce recombinant plasmids that carry two or more completely different pieces of DNA This 15 not a good Idea. If we come to characterize the clones In the library and try to relate them to the structure of the genome of our starting organism, It may lead to seriously' misleading results. 50 adjusting the relative amount of the vector and Insert will not only influence the success of ligation but will also have an effect on the nature of the products formed. Intramolecular ligation of vector molecules, or formation of vector dimers, will result in transformants that do not carry our Insert, while putting in too much insert will result in insert dimers (which will not give transformants) or recombinant plasmids carrying multiple inserts. Fortunately, in order to obtain desired results, we do not need to depend entirely on the adjustment of the DNA levels. ~ne Phosphatase The ligation process depends absolutely on the presence of a 5' -phosphate at the nick site. It is impossible to ligate the site without removal of this phosphate group. Removal of 5' -phosphate groups is performed with an enzyme known as alkaline phosphatase (because of its optimum pH) - the most commonly used enzyme is calf-intestinal phosphatase (CIP). Before ligation, treatment of the vector molecule with CIP will remove the 5' -phosphates~ and so make it impossible for selfligation of the vector to occur. Ligation of vector to insert can occur, however, as the insert still has its 5' -phosphates. It is of course important to remove the CIP before ligation, and this is best done. by phenol extraction and subsequent ethanol precipitation as heat inactivation is not sufficiently reliable. If the same vector is used again and again, it is possible to carry this out on a larger scale and store the unused treated vector for further experiments. This is desirable as CIP treatment is not entirely reliable, and a bulk preparation of dephosphorylated vector that you have already used successfully is a considerable asset in subsequent experiments. If a standard vector is used, a ready-made and tested dephosphorylated vector can be purchased.

.

.

O.

23

Cutting and Joining DNA Molecules Vector

HO P

Insert

mr111++lIIII1 --'----- rI I I I 11 ++ll11I r Ligation

TTfIfrriiITffTl

TTfTflliiUllll POH Dephosphorylated vector HO P

mr11 I! Ji"'ihn

Ligation

II I I I 11 +tII1III

TTITf IIIlUllll --'TTlTfirnUllll

.

~HO:OH

. ....

Phosphate group removed

·.HO:OH ' .. "

Nick unligated

Fig. 3.B. Ligation-effeet of dephosphorylation of vector.

However, further inspection of the situation will show that it is not that simple. There are two nicks present at each function between the two DNA fragments. These nicks require repairing. At one • of these, the 5' -phosphate is supplied by the insert, and that can be ligated normally. However, at the other nick, there should be a 5'-phosphate on the vector, and that has been removed. So the second nick cannot be mended. Actually it is not a real problem. In its DNA, the recombinant plasmid will have unrepaired nicks, i.e., one at each end of the inserted fragment. But by virtue of the base pairing all along the inserted DNA fragment, it will hold together very stably. So we have hundreds or thousands of base pairs holding the two strands together. It will be maintained stably at 37°C, as a double-stranded molecule. When this nicked molecule is introduced into a bacterial cell by transformation, enzymes within the cell will rapidly repair the remaining nicks, by adding the missing phosphates and ligating the broken ends. Without any difficulty the plasmid will be replicated. Potentially some of the other problems still persist. There will be still insert dimers, or multiple inserts. However, since we no longer have to worry about self-ligation of the vector, we can increase vector concentration relative to the insert and thus drive the reaction in the direction we want. Even so, if we are making a gene library, when the possibility of multiple inserts is at its most serious as a problem, we may want to tum this strategy to our benefit. Phosphatase treatment of the insert rather than the vector will prevent multiple inserts, and by using special vectors that are unstable to produce clones unless they carry an inserted DNA fragment, the occurrence of non-recombinant transformants (carrying self-ligated vector rather than vector-insert recombinants) cannot be prevented. Double Digests If possible, we should try to avoid using alkaline phosphatase. The treatment easily overdigests, not only removing the phosphate group but modifying and disabling the sticky end. In addition, the required phenol extraction and ethanol precipitation are lengthy process and can also result in loss of, or damage to, the DNA. If it is simply tried to make a specific recombinant plasmid, and have efficient ways of differentiating self-ligated vectors, a perfectly optimal ligation is not essential. Only one clone is required eventually, and as long as it could be identified without trawling through ' hundreds of others, it will be content.

24

Genetic Engineering

However, there are situations when it is necessary to ensure that a high proportion of the colonies obtained are genuine recombinants. For example, if it is tried to create a representative DNA library, self-ligated vectors and multiple inserts are a real problem. Even simple binary ligations can sometimes be problematic, in particular if it happens to do a ligation where there is no easy way of rapidly differentiating recombinant plasmids from self-ligated vectors. If it is essential to avoid the use of alkaline phosphatase, one alternative method of avoiding selfligation of vector and insert is to cut both components with two different restriction enzymes. Most modem vectors have multiple cloning sites so we can cut the vector with, for example, EcoRI and BamHI. Provided that both enzymes have cut efficiently (and that we have removed the small fragment between the two sites), then vector re-ligation will be impossible. If our insert fragment has been digested with the same two enzymes, then virtually all the colonies obtained will be recombinant, i.e., they will contain the insert fragment. Still at present, there are some other possible reactions like ligation of the two vector molecules together. But, comparatively, these types of events occur rarely. With double digests, there are two types of problems. The first is to find optimum conditions that allow both enzymes to work. Suppliers' catalogues often contain tables of recommended buffers for a variety of combinations of restriction enzymes. Enzymes that require widely different temperatures are more difficult, and you may have to use the enzymes one after another. If the two sites are close together, then the second problem appears .. Some restriction enzymes do not cut very effiCiently at a recognition site that is near to the end of a DNA molecule. MODIFICATION OF RESTRICfION FRAGMENT ENDS

Restriction fragments with sticky ends are useful as they can be readily ligated. However, on their usefulness, there is a limitation, as they can only be ligated to another fragment with compatible ends. Sq an EeaRJ fragment can be ligated to another EeaRI fragment, but not to one generated by BamHI. Not only we loose the advantage of the cohesive ends, but the unpaired DNA gets in the way, making us better off with blunt-ended fragments. This is a potential problem, as the vectors which we are using will only have a limited number of possible sites into which we can insert DNA (and we may specifically want to put our insert in a particular position), and it may not be possible to produce a suitable insert with the same enzyme. In this section, we will look at ways in which the ends of a DNA fragment could be modified to enable it to be ligated with other sites.

Trimming and Filling Conversion. of the sticky ends into blunt ends, either by filling in the complementary strand or by trimming back the unpaired sequence, is the one strategy that enables a more flexible use of restriction fragments. If the restriction fragment has a 5' overhang (such as those generated by EcoRn, the 3' OH group provides a primer site that can be extended by DNA polymerases, thus filling in the recessed end and converting it to double-stranded. However, some DNA polymerases (such as E. coli DNA polymerase I) also have 5'-3' exonuclease activity, which would remove the 5' overhang (and may be more); these enzymes are not suitable for filling in the ends. Therefore, for this purpose, you would use an enzyme lacking this 5'-3' exonuclease activity such as the Klenow fragment of E. coli DNA polymerase I (a proteolytic product of E. coli DNA polymerase I in which the region responsible for the 5'- 3' exonuclease activity has been removed). Restriction fragments with 3' sticky ends, such as those produced by PstI, cannot be filled in, but the unpaired sequences can be trimmed back, using the 3'- 5' exonuclease activity of an enzyme such as T4 DNA polymerase. Linkers and Adapters

Both the above procedures, filling in or trimming back, result in conversion of sticky ends to blunt ends, which can then be ligated to any other blunt-ended fragment, however produced. This is a start, but it is still some way short of the complete versatility that is our goal, especially as blunt-end

25

Cutting and Joining DNA Molecules

5'mn

a) Filling in

OH3' Extension by addition of nucleotides to 3'-OH ------------------~

3,lTTIf I lit. EcoRI sticky end

5'1111 I1I!r 3.1 j TIf I lit,

b) Trimming back

5'

Pst1 sticky end 3

b ! b1

~ ~ ~ b+

3,rnn,

'

rm1

3

Removal unpaired bases by of 3'-5' exonuclease action

5'

'

-----------. 3,ITlli

, s

Fig. 3.9. Converting sticky ends to blunt ends.

ligation is so inefficient, and requires a much higher concentration of DNA (which may be difficult to come by). By the use of linkers, this process can be made much more efficient, and much more versatile. These are short synthetic pieces of DNA that contain a restriction site. For example, the sequence CCGGATCCGG contains the 5'1 I I I I I I I I 13' &mHI site (GGATCC). Furthermore, CCGGATCCGG it is self-complementary, so you only Self-complementary need to synthesize (or buy) one strand; oligonucleotide two molecules of it will anneal to GGCCTAGGCC produce a double-stranded DNA ) I I I I I I I I 15, fragment 10 base pairs long. If this is Anneal joined to a blunt-ended potential insert Blunt-ended (Sma1) fragment fragment by blunt-end ligation, your 5'1 I I I I I I I I 13' fragment now will have a BamHI site CCGGATCCGG near each:"end. Cutting this with rGCCTAGGCC &mHI will generate a fragment with I I I I I I I I Is ~ &mHI sticky ends, that can be ligated with a BamHI-cut vector. If your insert already has an internal BamHI site, this is not possible. The alternative is to use adapters rather than linkers. 5'" It may object to the fact that to join the linker to the potential insert, , 3' - 5 this still requires blunt-end ligation. However, the efficiency of blunt-end Cot "h BamH 1 ligation can be markedly improved by using high concentrations of at least 5'" ~ ~ one of the components. In this case, you can easily produce and use large 3' -. , amounts of the linker. Furthermore, 5 Fragment with sticky ends since the linker is very small (e.g. 10 bases). and it is the molar concenFig. :':.10. Linkers.

i

[!] ~ III ~ bbbbbbbI +bbIt'

TT?Trrr??III I ??I I

~ (!] III bbbbbt' TT?Trrr??I I I I r

26

Genetic Engineering

tration that is important, even modest amounts of the linker by mass will represent an enormous excess of linker in molar terms. For example, if you use 100 ng of a 1 kb insert, then 10 ng of linker will represent a 10: 1 linker:insert ratio. The reaction could effectively be driven by the high molar concentration of the linkers. Of course, this efficient ligation is likely to add multiple copies of the linker to the ends of your insert, but this is not a problem-the subsequent restriction digestion will remove them. Further, versatility can be obtained by the use of adaptors. These are pairs of short oligonucleotides that are designed to anneal together in such a way as to create a short doublestranded DNA fragment with different sticky ends (or with one sticky and one blunt end). For example, the sequences S'GATCCCCGGG and S'-AAITCCCGGG will anneal to produce a fragment with a BamHI sticky end at one end and an EcoRI sticky end at the other, without needing to be cut by a restriction enzyme. Ligation of this adaptor to a restriction fragment generated by BamHI digestion will produce a DNA fragment with EcoRI ends that can now be ligated with an EcoRI cut vector. Two synthetic, partly complementary, oligonucleotides

G G

c: C C C T

T A A

,I 11 I 1 1 I 1 1 I,

5'

~

-rl--rl--'Ir--r-I"'-1"T"T......1--r1-,13' ~ 3 A T C C C eGG G

5'····~ ~

5

Anneal



r

5'~ 1+~ 666~ g

JI ~ b3'

GGC:9C9TTAA

3,1

T~?~CCTAG

I 11 I 1 I I I Is'

I I I I 15'

3' .. ··

Fragment with BamH1 sticky ends

Ligate

t

5'· .. ·

dJ ~ JI ~ ~ GA I I +CCCCGG I I I I I I

3' ....

TT?TTTTi ????111 I I i is, Fragment with EcoR1 compatible ends

Fig. 3.11. Adaptors.

Alternatively, an adaptor with one sticky end and one blunt end Can be used to convert bluntended DNA fragments, such as those generated by cDNA synthesis, into fragments with a sticky end, which increases the cloning efficiency substantially. As with linkers, you can use high molar concentrations ,of adaptors to drive ligation very efficiently, and the use of adapters with non-phosphorylated sticky ends ensures that you will not get multiple additions of the adaptor to the end of your DNA fragment. Homopolymer Tailing

For joining DNA molecules make a general method and use of the annealing of complementary homopolymer sequences. Thus, by adding oligo(dA) sequences to the 3' ends of one population of DNA molecules and oligo(dll blocks to the 3' ends of another population, the two types of molecules. can anneal to form mixed dimeric circle. An enzyme purified from calf thymus, terminal deoxynucleotidyltransferase, provides the means by which the homopolymeric extensions can be synthesized, for if presented with a single deoxynucleotide triphosphate, it will repeatedly add nucleotides to the 3' OH termini of a population of DNA molecules.

27

Cutting and Joining DNA Molecules dGTP

3'OH

5'·· ..

b~ III ~ ~ III!

\

3,....T~9T T??5'

terminal transferase

Fig. 3;12. Tailing with terminal transferase.

DNA with exposed 3' OH groups, such as arising from pretreatment with phage')... exonuclease or restriction with an enzyme such as PstI, is a very good substrate for the transferase. However, conditions have been found in which the enzyme will extend even the shielded 3' OH of 5' cohesive termini generated by EcoRI. the terrniI'lal transferase reactions have been characterized in detail with regard to their use in gene manipulation. Typically, 10 - 40 homopolymeric residues are added to each end . . , The insertion of a piece of '). . DNA into SV40 viral DNA is one of the earliest examples of the ,construction of recombinant molecules. It made use of homopolymer tailing. In the experiments, the single-stranded gaps which remained in the two strands at each join were repaired in uitro with DNA polymerase and ' DNA ligase so as to produce covalently closed circular molecules. After this the recombinantS were transfected into susceptible mammalian cells. Subsequently, the homopolymer ' method, using either dA.dT or Dg.DC homopolymers, was used extensively to construct recombinant plasmids for cloriing in E. coli. In recent years, due to the availability of a much wider range of restriction endonucleases and other DNA-modifying enzymes, homopolymer has been replaced largely. However, for cDNAcloning, it is still important. Vector

Insert

:: 1-1------1

~: ~I.------..ol dGTP

I

~

Terminal transferase

+

Terminal transferase

IGGGGGGI

IGGGGI

I ___J

Iccccc

'---'- -

V+ \

dcTP

I

CCCCCC I

Fig. 3.13. Cloning using homopolymer tailing.

Joining Polymerase Chain Reaction (peRl Products

.

For cloning DNA fragments, most of the strategies do not work well witn PCR products. The reason for this ' is that the polymerases used in the peR have a terminal transferase activity. For example, the Taq polyffierase adds a single 3'A overhang to each end of the PCR product. Thus PCR produCts cannot be blunt-end-ligated unless the ends are first polished (blunted) . A DNA polymerase like Klenow can be used to fill in the ends. Alternatively, Pfu DNA polymeras¢ can be used to remove extended bases with its 3' to 5' exonuclease acvitity. However, even w.hen the PCR fragments are polished, blunt-end-ligating them into a vector still may be very inefficient One

28

Genetic Engineering

A solution to this problem is to use T/ A cloning. The PCR A fragment in this method is ligated to a vector DNA molecule peR product with 3' A overhang with a single 3' deoxythymidylate extension. Incorporation of Extra Sequence at the 5' End of a Primer into Amplified DNA A PCR primer may be designed which, in addition to the sequence required for hybridization with the input DNA, includes an extra sequence at its 5' end. The extra sequence does not participate in the first hybridization step-only the 3' portion of the primer hybridizes-but it subsequently Prepared cloning vector becomes incorporated into the amplified DNA fragment. Great flexibility is available here because the extra sequence can be chosen at the will of the experimenter. In this principle, a common application is the incorporation of restriction sites at each end of the amplified product. In order to ensure that the restriction sites are good substrates for the restriction endonucleases, four nucleotides Recombinant plasmid are placed between the hexanucleotide restriction sites and the extreme ends of the DNA. The incorporation of these Fig. 3.14. TA cloning. restriction sites provides one method for cloning amplified DNA fragments.

..

3'·OH

~

--

5' I .p I

Topoisomerase cuts one strand

topoisomerase

3'OH

~/. /

5' I

Second strand passed through gap

P I fopoisomerase

3'·OH

~t,. closed

II

p I topoisomerase

-------)(~------Fig. 3.15. Action of topolsQmerase:

Cutting and Joining DNA Molecules

29

Joining DNA Molecules without DNA Ligase Two separate protein components were required in all the cutting and joining reactions described above-a site-specific endonuclease and a DNA ligase. Shuman (1994) has described a novel approach to the synthesis of recombinant molecules in which a single enzyme, vaccinia DNA topoisomerase, both cleaves and rejoins DNA molecules. Placement of the CCCTT cleavage motif for vaccinia topoisomerase near the end of a duplex DNA permits efficient generation of a stable, highly recombinogenic. protein DNA adduct that can only relegate to acceptor DNAs that contain complementary single-strand extensions. Linear DNAs containing CCCTT cleavage sites at both ends can be activated by topoisomerase and inserted into a plasmid vector. Heyman et al. (1999) have used the properties of vaccinia topoisomerase to develop a ligasefree technology for the covalent joining of DNA fragments to plasmid vectors. Whereas joining molecules with DNA ligase requires an overnight incubation, topoisomerase-mediated ligation occurs in five minutes. The method is particularly suited to cloning PCR fragments. A linearized vector with single 3' T extensions is activated with the topoisomerase. On addition of the PCR product with 3' A overhangs, ligation is very rapid. The high substrate specificity of the enzyme in addition means that there is a low rate of formation of vectors without inserts.

4 CLONING VECTORS In general, genetic engineering is the science which deals with the transfer of DNA between hosts or species by in uitro manipulation of enzymes. This implies that the DNA to be transferred is duplicated with the selection and use of a suitable carrier molecule which is called vector, and a living system or host in which the vector can be propagated. In E. coli or any other host cell, most of the DNA fragments are not capable of self replication. Thus, an additional DNA segment which is capable of autonomous replication must be linked to the fragments which are to be cloned. This autonomously replicating fragment is called cloning vector. Most cloning vectors are originally derived from naturally occurring extra-chromosomal elements such as bacteriophage and plasmids.

Properties of a Good Vector A good vector must have the following properties: 1. It should be able to replicate autonomously. 2. It should be easy to isolate and pUrify. 3. It should be easily introduced into the host cells. 4. The vector should have suitable marker genes that allow easy detection and/or selection of the transformed host cells. 5. When the objective is gene transfer, it should have the ability to integrate either itself or the DNA insert it carries into the genome of the host cell. 6. The cells transformed with the vector molecules containing the DNA insert (recombinant or chimaeric vector) should be identifiable or selectable from those transformed by the vector molecules' only. 7. A vector should contain unique target sites for as may restriction enzymes as possible into which the DNA insert can be integrated without disrupting an essential function. 8. When expression of the DNA insert is desired, the vector should contain atleast suitable control elements, e.g., promoter, operator and ribosome binding sites; several other featur'es may also be important. It should be kept in mind that the DNA molecules used as vectors have coevolved witn their specific natural host species, and hence are adapted to function well in them and in their closely related species. Therefore, the choice of vector depends largely on the host 'species into which the DNA insert or gene is to be cloned. Properties of a Good Host A good host should have the follOWing features: (i) It is easy to transform, (ii) It supports the replication of recombinant DNA, 30

31

Cloning Vectors

{iii} It is free from elements that interfere with replication of recombinant DNA, {iv} It lacks active restriction enzymes, e.g., E. coli K12 substrain HB 101, {v} It does not have methylases since these enzymes would methylate the replicated recombinant DNA which, as a result, would become resistant to useful restriction enzymes, and {vi} It is deficient in normal recombination function so that the DNA insert is not altered by recombination events. PLASMID VECTORS

A plasmid is a DNA molecule which can replicate and transmit independently. Plasmid is extensively used as cloning vehicle. Each plasmid is a replicon, which is stably inherited in an extrachromosomal state. Plasmids are by far the most Supercoiled extensively used, versatile, and easily manipulated vectors. DNA They are the work-horses of the molecular biology laboratory. They are naturally bccurring extraDNA gyrase chromosomal DNA molecules, which aH~ circular, doubleEndonuclease stranded and supercoiled. Plasmids occur widely in Topo nature, and are present in most bacterial species. They isomerase are widely distributed throughout the prokaryotes. Their molecular weight differs from less than 1 x 106 daItons to greater than 200 x 106 daltons, and are generally dispensable. Size may vary from a few thousand base pairs up to several hundred kilobases. Generally plasmids used as gene cloning vectors are usually small (generally Endonuclease in the rage of 2-5 kb). Most of the commonly used • ones are based on (or are closely related to) a naturally occurring E. coli plasmid called CoiEl. piasmids to which DNA ligase phenotypic traits have not yet been ascribed are called cryptic plasmids. Open, circular DNA All plasmids are not circular molecules. Linear Relaxed, covalently plasmids are also found in a variety of bacteria, e.g. closed circular DNA Streptomyces sp. The ends of linear plasmids should be Fig. 4.1. The interconuersion of supercoiled, relaxed coualently closed circular DNA protected to prevent nuclease digestion. There are two and open circular DNA general methods used for this purpose. (i) Thete are repeated sequences ending in a terminal DNA hairpin loop (Borrelia). (ii) The ends are protected by covalent attachment of a protein (Streptomyces). Spreading of antibiotic resistance genes is the most notorious property of plasmids. They are responsible for spreading antibiotic resistance, to a large extent-although it should be noted that the plasmids used for gene cloning are nearly always unable to spread from one bacterium to another, and there are restrictions on experimental protocols to make sure that these experiments do not add new antibiotic resistance genes to clinically important pathogenic bacteria. Antibiotic resistance is not the limit of the ability of plasmids, nor the reason for their existence. Because of its importance in medical microbiology, and the ease with which they can be studied, interest is always on antibiotic resistance. However, many naturally occurring plasmids code for other properties, or even for none at all. A transfer gene called "tra gene" which is responsible for promoting bacterial conjugation divides plasmids into 2 types (il conjugative plasmids which have tra genes and (ii) non conjugative which do not have tra genes. Plasmids can also be classified on the basis of their being maintained as multiple copies per cell (relaxed plasmids) or as a limited number of copies per cell (stringent plasmids). In general, conjugative plasmids have high molecular weight, whereas non-conjugative plasmids are of low molecular weight.

32

Genetic Engineering

Host Range of Plasmids Very few of the proteins are encoded by the plasmids, needed for their own replication and in many cases encode only one of them. All the other proteins required for replication, e.g. DNA polymerases, DNA ligase, helicases, etc., are provided by the host cell. Those replication proteins which are plasmid-encoded are found very close to the ori (origin of replication) sequences on which they act. Thus, only a small region surrounding the ori site is needed for replication. Other parts of the plasmid- can be deleted and foreign sequences can be added to the plasmid and replication will still occur. The construction of versatile cloning vectors has become greatly simplified by this feature of plasmids. By looking at the ori region, the host range of a plasmid is easily known. Plasmids whose ori region is derived from plasmid ColEl have a limited host range: they only replicate in enteric bacteria, such as E. coli, Salmonella, etc. Other promiscuous plasmids have a broad h9st range and these include RP4 and RSFI0~O. Plasmids of the-RP4 type will replicate in most gram-negative bacteria, to which they are readily tr.msmitted by conjugation. Such promiscuous plasmids offer the potential of readily transferring cloned DNA molecules into a variety of genetic backgrounds. Plasmids like RSFI0I0 are not conjugative but can be transformed into a wide range of gram-negative and grampositive bacteria, where they are stably maintained. Most of the plasmids separated from Staphylococcus aureus possess a wide host range and can replicate in many other gram-positive bacteria. Plasmids which have wide host range encode most, if not all, of the proteins required for replication. They should also be able to express these genes and thus their promoters and ribosome binding sites must have evolved in such a way that they can be recognized in a diversity of bacterial families. Plasmid replication This ability of plasmids to be replicated is used in genetic engineering because it makes it possible for insert pieces of DNA, which are then copied as part of the plasmid, and hence passed on to the offspring when the cell replicates. The most fundamental property of a plasmid is the ability to replicate in the host bacterium. Most, or all, of the enzymes and other products required for this replication are already present in the host cell; the amount of information that the plasmid has to supply may be only a few hundred base pairs. This region of the plasmid which is crucial for replication is generally called the origin of replication, although the site at which replication starts, is one specific base. Plasmids that use the origin of replication from ColE 1 or its relatives are multi-copy plasmids. Wild-type ColEl is present in about 15 copies per cell, while most of the engineered vectors used today are present in numbers running into many hundreds of copies per cell. This is convenient as it makes it easier to purify large amounts of the plasmid, and if we want to express a cloned gene, we also get a gene dosage effect. In the cell, the presence of so many copies of the gene is reflected in higher levels of the product of that gene. However, this may also be as a drawback. The presence of large amount of plasmid DNA may make the cell develop more slowly even without expression of the cloned gene. If the gene or its product is in any way harmful to the bacterium, it becomes very difficult to separate the required clone. For some defined purposes, sometimes, it is advisable to use alternative vectors that exist at low copy number. There are various plasmids which can replicate in a wide range of bacterial species (broad host range plasmids), but most of those which are used for gene cloning are rather more limited in their host range. In one way, this is useful: if there is any question about potential health hazard or environmental consequences associated with cloning a specific fragment of DNA, use of a narrow host range plasmid makes it very unlikely that the gene will be transmitted to other organisms. Genetic manipulation could be carried out in a bacterium other than E. coli, especially if we want to study the behaviour of specific bacteria. It will then usually be necessary to separate or grow new vector plasmids, based on a replication origin that is functional in the species selected_ The host

Cloning Vectors

33

range of the new vector will probably also be limited, and it may well be unable to replicate in E. coli. This is a disadvantage, because we are likely to want to use E. coli as an intermediate host for the initial cloning and for studying the structure and behaviour of the gene to be cloned. However, it is possible to insert two origins of replication into plasmid, so that it will be replicated in BamHI E. coli using one origin, and in our selected host using the alternative replication origin. This type of a vector BamHI BamHI is called a shuttle plasmid, because I I of its transferring power between the DNA fragment two species. In order to transfer cloned to be cloned Cut vector with BamHI genes between E. coli and a eukaryotic organism shuttle vectors can also be Mix with fragment to be cloned and ligate used. Therefore, the first crucial characteristic of a plasmid cloning vector is the origin of replication, usually designated as ori in plasmid maps.

Cloning sites

BamHI Cloning site is the another unique restriction site where enzyme concerned Cloned Recombinant cut the plasmid once only. To convert fragment plasmid a circular molecule into a linear molecule, it is broken at one point and BamHI it is relatively simple to join the ends together to reform a complete circle. If an enzyme cuts more than once, the plasmid will be cut into two or more Fig. 4.2. Cloning with a plasmid vector: bla= beta-19ctamase pieces, and joining them up again to (ampicillin resis'tance) selective marker; ori = origin of make an intact plasmid will be much replication. more difficult. A basic plasmid used as a cloning vector may only be provided with one or two such unique restriction sites, which must of course be located in a region of the plasmid that is not essential for replication or any other required functions. With such a plasmid not only in our choice of restriction fragments that can be inserted and In in the position of insertion gets limited, but there Sphl is also in the number of different fragments that Pst! Sa!! can be inserted. This is because in most cases Xbal BamHI ligation of two restriction fragments, produced Smal with the same enzyme, recreates the original Kpnl Sst! restriction site. ThuS, whereas a BamHI fragment is inserted into a site on the vector that has been cut with &mHI, the resulting recombinant bal = beta-Iactamase (ampicillin resistance); selective marker plasmid will have two BamHI sites: one at each ori =origin of replication end of the inserted fragment. This is the basis lacZ' = beta-galactosidase (partial gene) of a common test for the presence of such an lacl = repressor of lac promoter insert. Digesting the supposed recombinant Fig. 4.3. Structure of the plasmid cloning vector pUC18. plasmid with, in this case BamHI, will release a

34

Genetic Engineering

DNA fragment that should be the size of the insert that has been tried to be cloned. However, it would be difficult to clone further BatnHI fragments into it since the recombinant plasmid has now two BamHI sites. Recombinant The main problem here is that in many pUC18 plasmid cases it is not at all required to insert several fragments into the same plasmid. It may be desired to combine the expression signals of one gene with the coding region of another, or we may want to insert additional markers that can be used to identify the presence of The lacl gene has been disrupted by insertion of a DNA fragment, the plasmid. Or it would be needed, as resulting in white colonies on X-gal plates described above, want to insert the replication origin from ano:her plasmid so bla = beta-Iactamase (ampicillin resistance); selective marker as to create a shuttle vector, and still leave ori =origin of replication sites available for further inserts. The best lacl' =beta-galactosidase (partial gene) lacl = repressor of lac promoter way to solve this problem is to create a Fig. 4.4. Use of the plasmid cloning vector pUC18. multiple cloning site (MC5), i.e., a small DNA region that contains recognition sites for different enzymes. This is done by synthesizing a short piece of DNA with the required restriction sites, and inserting that into the plasmid in the usual way.

Selectable markers 50 we have a plasmid with a replication origin and one or more restriction sites. One further feature is essential for a functionally useful vector, and that is a selectable marker. The need for this arises from the inefficiency, both of ligation and of bacterial transformation. Even with the high efficiency systems that are now available for E. coli, the best yield available, using native plasmid DNA, implies that only about one per cent of the bacterial cells actually take up the DNA. In practice the yields are likely to be lower than this-and if we are using a host other than E. coli, many orders of magnitude lower. In order to be able to recover the transformed clones, it is necessary to be able to prevent the non-transformed cells from growing. On the plasmid vector, the presence of an antibiotic resistance gene means that transformation mix can be simply pointed out on an agar plate containing the relevant antibiotic, and only the transformants will be able to grow.

Insertional inactivation . The multiple cloning site is found near the 5' end of a ~-galactosidase gene (lacZ). The formation of synthetic oligonucleotide that creates the multiple cloning site is such that it did not affect the reading frame of the lacZ gene, it merely results in the production of ~-galactosidase with some additional amino acids near to the amino terminus of the protein. This does not affect the function of the enzyme; it is still able to hydrolyse lactose. More accurately, we should say that pUCl8 carries a part of the lacZ gene; we use E. coli strains that carry the remainder of the gene. The product of the host gene is unable to hydrolyse lactose by itself, and so the host strain without the plasmid is Lac-, i.e., it does not ferment lactose. When pUCl8 is inserted into the host, the plasmid-encoded polypeptide will associate with the host product to form a functional enzyme. We say that pUC18 is capable of complementing the host defect in lacZ. We can easily detect the activity of the p-galactosidase by plating the organism onto agar containing the chromogenic substrate 5-bromo-4-chlor03-indolyl-p-D-galactopyranoside (universally known as Xga/), together with the indUcing agent isopropyl thiogalactoside (lPTG). The X-gal substrate is colourless but the action of ~-galactosidase releases the dye moiety, prodUcing a deep blue colour. Colonies

Cloning Vectors

35

carrying pUC18 are therefore blue when grown on this medium. However, if we successfuly insert a DNA fragment at the cloning site, the gene will (normally) be disrupted and the resulting E. coli colonies are referred to as 'white'. The advantage of this insertional inactivation is that we cannot only tell that the cells have been transformed with the plasmid, but also that the plasmid is a recombinant, and not merely the original pUCl8 self-ligated. An insertional inactivation marker such as this is not an essential feature of a cloning vector but it does provide a useful way of monitoring the success of the ligation strategy and overcoming some of the problems. If the insert is relatively small, and if it happens to consist of a multiple of three bases, transcription and translation of the lacZ gene may still occur and the enzyme may still have enough activity (despite the addition of still more amino acids at the N terminus) to produce a detectable blue colour. Conversely, a white colony is not a guarantee of cloning success as the deletion of even a single base at the cloning site, or the insertion of undesirable junk fragments (in other than multiples of three bases), will put the lacZ gene in the wrong reading frame and thus inactivate it. The advantage of plasmid vectors is that they are small, simple, universal and easy to manipulate. Plasmid vectors could be produced and used for a wide range of organisms without extensive knowledge of the molecular biology of the host or the vector. On the other hand, the basic plasmid vectors that we have been studying so far are limited in their cloning capacity, i.e., the size of the insert they can accommodate. We will also study at other vectors that will accommodate larger inserts, but first we need to consider the ways in which we can introduce the recombinant plasm ids into host bacterial cells. Bacterial Transformation The experiment of bacterial transformation is based on the natural ability of the pneumococcus to take up 'naked' DNA from its surroundings. This is called competence. In some bacterial species, competence develops naturally, but although there are many species that exhibit natural competence, it is still too limited in scope to be of much use for genetic engineering. E. coli does not seem to show natural competence. Therefore, it became necessary to develop alternative ways of introdUcing plasmid DNA into bacterial cells. Although these methods are radically different, but still they are referred to as transformation, which is defined as the uptake of naked DNA, in order to differentiate it from other methods of horizontal gene transfer, namely conjugation (direct transfer by cell to cell contact) and transduction (which is mediated by bacteriophage infection). Competence in E. coli cells could be produced by washing them with ice-cold calcium chloride, followed by adding the plasmid DNA and subjecting the mixture to a brief, mild heat shock for 2 min at 42°C. In order to allow the bacteria to recover and to express the resistance marker introduced on the plasmid, before plating them onto a selective medium containing the appropriate antibiotic, it is necessary to dilute the cells in growth medium. Although this was a major step but it was very ineffecient. Improvements have been made in the transformation process gradually over the years, both by modifying the preparation of competent cells (e.g. using salts other than calcium chloride) and also by selecting E. coli strains with mutations that make them easier to transform. It is now possible to obtain transformation frequencies in excess of 109 transformants per jJ.g of plasmid DNA. Where such high yields are required, it is cost-effective to purchase pre-prepared competent cells of a strain with high transformation efficiency. It should be noted that although transformation frequencies are generally quoted in these telms (number of transformants per jJ.g of DNA), there is no linear relationship between the number of transformants and the amount of DNA used. Transformation works best with low levels of DNA, and the efficiency with which bacterial cells take up DNA, falls off as the concentration of DNA is increased. If the amount of DNA is increased too much, it may even result in decrease not just efficiency but the actual numbers of transformants. Ligation works best with high concentrations of DNA. The following step, transformation, works best with small amounts of DNA. The resolution is clear, although unpalatable: use only a small

36

Genetic Engineering

proportion of your ligation mix in the transformation step. If you really need very large numbers of transformants, scaling up the transformation step does not work very well-it is usually much better to carry out several separate small-scale transformations. Transformation which depends upon induced competence and heat shock could be used for bacterial species other than E. coli, but all the advantages that have been obtained by optimization of transformation conditions for selected strains of E. coli could be lost immediately. Therefore, transformation is likely to be very inefficient at its best. Therefore, those laboratories which are interested in manipulating other bacterial species have had to develop alternative methods of transformation.

Electroporation Electroporation is the most versatile transformation procedure. To remove electrolytes from the growth the medium, bacterial cells, washed with water, are mixed with DNA and given a brief pulse of high-voltage electricity. Through this temporary holes are produced in the cell envelope through which the DNA can enter. The cells are then diluted into a recovery medium before plating on a selective medium in the same way as above, although it is comparatively easy to obtain some transformants with a wide range of bacteria. There are several parameters under which the cells are grown, for example, the temperature of the suspension, the duration and voltage of the electric pulse etc. which needs to be adjusted to obtain optimum performance. The effect is not DNA specific nor directional specific. So material in the cell can also diffuse out as well, and the procedure can be used for isolating plasmid DNA from bacterial cells. If follows from this that, since the plasmid that comes out of one cell can enter another one, simply by applying it to mixture of the two strains, electroporation can be used to transfer plasmids from one strain to another. Incompatibility of Plasmids It can be defined as the inability of two different plasmids to co-exist in the same cell in the absence of selection pressure. The term incompatibility can only be used when it is certain that entry of the second plasmid has taken place and that DNA restriction is not involved. Plasmids will be incompatible if they have the same mechanism of replication control. Not surprisingly, by changing the sequence of the RNA I/RNA II region of plasmids with antisense control of copy number, it is possible to change their incompatibility group. They will be incompatible if they share the same par region alternatively. PURIFICATION OF PlASMID

DNA

For cloning in plasmids, first step is the purification of the plasmid DNA. The trickiest stage in the purification of plasmid DNA is the lysis of the host cells; both incomplete lysis and total dissolution of the cells result in greatly reduced amount of plasmid DNA. The ideal situation occurs when each .cellisjust sufficiently broken to permit the plasmid DNA to escape without too much contaminating chromosomal DNA. After the gentle procedure of the lysis, most of the chromosomal DNA released will be of high molecular weight and can be isolated, along with cell debris, by high-speed centrifugation to yield a cleared lysate. The production of satisfactory cleared Iysates from bacteria other than E. coli, particularly if large plasmids are to be separated, is frequently a combination of skill, luck and patience. Out of several methods available for isolating pure plasmid DNA from cleared lysates only two will be described here. The first of these is the 'classical' method and is attributed Vinograd. This method involves isopycnic centrifugation of cleared Iysates in a solution of CsCI containing ethidium bromide (EtBr). EtBr binds by intercalating between the DNA base pairs, and in so doing, causes the DNA to unwind. A CCC DNA molecule, such as a plasmid, has no free ends and can only. unwind to a limited extent, thus limiting the amount of EtBr bound. A linear DNA molecule, such as fragmented chromosomal DNA, has no such topological constraints and can therefore bind more of the EtBr

37

Cloning Vectors

molecules. Because the density of the DNA-EtBr complex decreases as more EtBr is bound, and because more EtBrs can be bound to a linear molecule than to a covalent circle, the covalent circle has a higher density at saturating concentrations of EtBr. Thus covalent circles (Le. plasmids) can be separted from linear chromosomal DNA. The best method of extracting and purifying plasmid DNA is given by Birnboim and Dol~' (1979). This method makes use of the observation that there is a narrow range of pH (12 .0-12 .5) within which denaturation of linear DNA (but not covalently closed circular DNA) occurs. Plasmidcontaining cells are treated with lysozyme to loosen the cell wall and then lysed with sodium hydroxide and sodium dodecyl sulphate (SDS). Chromosomal DNA remains in a high-molecular-weight form but is denatured. The chromosomal DNA renatures and aggregates to form an insoluble network when neutralized with acidic sodium acetate. Simultaneously, the high concentration of sodium acetate cau~s precipitation of protein-SDS complexes and of high-molecular-weight RNA. If the pH -of the alkaline denaturation step has been carefully controlled, the CCC plasmid DNA molecules will remain in a native state and in solution, while the contaminating -macromolecules co-precipitate. The precipitate can be removed by centrifugation and the plasmid concentrated by ethanol precipitation. The plasmid DNA can further be purified by gei filteration if required. To improve the efficiency and purity of plasmid DNA, various kits are available in the market. All of them take advantage of the benefits of alkaline lysis and have as their starting-point the cleared lysate. The plasmid DNA is selectively bound to an ion-exchange material, pre packed in columns or tubes, in the presence of a chaotropic agent (e.g. guanidinium hydrochlOride). The purified plasmid is eluted in a small volume after washing away the contaminants. Various factors affecting the yield of plasmids have been studied. The first of these is the actual copy number inside the cells at the time of harvest. The copy-number control systems described earlier are not the only factors affecting production. The copy number is also affected by the growth medium, the stage of growth and the genotype of the host cell. The care in making the cleared lysate is the second and most important factor. Unfortunately, the commercially available kits have not removed the vagaries of this procedure. Finally, the presence in the host cell of a wild-type endA gene can affect the recovery of plasmid. The product of the endA gene is endonuclease I, a periplasmic protein whose substrate is double-stranded DNA. The function of endonuclease I is not fully understood. Other than improved stability and yield of plasmid obtained from them, strains bearing DNA mutations have no obvious phenotype. Although most cloning vehicles are of low molecular weight, it is sometimes essential to use the much larger conjugative plasmids. Although these high molecular weight plasmids can be isolated by the methods just described, however, the yields are often very low. Either there is not enough release of the plasmids from the cells due to their size or there is physical destruction caused by shear forces during the various manipulative steps. A number of alternative procedures have been described, many of which are generally based on that of Eckhardt (1978) . Bacteria are suspended in a mixture of Ficoll and lysozyme and this results in a weakening of the cell walls. The samples are then placed in the slots of an agarose gel, where the cells are lysed by the addition of detergent. The plasmids are subsequently ex!racted from the gel following electrophoresis. Extraction of the plasmid from the gel is facilitated by the use of agarose since it melts at low temperature. DESIRABLE PROPERTIES OF PLASMID CLONING VEHICLES

A good cloning vehicle should be having the following three properties: (a) low molecular weight; (b) ability to confer readily selectable phenotypic traits on host cells; and (c) single sites for a large number of restriction endonucleases, preferably in genes with a readily . scorable phenotype.

38

Genetic Engineering

There are many advantages of having low molecular weight. First, the plasmid is much easier to to manipulate, i.e., it is more resistant to damage by shearing, and is readily separated from host cells. Secondly, those low-molecular-weight plasmids are usually present as muluple copies, and this not only facilitates their separation but leads to gene dosage effects for all cloned genes. Finally, with a low molecular weight, there is less chance that the vector'will have multiple substrate sites for any restriction endonuclease. ' The resulting chimeric molecules which we obtain have to be transformed into a suitable recipient after a piece of foreign DNA is inserted into a vector. Since the efficiency of transformation is low, so it becomes essential so that the chimeras should have some readily scorable phenotype. This generally results from some gene carried on ' the vector, but could also be produced by a gene carried on the inserted DNA. The first step in cloning is to cleave both the vector DNA and the DNA which is to be inserted with same endonucleases. If the vector has more than one site for the endonuclease, more than one fragment will be produced. When the two samples of cleaved DNA are subsequently mixed and ligated, the resulting chimeras will, in all probability, lack one of the vector fragments . It is beneficial if insertion of foreign DNA at endonuclease-sensitive sites inactivates a gene whose phenotype is readily scorable, for in this'way, it is possible to distinguish chimeras from cleaved plasmid molecules which have self-annealed. If the vector and insert are to be joined by the homopolymer tailing method or if the insert confers a new phenotype on host cells, then there is no requirement of readily detectable insertional inactivation.

pBR322, a Purpose-built Cloning Vehicle In early cloning experiments, the natural plasmids were used as cloning vectors such as ColEl and pSCIOl. These plasmids are small and have single sites for the common restriction endonucleases. They have limited genetic markers for selecting transformants. So construction of better cloning vectors in vitro required considerable efforts. The best, and most widely used of these early purposebuilt vectors is pBR322. Plasmid pBR322 contains the ApR and TcR genes of RSF2124 and pSCIOl, respectively, combined with replication elements of pMBl, a ColEl-like plasmid. (a) 4361/1

pMB1-derived material

FIg. 4.5. The origins of plasmid pBR322. (a) The boundaries between the pSCI01 , pMBl and RSF2124-derlued material. The numbers Indicate the positions of the junctions In base pairs from the unique EcoR! site. (b) The molecular origins of plasmid pBR322.

Cloning Vectors

39

Plasmid pBR322 has been completely sequenced. The original published sequence was 4362 bp long. Position 0 of the sequence was arbitrarily set between the A and T residues of the EcoRI recognition sequence (GAATTC). The sequence was revised by the inclusion of an additional CG base pair at position 526, thus increasing the size of the plasmid to 4363 bp. Watson (1988) has revised the size yet again, this time to 4361 bp, by removing base pairs at coordinates 1893 and 1915. The most useful aspect of the DNA sequence is that it totally characterizes pBR322 in terms of its restriction sites, such that the exact length of every fragment can be calculated. These fragments can serve as DNA markers for sizing any other DNA fragment in the range of several base pairs up to the entire length of the plasmid. There are more than 40 enzymes with unique cleavage sites on the pBR322 genome. The target sites of 11 of these enzymes lie within the Tc R gene, and there are sites for a further two within the promoter of that gene. For six enzymes within the ApR gene, there are unique sites. Thus, cloning in pBR322 with the aid of anyone of those 19 enzymes will result in insertional inactivation of either the ApR or the TCR markers. However, because neither of the antibiotic resistance determinants is inactivated, cloning in the other unique sites does not permit the easy selection of recombinants. Following manipulation in vitro, E. coli cells transformed with plasmids inserts in the Tc R gene can be distingUished from those cells transformed with recircularized vector. The former are ApR and Tcs, whereas the latter are both ApR and TcR. In practice, transformants are selected on the basis of their Ap resistance and then replica-plated on to Tc-containing media to identify those that are Tcs. Cells transformed with pBR322 derivatives carrying inserts in the ApR gene can be identified more readily. Detection is done on the basis of the ability of the ~-Iactamase produced by ApR cells to convert penicillin to penicilloic acid, which in turn binds iodine. Transformants are selected on rich medium containing soluble starch and Tc. When colonized plates are flooded with an indicator solution of iodine and penicillin, ~-lactamase-producing (ApR) colonies clear the indicator solution whereas APS colonies do not. In the ApR gene, the PstI site is particularly useful, because the 3' tetranucleotide extensions formed on digestion are ideal substrates for terminal transferase. Thus, for cloning by the homopolymer tailing, this site is excellent. The PstI site is regenerated and the insert maybe cut out with that enzyme if oligo(dG.dC} tailing is used. Plasmid pBR322 has been a widely used cloning vehicle and also, it has been widely used as a model system for the study of prokaryotic transcription and translation, as well as investigation of the effects of topological changes on DNA conformation. The popularity of pBR322 is a direct result of the availability of an extensive body of information on its structure and function. This in turn is increased with each new study.

Example of the use of plasmid pBR322 as a vector: Isolation of DNA fragments which carry promoters Cloning into the HindIII site of pBR322 generally results in loss of tetracycline resistance. However, in some recombinants, Tc R is retained or even inhanced. This is because the HindIII site lies within the promoter rather than the coding sequence. Thus, whether or not insertional inactivation occurs depends on whether the cloned DNA carries a promoter-like sequence able to initiate transcription of the Tc R gene. Widera has used this technique to search for promoter-containing fragments. Within E. coli promoters, four structural domains can be recognized which are: (a). position 1, the purine initiation nucleotide from which RNA synthesis begins; (b) position -6 to -12, the Pribnow box; (c) the region around base pair -35; and (d) the sequence between base pairs -12 and -35.

40

Genetic Enginet&ing BsnB 12683 EcoO 10912674 Aat 112617 BcuV 12542 Ssp I 2501----::I~ Eci57 I 2381 Acl12297 Xmn 12294

Nar I· Kas I·Slo I 235 Bgi I 245 Fsp 1256 Pvu 1276 "v~ '''--Pvu II 306 polylinker cloning SIIes 'ti-------396·454

Pvu 11628 Tfi 1641 Sap 1683

BsrD 11935 Ack 11924 Fsp 11919 Ava 111837 Bgl11813 Bpm 11784 BsrF 11779 Bsa 11766 BsrD 11753

Fig. 4.6. Genetic maps of some pUC plasmids.

Improved vectors derived from pBR322 Numerous different derivatives of pBR322 have been produced over the years, most of which serve special-purpose cloning needs. Balbas et. al. (1986) gave a compilation of the properties of some of these plasmids. Much of the early work on the improvement of pBR322 focussed on the insertion of additional unique restriction sites and selectable markers, e.g., pBR325 encodes chloramphenicol resistance in addition to ampicillin and tetracycline resistance and has a unique EcoRI site in the Cm R gene. Initially, each new vector was constructed in a series of steps analogous to those used in the generation of pBR322 itself. Then the construction of improved vectors was simplified by the use of polylinkers or multiple cloning sites (MCS) , as exemplified by the pUC vectors. An MCS is a short DNA sequence, 2.8 kb in the case of pUCI9, having sites for many different restriction endonucleases. An MCS enhances the chances of potential cloning strategies available by extending the range of enzymes that can be used to produce a restriction fragment suitable for cloning. By combining them within an MCS, the sites are made contiguous, so that any two sites within it can be cleaved simultaneously without excising vector sequences. The pUC vectors also incorporate a DNA sequence that permits rapid visual detection of an insert. The MCS is inserted into the lacZ' sequence, which encodes the promoter and the a-peptide of ~-galactosidase. The insertion of the MCS into the lacZ' fragment does not affect the ability of the a-peptide to mediate complementation, but cloning DNA fragments into the MCS does. Therefore, recombinants can be detected by blue/white screening on the growth medium containing Xgal. In between the initiator ATG codon and codon 7, there is a usual site for the insertion of the MCS. This is an area which encodes a functionally non-essential part of the a-complementation peptide. Recently, Slilaty and Lebel (1998) have reported that blue/white colour selection can be variable. They observed that only when the insert is made between codons 11 and 36, reliable inactivation of complementation occurs.

41

Clonrng Vectors VECTORS BASED ON THE I...AMBDA BACTERIOPHAGE

Lambda Biology Whenever cloning is made for relatively small fragments of DNA, the plasmid vectors give their best results. Although there is probably no fixed limit to the size of a DNA fragment that can be inserted into a plasmid, the recombinant plasmid may become rather less stable with larger DNA inserts, the efficiency of transformation is decreased, and the plasmid will give a much smaller production when grown and purified in E coli. Vectors based on bacteriophage lambda allow efficient cloning of bigger fragments, which is important in constructing gene libraries. The larger the inserts, the fewer clones you have to screen to find the one you want. In the construction of a gene library, lambda vector also proves to be beneficial. By using bacteriophage vectors, it is much easier to screen large libraries. In comparison to those obtained with bacterial colonies, the results with bacteriophage plaques are much cleaner. Some knowledge of the basic biology of bacteriophage lambda is necessary for understanding the nature and use of lambda cloning vectors. Lysogeny Lambda serves as a temperate bacteriophage, i.e., on infection of E. coli, it may enter a productive lytic cycle, resulting in lysis of the cell and liberation of a number of phage particles, or it may enter a more or less stable relationship with the host known as lysogeny. Expression of almost all the phage genes in the I I lysogenic state is switched off I I by the action of a phageLysogeny Growth and division II encoded repressor protein, the I of lysogen product of the cl gene. During I I the establishment of lysogeny, I Lyoogon [ Prophag. d1J) I the expression of this gene I I needs two other genes, ell and I cIlI. The proportion of infected -:I I . I cells going down each route is : Failure of influenced by environmental : repression I conditions, as well as by the I I genetic formation of the phage I and the host. Some phage I I mutants will only produce lytic I I I infection, and these give rise to I I I clear plaques, while the wildtype phage produces turbid r plaques due to the presence of Phage lysogens, which are resistant to replication further attack by lambda phage (known as superinfection immunity). On the other hand, Fig. 4.7. LytiC cycle and lysogeny. some bacterial host strains carrying a mutation knOI,lJn as hll (high frequency of lysogenization) forms a much higher proportion of lysogens when infected with wild-type lambda, which can be useful if we want a more stably altered host strain, for example, if we are studying the expression of genes carried by the phage. While using lambda vectors, the recombinant phage carrying the cloned genes is more preferred, and in such cases, the lytic cycle is the more relevant one. Although the lysogenic state is relatively stable, that stability is not absolute. Due to a low level of spontaneous failure of the repression mechanism, a culture of a bacterial lysogen will normdlh:

[

I

I

I

......

~....

J. \

1

I

I

I

I

I

I I

42

Genetic Engineering

contain phage particles in the supernatant. This rate of breakdown of repression can be increased by treating the culture with agents that damage the DNA, such as UV irradiation; the DNA damage induces the production of repair enzymes which amongst other things destroy the el repressor protein, allowing initiation of the lytic cycle. Some widely used lambda vectors carry a mutation in the el gene, which makes the protein more temperature-sensitive (c1857 mutation). At a reduced temperature, a lysogen carrying such a mutant phage can be grown as a lysogen. Thus, due to inactivation of the repressor protein, the lytic cycle will be induced by raising the temperature, at a reduced temperature. In the lysogenic state, lambda is normally integrated into the bacterial chromosome, by sitespecific recombination at a specific position, and is therefore replicated as part of the bacterial DNA. Induction of the lysogen requires excision from the chromosome. However, this integration, although common amongst temperate phages, is not an essential feature of lysogeny. Lambda can continue to // replicate in an extrachromosomal, plasmid-like state; with some bacterio- 5' Ir;;QGGC;;;;;;:;;:GGC=GA=c;;::;c;;;T------IIlf-----..., phages (including PI, which we will I 7 ,F'====~====."., encounter later in this chapter), this \ 71 is the normal mode of replication in lysogeny. Particles of wild-type bacteriophage lambda have a double-stranded linear DNA genome of 48514 base Cohesive ends: pairs, in which the 12 bases at each annealing produces a end are unpaired but complementary. circular structure Therefore, these ends are 'sticky' or 'cohesive', much like the ends of Fig. 4.8. Cohesive ends of lambda DNA. many restriction fragments-but the longer length of these sticky ends makes the pairing much more stable, even at 37°C. The ends can be isolated by heating lambda DNA, and if it is then cooled rapidly, we will get linear monomeric lambda DNA. The ends of the molecule will move slowly at low temperature, and therefore, the re-annealing of the sticky ends will take a long time. Eventually, however, it will resume a circular (although not covalently joined) structure. When lambda infects a bacterial cell and injects its DNA into the cell, it will therefore form a circular structure, with the nicks being repaired In uluo by bacterial DNA ligase. At about this time, a complex series of events occur that affect subsequent gene expression, determining whether the phage enters the lytic cycle or establishes lysogeny. It is not required to consider the details of the lyticlysogenic decision, except to emphasize that it is essentially irreversible so that once started on one or the other route, the phage is committed to that process. However, the events in the lytic cycle should.

Lytic cycle Initially this circular DNA structure is replicated, in a plasmid-like manner (theta replication), to produce more circular DNA in the lytic cycle. Eventually, however, replication switches to an alternative mode (rolling-circle replication) which produces a long linear DNA molecule having a large number of copies of the lambda genome joined end to end in a continuous structure. While all this is going on, the genes carried by the phage are being expressed to produce the components of the phage particle. First of all these proteins are assembled into two separate structures: an empty precursor structure called head into which the DNA will be inserted, and the tail which will be joined to the head after the DNA has been packaged. The packaging process includes enzymes recognizing specific sites on the multiple-length DNA molecule generated by rolling-circle replication, and making asymmetric cuts in the DNA at these

Clonlllg Vectors

43

positions. In DNA. these staggered breaks give rise to the cohesive ends seen in the mature phage DNA; these sites are known as cohesive end sites (cos sites). Accompanying these cleavages, the area of DNA between two cos sitesLinear DNA, with sticky ends, in phage particle representing a unit length of the lambda genome-is wound tightly into the phage head. This process is called packaging. Injection into ~ cell followed Following successful packaging of the DNA by circularization into the phage head, the tail is added to produce the mature phage particle, which is eventually released when the cell lyses. Lysis of the bacterial cell is largely done DNA nicks sealed in vivo by ligase through the action of a phage-encoded protein, the product of gene S mutations in this gene can cause a delay or failure of lysis-which can be advantageous in increasing the yield of bacteriophages, as the replication of the phage will therefore continue for a longer duration instead of being interrupted by lysis of the host cell. Many lambda vectors have such a mutation. In this process, one of the most important feature.s is that the length of DNA that will be packaged into the phage head is determined by the gap between two cos sites. If we insert a piece of DNA into Rolling-circle replication our lambda vector, gap would be increased and so the amount of DNA to be packaged will be bigger. However, the head is a fixed cos cos size, and can only accommodate a certain 14 ~I amount of DNA. It can take somewhat Length of DNA packaged more than is present in wild-type lambda (up to about 51 kb altogether, which is Fig. 4.9. Replication of bacteriophage lambda DNA. about 5 per cent more than wild-type). As one of the reasons for using lambda is to be able to clone large pieces of DNA, this would be a serious limitation. The way around it is. to delete some of the DNA that is normally present. This is possible, because the lambda genome contains a number of genes that are not absolutely necessary-especially if we require only lytic growth. when we can delete any genes that are solely required for the establishment of lysogeny. However, we cannot delete too much. The stability of the phage head needs a certain amount of DNA, so even though there are more genes that are not required, we cannot delete all that DNA. There has to be a minimum of 37 kb of DNA (about 75 per cent of wild-type) between the two cos sites that are cleaved to produce viable phage. The existence of these packaging limits is a very important characteristic of the design and application of lambda vectors, and also of cosmids which we will discuss later.

In vitro Packaging Through transformation naked bacteriophage DNA can be introduced into a host bacterial cell in much the same way as is described for a plasmid. The big difference is that in this case, instead of plating on a selective agar and counting bacterial colonies, we could mix the transfection mi~-with a culture of a phage-sensitive indicator bacterium in molten soft agar and look for plaques (zones of

44

Genetic

Engineering

~ clearing due to lysis of the bacteria) when ~ Phage genes Phage genes overlaid onto an agar plate. It should be noted that there is no need of an antibiotic resistance 1 gene as a selective marker. Transformation Is an inefficient process in' Induced A. lysogen E. coli comparison to plasmid transformation because Induced A. lysogen E. coli BHB2690 D amber mutation. BHB2688 E amber mutation. of the presence of large size of the bacterio- Produces A. tails, protein D, Produces A. tails, pre-heads phage DNA molecule. Thus, it is not suitable assembly proteins, but no including protein E, assembly proteins, but DNA packaging for the production of gene libraries which is protein E and hence no blocked by lack of protein D pre-heads the principal application of lambda vectors. However, there is a more efficient alternative. Lyse, mix Add ATP and In an appropriate bacterial host strain, some concatemerized ADNA mutant lambda phages will produce empty COS COS COS COS phage heads (as they lack a protein needed for packaging the DNA), while others are defective ' in the production of the head, but contain the + pD proteins needed for packaging'. Thus, the two extracts are thus complementary to one another. Use of the mixture allows productive packaging of added DNA, which occurs very effectively in vitro (including the addition of the tails). By addition of a sensitive bacterial culture and plating as an overlay, as above the resulting Mature phage phage, particles can then be assayed. In uitro particles packaging of lambda DNA is that method which is almost always used and is much more Fig. 4.10. In uitro packaging of concatemerized phage-A. DNA in mixed lysate. effective than transfection. This system exhibits one such feature which is markedly different from working with plasmid vectors is that the packaging reaction is most efficient with multiple-length DNA. The enzyme involved in packaging the DNA normally cuts the DNA at two different cos sites on a multiple-length molecule. monomeric circular molecules with a single cos site are packaged very poorly. So, whereas with plasmid vectors, the ideal ligation product is a monomeric circular plasmid consisting of one copy of the vector plus insert, for lambda vectors it is advantageous to adjust the ligation conditions so that multiple end-to-end ligation of lambda molecules together with the insert fragments can be obtained. The stickiness of the ends of the linear lambda DNA means this happens very readily.

o

1 !

1 1 o 0

Insertion Vectors This is simplest form of lambda vectors. In concept, these are similar to a plasmid vector, that is they consist of a single cloning site into which DNA can be introduced. However, wild-type lambda DNA contains many sites for most of the commonly used restriction enzymes; you cannot just cut it with say HindIII and ligate it with your insert DNA. In normal lambda DNA, HindIII has seven sites, and so will cut it into eight pieces. (Note the difference between a circular DNA molecule such as a plasmid, and a linear molecule like lambda: cut a circular DNA molecule once and you still have one fragment; cut linear DNA once and you have two fragments.) It would be almost impossible to join all these fragments (and your insert) together in the right order. To circumvent this. all lambda vectors have been genetically manipulated to remove unwanted restriction sites. In some cases, this has been done by deleting regions of DNA carrying these sites (and hence also increasing the cloning capacity). Another method is to select mutants with alterations in their sequence that result in the loss of the unwanted restriction sites.

45

Cloning Vectors

Lambda gt10 is the one example of a lambda vector. In this vector, there is only a single site at which EcoRI will cut the DNA. The deletions and other manipulations that this phage has undergone have removed some of the unwanted sites, and have also reduced the overall size of the phage DNA to 43.3 kb (which is still large enough to produce viable phage particles, and still contains the genes that are needed for viability), and EcoRI hence allows the insertion of foreign DNA up to a maximum of 7.6 kb. 43.3 kb o Immunity Two DNA fragments are region produced by cutting this vector through EcoRI. These are called Fig. 4.11. Lambda insertion vector gtlO. the left and right anns. Although it therefore appears that the insert would have to be ligated to two different pieces of DNA, in practice this does not complicate the ligation as much as might be imagined. In both the fragments, one end of each is derived from the cohesive ends of the lambda DNA, and will therefore anneal quite stably at 37°C-so although not covalently joined-they can be considered as a single DNA fragment. Lambda gt10 also provides us with another example of how insertional inactivation can be used to distingUish the parental vector (which may form by religation of the arms without an insert) from the recombinants. The EcoRI site is found within the repressor (el) gene, so the recombinant phage, which carries an insert in this position, is not able to make a functional repressor. As a result, they will be unable to establish lysogeny and will give rise to clear plaques, whereas the parental gtl 0 phage will give rise to turbid plaques. By using a host strain carrying the hfl (high frequency of lysogenization) mutation, the distinction can be made even more marked. In such a strain, any parental phage will establish lysogeny extremely efficiently, rather than entering the lytic cycle, with the result that few, if any, plaques will be obtained. This does not affect the recombinants, which are unable to make a functional repressor, and therefore do not establish lysogeny. So a substantial enrichment of recombinant phage over the religated vector can be achieved without having to resort to dephosphorylation with alkaline phosphatase. Lambda gt11 is another example of a lambda insertion vector, which is used in a rather different way. Since it allows expression of the cloned fragment, it is considered later in this chapter, together with other expression vectors. For lambda DNA, the packaging limits are between 37kb and 51 kb. In other words, an insertion vector smaller than 37 kb cannot be produced, or it is not possible to grow it to produce the DNA that we need. Such a big DNA fragment cannot be inserted so that it would make the product larger than 51 kb; the recombinant DNA would be unable to be packaged into the phage heads. It follows from this that the maximum cloning capacity for an insertion vector is (51 min:...s 37) kb = 14 kb. This is larger than we would normally clone comfortably in a plasmid vector, but still smaller than we would like for some purposes. It is required to turn to a different type of lambda vector, called a replacement vector, to inhance the available cloning capacity. Replacement Vectors Rather than by the nature of the genes required, the physical requirements of the phage head imposes the packaging limits that restricts the cloning capacity of insertion vectors. There are more genes that are not essential for lytic growth and could be deleted, except that it would make the phage DNA too small to produce viable progeny. That provides a clue to an alternative design of lambda cloning vectors. A piece of DNA can be removed and replaced by just arranging the vector instead of merely inserting extra DNA. There are two sites where the DNA will be cleaved instead of being cut just once by the restriction enzyme of choice {in this case BamHO. The vector DNA will therefore be cleaved into three fragments: the left and right anns (which will anneal by virtue of their cohesive ends) and a

r---~!..~e-::ft-a-rm-=-32::-.=-7-:-:kb----,.i----..,R=:i-:9h-:-t-ar-m-1:-:0~.6:-;-k-;-b--,

Genetic Engineering

46 BamHI

cos Left arm 20 kb

BamHI

cos

VisJJ;tte,1'4)b'iJ Right arm 9 kb

BamHI

~

I EMBL4

cut with BamHI

BamHI

---'1 VJSjI~~tf7l I Right arm 9 kb

I..--_ _ _L_ef_ta_r_m_20_k_b_ _

BamHI

BamHI - - - - - - - . . Discarded Insert

~____----------------B~am_ HI __ Bam~1H_I________~

I

Recombinant phage DNA .

Left arm 20 kb

Right arm 9 kb

I.

Fig. 4.12. EMBL4.

third fragment which is not needed (except to maintain the size of the DNA) and can be &?carded. It is called stuffer fragment because the only purpose of this fragment is to help to fill up the phage head. In use, therefore, this vector would be cut with BamHI and the fragments separated, e.g., b~ gel electrophoresis. The stuffer fragment would be thrown away, and the arms mixed with the restrit:;tion fragments to be cloned. These could be generated by BamHI digestion, or by cleavage of the target with another enzyme that produces compatible ends, e.g., Sau3A. Ligation of the mixture would produce recombinant phage DNA that would be packaged into phage heads by in vitro packaging. The cloning capacity of the vector is thus conSiderably enhanced; in this case the size of the arms combined comes to 29 kb and thus you can clone fragments up to (51 minus 29) kb = 22 kb. There is a further benefit of such a vector. The combined size of the arms is only 29 kb, which is less than the minimum required for packaging. Any pairs of arms that are ligated without an insert will therefore be too small to produce viable phage particles. Such viable particles will only be' produced if ligation results in an insert of at least (37-29) kb = 8 kb. For ensuring that, recombinant progeny can be obtained rather than parental vector molecules, including the use of alkaline phosphatase treatment of the vector to prevent recircularization. This is not necessary with replacement vectors. Since we do not have to treat the vector with phosphatase, we have another possibility of dephosphorylation of the insert. In the formation of a gene library, the insertion of more than one fragment into the same vector molecule is a problem that can give rise to anomalies in characterizing the insert in relation to the genome it came from. Insert-insert ligation can be purified by the phosphatase treatment of the insert and hence will ensure that all the recombinants carry only a single-insert fragment. There is yet another useful feature that can be built into a replacement vector, since for phage production, the stuffer fragment is not necessary as it does not have to be lambda DNA. It can be anything we want, so we could, for example, put in a fragment carrying a a-galactosidase gene. Then any plaques formed by phage that still carry the stuffer fragment would be blue (on a medium containing X-gal). There may be some phage DNA molecules that have not been cut completely, or there may be some stuffer DNA contaminating the preparation of the vector arms, which could then

47

Cloning Vectors

be ligated back into the vector. Any plaques containing the stuffer will be blue, so it could be checked immediately that everything has gone according to plan or not. So we see that lambda vectors provide a highly versatile and efficient system for primary cloning of unknown fragments, especially in the construction of genomic and cDNA libraries. They extend the cloning capacity over that which is readily obtainable with plasmid vectors, and can easily generate the very large numbers of recombinants that are required for a gene library. However, some people do not like working with lambda systems, mainly because it requires a different set of techniques for growing, assaying and maintaining phage preparations. There is nothing really difficult about it; it is just unfamiliar. The size of the vector DNA is the only real disadvantage BamHI to lambda cloning systems. With an insertion vector, your recombinant may contain 5 kb of insert and 45 kb of vector. This makes it more difficult to analyse or manipulate your insert than would be the case with a plasmid recombinant - especially as the lambda vector will contain a cos substantial number of recognition ~itE's for different restriction enzymes. site Thus, for further analYSis and manipulation, the normal procedure therefore, having identified the recombinant clone of interest, would be to reclone the insert (or part of it) into a plasmid vector. There are several other phages on vectors including PI and MI3 that are based on lambda phages which are the most widely used phage Fig. 4.13. Structure of a cosvectors. These are discussed further in subsequent sections in this chapter; mid. but first we need to look at a special class of vector that combines some of the features of lambda and plasmid vectors, and enables the cloning of even larger pieces of DNA. These are the cosmids.

M13 M13 phage DNA

Synthesis of complementary (0) strand

l ~(+)strand ~~~theSIS of

/

o Phage particles

Fig. 4.14. Replication of Single-strand bacteriophages.

VECTORS

Similar to lam9da, MI3 is a bacteriophage . that infects E. coli. That is about as far as the resemblance goes, either biologically or in their use as cloning vectors. MI3 is a 'sex-specific' bacteriophage, or more accurately F-specific. It attaches to the tips of the pili that are produced on the surface of bacteria that carry an F-type plasmid, and is therefore unable to infect bacteria that does not carry such a plasmid. The very long, thin filamentous phage particles consists of a circular, single-stranded DNA molecule of about 6 kb. By synthesis of the complementary strand, after this DNA enters the cell, it is converted to a double-stranded molecule called the replicative form, RF. By producing a circular singlestranded copy of one strand of the RF, this molecule is replicated and as a result, this singlestranded DNA is again converted to a doublestranded form. The production of the singlestranded intermediate requires a specific signal on the DNA, and is therefore completely strandspecific, i.e., it is always the same strand that appears in this form. This separation of the synthesis of the two strands is not unique to MI3 but is found in some other bacteriophages

48

Genetic Engineering

and also some classes of plasmids. However,within the cell most of the phage DNA is doublestranded circular DNA (RF) and by using conventional plasmid purification methods, it can be isolated. Plasmid-like DNA molecules are formed within the cell by continued replication of the phage DNA. At the same time, expression of phage genes occurs, and the product of one of these genes binds to the single-stranded product, initiating production of phage particles. This occurs by extrusion of the DNA through t}:le cell membrane, during which process it becomes coated with phage proteins. The length of the filamentous phage particle is determined by'the length of the DNA molecule, unlike lambda where the size of the particle is determined by the structure of the proteins of which it is composed. Although the phage does become increasingly fragile if large DNA fragments are inserted, there are no absolute packaging limits for M13. A curious and significant feature of M13 is that infection does not lead to bacterial lysis. Phage particles continue to be produced, and the cell remains viable, although it grows more slowly. Infection does result in the appearance of 'plaques' in a bacterial lawn, but these are zones of reduced growth rather than zones of lysis. As a consequence of the continuing viability of the host cell, very high titres of phage can be produced. The main advantage of M13 is that it provides a very convenient way of obtaining singlestranded versions of a gene, which would be difficult to do in any other way. Single-stranded DNA obtained from M13 clones has been widely used for DNA sequencing. Although, nowadays, doublestranded DNA templates are commonly used for sequencing, Single-stranded vectors are still preferred by some laboratories, Another application where Single-stranded DNA can be advantageous (although not essential) is site-directed mutagenesis. There is another important role of M13 vectors which is not at all with the production of single-stranded DNA. This is the technique called phage display, Those vectors which are on M13 have been engineered to contain multiple cloning sites, and these vectors usually include a beta-galactosidase gene to distinguish recombinants from parental vector, as was described for pUC18 earlier in this chapter, DNA fragments can be cloned into such sites, using the plasmid-like replicative form; after transformation, the progeny are detected by a plaque assay. with 'white' plaques indicating a recombinant clone and blue plaques (on a medium containing X-gal) suggesting non-recombinant phage. As discussed earlier, in relation to pUCI8, some blue plaques may occur through insertion of small fragments if they do not shift the reading frame of the lacZ gene. If after this the phage particles are isolated from the supernatant of an infected culture, these will contain the gene in single-stranded form. It should be noted that it is always the same strand that is found in all phage particles due to the specificity of the replication process, EXPRESSION VECTORS

The above discussion has assumed that all you want to do is to clone a piece of DNA. It does not consider the possibility that you might want to obtain expression of the gene encoded by that DNA. If you take a DNA fragment from another organism and clone it in E. coli, there are many reasons why it may not be expressed. At the simplest level, these relate to the signals necessary for initiating transcription (a promoter) and translation (a ribosome-binding site and start codon). Incorporating these signals into the vector, adjacent to the cloning site is the basic way of encouraging expression of the cloned gene. This is then known as an expression vector. There are two main types of expression vectors, If the vector just carries a promoter, and relies on the translation signals present in the cloned DNA, it is referred to as a transcriptional fusion vector. On the other hand, if the vector supplies the translational signals as well (so you are inserting the cloned fragment into the coding region of a vector gene), then you have a translational fusion. Note that in this case, the insert must be in frame with the start codon. The plasmid vector pUC18 that we looked at earlier is actually a translational fusion vector, although not often used as such. A better example is the lambda vector gt11. This is an insertional vector, 43.7 kb in length (making the

49

Cloning Vectors Cloning site

~

Promoter

~~----------------------

Insert with translational signals

Ribosome binding site and start codon )~~-----------------------------;~~mRNA

Protein product

Fig. 4.15. Expression vectors-transcriptional fusions .

maximum cloning capacity about 7 kb). It has been engineered to contain a ~-galactosidase gene, and has a single EcoRI restriction site within that gene. This confers two properties on the vector. Firstly, insertion of DNA at the EcoRI site will inactivate the ~-galactosidase gene, so that recombinants will give 'white' plaques on a medium containing X-gal. Secondly, the insert, if in the correct orientation and in frame, will give rise to a fusion protein containing the product encoded by the insert fused to the a-galactosidase protein .. Thi$ fusion protein is unlikely to have the biological functions associated with your cloned gene-it contains too much extraneous material-but that is not the point. It is reasonably likely to react with some antibodies to the natural product, which makes it a useful way of detecting the clone of interest . . Promoter and

Insert without translational signals

Start codon

)>-_..;..l_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _.~~mRNA

Fusion protein

Fig. 4.16. Expression vectors-translational fus ions.

50

Genetic Engineering EcoRI

I ~/L/--~~======~·:I---7.11 ~ / I!¥I 111----p-

gt 11

Beta-galactosidase

I~

p-

EcoRI

!

EcoRI

I

I

.

Insert

---/;1-1---1§j~=====jl• • •t=:Jr----1tI-I_-_6

,

r

t

Recombinant

r1tt '")

Fusion protein

Fig. 4.17. Use of lambda gt11 for generation of fusion proteins.

To make full use of an expression, vector is required to be able to choose whether to have expression on or off. The regulation of most inducible bacterial promoters, such as that of the lac operon, is rather 'leaky', i.e., there is still some expression even in the uninduced or repressed state. Firmer control can be achieved by the use of promoters from bacteriophages, notably one from the bacteriophage T7. In T7, and this promoter controls the expression of the 'late' genes, i.e., the genes that are only switched on at a late stage of infection. This promoter is not recognized by E. coli RNA polymerase, but requires the T7 RNA polymerase, a product of genes expressed earlier in the infection cycle. So if we clone our DNA fragment downstream from a T7 promoter, using an 'ordinary' E. coli host (lacking a T7 polymerase gene), we will get no expression at all. This can be useful, as the product might be deleterious to the cell. Once we are satisfied that we have made the right construct, we can isolate the plasmid and put it into o/l0ther E. coli strain that has been engineered to contain a T7 polymerase gene, and hence will allow transcription of the cloned gene. If the expression of the T7 polymerase gene is itself regulated, for example, by putting it under the control of a lac promoter, then we can turn the expression of the T7 polymerase up (by adding IPTG) or down. Thus, the level of expression will still be in control. Further, devices can be included to inhibit the low level of T7 polymerase arising from the leakiness of the uninduced lac promoter. Multiple cloning The pGEM® seri~ of, vectors provide an example .. site In this case, there is a multiple cloning site adjacent to the T7 promoter, so any DNA inserted will be under the control of the T7 promoter. This is a transcriptional fusion vector, and it is often more useful for generating substantial amounts of an RNA copy of your cloned fragment, which can then be used as a probe for. hybridization. The vector Fig. 4.18. Structure of the expression vector pGEM-3Z. bla-beta-lactamase actually has a second specific promoter, derived from (ampicillin resistance); ori-rigln of another bacteriophage (SP6) , at the other side of the replication'. multiple cloning site-so if you provide an SP6 polymerase, you will get an RNA copy of the other strand, the antisense strand. The usefulness of this will be apparent if applications of antisense RNA are considEilred like the RNase protection assay. There are also a variety of translational fusion vectors with T7 promoters, which are desigped for obtaining high, controllable, levels of protein expression. We will return to the concept of expression vectors, and other factors that have to be considered for the optimization of protein production.

51

Cloning Vectors VE~ORS FOR CLONING AND ExPRESSION IN EUKARYOTIC CELLS

By using bacterial host like E. coli, most primary cloning is carried out, because of the ease of manipulation and the range of powerful techniques that have been developed. Eukaryotic hosts are more commonly used for studying the behaviour of genes that have already been cloned (but in an environment more closely related to their original source), for analysing their effect on the host cell and modifying it, or for obtaining a product which is not made in its natural state in a bacterial host. There is, therefore, more emphasis with eukaryotic vectors in obtaining gene expression rather than making gene libraries or primary cloning (with the notable exception of YAC vectors, see below). For cloning in different eukaryotic hosts, a bewildering variety of vectors are available, and a full review of them is way beyond the scope of this book. Yeasts Microbiologically, 'yeasts' are single-celled fungi , as opposed to filamentous fungi , but the term is quite imprecise. Not all 'yeasts' are related taxonomically, and indeed some filamentous fungi can also grow in a unicellular form that is referred to as a yeast form. Although in common usage, the term 'yeast' would be taken to mean the brewer's/ baker's yeast Saccharomyces cereuisiae, even molecular biologists are starting to have to recognize the existence of other yeasts (especially members of the genus Pichia). However, for the moment, we will limit ourselves to S. cereuisiae. After reading about bacterial cloning vectors, the vector that will be most famiiiar are the yeast episomal plasmids (YEp). These are based on a naturally occurring yeast plasmid known as the 2 Ilm plasmid, and they are therefore able to replicate independently in yeast, at a high copy number (25-100 copies per cell). As is usually the case with vectors for FIg. 4.19. Structure of a veast eukaryotic cells, these plasmids also have an E. coli origin of replication, episomal uector. enabling them to be grown and manipulated in an E. coli host (i.e. they are shuttle vectors). There is one point of detail in which they differ from bacterial cloning vectors, and that is the nature of the selectable marker. For bacterial vectors, a large number of antibacterial .antibiotics can be explained to enable us to select our transformants. There are fewer antibiotics available to which yeasts are sensitive (although some fungicides can be used), and therefore selection more commonly makes use of complementation of auxotrophic mutations in the host strain. For example, a host strain of S.cereuisiae with a mutation in the trpl gene will be unable to grow on a medium lacking tryptophan. If the vector plasmid carries a functional trpl gene, then transformants can be selected on a tryptophandeficient medium. ura3 (uracil), leu2 (leucine), and his3 (histidine) are the other commonly used selectable markers. These vectors would usually also carry an antibiotic resistance marker for selection in E. coli. Generally those vectors which replicate as plasmlds In S. cereuisiae are rather unstable, in that they tend to be lost from the culture as plasmid-free daughter cells accumulate. This is because of the erratic partitioning during mitosis. Newer versions of YEp vectors, taking advantage of a better understanding of the biology of the 2 Ilm plasmlQ, are more stable. By i11serting a specific sequence from a yeast chromosome, autonomously replicating plasmids can also be constructed. This sequence Is called as an autonomously replicating sequence, or ars. These plasmids are very unstable, but some constructs that also include a centromere are more stable. In contrast to the YEp vectors, these yeast centromere plasmlds (YCp), are normally maintained at a low copy number (1-2 copies per cell), which can be advantageous if your product Is in Fig. 4.20. Structure of a veast any way harmful to the cell, or if .you want to study its regulation. centromere uector.

52

~netlc Engineering

One of the reasons for using yeast as a host is to express the cloned gene. These vectors are commonly designed as expression vectors. The principles irwolved are similar to those described above for bacterial expression vectors, except that of course the expression signals involved are those applicable to S. cereuisiae rather than E. coli. If required, this can include signals for secretion, or for targeting to the nucleus or other cellular compartments. It should be ,noted that being eukaryotic, ' . S. cereuisiae possess very few Fragment cloned, and recombinant introns, and is not the host of plasmid propagated, in E. coli choice if the correct excision of introns is to be ensured. In yeast, the vectors described are maintained as circular DNA molecules, much like a bacterial plasmid. Two other classes of vectors deserve a mention. First, there are the yeast integrating plasmids (YIp). These do not replicate independently but integrate into the chromosome by recombination. After transformation, the frequency of . transformation is very low, and Recombination ura3 it is difficult to recover the re- Yeast combinant vector. The main ad- chromosomal ======m~~~~!L======== DNA vantage is that the transformants are much more stable than those Plasmid DNA Inseted Into chomosome obtained with the autonomously replicating plasrnids. ~~ , ~ Non-functional Insert Functional ura3 Second, there are the yeast ura3 artificial chromosomes (yACs) , which carry telomeres that Fig. 4.21. Structure and use 0/ a yeast Integrotlue plasmid (yIp). enable their maintenance in S. cereuisiae as linear structures resembling a chromosome. For cloning very large pieces of DNA, the use of these vectors is quite distinct from the uses of the vectors described above. YACs are considered further, along with other vectors used for the same purpose, in a later section of this chapter.

!

.-.. !

Mammalian Cells The cloning vectors used in bacteria replicate separately from the chromosome, as plasmids or bacteriophages. As we have seen above, the same is true of many types of vectors used in yeast. The situation with cloning in mammalian cells is somewhat different in that independent, plasmid-like, replication is often not sustained. There are a few vectors which are capable of plasmid-like replication, especially those carrying the origin of replication from the virus SV 40 (simian virus 40), which replicate episomally in some mammalian cells (such as COS cells). By inserting the DNA into the chromosome, which happens readily in mammalian cells is that more stable clones are obtained. In either case, the cloning vector enables you to organize your cloned gene in relation to a set of expression signals, many of which are derived from viruses such as SV 40 or cytomegalovirus (CMV). A gene inserted at the multiple cloning site (MCS) enables high level constitutive expression from the CMV promoter, while the presence of a polyadenylation signal increases mRNA stability. The SV 40 origin allows episomal replication in COS cells, and the neomycin phosphotransferase gene permits selection for resistance to the antibiotic G-418 (Geneticin®). Note that this is a shuttle vector, carrying

53

Cloning Vectors

an E. coli origin of replication and an ampicillin resistance gene (~-lactamase), so the construction can be carried out in E. coli before transferring the recombinant plasmid to a mammalian cell line. For mammalian cells there is a wide variety of commercially available expression vectors with more sophisticated features than which are shown. On the basis of several viruses, several other types of vectors are available, which can be used to transmit the cloned gene from one cell to another. Of these, the retroViral vectors deserve a special mention, and in order to understand these, we need a brief account of retroviral biology. Retroviruses have an RNA genome. When a cell is infected, the RNA is copied into double-stranded DNA by the action of a viral protein, reverse Fig. 4.22. Structure of a ba· sic episomal uec· transcriptase. This protein is present in the virion and enters the cell tor for gene ex. along with the RNA. (Reverse transcriptase is formally an RNA-directed preSSion in rna· DNA polymerase.) This DNA then circularizes and by the action of another mmlian cells. virion protein known as integrase, it is integrated into the host cell. One of the main attractions of this system for genetic manipulation of animal cells includes .the efficiency of integration of the DNA into the genome. The integrated DNA is bounded by sequences known as long terminal repeats (LTR) , which include a strong promoter for transcription of the integrated viral genes gag, pol and enu. Full-length transcripts provide the viral RNA which is assembled into virus particles; one region of the virus, known as the psi site, is essential for this process. The packaged virus particles acquire envelope S'LTR glycoproteins from the host cell membrane as they bud off from the cell, without lysis . These glycoproteins help in determining the type of receptors the virus uses to infect further cells. Development of vectors based on retroviruses neo rests on the knowledge that most of these functions can be provided in trans, e.g., by genes from a defective helper virus already integrated into the genome of the host cell. The main features that are cis-acting, and therefore need to be located on the vector itself, are the LTR sequences and the psi site. Insert at multiple cloning site The vector is a shuttle plasmid, thus by inserting the required gene at the multiple cloning site, E. Transfection coli is used for construction of the recombinant plasmid. This construct is then used to transfect a culture of a special cell line (helper cells) that contains the gag, pol and enu genes required for Virus ~~ particles virus production, integrated into the genome. Therefore, the transfected cells will be able to produce virus particles containing an RNA copy of your construct. These particles are able to infect other cells that do not contain the integrated essential genes; and since the viral particles carry preformed reverse transcriptase and integrase, the RNA will be copied Select and into DNA in such cells, and the DNA will be - - - - -.. ~~analyse efficiently integrated into the genome. However, since these cells do not carry the essential genes, no further production of viral particles will occur. However, now Fig. 4.23. Structure and use of a retroulral uector.

1

-----.

54

Genetic Engineering

the gene is stably integrated into the chromosome and can be expressed from the adjacent promoter derived from the vector. Through the envelope gene carried by the helper cells, the specificity of the viral particles for other cells will be detennined. Replacing that gene by other genes for envelope glycoproteins from other viruses, in particular the VSV-G gene from vesicular stomatitis virus, enables a wider range of target cells to be used, not just mammalian cells but extending to, for example, chickens, oysters, toads, zebrafish, mosquitoes-in fact cells from virtually all non-mammalian (and mammalian) species can be infected. SUPERVECTORS:

YACs

AND

BACs

Although cosmids were the first vectors that made the production and use of mammalian gene libraries feasible, their limited capacity would still not have sufficed for decoding the hUman genome. This was made ppssible by the development of novel supervectors that were able to carry 100 kb or more. The yeast artificial chromosome (yAC) was the first one among these. Similarly to cosmids, these could be constructed through knowledge of what features were necessary to enable the vector to be carried by its host (in the case of yeast, telomeres, a centromere, and an origin of replication), as well as selectable markers and cloning sites. As with the shuttle plasmids discussed earlier, the YAC vector is propagated as a circular plasmid in E coli. Restriction enzyme digestion removes the stuffer fragment between the two telomeres, and cuts the remaining vector molecule into two linear arms, each carrying a selectable marker. The insert is then ligated between these arms, as in the case of phage lambda, and transformed into a yeast cell, with selection for complementation of both auxotrophic markers. This ensures that the recombinants contain both arms. Furthermore, a successful recombinant must contain the tel sequences at each end, so that the yeast transformant can use these sequences to build functional telomeres. The titans amongst vectors, YACs are routinely used to clone ,6 00 kb fragments, and specialized versions are available which can accommodate inserts close to 2 Mb. As SnaBI

BamHI

ICut with SnaBI and BamHI + Discard stuffer .

·1II.r~~~~~~~~~a~~mllnllnllnllrl-'N~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~__~__~______~_

71

Insert DNA Fig. 4.24.

Structure and use of a veast artificial chromosome uector. trpl. ura3-6electable markers; cerVars~en· tromere and autonomouslv replicating sequence enabling replication In S. cerevlslae; tel-telomere; for simplification the E. coli Origin of replication and selectable markers for E. coli are not shown.

Cloning Vectors

55

such, they will not only easily accommodate any eukaryotic gene in its entirety, but also complete within their framework of three-dimensional structure and distant regulatory sequences. Thus, in producing transgenic organisms, they prove to be quite beneficial. However, YACs have problems with the stability of the insert, especially with very large fragments which can be subject to rearrangement by recombination. Furthermore, apart from the fact that many laboratories are not set up for the use of yeast vectors, the recombinant molecules are not easy to recover and purify. Thus, larger bacterial vectors are used more than YACs even though their capacity is lower. These include vectors based on bacteriophage PI, which are able to accommodate inserts in excess of 100 kb, and bacterial artificial chromosomes (BACs), which are based on the F plasmid and can accommodate 300 kb of insert. These vectors lack the instability problems found in YACs, and play an important role in genome sequencing projects.

5 COSMIDS, PHASMIDS AND OTHER ADVANCED VECTORS In the 1970s, when recombinant DNA technology was in the initial stages of development, only a few vectors were available and these were based on either high-copy-number plasmids or phage 'A. Later, phage M13 was developed as a specialist vector to facilitate DNA sequencing. Slowly, a number of purpose-built vectors were produced, of which pBR322 is probably the best example, but the creation of each one was a major process. Over a period of time, a series of specialist vectors were produced, each for a special purpose. During this period, there were many arguments about the relative benefits of plasmid and phage vectors. Today, the molecular biologist have an enormous range of vectors and these are notable for three reasons. First, many of them combine elements from both plasmids and phages and are known as phasmids or, if they contain an M13 ori region, phagemids. Secondly, many different features that facilitate cloning and expression can be found combined in a single vector. Thirdly, purified vector DNA plus associated reagents can be bought from molecular-biology suppliers. The aim of this lession is to provide the reader with a detailed explanation of the biological basis for the different designs of vector. There are two main uses of cloning vectors: (i) cloning large pieces of DNA and (ii) modification of genes. When mapping and sequencing genomes, the first step is to sub-divide the genome into smaller manageable pieces. The larger these pieces, the easier it is to construct the final picture; hence the need to clone large fragments of DNA. Large fragments are also needed if it is essential to move along the genome to isolate a gene. In many cases, the desired gene will be relatively easy to separate and a simpler cloning vector can be used. Once separated, the cloned gene can be expressed as a probe sequence or as a protein, it may be sequenced or it may be mutated in uitro. For all these applications, small specialist vectors are used. VECTORS USED FOR CLONING lARGE FRAGMENTS OF

DNA

Cosmid Vectors As stated already that concatemers of unit-length 'A DNA molecules can be efficiently packaged if the cos sites, substrates for the packaging-dependent cleavage, are 37-52 kb apart (75-105% the size of A,+ DNA). In fact, only a small area in the proximity of the cos site is required for recognition by the packaging system. Plasmids have been produced which contain a fragment of 'A DNA including the cos site. These plasmids are named "cosmids" and can be used as gene-cloning vectors in combination with the in uitro packaging system. Packaging the cosmid recombinants into phage coats imposes a desirable

56

57

Cosmids, Phasmids and other Advanced Vectors

Restriction selection upon their size. With a cosrnid Foreign DNA restriction endonuclease vector of 5 kb, we demand the fragment target site induction of 32-47 kb of foreign DNA -much more than a phage-A. vector cos can accommodate. After packaging in Restriction vitro, the particle is used to infect a 0·····----··---- -.. -- endonuclease suitable host. The recombinant cosmid DNA is injected and circularizes like ~ phage DNA but replicates as a normal plasmid without the expression of any DNA ligase (high DNA concentration) phage functions. Transformed cells are selected on the basis of a vector drugresistance marker. ...... Cosmids gives an efficient means ,,' '0, of cloning large pieces of foreign DNA , , due to their capacity for large fragments , ........... , of DNA, they are also particularly attractive vectors for producing libraries of eukaryotic genome fragments. Partial cos cos digestion with a restriction endonuclease o-tt-""""-O----- ---- --- _e' ~--. - - . - - - - provides SUitably large fragments, but \.. 37-52 kb -\ there is a potential problem associated with the use of partial digests because Packaging of the possibility of two or more genome in vitro" fragments joining together in the ligation reaction, hence creating a clone containing fragments that were not initially -adjacent in the genome. This would give an incorrect picture of their chromosomal organization. The problem Transducing particle can be overcome by size fractionation containing cosmid of the partial digest. recombinant Even with sized foreign DNA, in cos practice cosmid clones may be Select ApRclone constructed that have non-contiguous DNA circularizes after DNA fragments ligated to form a single injection insert. The problem can be solved by dephosphorylating the foreign DNA Fig. 5.1. Simple scheme for cloning In a cosmid vector. fragments so as to prevent their ligation together. This method is very sensitive to the exact ratio of target-te-vector DNAs because vector-te-vector ligation can occur. Furthermore, recombinants with a duplicated vector are unstable and break down in the host by recombination, resulting in the propagation of a non-recombinant cosmid vector. Such difficulti~s have been overcome in a cosmid-cloning procedure devised by Ish-Horowicz and Burke (1981). By proper treatment of the cosmid vector pJB8, left-hand and right-hand vector ends are purified, which are incapable of self-ligation but which accept dephosphorylated foreign DNA. Thus, the method eliminates the need to size the foreign DNA fragments and prevents formation of clones having short foreign DNA or multiple vector sequences. An alternative solution to these problems has been devised by Bates and Swift (1983) who have constructed cosmid c2XB. This cosmid carries a BamHI insertion site and two cos sites separated by

I

o

+ . I •

,

+

I

.

, I

I

+

Genetic Engineering

58'

BamHl HinDIII a blunt-end restriction s;.te. The Flstl creation of these blunt ends, which ligate only very inefficiently under the conditions used, effectively cos prevents vector self-ligation in the ligation reaction. Modem cosmids of the pWE San and sCos series contain Target genomic characteristics such as: (a) multiple cloning sites for simple cloning using non-size-selected DNA; (b) phage promoters flanking the cloning site; and (c) unique NotI, ~I SacII or Sill sites (rare cutters) (1) Hind III (I) San flanking the cloning site to permit (2) Phosphatase (2) Pho5:phatse (1) Partial H S 's removal of the insert from the SauSA HO~OH vector as single fragments. HO digestion Mammalian expression modules H B BH B S encoding dominant selectable HO~ (2) Phosphatase HOHHOI---I !:-C.=:J.-iOH COS 1...._ _ _-, 1 COS markers may also be present, for gene transfer to mammalian cells if required.

~

!

C~~mHlj

B;~~I

H

Alternatives to Cosmids (BACs

'"

Mix

Sau3A

r-~

Sau3A

Ligate . . . . . . . OH~OH and PACs) Phage PI is a temperate bacteriophage which has been widely used for genetic analysis of H (8) (B) S Escherichia coli because it can HO~I-'_ _-i~OH facilitate generalized transduction. 37-52 kb Sternberg and co-workers have Only packageable molecules developed a PI vector system which has a capacity for DNA Package fragments as large as 100 kb. in vitro Thus, the capacity is about twice that of cosmid clones but less than Fig. 5.2. Cosmid cloning scheme of Ish·Horowicz and Bruke. (a) Map of

t

32-4~

cosmid pJBB. (b) Application to the construction of a genomic

that of yeast artificial chromosome library of fragments obtained by partial digestion with Sau3A. This (YAC) clones. The PI vector restriction endonuclease has a tetranucleotide recognition site and contains a packaging site (pac) generates fragments with the same cohesive termini as BamHI. which is necessary for in vitro packaging of recombinant molecules into phage particles. The vectors contain two 10xP sites. These are the sites recognized by the phage recombinase, the product of thf! phage cre gene, and which lead to circularization of the packaged DNA after it has been introduced into an E. coli host expressing the recombinase. Clones are maintained by E. coli as low-copynumbers plasmids by selection for a vector kanamycin-resistanc,e marker. PI. high copy number can be induced by exploitation of the PI lytic replicon. This PI system has been used to construct genomic libraries of mouse, human, fission yeast and Drosophila DNA. Shizuya (1992) have produced a bacterial cloning systerr, for mapping and analysis of complex genomes. This BAC system (bacterial artificial chromosome) is based on the single-copy sex factor F of E. coli. This vector includes the A. cos N and P110xP sites, two cloning sites (Hil1dIII and BamHO and several G+C restriction enzyme sites (e.g. Siil, NotI, etc.) for potential excision of the inserts.

59

Cosmids, Phasmids and other Advanced Vectors

The cloning site is also flanked by T7 and SP6 promoters for producing RNA probes. This BAC can be transformed into E. coli very efficiently, thus avoiding the packaging extracts that are required with the PI

KmR Smal

1

BainHl

system.

BACs can maintain human and plant genomic fragments of greater Target genomic than 300 kb for over 100 generations DNA with a high degree of stability arx:I have . been used to construct genome Sma +Bam Bam +- Sma libraries with an average insert size of (Blunt) t--e::J----1 I-C:J--i (Blunt) 125 kb. Subsequently, Ioannou have cos ApR rep cos developed a PI-derived artificial chromosome (PAC), by combining (I) Sa are (presumably) redundant gene copies so that loss of function can be tolerated, and might therefore be labelled as 'junk DNA'. They nevertheless represent an evolutionary resource and may evolve later on into new genes with novel functions. To some extent the amount of apparent debris or junk in the genome is related to genome size. Bacterial genomes carry mobile elements and phages, and some other repetitive elements, but in most cases relatively few obvious pseudogenes (although there are many potential coding sequences to which we cannot ascribe any function). The percentage of junk DNA is much higher with larger genomes like that of mammals. A library based on mRNA, rather than a genomic library, will reflect only those genes that are actually expressed in a particular cell or tissue sample at a particular time. Hence all the real junk will be eliminated. However, it goes further than that. A cell will use only a part of its genetic capability at anyone time. On the basis of its environment and its stage of growth, a bacterial cell switches genes on or off. In multicellular organism, differentiation of cells into tissues and organs will be reflected in more or less permanent changes in the nature of the genes that are expressed. Some genes will be active only during specific developmental stages; others will be active during specific times of the day. A library can also be prepared from a cancerous sample, or from an individual having a genetic disease. A library of this sort will reflect the nature of the cells from which the mRNA was obtained. As we will see later on, this not only decreases very substantially the number of clones needed for a representative library, but it also gives us with a variety of ways in which we can focus attention on the differences between various cells or tissues. Thus it enables to identify genes that are selectively expressed in different environments or in different tissues. However, mRNA cannot be cloned directly. We have to produce a complementary DNA (cDNA) copy; hence the designation of such a library as a eDNA library. The synthesis of the cDNA is carried out using an enzyme known as reverse transeriptase. (Since transcription refers to the production of RNA from a DNA template, the opposite process-RNA-directed DNA synthesis-is known as reverse transcription.) Although in most cells, this is not normal, some types of viruses, such as leukaemia viruses and HIV, replicate in this fashion; the viral particle contains RNA which is copied into DNA after infection, using a virus-encoded enzyme. Some cellular DNA polymerases are also capable of reverse transcription. Isolation of mRNA In a cell most of the RNA is not messenger RNA. The initial RNA preparation will have substantial amounts of ribosomal RNA and transfer RNA. For forming a library for effective production of cDNA, it is highly desirable to purify the mRNA. This can be done with eukaryotic cells, where we can take advantage of the fact that mRNA carries a tail at the 3' end-a string of A residues that is added post-transcriptionally. The polyadenylated mRNA will anneal to synthetic oligo(dT) sequences (Le., short polymers of deoxythymidine, or in other words, short stretches of synthetic DNA containing just T residue). Other RNA species, and non-RNA compo-nents will not anneal and can be rinsed off. Although in bacteria some mRNA does have polyA tails, these are much shorter and only a small proportion of mRNA is polyadenylated. Therefore, this does not provide a reliable way of isolating bacterial mRNA. The question of the production of bacterial eDNA is described below.

79

Genomic and eDNA Libraries Mixture of mRNA (polyadenylated) and other components

: -I ~RNA .~~ ; I

I

mRNA sticks to the oligo-dT beads 1

mRNAh~e

I 1- _ _ _ _ _ _ _ _ _ I

oligo-dT beads

Other components do not stick

mRNA washed off by elution buffer

Fig. 6.5. Principle of oligo-dT purification of mRNA.

The polyA tail will anneal to the oligo(dn residues and will be retained on the column while other RNA species will pass through when the RNA preparation is passed through a column of a polymer coated with synthetic oligo(dn fragments. This is in effect a hybridization process, and as such the hybrids can be made unstable by decreasing the salt concentration and raising the temperature, and enabling the elution of purified mRNA from the column. This will contain complex mixture of all the mRNA species present in the cell at the time of extraction. The relative amounts of the different transcripts will differ substantially, which has major implications for the ease of setting certain cDNA clones. This is a further clear distinction from a genomic library. Although the following description is presented in terms of a single mRNA, bear in mind that we would in reality be dealing with a complex mixture. Wash buffer

Elution buffer

.--e ---e .---- •-e .--e .---- .--e

Mixture of mRNA (polyadenylated), other RNA, and other components

.--e

~---e

.--e

e----

---e

Column of beads with oligo-dT

-----~ ~ ~

Non-mRNA components washed through

mRNA eluted

Fig. 6.6. Purification of mRNA through oligo-dT column.

eDNA Synthesis Reverse transcription step is also helped by presence of the polyA tail. Reverse transcriptase, like DNA-directed DNA polymerase, requires a primer for start up. An oligo(dn primer will anneal to the

80

Genetic Engineering

polyA tail; by using the mRNA 5' - - - - - - - - - - - - - - - - A A A A A 3' mRNA as the template, reverse oligo-dT primer transcriptase will extend this primer. Thus as d l~~U:L, a single-stranded cDNA copy will 5' - - - - - - - - - - - - - - - - A A A A A 3' mRNA be produced. TTTTT 5' Now there is a doubleReverse trasnscriptase Primer stranded molecule that is a hybrid between DNA and 5' AAAAA mRNA DNA RNA. It is needed to replace ================~'1TTTT the RNA with a DNA strand Removal of mRNA of the same sequence to obtain (alkaline degradation a molecule that is stable and or RNaseH)) can be cloned into a vector. 3' ------------------HTTT First, the RNA is partially Hairpin formation degraded by a specific RNAse c=3' called RNAse A, which is able ------------------'TTTTT to attack double-stranded molecules. This leaves the DNA polymerase cDNA strand largely in singlec:...._ _ _ _ _ _ _ _ _ _ _ _ _ _ _~_......:HTTT stranded form. Single-stranded nucleic acid molecules tend to form secondary structures, c looping back on themselves, Removal of hairpin because of the hydrophobicity Fill-in end with DNA polymerase of the bases. Therefore, the Double-strand single-stranded cDNA will tend eDNA to form a hairpin loop at the Fig. 6.7. Synthesis of eDNA from mRNA. 3' end. This hairpin loop, and the partial remains of the RNA ... - -- , ~-=-, strand, are used by DNA eDNA -= -=' Adapter ~ =- ~ oligonucleotides polymerase I as primers for ,, second strand synthesis. . . -----=-- - ..' . The product is a doublestranded DNA molecule, Blunt-end ligation which has got a hairpin loop at one end. That loop is then removed by treatment with SI nuclense (which will cut singlestranded DNA, including exposed loops). Further treatment with DNA polymerase will ensure that the molecule is fully blunt-ended. Adapters are added for cloning the cDNA, by bluntend ligation, to make the cDNA molecules compatible with the chosen vector. The library is inserted into the Fig. 6.B. Cloning eDNA.

I

+

3'

! I t I t I t I t

- - ---'

I

:

_

-

- -=

_~-=

I

81

Genomic and eDNA Libraries

vector in a second ligation after size"fractionation, elimination of excess adaptets and small abortive cDNA fragments. Because transcripts are not usually longer than a few kb, larger vectors are not considered for cDNA libraries-the choice is essentially between a plasmid vector, or a phage lambda insertion vector such as gtl 0 or gt 11. One limitation of the basic procedure described above is that you may not get full-length eDNA. Restrictions such as elements of secondary structure in the mRNA may interfere with reverse transcription, so that the enzyme rarely, if ever, reaches the end of the mRNA. As a result, the regions at the 5' end of the mRNA may be under-represented in the eDNA library. This can be partially addressed by using random primers rather than oligo-dT primers. At intermediate points these random primers will initiate first strand cDNA synthesis and hence the enzyme will be more likely to reach the 5' end of the mRNA. Obviously, you will not then get any clones containing fulllength cDNA, but these clones having the 5' end can be compared to other clones carrying the 3' portion, making it possible to devise strategies for obtaining full-length molecules. Rather than oligodT primers, the use of random prim'2rs also overcomes the problem that some RNA molecules (e.g. bacterial mRNA and genomic RNA from viruses) are not polyadenylated. (b) Random primers: enrich for 5' ends

(a) Oligo-dT primer, incomplete products mRNA 5' AT

l

~5'

First-str?nd synthesIs

!

Socond~traod

synthesis

cDNA products

5'

mRNA ~

AT

~

~

l

~

3'

(c) Taailing enhances representation of5'ends 5'

First-strand synthesis

CCCC Second-strand synthesis GGGG CCCC

l l l l

3' Aemcval of mANA

Tailing terminal transferase plus dCTP

A~neal oligo-dG pnmer

Second-strand synthesiS

GGGG CCCC

Fig. 6.9. eDNA synthesis-enhancing representation of 5' mRNA ends.

The requirement for hairpin formation to prime second strand synthesis can also minimize the representation of the 5' ends. One strategy to counteract this is to use terminal transferase to add a tail to the 3' end of the first cDNA strand. Terminal transferase, if provided with, say, dCTP, will add a string of C residues to the 3' ends of DNA molecules. To initiate second strand synthesis, without requiring hairpin formation this enables the use of an oligo-dG primer. Many variations of these strategies have been devised to obtain full-length cDNA. The most powerful strategy, known as rapid amplification of eDNA ends (RACE), exploits the amplification power of the polymerase chain reaction. Bacterial eDNA Rather than genomic arguments in favour of eDNA, libraries carry much less force with bacterial targets. The smaller size of bacterial genomes, and the (virtual) absence of introns means that a genomic library is usually quite enough-and a lot easier to construct. In producing cDNA with

Genetic Engineering

82

bacteria, there are additional technical difficulties. Not only is the mRNA not consistently polyadenylated, but it is also very unstable-many bacterial mRNA species have a half-life (in vivo) of only a minute or two. Furthermore, the organization of bacterial genes into polycistronic operons (groups of genes inat are transcribed-into a.single 10l)g mRNA) 'means that a bacterial mRNA can be as much as 1020 kb in length. Not only is it tough to isolate this mRNA intact, but it would be very difficult to produce a fulHength cDNA copy from it. As a result, bacterial cDNA libraries are rarely produced. However, for some purposes, such as the analysis of gene expression, cloning of bacterial cDNA can play an important role, for example in identifying those transcripts that are relatively abundant in the bacterial cells under selected conditions. RANDOM, ARRAYED AND ORDERED LIBRARIES

So far it has been concluded that a gene library is a single tube having a mixture of a large number of clones. When you want to screen the library, you plate it out to give plates with a large number of bacterial colonies or phage plaques. This is perfectly adequate for many purposes. However, there are circumstances when you do not want to treat the library as a random collection of a large number of clones. For example, in very complex screening process, you may be able to test only a portion of the library at a time. This might mean taking a small aliquot of the library and testing say 100 clones. If you then take another aliquot, and again test 100 clones, then some of those clones may be the same ones that you have already tested. (Remember that your library has been amplified, so each independent clone is present in many copies.) One way to avoid it is to produce an arrayed or gridded library. If we go back to the original transformation step (or infection with the packaged phage particles), then instead of simply pooling all the clones and storing them in one tube, we can pick them individually and store them separately. This can be done using individual wells in micro titre trays, or sometimes on filters. If you are dOing this manually, then you are limited in the size of library that you can handle (depending on how patient you are). For a bacterial library with, say, a few thousand clones, this is possible, However, there are now machines available that will identify colonies on a plate, pick them individually and transfer them to individual wells in a micro titre tray. Pick individual colonies to a microtitre plate

Plate out

~

ow Random library

~ r-.-.,-.-.-.-.,-..,-.__.-.-,.-.-, •••••••••••• •••••••••••• •••••••••••• •••••••••••• •••••••••••• •••••••••••• ••••••••••••

Replicate ~and

screen

Arrayed (grldded) library Fig. 6.10. Production of an arrayed, or gridded library.

Once you have an arrayed library in a micro titre tray, it is then relatively easy to subculture the clones to trays with fresh culture medium, producing multiple and identical copies of your library. You can then work your way through the library, testing each clone individually, knowing that every well contains a different clone. Alternatively, the library can be replicated to agar plates, or directly to membranes, which can then be screened to identify specific clones. Replicating such a library onto a membrane is one way of forming another form of array, consisting of spots of DNA rather than viable clones. For a larger library, for example, a human gene library, you would want to screen the library at a much higher density, That is, you would want to

83

Genomic and cDNA Libraries

put the spots on the filter much closer together than would be obtained by merely replicating from the micro titre tray. This needs a precise positioning of the spots on the filter, which is achieved by using another robot. The availability of genome sequence data makes it possible to produce DNA arrays without constructing a gene library, using peR products or synthetic oligonucleotides. These either use nylon membranes (macroarrays) or glass slides (micro- r--------------------------------------------------------------Clones in a random genomic library arrays), and are especially important for analysing variations in genome content and genome-wide analyses of transcription. It is not necessary to be able to produce your 3 own arrays, or to own your own robots. Arrays 4 2 representing genomic and cDNA libraries from a considerable range of organisms are readily available _______________________________________________________________ _ from public and commercial resource centres, as well (a) Clone 1 hybridizes to clone2 as facilities for producing such arrays from your own libraries. An arrayed library still consists of a random set of clones. Without screening, we have no information (b) Clone 2 hybridizis to clone 3 as to the nature of the insert in each clone, or the relationship between clones. A further step in the development of the concept is to establish which clones overlap so that we can produce a set of clones that can be arranged in order, so as to cover the whole (c) Clone 3 hybridizes to clone 4 genome. This can be done by hybridization between sets of clones, so that if the insert in clone 1 overlaps with that in clone 2, then we know that they are adjacent and overlapping in the genome. Because of Fig. 6.11. Production of an ordered library. the inherent redundancy in a random library, the number of clones required in an ordered library is much less; and some of the clones in the random library will not be needed. The actual number of clones needed will depend on the degree of overlap between the chosen clones, as well as the insert size. An ordered library can be a important resource. Ordered libraries have contributed substantially to some of the genome sequencing projects-notably the public human genome sequencing consortium used an ordered library of BAC clones. However, the work required to establish an ordered library is substantial, even for a relatively small genome such as that of a bacterium, and even using relatively large inserts such as those in a cosmid library. More recent developments in sequencing technology, and in the computer techniques for assembling sequence data from large numbers of ·small fragments, provide alternatives to the production of ordered libraries as strategies for genome sequencing.

I I

I I I

7 FINDING THE RIGHT CLONE SCREENJN~ UBJWUES WITH GENE PROBES

We have already studied the way of establishing a gene library. Now to identify whether it is a , genomic library or a cDNA library, whether it is a large collection of random clones, w~ have to find a way of identifying which clone(s) carry the gene that we intend to study. This means that we need rapid ~creening of very large numbers of clones. This is done simply by using a nucleic acid probe (DNA or RNA), which will hybridize to the DNA sequence we are looking for in a specific clone. The principle involved is that the library (in the form of bacterial colonies or phage plaques) is replicated onto a filter, which is then treated to isolate the DNA and bind it to the filter, which then. carries a pattern of DNA spots that replicates the position of the colonies or plaques on the original plate. The filter is then hybridized with the probe, which has first been labelled so that 'it can be easily detected. This allows you to detect which DNA spots hybridize to the probe, and isolate the correspondmg clones from the original plate. In this section we will study different alternative strategy including the use of antibodies to screen an expression library. Hybridization Hybridization is based on the difference in stability between the covalent bonds in the nucleic acid backbone of each strand, and the much weaker hydrogen bonds that bind the two strands in the double helix together by base pairing. This arr~gement allows the separation of two strands safely -both in the cell and in the test tube-uncler conditions which are much too light to pose any threat to the covalent bonds. This is referred to as denaturation of DNA, and it is reversible. ~60 The strands will easily join together again and renature because of the complementary of the base pairs. In the test tube, DNA is readily denatured by heating, and the denaturation process is therefore often referred to as melting even when it is accomplished enzymatically (e.g. by DNA polymerase) or chemically (e.g. by NaOH) , During 00 90 70 BO renaturation, the breaking of the strands causes a drastic Temperature (00) change in the physical properties of DNA, such as optical density. The optical density changes dramatically during a Fig. 7.1. Melting (denaturation) of DNA. short temperature interval when the DNA melts and then stabilizes after the strands have separated entirely. The midpoint of this temperature interval is denoted as the melting temperature (Tm).

84

Finding the Right Clone

85

Under physiological conditions, the Tm is usually 85-95°C. There are various other adjusting factors in the laboratory like the salt concentration which decreases the melting temperature down to a more convenient range. The reason that the Tm varies according to the base composition of the DNA is that guaninecytosine base pairs are joined together by three hydrogen bonds, whereas adenosine-thymidine base AG pairs have only two. Base composition could be estimated by measuring the Tm' if a large DNA molecule is taken. Also, if we know the base composition, [QJ~ -1.1 we om calculate the Tm' For shorter sequences, such as the 20-30 base synthetic ~[QJ oligonucleotides that are commonly used as primers, other factors have to be A U taken into account. The bond strength between two bases (expressed as ~G, the energy released on formation of a base pair) depends on the adjacent bases, [K][Q] -0.9 because hydrophobic interactions between adjacent bases (stacking) also affects illJ~ the stability of the pairing. It can be seen that the free energy of base pairing Icl'c3 1 (kcal/mol) for the CG/GC doublet is not the same as for the GC/CG doublet. ~~ -2.0 The negative values shows that energy is released on formation of a base- [§J[fJ paired structure. More the energy released greater is the stability. For short oligonucleotides, calculation of the Tm therefore has to take account of the [Q][Q] -3.4 context of each base in the sequence. Although the normal base pairs (A-T and [Q][§J G-C) are the only forms that are fully compatible with the canonical Watson- Fig. 7.2. Free energy of Crick double helix, pairing of other bases can occur. base pairing. Hydrophobic interaction accompanied with hydrogen bonds occuring in between the bases on opposite strands helps in maintaining the double-stranded structure of DNA. A single-stranded structure, in which the bases are exposed to the aqueous environment, is unstable; and pairing of the bases enables them to be removed from interaction with the surrounding water. In contrast to the hydrogen bonding, hydrophobic interactions are relatively non-specific, Le., nucleic acid strands will tend to stick together even in the absence of specific base-pairing, although the specific interactions enables a stronger association. By the use of chemicals (such as formamide) that reduce the hydrophobic interactions, the specificity of the interaction can be increased. In case of RNA where there is single strand removal of bases from the surrounding water is accomplished by the formation of secondary structures in which th~ nucleic acid folds up on itself to form localized double-stranded regions, induding structures referred to as hairpins or stem-loop structures. A Single-stranded nucleic acid normally contains complex set of such localized secondary structure elements at room temperature, in the absence of denaturing agents. In nucleic acid backbone, another major factor to be considered, is the negative charge on the phosphate groups. Their effect is opposite to that of hydrogen bonds and hydrophobic interactions; the strong negative charge on the DNA strands causes electrostatic repulsion that tends to repel the two strands. In the presence of salt, this effect counter balanced by the presen~e of a cloud of counterions surrounding the molecule, neutralising the negative charge on the phosphate groups. However, if the salt concentration is decreased, any weak interactions between the strands wilt be disrupted by electrostatic repulsion-hence the low salt condition is used to increase the specificity of hybridization. If two similar, but different, double-stranded DNA fragments are mixed, melted, and then left to renature, some of them will reform hybrids with their perfectly complementary halves, but others will have formed hybrids. This would happen, for example, if you were to mix the cDNA molecules encoding the human red- and green-sensitive photopigments, or the actin genes from mouse and rat. In the resulting hybrid double mix, imperfection would be caused if any base pairs do not match. A higher number of mismatches will lead to a less stable hybrid-in other words. one that 'would have a lower melting temperature.

86

Genetic Engineering

IIIIIIIIIIIIIIIIIIIIIIIII! When single strands 1111111111111111111111111 of nucleic acids are hybridized in the laboratory, it D,"aMa proves to be very important. The investigator can choose conditions that would be more or less for~ Mix and anneal ~ giving of partial mismatches, depending 011 the R,.. R,-",m,. issue if they are wanted or not. This is referred to as II!!!!IIIIII!I!!!IIIIIIIII 111111/11 ( ) 1111111111 varying the stringency of 111111 Wi 11111111111 III I the hybridization. This can Hybrid be done by altering the (heteroduplex) temperature. Secondly, at Fig. 7.3. Formation of hybrid DNA between similar but non-identical DNA higher salt concentrations, molecules. hybrids are more stable. At low salt concentrations, the negative charges of the phosphate groups in the backbones cause an electrostatic repulsion between the two strands. At higher salt concentrations, the presence of positive counterions will relieve this repulsion. In the laboratory, salt concentration is usually regulated as multiples of SSC (Standard Saline Citrate; 1 x SSC is defined as O.15M sodium chloride and O.OI5M sodium citrate). Sometimes, formamide is added to the hybridization solution. The melting temperature gets lower down by formamide and is therefore used in situations where hybridization temperatures need to be kept low, such as when carrying out in situ hybridization. In the laboratory, the basic types of hybridization used are filter hybridization, solution hybridization, and in situ hybridization. In all three of them, one nucleic acid fragment, the probe, is labelled in order to detect and locate a complementary DNA strand-the target.

!D,"aw~ !

1

",aI

! !

Labelling Probes A basic feature of nucleic acid hybridization is that the probe is labelled in such a way which makes it possible to detect it. To incorporate a radioactive isotope into the probe molecule is one of the classical labelling methods. Isotopes that are used for this include 32p, 33p, 35S, and 3R. A more energetic isotope, such as 32p, gives a stronger signal, but less good resolution. This makes it useful for experiments where sub-millimetre resolution is irrelevant, such as Southern blot hybridization. At the other end, 3R gives a weaker signal and thus needs a much longer exposure time to be detectable. On the other hand, it provides excellent resolution, and can be used in experiments such as in situ hybridization, where it is important to be able to assign the probe labelling not only to specific cells, but even to specific regions of chromosomes. For detecting radiolabelled probes, the classical way is to place the probe-target hybrid on an ray film. Radioactive particles will expose the region of the film with which they are in contact just like X-rays or visible light would. For detecting binding of radioisotopes, a more modern method is phosphoimaging. This method needs specialized and expensive apparatus but is, on the other hand, faster and the phosphoimaging plates can be reused, in contrast to X-ray films. Many investigators do not use radioactive labelling methods but non-radioactive methods. In terms of safety, detection speed, and cost, these methods are more advantageous over radioactive ones. They are also unaffected by the continuous decay of radiOisotopes, and are therefore more stable. There are various non-radioactive labels that can be used. Nucleotides substituted with biotin or digoxigenin can be incorporated into the probe, and then detected with specific antibodies (or, in the case of biotin, using avidin. which bind very strongly and specifically to biotin). The antibody (or avidin) that is used is itself labelled with an enzyme, such as horseradish peroxidase (HRP) or alkaline phosphatase, so it can be detected using a chromogenic substrate iLe. a substrate that yields a

x-

87

Finding the Right Clone

coloured product when reacted with the enzyme) or a chemiluminescent substrate (where the initial reaction product is unstable and light is emitted as it breaks down). In the latter case, the emitted light will darken the X-ray film just like a radioisotope would. As with radioactive methods, specialized equipment can be used instead of X-ray film. Alternatively, the probe can be labelled directly witL HRP, or a fluorescent label can be incorporated, which allows direct detection of the labelled p"')be. Such probes are especially useful in the technique known as fluorescent in situ hybridizatio'l or FISH. Steps in a Hybridization Experiment Nucleic acid probes have an inclination to bind non-specifically to other materials on the filter, or even to the filter itself. The hybridization solution contains various blocking agents to minimize this non-specific probe binding. This may include detergents, bovinE! serum albUmin, and non-homologous DNA. It is often advantageous to pretreat the filter with the hybridization solution without added probe (prehybridization). If the probe is double-stranded DNA, it has to be heat-denatured by boiling prior to hybridization in order to make it accessible to the target. It is then added to the target in the hybridization solution. Homologous SDNA

Partially similar DNA

Probe

Hybridization

1

u_ 11111111111111111 __ _

---JIIII()mm ---

u

High / stringency wash

_

11111111111111111 ___

Probe washed off

--- --......,.,-----