145 72 9MB
English Pages 289 [271] Year 2021
Methods in Molecular Biology 2392
Chhandak Basu Editor
PCR Primer Design Third Edition
METHODS
IN
MOLECULAR BIOLOGY
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK
For further volumes: http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.
PCR Primer Design Third Edition
Edited by
Chhandak Basu Department of Biology, California State University, Northridge, Los Angeles, CA, USA
Editor Chhandak Basu Department of Biology California State University, Northridge Los Angeles, CA, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1798-4 ISBN 978-1-0716-1799-1 (eBook) https://doi.org/10.1007/978-1-0716-1799-1 © Springer Science+Business Media, LLC, part of Springer Nature 2007, 2015, 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Dedication In loving memory of my mother (Maa) Tapasi Basu
v
Preface The importance of PCR was evident during the COVID-19 outbreak in 2020. Who knew a 1983 discovery by Dr. Kary Mullis (1944–2019) would save countless lives during the pandemic? In 1993, Dr. Mullis was awarded the Nobel Prize in Chemistry for discovering PCR. A confirmed negative “PCR test” (RT PCR, to be precise) became a requirement for international travel during the pandemic. The application of PCR is multitudinous, starting from genotyping, cloning, medical diagnostics, and agriculture, to name a few. The three major ingredients needed for PCR are template DNA, primer, and Taq polymerase. This book focuses explicitly on the design of primers for successful PCR amplification. This book contains a total of 18 chapters, and the chapters are divided into seven parts. Part I (Primer Design for Genotyping) includes seven chapters on PCR-based genotyping and genetic diversity studies. The chapter topics in this part include primer design for analysis of plant species, genotyping in C. elegans, allele-specific amplification, genetic diversity in marine species, detection of rubella, SNP-based detection in animal species, and amplification-refractory mutation system-PCR-based system to detect mutations and polymorphisms. Part II (Primer Design for Genome-Wide Identification of Specific Regions and Nucleic Acids) contains two chapters on primer design to identify specific motifs in nucleic acids, specifically circular DNA and AT-rich regions. Part III is about Primer Design for Multiplex PCR. Multiplex PCR can be used to amplify multiple target DNAs in a single reaction. This part contains two chapters on primer design for multiplex PCRs. The authors describe a step-by-step primer design protocol for multiplex PCR and a protocol for PLASmid TAXonomic PCR (PlasTax-PCR). Part IV (Primer Design for qPCR) contains two chapters that focus on the use of qPCR for identification of gene copy numbers in transgenic plants and use of qPrimerDB software for qPCR primer design. Part V (Primer Design for Identification of Plant and Animal Viruses) contains two chapters. The chapters describe primer design to identify papaya umbra-like virus (plant virus) and SARS CoV-2 genome (animal virus). Part VI (Use of Software for Primer Design) contains a single chapter describing the use of FASTPCR software for primer design. Part VII (Primer Design for Newer PCR Approaches) includes PCR primer design for more recent technologies, including phosphate-methylated oligonucleotides as primers and pyrosequencing primer design for forensic biology applications. This book will be useful for researchers and students in various fields of molecular biology, including biotechnology, molecular genetics, and recombinant DNA technology. This book is an international collaborative effort by researchers from Oman, Belgium, India, Finland, Kazakhstan, Austria, China, Spain, Poland, Mexico, Iran, Saudi Arabia, Estonia, Sweden, Australia, and the USA. I sincerely thank all authors for their contributions. Los Angeles, CA, USA
Chhandak Basu
vii
Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
v vii xi
PRIMER DESIGN FOR GENOTYPING
1 The Significance of PCR Primer Design in Genetic Diversity Studies: Exemplified by Recent Research into the Genetic Structure of Marine Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Madjid Delghandi, Marit Pedersen Delghandi, and Stephen Goddard 2 Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant Genotyping in C. elegans. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Amita Pandey, Binu Bhat, Madan L. Aggarwal, and Girdhar K. Pandey 3 Design of Oligonucleotides for Allele-Specific Amplification Based on PCR and Isothermal Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis Antonio Tortajada-Genaro 4 Detection of Rubella Virus by Tri-Primer RT-PCR Assay and Genotyping by Fragment RT-PCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Suji George 5 Design of Mismatch Primers to Identify and Differentiate Closely Related (Sub)Species: Application to the Authentication of Meat Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Kaltenbrunner, Rupert Hochegger, and Margit Cichna-Markl 6 Primer Design for the Analysis of Closely Related Species: Application of Noncoding mtDNA and cpDNA Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lidia Skuza 7 Designing PCR Primers for the Amplification-Refractory Mutation System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Majid Komijani, Khashayar Shahin, Esam Ibraheem Azhar, and Mohammad Bahram
3
17
35
53
65
83
93
PART II PRIMER DESIGN FOR GENOME-WIDE IDENTIFICATION OF SPECIFIC REGIONS 8 Validation of Circular RNAs by PCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Aniruddha Das, Debojyoti Das, and Amaresh C. Panda 9 Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Pinky Dhatterwal, Sandhya Mehrotra, and Rajesh Mehrotra
ix
x
Contents
PART III 10
11
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing to Assort Plasmids into Taxonomic Units . . . . . . . . . . . . . . . . . . . . . 127 Raquel Cuartas, Teresa M. Coque, Fernando de la Cruz, and M. Pilar Garcilla´n-Barcia Multiplex PCR Design for Scalable Resequencing. . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Darren Korbie and Matt Trau
PART IV 12
13
15
18
USE OF SOFTWARE FOR PRIMER DESIGN
A Guide to Using FASTPCR Software for PCR, In Silico PCR, and Oligonucleotide Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Ruslan Kalendar
PART VII 17
PRIMER DESIGN FOR IDENTIFICATION OF PLANT AND ANIMAL VIRUSES
PCR Primer Design for the Rapidly Evolving SARS-CoV-2 Genome. . . . . . . . . . 185 Wubin Qu, Jiangyu Li, Haoyang Cai, and Dongsheng Zhao Universal Primers for Detection of Novel Plant Capsid-Less Viruses: Papaya Umbra-like Viruses as Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
PART VI 16
PRIMER DESIGN FOR QPCR
Identification of Gene Copy Number in the Transgenic Plants by Quantitative Polymerase Chain Reaction (qPCR) . . . . . . . . . . . . . . . . . . . . . . . . 161 Poonam Kanwar, Soma Ghosh, Sibaji K. Sanyal, and Girdhar K. Pandey qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 Wei Chang, Yue Niu, Mengna Yu, Tian Li, Jiana Li, and Kun Lu
PART V 14
PRIMER DESIGN FOR MULTIPLEX PCR. MULTIPLEX
PRIMER DESIGN FOR NEWER PCR APPROACHES
Pyrosequencing Primer Design for Forensic Biology Applications. . . . . . . . . . . . . 247 Kelly M. Elkins Phosphate-Methylated Oligonucleotides as a Novel Primer for PCR and RT-PCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 Yu-Hsuan Chang, Meng-Wei Wu, Yi-Ju Chen, Cao-An Vu, Ching-Ya Hong, and Wen-Yih Chen
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
273
Contributors MADAN L. AGGARWAL • Analytical Science Division-Bio, Molecular Biology Laboratory, Shriram Institute for Industrial Research, Delhi, India ESAM IBRAHEEM AZHAR • Department of Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia MOHAMMAD BAHRAM • Department of Ecology, Swedish University of Agricultural Sciences, Uppsala, Sweden; Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia BINU BHAT • Analytical Science Division-Bio, Molecular Biology Laboratory, Shriram Institute for Industrial Research, Delhi, India HAOYANG CAI • Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resources and Eco-Environment, College of Life Sciences, Sichuan University, Chengdu, China WEI CHANG • College of Agronomy and Biotechnology, Southwest University, Chongqing, China YU-HSUAN CHANG • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan WEN-YIH CHEN • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan YI-JU CHEN • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan MARGIT CICHNA-MARKL • Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria TERESA M. COQUE • Department of Microbiology, Ramon y Cajal University Hospital, Ramon y Cajal Health Research Institute (IRYCIS), Madrid, Spain RAQUEL CUARTAS • Instituto de Biomedicina y Biotecnologı´a de Cantabria (Consejo Superior de Investigaciones Cientı´ficas CSIC-Universidad de Cantabria), Santander, Spain ANIRUDDHA DAS • Institute of Life Sciences, Bhubaneswar, Odisha, India; School of Biotechnology, KIIT University, Bhubaneswar, Odisha, India DEBOJYOTI DAS • Institute of Life Sciences, Bhubaneswar, Odisha, India; School of Biotechnology, KIIT University, Bhubaneswar, Odisha, India FERNANDO DE LA CRUZ • Instituto de Biomedicina y Biotecnologı´a de Cantabria (Consejo Superior de Investigaciones Cientı´ficas CSIC-Universidad de Cantabria), Santander, Spain MADJID DELGHANDI • Centre of Excellence in Marine Biotechnology, Sultan Qaboos University, Al-Khoud, Sultanate of Oman MARIT PEDERSEN DELGHANDI • College of Medicine & Health Sciences, Sultan Qaboos University, Al-Khoud, Sultanate of Oman PINKY DHATTERWAL • Department of Biological Sciences, Birla Institute of Technology & Science Pilani, Sancoale, Goa, India KELLY M. ELKINS • TU Human Remains Identification Laboratory (THRIL), Chemistry Department, Forensic Science Program, Towson University, Towson, MD, USA M. PILAR GARCILLA´N-BARCIA • Instituto de Biomedicina y Biotecnologı´a de Cantabria (Consejo Superior de Investigaciones Cientı´ficas CSIC-Universidad de Cantabria), Santander, Spain
xi
xii
Contributors
SUJI GEORGE • Diagnostic Virology Group, National Institute of Virology, Pune, Maharashtra, India SOMA GHOSH • Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India STEPHEN GODDARD • Water Farmers, Hatrival, Belgium RUPERT HOCHEGGER • Department of Molecular Biology and Microbiology, Austrian Agency for Health and Food Safety, Institute for Food Safety Vienna, Vienna, Austria CHING-YA HONG • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan RUSLAN KALENDAR • PrimerDigital Ltd, Biocentre 3, Helsinki, Finland; National Laboratory Astana, Nazarbayev University, Nur-Sultan, Kazakhstan MARIA KALTENBRUNNER • Department of Molecular Biology and Microbiology, Austrian Agency for Health and Food Safety, Institute for Food Safety Vienna, Vienna, Austria; Department of Analytical Chemistry, Faculty of Chemistry, University of Vienna, Vienna, Austria POONAM KANWAR • Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India MAJID KOMIJANI • Department of Biology, Faculty of Science, Arak University, Arak, Iran DARREN KORBIE • Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia JIANA LI • College of Agronomy and Biotechnology, Southwest University, Chongqing, China; Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China; Academy of Agricultural Sciences, Southwest University, Chongqing, China JIANGYU LI • Information Center, Academy of Military Medical Science, Beijing, China TIAN LI • State Key Laboratory of Silkworm Genome Biology, Southwest University, Chongqing, China; Chongqing Key Laboratory of Microsporidia Infection and Control, Southwest University, Chongqing, China LUISA A. LOPEZ-OCHOA • Plant Biochemistry and Molecular Biology Unit, Yucatan Center for Scientific Research, A.C. (CICY), Merida, Yucatan, Mexico KUN LU • College of Agronomy and Biotechnology, Southwest University, Chongqing, China; Engineering Research Center of South Upland Agriculture, Ministry of Education, Chongqing, China; Academy of Agricultural Sciences, Southwest University, Chongqing, China RAJESH MEHROTRA • Department of Biological Sciences, Birla Institute of Technology & Science Pilani, Sancoale, Goa, India SANDHYA MEHROTRA • Department of Biological Sciences, Birla Institute of Technology & Science Pilani, Sancoale, Goa, India YUE NIU • College of Agronomy and Biotechnology, Southwest University, Chongqing, China AMARESH C. PANDA • Institute of Life Sciences, Bhubaneswar, Odisha, India AMITA PANDEY • Analytical Science Division-Bio, Molecular Biology Laboratory, Shriram Institute for Industrial Research, Delhi, India; Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India GIRDHAR K. PANDEY • Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India WUBIN QU • Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resources and Eco-Environment, College of Life Sciences, Sichuan University, Chengdu, China; iGeneTech Bioscience Co., Ltd, Beijing, China
Contributors
xiii
JORGE H. RAMIREZ-PRADO • Biotechnology Unit, Yucatan Center for Scientific Research, A. C. (CICY), Merida, Yucatan, Me´xico SIBAJI K. SANYAL • Department of Plant Molecular Biology, University of Delhi South Campus, New Delhi, India KHASHAYAR SHAHIN • State Key Laboratory Cultivation Base of MOST, Institute of Food Safety and Nutrition, Jiangsu Academy of Agricultural Sciences, Nanjing, People’s Republic of China LIDIA SKUZA • Institute of Biology, University of Szczecin, Szczecin, Poland; The Centre for Molecular Biology and Biotechnology, University of Szczecin, Szczecin, Poland LUIS ANTONIO TORTAJADA-GENARO • Instituto Interuniversitario de Investigacion de Reconocimiento Molecular y Desarrollo Tecnologico (IDM), Universitat Polite`cnica de Vale`ncia, Universitat de Vale`ncia, Valencia, Spain; Departamento de Quı´mica, Universitat Polite`cnica de Vale`ncia, Valencia, Spain; Unidad Mixta UPV-La Fe, Nanomedicine and Sensors, Valencia, Spain MATT TRAU • Centre for Personalised Nanomedicine, Australian Institute for Bioengineering and Nanotechnology, University of Queensland, Brisbane, QLD, Australia; School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia CAO-AN VU • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan MENG-WEI WU • Department of Chemical and Materials Engineering, National Central University, Jhongli, Taiwan MENGNA YU • College of Agronomy and Biotechnology, Southwest University, Chongqing, China DONGSHENG ZHAO • Information Center, Academy of Military Medical Science, Beijing, China
Part I Primer Design for Genotyping
Chapter 1 The Significance of PCR Primer Design in Genetic Diversity Studies: Exemplified by Recent Research into the Genetic Structure of Marine Species Madjid Delghandi, Marit Pedersen Delghandi, and Stephen Goddard Abstract Genetic markers are widely applied in the study of genetic diversity for many species. The approach incorporates a Polymerase Chain Reaction (PCR) amplification of targeted sequences in the genome. Crucial for the overall success of a PCR experiment is the careful design of synthetic oligonucleotide primers. Ideally designed primer pairs will ensure the efficiency and specificity of the amplification reaction, resulting in a high yield of the desired amplicon. Important criteria such as primer-sequence, length, and -melting temperature (Tm) are fundamental for the selection of primers and amplification of targeted nucleotide sequences from a DNA template. There are many computational tools available to assist with critical bioinformatics issues related to primer design. These resources allow the user to define parameters and criteria that need to be taken into account when designing primers. Following the initial in silico selection, a primer pair should be further tested in vivo for their amplification efficiency and robustness. Using examples taken from genetic diversity studies in a marine crustacean, this chapter provides outlines for the application of PCR technology and discusses details for the design of primers for the development and characterization of microsatellite and SNP-markers. Key words Population genetics studies, Marine species, Molecular tools, Genetic markers, Microsatellite markers, SNP markers, Polymerase Chain Reaction, Oligonucleotide design
1
Introduction Advances in the biotechnology and utilization of molecular markers facilitate the discovery of genetic variation among individuals, species, and higher order taxonomic groups. Generally, a marker of choice should have the capability of addressing the research issues while its genotyping procedure remains as simple and as low cost as possible. The most powerful and commonly used markers in the field of population genetics are microsatellite and single-nucleotide polymorphisms (SNPs) [1, 2].
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_1, © Springer Science+Business Media, LLC, part of Springer Nature 2022
3
4
Madjid Delghandi et al.
Fig. 1 A brief illustration of the classical approach for the development and characterization of microsatellite and SNP markers
Microsatellites are characterized by the high number of alleles present at a single locus resulting in high heterozygosity and ease of genotyping by PCR [3–5]. The multiallelic microsatellites exhibit more recent divergence due to their mutational rates of (1 105) and are well suited to generate linkage maps in marine species [6], to prognosticate human prostate cancer [7], monogenic diseases [8], and neurological disorders [9, 10]. SNPs present biallelic markers, and are more reproducible and powerful for assessing gene flow, differentiation, admixture, migration, and connectivity in marine species [11–13]. They have also been utilized in a wide range of human disease prognostics such as lung [14] or pancreatic cancer [15] and vitamin D deficiency [16, 17]. For many species, the necessary genetic resources are lacking or have not been properly investigated. Hence, access to a wide range of functional marker types is fundamental for genomics-related studies (Fig. 1). Classically, microsatellite markers are developed de novo. The approach involves the construction of a genomic library enriched for repeated motifs, isolation, and sequencing of microsatellitecontaining clones, primer design, optimization of PCR amplification for each primer pair, and a test of polymorphism on a few unrelated individuals [18–20]. Presently, SNP markers are possible to detect using highthroughput genotyping technologies, i.e., Next generation (e.g., Illumina HiSeq and MiSeq platforms) or third-generation sequencing (e.g., PacBio and Nanopore technology) [1, 3, 21, 22]. These approaches have been successfully used to detect SNPs for various marine species [11–13, 23].
PCR Primers and Genetic Markers
5
The essential component for the approaches highlighted above is the PCR technique. Clearly, any successful PCR-based assay relies mainly on the careful design of synthetic oligonucleotide primers. To ensure the specificity and efficiency of the amplification, it is important to consider carefully key factors including primersequence, length, and -melting temperature (Tm). Primers with poor specificity tend to cause a mismatch between primers and target genetic material, resulting in imbalanced PCR chemistry, impaired detection, and production of false-negative results. Designing optimal primers is even more critical for the overall success of a multiplex PCR experiment, where a set of markers are used to amplify several segments of target DNA simultaneously. Multiplexing of PCR amplification requires that several primer pairs are included in the reaction mix. Incorrect design of primers will lead to mispriming of the target DNA or the formation of primer dimers and cause nonspecific amplification. Consequently, designing primers of adequate lengths and sequences will ensure an optimal annealing temperature and provide optimal conditions for a specific and efficient multiplex amplification. This chapter describes the most frequent and widely used methods for the isolation and characterization of microsatellite- and SNP-markers. It also provides details for primer design, PCR amplification, and multiplexing of these markers and their utilization in comprehensive functional analysis.
2
Research Tasks Marine environments are some of the most important and ecologically diverse biological systems on the planet. The current boom in intensive human intervention and exploitation of natural marine resources have a negative impact on marine biodiversity and ecosystems. To counter the impact of these interventions, an understanding of the broad scale ecological state and distribution of biodiversity is necessary. Coupling of established biodiversity assessment with advanced molecular genetic techniques can describe and monitor the marine biodiversity. Molecular markers that reveal variations in the DNA are the key tools in such studies. The selection and application of proper molecular markers depend on their power to address the task of study purposes. Microsatelliteand SNP-markers present powerful markers for large-scale biodiversity monitoring. While utilization of each marker alone might be limited in its effectiveness, the application of both markers combined, greatly increases the resolution in genetic diversity in marine species, remarkably. Described below are details for primer design and application of microsatellite- and SNP- markers in research tasks related to marine population and biodiversity studies [6, 11, 24].
6
3
Madjid Delghandi et al.
Sample Collection and DNA Extraction Samples were obtained from a single walking leg of wild-caught scalloped spiny lobster (Panulirus homarus) adults and egg-carrying females or juveniles. They were preserved immediately in DMSO-salt preservative solution [25] until DNA extraction. Genomic DNA (gDNA) was extracted later using a modified CTAB (cetyltrimethylammonium bromide) protocol [26].
4
Microsatellite Markers
4.1 Isolation, Primer Design, and Amplification
Details and procedures for the development of microsatellite markers using enriched microsatellite library techniques [27], available expressed sequence tags (EST) [28], or 454 whole-genome sequencing [20] (Fig. 1), are reported earlier. These techniques require no prior genomic knowledge and facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Further, all these approaches are based on the extraction of DNA from a particular organism, followed by the amplification of particular segments of DNA. Below are details for primer design for amplification and multiplexing of microsatellite markers presented. Further, outlines are highlighted for the successful application of these markers in the paternity testing and genetic diversity study of P. homarus. Genomic sequences from P. ornatus were obtained using a Roche 454 whole-genome sequence run and cross-species primer amplification strategy was used to isolate assayable microsatellite markers for closely related P. homarus. The sequence database was mined for perfect di-, tri-, and tetra-nucleotide microsatellite repeats using iQDD [29] and MSATcommander version 0.8.2 [30], which incorporate Primer3 software for PCR primer design (parameters: product length 150–400 bps; annealing temp 50–63 C; GC content 20–80%). From this data mining, 370 independent sequence regions with microsatellite repeats with possible primers were identified. Potential loci with primers were subsequently filtered, based on the distance of primers from the beginning and end of a sequence (>10 bps), distance between primers and motif repeat (>10 bps), and PCR product length (from 75 to 400 bps). The quality of perfect di-, tri-, and tetra-nucleotide microsatellite repeats was validated by PCR amplification of 96 P. homarus gDNA samples [4]. Forty-six polymorphic microsatellites were reliably amplified across all DNA samples and were coamplified in 14 multiplexes. For fluorescent detection and multiplexing, the forward primers were dye labeled (NED, VIC or 6-FAM) (Life Science Technologies). A PIG-tail sequence 50 -GTTTCTT [31]
PCR Primers and Genetic Markers
7
was included in the reverse primers. The PCR conditions for all multiplexes were optimized and carried out in a “Veriti Thermal Cycler” (Applied Biosystems Inc.; AB). The amplifications were applied in 12.5μl reaction volume containing 10–50 ng template, 6.25μl 1 Type-it Multiplex PCR Master Mix (Qiagen), and adjusted primer concentrations to yield consistent and relatively even fluorescence among loci [24]. Genotyping was performed using a 3130 Genetic Analyzer (AB). Data were collected automatically and sized with GeneMapper v4.0 software (AB) using the GeneScan-500-LIZ size standard (AB). Paternity Testing
The successful application of microsatellite markers for successful parentage assignment in P. homarus has been reported earlier [24]. Here, two multiplex PCR protocols including seven microsatellites were developed (Fig. 2) for the study of maternal assignment of 24 larvae hatched from ten potential female spiny lobsters after a mass spawning in a common tank. Exclusion-based parentage analysis unambiguously assigned 83% of fry (20 of 24) to a single female parent. Of ten putative female parents, five have contributed to the 20 allocated offspring, with one being the true parent of 11. This highlighted study demonstrates the usefulness of microsatellite markers for parentage analysis and their possible potential for application in a wide range of studies investigating population structure related to conservation and management planning, genetic diversity, and evolutionary relationships between Panulirus species.
4.3 Genetic Diversity Studies
Microsatellite markers have been applied successfully in a wide range of genomic studies of marine species [32–34]. An example of the utilization of microsatellite markers for the study of tropical spiny lobster population is given below. The study contains integrated microsatellite markers and a comprehensive sampling strategy for the assessment of the genetic structure of P. homarus along the coastline in Oman. To assess the level of genetic differentiation between individuals, Discriminant Analysis of Principal Components (DAPC) was used. DAPC was carried out using the optimum number of principal components (PCs) calculated with the α-score function in adegenet [35]. To assess both broad and fine-scale population structure, a network analysis with no prior population assumptions was performed using NetView R [36]. NetView was run through the R implementation of NetView P [37] at a mutual k-nearest neighbor k-NN range from 10 to 40 as determined by a k-NN selection plot. To visualize the extent of relatedness between individuals within each population and divergence among populations, a Neighbor-Joining (NJ) tree was constructed in MEGA6 [38]. The NJ tree was constructed using 1-proportion of shared alleles (1-psa) genetic distance matrix calculated in the R package adegenet using propShared function [39]. The findings
4.2
Fig. 2 Electropherogram of genotypes from seven microsatellite markers (A to G) coamplified in two multiplexes (M1 & M2) for parentage assignment in P. homarus. Alleles were typed simultaneously using DNA extract from one individual spiny lobster. The individual was heterozygous for all loci, except for the loci E. The allele sizes in basepairs were calculated automatically using commercial internal allelic ladder GeneScan-500 LIZ (AB) in a 3130xl Genetic analyzer. *Peaks from allelic ladder; A to G corresponds to seven microsatellites and digits to the two alleles of each markers
8 Madjid Delghandi et al.
PCR Primers and Genetic Markers
9
Fig. 3 Population structure of P. homarus across the sampled locations in Oman. Unrooted neighbor-joining tree (a) was drawn in MEGA6 using 1-psa genetic distances. Population network (b) was constructed in the R package Netview using 46 microsatellite markers
have a potential impact on fisheries management and aquaculture of the species. Genomic DNA from 220 P. homarus individuals were genotyped using 46 microsatellite loci (14 multiplexes) as described above. PCR amplification and multiplexing conditions were the same as outlined in Subheading 4.2. The results from this study indicated the presence of two major stocks of scalloped spiny lobster in Oman (Fig. 3). Further, the findings deliver support for regional fishery management measures and contribute to sustainable fishery management and protection of spiny lobster stock in Oman.
5
SNP Markers High-throughput DNA sequencing technologies are widely applied for the discovery and analysis of SNP markers (Fig. 1). Details and procedures for the identification of SNP makers from EST sequences [23] and Diversity array technology [11] are reported earlier. These approaches include the preparation of genomic libraries of pooled genomic DNA extracts from a species of interest. The genomic library construction is a multistep process including specific primer design for sequencing and PCR amplification. The following section provides an example of SNP discovery utilizing the existing EST library for the Atlantic cod (Gadus morhua) and highlights the successful application of Diversity array technology for genetic diversity study of P. homarus.
10
Madjid Delghandi et al.
Fig. 4 Overall allele frequency distribution of 318 isolated SNPs for Atlantic cod, ranked in descending order [23] 5.1 SNP Discovery from EST Sequences
The methodological details for SNP marker discovery from EST-sequences of existing cDNA libraries are described earlier. The work describes the successful discovery of 318 SNPs from 17,056 EST sequences originated from cDNA libraries of G. morhua (Fig. 4). Genotyping of SNPs was performed using the MassARRAY system from Sequenom (San Diego, USA). PCR-primers and extension-primers were designed using the software SpectroDESIGNER v3.0 (Sequenom). All SNP genotyping was performed according to the iPLEX protocol from Sequenom [40]. For allele separation, the Sequenom MassARRAY™ Analyzer (Autoflex mass spectrometer) was used. Genotypes were assigned in real-time [41] by the MassARRAY SpectroTYPER RT v3.4 software (Sequenom) based on the mass peaks present. All results were manually inspected, using the MassARRAY TyperAnalyzer v3.3 software (Sequenom). SNPs were classified as “failed assays” (meaning that the majority of genotypes could not be scored and/or the samples did not cluster well according to genotype), “SNPs w/all animals heterozygous”, “SNPs w/all animals homozygous”, or “polymorphic SNPs“, based on this manual inspection. SNPs that were out of HWE in one or several populations were double-checked. The SNP markers reported in this work proven to be powerful tools for genetics work and studies on the population structure of Atlantic cod [6].
PCR Primers and Genetic Markers
11
5.2 SNP Discovery and Genotyping by Diversity Array Technology
Genomic DNA extracts were standardized to 50 ng/μl and sent for sequencing and genotyping using DArTseq™ technology, with Diversity Arrays Technology, Canberra, Australia [42, 43]. Library preparation was completed as described [43, 44] with all P. homarus DNA samples being digested by a combination of PtsI and HpaII restriction enzymes. Multiplexed reduced representation libraries were then sequenced on the Illumina HiSeq2500 platform for 77 cycles. To determine SNPs and genotype for each individual, raw Illumina HiSeq2500 data were first demultiplexed into individual samples based on sample-specific barcode sequences. Demultiplexed samples were then assessed for overall sequence quality, with any fragments with an average Q-score of 95% similarity were identified using CD-HIT and collapsed into a single cluster or removed [46]. Further, SNPs with a call rate 95% were also removed. Additionally, individuals and SNPs with >20% missing data and SNPs with a Minor Allele Frequency (MAF) < 0.02 were excluded using Plink v1.07 [47]. To investigate the effect of sequencing depth, Fis and Ho were calculated for each population at different reads depth (Average SNP Counts) thresholds (3, 5, 7, and 10) to discover the degree of potential bias caused by lower call depths. Accordingly, four subsets of SNPs were generated at these sequencing depths. To detect potential genotyping artifacts, SNPs were tested for significant deviation from Hardy-Weinberg equilibrium (HWE) using Arlequin v.3.5.2.2 [48]. Any SNP loci which significantly deviated from HWE were excluded following Bonferroni correction (P < 0.000004). To assess the impact of deviation from HWE, Fis and Ho were calculated before and after the removal of significantly deviated SNPs.
5.3 Genetic Diversity Studies
Spiny lobsters are among the world’s most valuable and highly priced seafood, captured and marketed in over 90 countries. Recent assessment studies show that the local spiny lobster stock is heavily exploited. For the successful fishery management of this species, it is vital to understand the population genetic structure and to delineate the boundaries of unique genetic stocks. The results of a study utilizing genome-wide single-nucleotide polymorphisms to
12
Madjid Delghandi et al.
study the genetic structure of scalloped spiny lobsters from Oman are summarized below. The findings demonstrate the successful application of genome-wide single-nucleotide polymorphism markers to shed light on the genetic structure of P. homarus populations along the Omani coast. To assess population structuring and genetic differentiation of P. homarus populations along the Omani coastline, 180 pleopod samples were collected from nine sites. A reduced-representation sequencing GBS approach was used for SNP discovery and 3095 highly informative markers were selected for analysis following stringent filtering (Subheading 5.2). The markers were then used to assess population differentiation and genetic structure among collected samples. The extent of pairwise population differentiation was evaluated using Weir and Cockerham’s unbiased F-statistics [49] through Genetix v.4.05.2 [50]. To assess hierarchical levels of population structuring, an analysis of molecular variance (AMOVA) using Arlequin v.3.5.2.2 [48] was calculated. In addition, the function find.clusters in the R package adegenet [39] was used to determine the optimal number of clusters with the Bayesian Information Criterion (BIC) method. To assess levels of differentiation between the obtained genetic clusters, DAPC was performed using the optimum number of PCs calculated using the α-score function in adegenet [35]. Finally, a network analysis with no prior population assumptions was performed to assess both broad and fine-scale population structure using the R package NetView [36], an R implementation of NetView P [37]. NetView was run at a k-NN range from 25 to 65 as determined by a k-NN selection plot. The findings revealed five clearly distinguished genetic clusters of P. homarus (Fig. 5) along the coastline of Oman, suggesting spatially customized management strategies for the species along the coastline of Oman.
6
Notes 1. An extensive literature search should be performed to ensure the choice of the most suitable genetic markers for a specific research task. 2. Prior to the selection of suitable genetic markers, it is beneficial to look for an existing genomic library for the species of interest, as it will save time and resources. 3. For the development of microsatellites, a cross-species primer amplification strategy should be considered. 4. High-throughput sequencing is the method of choice for large-scale genetic marker development.
PCR Primers and Genetic Markers
13
Fig. 5 Population structure of P. homarus along the sampled locations in Oman.Discriminant Analysis of Principal Components (DAPC) scatter plot using 3095 highly informative SNP markers in 180 P. homarus individuals, in the R package adegenet. Dots represent individuals
5. General recommendations and guidelines should be carefully applied for the design of an optimal primer. 6. Primer3 software is a convenient and user-friendly tool for primer design. 7. A reference sequence of proximally 500 bp will facilitate the design of more specific and efficient primers. 8. Ideally designed primers are located >10 bps from the beginning and end of the target sequence and produce a PCR product of 75 to 400 bps. 9. A primer size of 21–30 bp, annealing temperature 50–63 C, and GC content 20–80% are favorable features for any primer. 10. Primer-length, primer-sequence similarity, primer-annealing temperature, and the PCR product size should be taken into account before combining primers for multiplexing purposes. 11. Inclusion of a PIG-tail sequence at the reverse primer end (50 -GTTTCTT) will significantly reduce undesirable PCR products.
Acknowledgments This work was supported by the Research Council of Oman (TRC) (ORG/SQU/EBR/13/027) and by the Deanship of Research at the Sultan Qaboos University (IG/DVC/CEMB/14/01). We acknowledge all researchers, students, and members of the research group for their contributions to these projects.
14
Madjid Delghandi et al.
References 1. Seeb LW, Templin WD, Sato S et al (2011) Single-nucleotide polymorphisms across a species’ range: Implications for conservation studies of Pacific salmon. Mol Ecol Resour 11:195–217 2. Ward RD (2000) Genetics in fisheries management. Hydrobiologia 420:191–201 3. Vignal A, Milan D, SanCristobal M et al (2002) A review on SNP and other types of molecular markers and their use in animal genetics. Genet Sel Evol 34:275–305 4. Delghandi M, Afzal H, Al Hinai MSN et al (2016) Novel polymorphic microsatellite markers for Panulirus ornatus and their crossspecies primer amplification in Panulirus homarus. Anim Biotechnol 27:310–314 5. Delghandi M, Goddard S, Jerry DR et al (2016) Novel genomic microsatellite markers for genetic population and diversity studies of tropical scalloped spiny lobster (Panulirus homarus) and their potential application in related Panulirus species. Genet Mol Res 15 6. Moen T, Delghandi M, Wesmajervi MS et al (2009) A SNP/microsatellite genetic linkage map of the Atlantic cod (Gadus morhua). Anim Genet 40:993–996 7. Moya L, Lai J, Hoffman A et al (2018) Association analysis of a microsatellite repeat in the TRIB1 gene with prostate cancer risk, aggressiveness and survival. Front Genet 9:428 8. Willems T, Gymrek M, Highnam G et al (2014) The landscape of human STR variation. Genome Res 24:1894–1904 9. Ishiura H, Doi K, Mitsui J et al (2018) Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet 50:581–590 10. Nair RR, Tibbit C, Thompson D et al (2020) Sizing, stabilising, and cloning repeatexpansions for gene targeting constructs. Methods 191:15–22 11. Al-Breiki RD, Kjeldsen SR, Afzal H et al (2018) Genome-wide SNP analyses reveal high gene flow and signatures of local adaptation among the scalloped spiny lobster (Panulirus homarus) along the Omani coastline. BMC Genomics 19(1):690 12. Kjeldsen SR, Zenger KR, Leigh K et al (2016) Genome-wide SNP loci reveal novel insights into koala (Phascolarctos cinereus) population variability across its range. Conserv Genet 17:337–353 13. Lal MM, Southgate PC, Jerry DR et al (2017) Swept away: ocean currents and seascape features influence genetic structure across the
18,000 km indo-Pacific distribution of a marine invertebrate, the black-lip pearl oyster Pinctada margaritifera. BMC Genomics 18:66 14. Aiello MM, Solinas C, Santoni M et al (2020) Excision Repair Cross Complementation Group 1 Single-Nucleotide Polymorphisms and Nivolumab in Advanced Non-Small Cell Lung Cancer. Front Oncol 10:1161 15. Gallerano D, Ciminati S, Grimaldi A et al (2020) Genetically driven CD39 expression shapes human tumor-infiltrating CD8+ T-cell functions. Int J Cancer 147:2597–2610 16. Sepulveda-Villegas M, Elizondo-Montemayor LTrevino V (2020) Identification and analysis of 35 genes associated with vitamin D deficiency: a systematic review to identify genetic variants. J Steroid Biochem Mol Biol 196:105516 17. Raafat Rowida I, Eshra KA, El-Sharaby RM et al (2020) Apa1 (rs7975232) SNP in the vitamin D receptor is linked to hepatocellular carcinoma in hepatitis C virus cirrhosis. Br J Biomed Sci 77:53–57 18. Wesmajervi MS, Tafese T, Stenvik J et al (2007) Eight new microsatellite markers in Atlantic cod (Gadus morhua L.) derived from an enriched genomic library. Mol Ecol Notes 7:138–140 19. Delghandi M, Wesmajervi MS, Mennen S et al (2008) Development of twenty sequencetagged microsatellites for the Atlantic cod (Gadus morhua L.). Conserv Genet 9:1395–1398 20. Dao HT, Todd EV, Jerry DR (2013) Characterization of polymorphic microsatellite loci for the spiny lobster Panulirus spp. and their utility to be applied to other Panulirus lobsters. Conserv Genet Resour 5:43–46 21. Coates BS, Sumerford DV, Miller NJ et al (2009) Comparative performance of single-nucleotide polymorphism and microsatellite markers for population genetic analysis. J Hered 100:556–564 22. Morse P, Kjeldsen SR, Meekan MG et al (2018) Genome-wide comparisons reveal a clinal species pattern within a holobenthic octopod-the Australian southern blue-ringed octopus, Hapalochlaena maculosa (Cephalopoda: Octopodidae). Ecol Evol 8:2253–2267 23. Moen T, Hayes B, Nilsen F et al (2008) Identification and characterisation of novel SNP markers in Atlantic cod: evidence for directional selection. BMC Genet 9:18 24. Delghandi M, Saif Nasser Al Hinai M, Afzal H et al (2017) Parentage analysis of tropical spiny
PCR Primers and Genetic Markers lobster (Panulirus homarus) by microsatellite markers. Aquac Res 48:4718–4724 25. Dawson MN, Raskoff KA, Jacobs DK (1998) Field preservation of marine invertebrate tissue for DNA analyses. Mol Mar Biol Biotechnol 7:145–152 26. Sambrook J, Russell DW (2001) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, New York 27. Westgaard JI, Tafese T, Wesmajervi MS et al (2007) Identification and characterisation of thirteen new microsatellites for Atlantic cod (Gadus morhua L.) from a repeat-enriched library. Conserv Genet 8:749–751 28. Delghandi M, Wesmajervi MS, Mennen S et al (2009) New polymorphic di-nucleotide microsatellite markers for Atlantic cod (Gadus morhua L.). Conserv Genet 10:1037–1040 29. Meglecz E, Costedoat C, Dubut V et al (2010) QDD: a user-friendly program to select microsatellite markers and design primers from large sequencing projects. Bioinformatics 26:403–404 30. Faircloth BC (2008) MSATCOMMANDER: detection of microsatellite repeat arrays and automated, locus-specific primer design. Mol Ecol Resour 8:92–94 31. Brownstein MJ, Carpten JD, Smith JR (1996) Modulation of non-templated nucleotide addition by taq DNA polymerase: primer modifications that facilitate genotyping. BioTechniques 20:1004–1006 32. Wesselmann M, Gonzalez-Wanguemert M, Serrao EA et al (2018) Genetic and oceanographic tools reveal high population connectivity and diversity in the endangered pen shell Pinna nobilis. Sci Rep 8:4770 33. Dao HT, Smith-Keune C, Wolanski E et al (2015) Oceanographic currents and local ecological knowledge indicate, and genetics does not refute, a contemporary pattern of larval dispersal for the ornate spiny lobster, Panulirus ornatus in the South-East Asian Archipelago. PLoS One 10:e0124568 34. Kennington WJ, Cadee SA, Berry O et al (2013) Maintenance of genetic variation and panmixia in the commercially exploited western rock lobster (Panulirus cygnus). Conserv Genet 14:115–124 35. Jombart T, Devillard SBalloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11:94 36. Neuditschko M, Khatkar MS, Raadsma HW (2012) NETVIEW: a high-definition network-visualization approach to detect finescale population structures from genome-wide patterns of variation. PLoS One 7:e48375
15
37. Steinig EJ, Neuditschko M, Khatkar MS et al (2016) NETVIEW P: a network visualization tool to unravel complex population structure using genome-wide SNPs. Mol Ecol Resour 16:216–227 38. Tamura K, Stecher G, Peterson D et al (2013) MEGA6: molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 30:2725–2729 39. Jombart T (2008) Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics 24:1403–1405 40. http://www.sequenom.com 41. Tang K, Fu DJ, Julien D et al (1999) Chipbased genotyping by mass spectrometry. Proc Natl Acad Sci U S A 96:10016–10020 42. Jaccoud D, Peng K, Feinstein D et al (2001) Diversity arrays: a solid state technology for sequence information independent genotyping. Nucleic Acids Res 29:E25 43. Kilian A, Wenzl P, Huttner E et al (2012) Diversity arrays technology: a generic genome profiling technology on open platforms. Methods Mol Biol 888:67–89 44. Sansaloni C, Petroli C, Jaccoud D, Carling J, Detering F, Grattapaglia D, Kilian A (2011) Diversity Arrays Technology (DArT) and next-generation sequencing combined: genome-wide, high-throughput, highly informative genotyping for molecular breeding of Eucalyptus. BMC Proc 5(Suppl 7):P54 45. Lind C, Kilian ABenzie J (2017) Development of diversity arrays technology markers as a tool for rapid genomic assessment in Nile tilapia, Oreochromis niloticus. Anim Genet 48:362–364 46. Li WGodzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1659 47. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575 48. Excoffier LLischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and windows. Mol Ecol Resour 10:564–567 49. Weir BCockerham C (1984) Estimating F statistics for the analysis of population structure. Evolution 38:1358–1370 50. Belkhir K, Borsa P, Chikhi L et al (1996) GENETIX 4.05, logiciel sous Windows TM pour la ge´ne´tique des populations. Lab Ge´nome Popul Interact CNRS Umr 5000:1996–2004
Chapter 2 Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant Genotyping in C. elegans Amita Pandey, Binu Bhat, Madan L. Aggarwal, and Girdhar K. Pandey Abstract Classical restriction fragment length polymorphism (RFLP) and sequencing are labor-intensive and expensive methods to study single base changes, whereas polymerase chain reaction amplification of specific alleles (PASA) or allele-specific polymerase chain reaction (ASPCR) is a PCR-based application that allows direct detection of any point mutation by analyzing the PCR products in an ethidium bromide-stained agarose or polyacrylamide gel. PASA is based on oligonucleotide primers containing one or more 30 mismatch with the target DNA making it refractory to primer extension by Thermus aquaticus DNA polymerase lacking the 30 to 50 exonuclease proofreading activity because of which it is also called amplification refractory mutation system-PCR (ARMS-PCR). This technique has found application in detection of allele, mutation, singlenucleotide polymorphisms (SNPs) causing genetic and infectious diseases. This chapter describes an approach of cohort PASA in context of genotyping single and double mutant worms generated to study the process of cell migration and axon outgrowth in C. elegans. Single worm-based cohort PASA allows genotyping for identification of single base mutations; particularly it is convenient method to detect mutations without a visible phenotype. Key words Polymerase chain reaction, Allele-specific PCR, Single worm PCR, C. elegans
1
Introduction DNA replication is the most critical process for continuation of a species, via transmission of genetic information encoded in the DNA by replication process from mother to the daughter cells. In vivo replication is catalyzed by DNA-dependent DNA polymerases, first discovered in E. coli by Arthur Kornberg and colleagues [1]. Whereas, polymerase chain reaction (PCR), is an in vitro process of DNA replication, developed in the early 1980s [2, 3], which uses Taq, a thermostable DNA-dependent DNA polymerase. Taq was first isolated from the thermophilic bacteria Thermophilus aquaticus in 1976 [4, 5], a strain YT-1 of this bacteria was first isolated from a sample taken from mushroom springs of Yellowstone national park in 1967 [6]. The enzyme became famous for its use in PCR [7] and
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_2, © Springer Science+Business Media, LLC, part of Springer Nature 2022
17
18
Amita Pandey et al.
in 1989 was designated as the “Molecule of the Year” [8]. Since then the basic PCR technique has been modified to address different problems encountered in biological sciences (Table 1). However, since Taq DNA polymerase lacks 30 –50 exonuclease proofreading activity and shows low replication fidelity, making it suitable for routine genotyping experiments. Pfu isolated from Pyrococcus furiosus, possessing superior thermostability and proofreading activity has been used for applications requiring precise replication [38]. Since the time of its invention, this in vitro technique producing billions of copies of DNA in a matter of couple of hours has found application in numerous areas (Table 1). This chapter discusses PASA/AS-PCR in greater details by including studies conducted in the model organism, C. elegans. As the name suggests PASA is the detection of specific alleles using polymerase chain reaction. Ideally the two copies of a gene, one on each chromosome should be identical; however, differences are observed between the two gene copies at the sequence level. Although, previously allele has been strictly used to define two forms of a gene, more recently allelic variations have been observed in noncoding regions of DNA; such allelic differences regulate phenotype by modulating gene expression [39]. PASA is a technique for distinguishing two alleles either within the gene or in the extra genic region using allele-specific primers, thereby underscoring the importance of primer design. One prerequisite for PASA like any other PCR-based technique is the knowledge of DNA sequence of the target gene unlike various sequencing methods including Sanger sequencing, pyrosequencing, and next- generation sequencing (NGS), which do not require prior sequence information. This chapter primarily discusses a modified PASA technique called cohort PASA for isolating single and double mutant worms in C. elegans.
2
Advances in Primer design Strategies Since the most important parameter for a successful and robust PASA reaction is the primers, therefore it will be most appropriate to discuss primer design and their influence on PASA-based detection of allelic variation.
2.1 Enhancing Specificity and Discriminatory Potential of Allele-Specific Primers
Traditionally, PASA reactions use two allele-specific primers (ASPs) and one gene-specific primer (GSP), an ASP includes an allelespecific (AS) nucleotide at the last position of the 30 -terminus. The allele specific (AS) nucleotide is present as the last base of the ASP because 30 -OH is the site of polymerase activity to add new nucleotides, therefore any mismatch at 30 end will abolish complementary base pairing and stall the polymerase activity. Moreover, it
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
19
Table 1 Modifications of basic PCR technique finds application in solving various biological problems PCR Technique
Principle and Application
References
PCR-restriction fragment length polymorphism (PCR-RFLP)
PCR amplification followed by RFLP for [9] detection of DNA polymorphism, mutations, single-nucleotide polymorphism (SNP).
Methylation-specific-PCR (MSP-PCR)
Assessing methylation status of any group of [10, 12] CpG sites within a CpG island by using sodium bisulfite, converting all unmethylated but not methylated cytosine to uracil and subsequent amplification with primers specific for methylated versus unmethylated DNA.
Differential display-PCR (DD-PCR)
Used to identify changes in gene expression at [13] mRNA level by reverse transcribing mRNA into cDNA to create cDNA library followed by PCR.
Random amplified polymorphic DNA-PCR (RAPD-PCR)
Random primers are used to amplify stretches [13] of DNA of unknown sequences to trace the phylogeny of plant and animal species.
Quantitative PCR (qPCR)/real-time PCR/quantitative real-time PCR
Simultaneous amplification and product quantification of a target DNA as the process takes place in real-time, absolute quantification of initial copy number, and relative quantification to reference target.
[14–17]
Microchip PCR
Miniaturizes conventional PCR systems and reduces operation time and cost.
[18, 19]
Repetitive element sequence based-PCR (Rep-PCR)
Used as an effective method for bacterial strain [20] typing.
In silico PCR
A complementary method to ensure primer specificity for an extensive range of PCR applications.
Gap-PCR
Detection of deletions in DNA, otherwise not [22] detectable by sequencing.
Droplet digital-PCR (ddPCR)
[23] For determining original concentration of template by analysis of droplets formed in a water-oil emulsion to determine the fraction of PCR-positive droplets in the original sample.
CO-amplification at lower denaturation temperature-PCR (COLD-PCR)
Selectively amplifies mutations by performing [24] single PCR at a critical temperature at which the mutation containing DNA is preferentially melted.
Multiplex PCR
Detection of multiple target DNA in single reaction using multiple primer pairs. Used in molecular diagnostics and species determination.
[21]
[25]
(continued)
20
Amita Pandey et al.
Table 1 (continued) PCR Technique
Principle and Application
References
PCR-Reverse Dot Blot (PCR-RDB)
[26] Genotyping polymorphisms and identifying heterogeneity in genes by amplification of target DNA, hybridization of the biotinylated amplicon to oligonucleotide probes immobilized on a membrane, followed by color development using streptavidin-conjugated alkaline phosphate.
Quantitative fluorescence-PCR (QF-PCR) Determination of aneuploidy associated with autosomes.
[27]
PCR-melting profile (PCR-MP)
Used for distinguishing different lineages or varieties within a species by doing qPCR followed by melting curve analysis.
[28]
Nested-PCR
Used for increasing sensitivity and specificity of PCR requiring two sets of primers for amplifying low-abundant cDNA.
[29]
Arbitrary primer sequence-based PCR (AP-PCR)
Used in conjunction with pulsed field gel electrophoresis for molecular typing of strains causing nosocomial outbreaks.
[30]
PCR-based denaturing high-performance Species identification by DHPLC based on liquid chromatography (PCR-DHPLC) length and sequence of amplicons.
[31]
Degenerate-PCR
[32] Degenerate-PCR primer towards target amplification and sequencing is a useful technique when a population of organisms under investigation is evolving rapidly or is highly diverse.
Reverse transcription-PCR (RT-PCR)
Used for gene expression analysis.
[33]
PCR-Enzyme linked immunosorbent assay PCR products are biotinylated and captured [34] (PCR-ELISA) on streptavidin-coated microtiter plates and detected by using an antidigoxigenin Fab-peroxidase conjugate. Used for diagnostic purpose. Chromatin-immunoprecipitation-PCR (ChIP-PCR)
Determine and quantitate if the ChIP actually [35, 36] enriched the DNA sequences that are associated with the target protein.
Polymerase chain reaction amplification of Detection of point mutation. specific alleles (PASA) or allele-specific polymerase chain reaction (ASPCR)
[37]
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
21
has been observed that ASP’s allelic determination is often hampered by cross-hybridization between the defined genotypic ASP and the alternative allele’s template giving rise to non-specific amplification. Therefore, to improve specificity, an artificial mismatch is introduced either in the penultimate (second last nucleotide) or antipenultimate (third last nucleotide) at the 30 terminus of an ASP [40]. Single-nucleotide polymorphisms (SNPs) are a cause of various disorders and studied using restriction fragment length polymorphism (RFLP), high-resolution melting (HRM) and realtime PCR using hydrolysis probes. A study was conducted on 15 SNPs and 15 clinically relevant somatic mutations using double mismatch allele-specific primers (DMAS) for qPCR assay (DMASqPCR). DMAS primer included a mismatch at fourth nucleotide from the 30 terminus and was used for qPCR assays [41]. It was concluded from this study that the artificial mismatch at fourth position enhanced the discriminatory power of the DMAS primer and was independent of the nucleotide type. Although, it has been observed that inclusion of artificially mismatched nucleotides does enhance specificity and priming of ASPs, these cannot always accurately discriminate between different alleles, thereby leading to false-positive results. A study conducted to detect an oncogenic mutation, BRAF V600E in B-raf (BRAF) proto-oncogene, using Taqman-based qPCR assays devised two additional strategies besides introduction of mismatch in the penultimate nucleotide, to circumvent the nonspecific amplification against nonmatched alleles. The first strategy included a competitive external allele-specific controller (CEAC) to allele-specific (AS) -PCR (cAS-PCR) and the second approach included a referenced internal positive controller (RIPC), in the cAS-PCR (rcASPCR) [40]. The CEAC plasmid shares the same binding sequences as in ASP, satisfying the requirement for the thermodynamic driving force of DNA polymerase, thereby eliminating nonspecific amplification observed in AS-PCR. Whereas, RIPC, human leptin gene, in rcAS-PCR monitored the initial amount of input sample genomic DNA (gDNA) to avoid false-negative results [40]. Another study for detection of codon 600 mutation in BRAF kinase gene used sense and antisense allele-specific primers in conjunction with Taqman probe. Two Taqman probes specific to sense and antisense strands were used [42]. 2.2 Enhancing Sensitivity of PASA-Based Approach
Molecular tests on DNA mutation detection have been used by clinical laboratories and physicians for better understanding of diseases in patients. Several probe-blocking methods have been introduced in real-time AS-PCR to block amplification of wildtype templates and to increase detection sensitivity and specificity for the mutant allele [43]. However, these methods have a limited sensitivity (no better than 1%) and are complex in the design of blockers, and thus cannot be readily adapted for different mutation
22
Amita Pandey et al.
assays. The AS-nonextendable primer blocker (NEPB)-PCR method amplifies only mutant allele-specific DNA, whereas wildtype (wt) DNA is blocked from amplification by a modified NEPB. This method was tested for three mutations in cancer; K-Ras, B-Raf, and EGFR, and with three different types of modified blockers (phosphate, inverted dT, and amino-C7), resulting in a detection limit of 0.1% [42]. 2.3 Web-Based Applications for Primer Design
3
Various softwares are available for designing primers with greater accuracy including http://bioinfo.biotec.or.th/WASP, www. primerxl.org, Oligo Primer Analysis Software v7 from Molecular Biology Insights (Cascade, CO), PRIMER 3 software (http:// frodo.wi.mit.edu/primer3).
Advances in Detection Techniques of PASA Traditionally, qualitative detection of PASA products has been done by size separation using agarose gel electrophoresis. However, newer techniques have been used to reduce cost, time, and sensitivity of detection.
3.1 Increased Sensitivity of Detection
In a study to detect point mutation in K-Ras gene using allelespecific PCR, a gold nanoparticle (AuNP)-DNA tag was covalently attached to the 50 -end of each primer by a nine-carbon linker to produce a sticky end. Therefore, one of the sticky ends of the PCR products was bound to gold nanoparticles, while the other sticky end was captured onto a nitrocellulose membrane of lateral flow strips. The lateral flow strip showed a great sensitivity, which detected mutations in as low as 10 tumor cells [43].
3.2 Reduced Cost and Time
Allele-specific amplification, combined with TaqMan probe-based real-time polymerase chain reaction (real-time AS-PCR) has been widely used for detecting genetic variants, single-nucleotide polymorphisms, and genetic mutations. Use of TaqMan probe-based methods makes detection process easy and eliminates step involving agarose gel electrophoresis [42]. A melting curve-based allele-specific PCR method was developed to genotype two single-nucleotide polymorphisms (SNPs) of Apolipoprotein E (APOE) locus. The method involved the PCR Tm shift method. In this method, two allele-specific forward PCR primers were tagged with two different lengths of GC tail, respectively which generated two allele-specific PCR amplicons with different sizes. As melting temperature of PCR amplicon is sizedependent the two different sizes of allele-specific amplicons resulted in two distinct melting temperatures in dissociation curve analysis. Hence, this method has a high resolution in the determination of the SNP genotype of APOE. Further experiments showed that DNA dissolved from blood collected on Guthrie filter paper
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
23
and total blood cell lysate without DNA extraction can be used in the melting curve-based allele-specific PCR method. Thus, it was suggested to be fast, accurate, and robust APOE genotyping method with a flexible throughput and suitable for DNA template from different preparations [44].
4
Cohort PASA, a Case Study in C. elegans In a laboratory culture, C. elegans majorly exists as a hermaphrodite; however to study function of genes and interaction between genes, males are artificially induced and crosses are performed with worms harboring different mutations. Graphic of the workflow of a typical cross involving two mutations to generate a double mutant is depicted in Fig. 1. Briefly, male and female worms (P0) were put together on a mating plate. Individual F1 progenies obtained from the cross were placed onto separate fresh nematode growth medium (NGM)-containing plates and allowed to lay eggs comprising F2 generation. While all the F1 progenies exhibited a wildtype phenotype because they were heterozygous for the mutations, after undergoing self-fertilization and segregation produced F2’s, which were either homozygous for the wild-type or mutant allele or heterozygous for one or both mutations. Genotyping using cohort PASA was performed on 8 gravid F2 worms picked from a single plate (Fig. 1). Once the plate homozygous for mutation was identified, individual worms were picked and transferred on to fresh plate, followed by second round of genotyping to confirm presence of the mutations (Fig. 1). Cohort PASA data were analyzed from genetic crosses conducted in C. elegans for investigating signaling modules regulating the process of axon outgrowth and cell migration [45]. The study provided supporting data for genetic interaction between unc-53 and rac GTPases; ced-10 and mig-2; and the netrin receptor unc-5 in regulating distal tip cell (DTC) migration [46]. Additionally, interaction between known regulators of cell migration and axon outgrowth process including rpm-1 and unc-73 with unc-53 was also studied (A. Pandey et al., unpublished data). C. elegans is an excellent model organism to study gene mutation because generally mutation in a gene in C. elegans results in a phenotype such as the mutant unc-53 has an Unc (uncoordinated) phenotype, therefore it is convenient to microscopically identify mutant worms. However, not all gene mutations result in a visible phenotype such as mig-2 (mu28), rpm-1( ju41), ced-10(n1993) and ced-10(n3246) to name a few genes, underscoring the need for an alternative approach for isolating mutant worms after performing genetic crosses. The studies conducted in C. elegans show PASA as an excellent method for genotyping because it is convenient and sensitive for differentiating mutant and wild-type alleles in single and double mutant worms. Moreover, a single gravid worm containing 10–14 eggs,
24
Amita Pandey et al.
Fig. 1 Work flow of cohort PASA: Male and hermaphrodite worms were crossed harboring mutations (m1 and m2) representing the P0 generation. F1 progenies with heterozygous for mutation were picked and transferred to fresh separate NGM plates and allowed to lay eggs, which represented the F2 generation comprising worms homozygous for wild-type or mutant (m1 or m2) alleles and heterozygous or one or both mutations. First round of cohort PASA was performed with gDNA extracted from eight worms of F2 generation picked from an individual plate until the plate homozygous for both mutation was identified. Solo worm from plate containing double mutant was transferred to fresh NGM plate and a second round of cohort PASA was performed to confirm the double mutant genotype
representing a cohort of gravid F2 progenies is used for gDNA extraction, subsequently a modified PASA called cohort PASA was performed. By using gravid F2 worms, this technique genotyped F2 and F3 generation in a single step. This book chapter is based on analysis of cohort PASA data obtained from genotyping of single and double mutants using allele-specific primers to detect point mutations. The study also provides useful insights for future development of sensitive and specific genotyping using allele-specific primers (ASPs). The present study is based on analysis of six mutations in genes including rpm-1, ced-10, mig-2, unc-73, and unc-5, sequences for the mutant (m) and wild-type (wt) alleles were obtained from WormBase (Fig. 2). A universal system of gene nomenclature followed in C. elegans will be adhered to in the chapter, which includes gene name in italics followed by mutation in the parenthesis, for example ced-10(n1993).
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
ced-10 ced-10(1993) ced-10 ced-10(n3246) rpm-1 rpm-1(ju41) mig-2 mig-2(mu28) unc-5 unc-5(e53) unc-73 unc-73(rh40)
TTGTTTGCTC TTGTTTGCTC TCTTCCTGTC TCTTCCTGTC GTTCAATAGG GTTCAATAGG TAGGATTGTG TAGGATTGTG AACTCGAAGA AACTCGAAGA ATCCCGTTCG ATCCCGTTCG
CAATTTCCCC CAATTTCCCC CAGCTGTATC TAGCTGTATC TTGAATGCAC TTGAATGCAC GGATACTGCT AGATACTGCT GCAAGCACTC GCAAGCACTC GATTGAATAA AATTGAATAA
ATCTTAGAGC ATCTTAGAGC CCAGAGCCCG CCAGAGCCCG AATGGACACT AATGGACACT GGACAGGAGG GGACAGGAGG CAATCACTCC TAATCACTCC GCTCTCGCAT GCTCTCGCAT
25
ACCGTACACT CCCGTACACT AGATTTATCG AGATTTATCG GCATAAAACG ACATAAAACG ATTATGATCG ATTATGATCG ATGAACTCCA ATGAACTCCA TGGCTCAAGC TGGCTCAAGC
Fig. 2 Target sequences of the mutations analyzed by cohort PASA: Sequences of wild-type (wt) alleles including ced-10, rpm-1, mig-2, unc-5, unc-73, and mutant (m) allele including ced-10(n1993), ced-10 (n3246), rpm-1( ju41), mig-2(mu28), unc-5(e53), unc-73(rh40) were obtained from WormBase. Bases undergone point mutation are highlighted in red and the region highlighted in blue shows the primer sequence. Forward primer sequence was identical to highlighted sequence, whereas the reverse primer sequence was complementary to highlighted sequence. All the primers included the mutation in the last base at the 30 terminus 4.1 Mechanism Involved in Cohort PASA
Since a DNA polymerase cannot synthesize a nascent DNA strand de novo, therefore all types of PCR use a short sequence of DNA complementary to the target sequence called the primer. Additionally, complementary base pairing and hydrogen bonding between the primer and target sequence at the 30 terminus are absolutely essential for the polymerase activity because the new nucleotides are added at the 30 -OH of the primer. In a PASA technique when a mismatch is introduced at the antipenultimate base at 30 terminus, the primer will form stable hydrogen bond only when the terminal and penultimate bases are complementary to the target (Fig. 3). However, if the terminal base and the antipenultimate bases are not complementary to target sequence, hydrogen bonding will not occur and polymerase activity will be stalled resulting in no subsequently amplification (Fig. 3).
4.2 Primer Design Strategy
The optimal primer length was taken between 18 and 25 bp, providing adequate specificity to bind to the target sequence and anneal at required temperature. Shorter primer lengths are desired as they show higher binding specificity at required annealing temperature. Besides length, primer sequence also determined success of a cohort PASA reaction; in general sequences with either high A T or G C content should be avoided. Regions of the target sequence with repetitive sequences were avoided wherever possible during designing primers. GC clamp was observed, where out of the five bases present at the 30 terminus not more than 3 bases were taken as G or C to enhance target binding. Specificity of the primers
26
Amita Pandey et al.
ASP1
5’ 3’
C C
wt ASP2
5’ 3’
C C
A T
PCR
3’ 5’
C 3’ T
5’
3’
5’
5’
3’
PCR
NO AMPLIFICATION
A
ASP2
5’ 3’
C C
m ASP1 B
5’ 3’
C C
C 3’ G
PCR 5’
3’
5’
5’
3’
A 3’ G
5’
PCR
NO AMPLIFICATION
Fig. 3 Mechanism of action of allele-specific primer: The primer with a mismatch in the antipenultimate base can only bind efficiently to target sequence, which shows complementary base pairing to terminal nucleotide, resulting in amplification. (a) Wild-type allele (wt) binds to the ASP1 primers specific to wt allele because of complementary base pairing at the 30 terminal base and undergoes PCR amplification, whereas ASP2, the mutant allele (m)-specific primer does not bind to wt allele due to mismatch at two positions at the 30 terminus attenuating amplification. (b) Mutant allele (m) undergoes complementary base pairing at the terminal base with ASP2 primers and undergoes PCR amplification, whereas the ASP1 primer does not bind to mutant allele due to mismatch at two positions and shows no PCR amplification
was analyzed using the following link https://wormbase.org/ tools/blast_blat, to avoid secondary annealing leading to generation of secondary amplicons. All the primers were further analyzed by IDT OligoAnalyzer™ tool and NEB Tm calculator for assessing secondary structure formation and annealing temperature. For genotyping to isolate single and double mutants, three primers were designed for each mutant genotype including two primers each targeting the mutant and the wild-type allele, labeled as ASP2 and ASP1 and one common primer targeting the gene, labeled GSP (data not shown). A mismatch was introduced in the antipenultimate base (in parenthesis) of the primers specific to the wild-type and mutant allele (Table 2). While introducing the mismatched bases, the hydrogen bonding capacity was not altered from that shown by the original base, for example Adenine was replaced with Thymine (both form two hydrogen bonds) and Cytosine was replaced with Guanine (both form three hydrogen bonds). In general, the GC content ranged between 40% and 60% for stronger hydrogen bond formation, determining the overall stability of the primers. However, a high GC content can also result in formation of primer dimers hindering amplification. For this study percent GC content was taken as 50% and melting temperatures
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
27
Table 2 Gene/allele investigated in the present study and genotyped using cohort PCR amplification of specific alleles Gene/ Allele
Primer Name
Primer sequence
%GC
Tm
ced-10 ced-10 (n1993 ) ced-10 ced-10 (n3246 ) mig-2 mig-2 (mu28 ) rpm-1 rpm-1 (ju41 ) unc-5 unc-5 (e53 ) unc-73 unc-73 (rh40 )
ASP-1 ASP-2 ASP-1 ASP-2 ASP-1 ASP-2 ASP-1 ASP-2 ASP-1 ASP-2 ASP-1 ASP-2
gctccaatttccccatcttaga(g/C)cA gctccaatttccccatcttaga(g/C)cC ctcgggctctgggatacag(c/G)tG ctcgggctctgggatacag(c/G)tA atcctcctgtccagcagta(t/A)cC atcctcctgtccagcagta(t/A)cT ggttgaatgcacaatggaca(c/G)tG ggttgaatgcacaatggaca(c/G)tA tcgaagagaaagcac (t/A)cC tcgaagagaaagcac (t/A)cT ccaatgcgagagcttattca(a/T)tC ccaatgcgagagcttattca(a/T)tT
48 52 64 59 55 50 48 43 50 44 43 39
69°C 70°C 72°C 70°C 69°C 67°C 67°C 65°C 62°C 60°C 64°C 63°C
Size (bp)
Cal. Ann Exp. Tm Ann. Tm
440
66°C
65°C
459
66°C
UD ⃰
150
66°C
65°C
460
68°C
64°C
243
66°C
UD ⃰
522
65°C
UD ⃰
a
Undetermined
were kept at 65 C and above. Interestingly, unlike PCR where the annealing temperature is generally 4–5 C below the calculated annealing temperature, the annealing temperature for cohort PASA determined by a gradient cohort PASA was found to be around 65 C, very close to the calculated Tm of the individual primers. 4.3 Genomic DNA Extraction
Genomic DNA was extracted using single worm lysis method [47]. Briefly, for each allele, wild-type (wt) and mutant (m), solo gravid worms were placed in each tube of a strip PCR tube containing 10μl worm lysis buffer. Worm lysis to release genomic DNA (gDNA) was carried out in a thermal cycler under following conditions; 60 C for 60 min, 95 C for 15 min. Thereafter, 1μl of the worm lysate was used for PCR using two sets of primers in a total reaction mixture of 20μl.
4.4 Determining Experimental Annealing Temperature for Cohort PASA
Genomic DNA from both the wild-type and mutant animals was genotyped using both sets of primer, i.e., primers specific to wildtype allele and mutant allele, respectively. Initially, a gradient cohort PASA was performed for all primer pairs taking the calculated annealing temperature as the midpoint of the gradient, as shown for rpm-1 (Fig. 4). Qualitative assessment was performed by size separation of PCR products on a 0.8% agarose gel and imaged using gel documentation system. An amplicon of expected size was observed for rpm-1 (Fig. 4) and ced-10 and mig-2 (data not shown). Moreover, the experimentally obtained annealing temperature was found to be independent of the length of amplicon
28
Amita Pandey et al.
Fig. 4 Gradient cohort PASA for detection of rpm-1 and rpm-1( ju41) alleles: A gradient of 60 C to 70 C was performed with a rise of 2 C for each reaction (from left to right). The cohort PASA products were size separated using agarose gel electrophoresis and the sizes were determined by using 1 kb DNA ladder as a reference. An experimental annealing temperature of 64 C was used for subsequent studies
amplified for example rpm-1, ced-10, and mig-2 had the same annealing temperatures even though the length of the amplicons is different (Table 2). Interestingly, for the primers targeted to unc-73, wild-type and mutant alleles, the annealing temperature could not be determined by gradient cohort PASA. For m allele, an amplicon of expected size was observed only for ASP2; however ASP1 primers specific to wt allele amplified a nonspecific amplicon of similar size in both wildtype and rh40 alleles (Fig. 5a), therefore these primer sets were not used for further genotyping analysis. These observations could be explained either due to a very high GC content or high Tm observed for ASP1 unc-73 primers (Table 2). Similarly, while the ASP1 ced-10 primers showed amplification in wt worm lysates, the ASP2 primers did not exhibit robust amplification for any of the temperatures used in the gradient for n3246 worm lysates (Fig. 5b). Primer analysis revealed a high GC content and high Tm for both ASP1 and ASP2 primers, which could result in primer dimer formation, subsequently resulting in reduced or no amplification. Similarly, experimental annealing temperature could not be determined for unc-5 (data not shown). unc-5 primer analysis revealed a low Tm and less than 50% GC content, which could contribute to nonspecific amplification observed in the gradient cohort PASA (Data Not Shown).
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
29
Fig. 5 Gradient cohort PASA to determine annealing temperature for unc-73 and ced-10, wt and m alleles: (a) A gradient cohort PASA was performed for wild-type and rh40 alleles of unc-73 followed by size separation of amplified products using agarose gel electrophoresis. The upper gel shows nonspecific amplification in mutant gDNA with ASP1 primers; however ASP2 primers showed amplification only for m gDNA (lower gel). (b) A gradient cohort PASA was performed for wild-type and n3246 alleles of ced-10 followed by size separation of amplicons using agarose gel electrophoresis. The upper gel shows specific amplification in wt gDNA with ASP1 primers, whereas the ASP2 primers did show robust amplification at any of the temperatures used for m gDNA (lower gel). 1 kb ladder was used as a reference to determine amplicon size
4.5 Differentiating Homozygotes from Heterozygotes
5
Cohort PASA assays used in this study could differentiate between homozygotes and heterozygotes. While genotyping to obtain ced10(n1993) worms, it was found that most of the animals used for the assay had both wt and m allele and were heterozygous (Fig. 6a). The ASP1 and ASP2 primers showed amplification in the gDNA extracted from 8 gravid worms picked from few of the plates during genotyping to isolate ced-10(n1993) mutant worms. Although, not all the 8 worms showed amplification with both sets of primers (Fig. 6a). Cohort PASA technique was successfully employed for isolation of double mutant worms (Fig. 6b). ced-10(n1993); mig-2 (mu28) double mutants were isolated by performing cohort PASA using allele-specific primers. Five of the eight worms showed amplification with ASP2 primers targeted to ced-10 and mig-2 (Fig. 6b). Amplification was not observed for three worms probably because of inhibition of amplification due to various factors including quality and quantity of gDNA.
Conclusions This study is based on parsing of cohort PASA data obtained from genotyping of various C. elegans mutations studied for their role in axon outgrowth and cell migration ([46]; A. Pandey et al., unpublished data). PASA also called AS-PCR is a technique particulalry useful for identification of point mutations instead of using laborious techniques such as RFLP, SNP, and sequencing. Cohort PASA technique used for this study uses a cohort of worms composed of F2 gravid adult harboring F3 eggs to isolate single and double mutant worms. The technique was successfully employed for genotyping of single and double mutants and differentiation of
30
Amita Pandey et al.
Fig. 6 Differentiation of homozygotes and heterozygotes and isolation of double mutant worms using cohort PASA: (a) A gradient cohort PASA was performed on 8 gravid worms for wild-type and mutant (n1993) alleles of ced-10 followed by size separation of amplicons using agarose gel electrophoresis. The first 8 wells show amplification with ASP1 primers specific to wild-type allele, whereas the following 8 wells show amplification with ASP2 primers specific to mutant allele. (b) A gradient cohort PASA was performed for isolation of ced-10 (n1993) and mig-2(mu28) double mutant followed by size separation of amplicons using agarose gel electrophoresis. Upper gel shows amplification with n1993-specific primers only, whereas the lower gel shows amplification with mu28-specific primers only. No amplicon was observed with ASP1 primers targeted to wt allele of both genes. Size of the amplicons was determined using 1 kb ladder used as a reference
homozygous from heterozygous worms using agarose gel electrophoresis as an end point analysis. After analysis of the qualitative data, this study provides some useful insights, which can contribute to enhancing sensitivity and specificity and decreasing cost and time for genotyping of mutations using PASA technique. The study concludes that primers should be designed with a GC content of around 50%, supported by nonspecific amplification or no amplification observed for primer sets with lower and higher GC contents, respectively. Since the usually recommended primer length is 18–25 bp, which is also the length of primers used in this study, therefore the relationship between primer length and cohort PASA efficiency could not be assessed. Although, the hydrogen bonding capacity of the mismatched base was kept same as that observed for the original base, it will be interesting to investigate the effect of a mismatched base with altered hydrogen bonding capacity on cohort PASA efficiency. It was also observed if primers had repeat sequences of either single base or more, the amplification efficiency and specificity was affected, as observed for ced-10(n3246) and unc73(rh40) specific primers, therefore primers with repeat sequences should be avoided. Moreover, the study supports to test primers for specificity for target gene using blast tool, which results in elimination of secondary amplicons. As a general rule, a GC clamp of 2–3 bases in terminal 5 bases at the 30 end was observed to enhance stability of hydrogen bond formation at 30 terminus, which is an absolute requirement for polymerase activity. In conclusion, cohort PASA is a convenient method for genotyping and its specificity and sensitivity are determined by the primers designed. The study further proposes strategies to reduce time and cost and increase throughput efficiency of cohort PASA technique. The first approach could be to use SYBR green-based end point analysis system. SYBR green binds to dsDNA and the fluorescent signal
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . .
31
generated can be detected in a real-time machine thereby eliminating use of agarose gel electrophoresis to analyze results. qPCR can also be performed by using different sizes of amplicons for wildtype and mutant allele and performing melting curve analysis to identify the cohort, which is homozygous for the mutant allele. Colorimetric end point analysis can be also an approach eliminates laborious agarose gel electrophoresis step.
Acknowledgments The authors extend sincere gratitude to the management of Shriram Institute for Industrial Research to provide necessary infrastructure for completion of the manuscript, assigned number; SRI-MS#20210107-01. References 1. Lehman IR, Zimmerman SB, Adler J, Bessman MJ, Simms ES, Kornberg A (1958) Enzymatic synthesis of deoxyribonucleic acid. V. Chemical Composition of Enzymatically Synthesized Deoxyribonucleic Acid. PNAS. 44 (12):1191–1196 2. Mullis KB, Faloona FA (1987) Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol 155:335–350 3. Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA, Arnheim N (1985) Enzymatic amplification of β-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230:1350–1354 4. Chien A, Edgar DB, Trela JM (1976) Deoxyribonucleic acid polymerase from the extreme thermophile Thermus aquaticus. J Bacteriol 127:1550–1557 5. Kaledin AS, Sliusarenko AG, Gorodetski SI (1980) Isolation and properties of DNA polymerase from extreme thermophylic bacteria Thermus aquaticus YT-1. Biokhimiya 45:644–651 6. Brock TD (1997) The value of basic research: discovery of Thermus aquaticus and other extreme thermophiles. Genetics 146:1207–1210 7. Saiki RK, Gelfand DH, Stffel S, Scharf SJ, Higuchi R, Horn GT, Mullis KB, Erlich HA (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239:487–491 8. Guyer RL, Koshland DE Jr (1989) The molecule of the year. Science 246:1543–1546
9. Nichols WC, Lyons SE, Harrison JS, Cody RL, Ginsburg D (1991) Severe von Willebrand disease due to a defect at the level of von Willebrand factor mRNA expression: detection of exonic PCR-restriction fragment length polymorphism analysis. Proc Natl Acad Sci U S A 88(9):3857–3861 10. Herman JG, Graff JR, Myo¨h€anen S, Nelkin BD, Baylin SB (1996) Methylation-specific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci U S A 93 (18):9821–9826 11. Ramalho-Carvalho J, Henrique R, Jero´nimo C (2018) Methylation-specific PCR. Methods Mol Biol 1708:447–472 12. Graf D, Fisher AG, Merkenschlager M (1997) Rational primer design greatly improves differential display-PCR (DD-PCR). Nucleic Acids Res 25(11):2239–2240 13. Brandt ME, Padhye AA, Mayer LW, Holloway BP (1998) Utility of random amplified polymorphic DNA PCR and Taqman automated detection in molecular identification of Aspergillus fumigatus. J Clin Microbiol 36 (7):2057–2062 14. Mackay IM, Arden KE, Nitsche A (2002) Realtime PCR in virology. Nucleic Acids Res 30 (6):1292–1305 15. Singh C, Roy-Chowdhuri S (2016) Quantitative real-time PCR: Recent advances. Methods Mol Biol 1392:161–176 16. Green MR, Sambrook J (2018) Analysis and normalization of real-time polymerase chain reaction (PCR) experimental data. Cold Spring Harb Protoc 2018(10):436–453
32
Amita Pandey et al.
17. Botes M, de Kwaadsteniet M, Cloete TE (2013) Application of quantitative PCR for the detection of microorganisms in water. Anal Bioanal Chem 405(1):91–108 18. Kricka LJ, Wilding P (2003) Microchip PCR. Anal Bioanal Chem 377(5):820–825 19. Koo C, Malapi-Wight M, Kim HS, Cifci OS, Vaughn-Diaz VL, Ma B, Kim S, Abdel-RaziqH, Ong K, Jo YK, Gross DC, Shim WB, Han A (2013) Development of real-time microchip PCR system for portable plant disease diagnoses. PLoS One 8(12):e82704 20. Healy M, Huong J, Bittner T, Lising M, Frye S, Raza S, Schrock R, Manry J, Renwick A, Nieto R, Woods C, Versalovic J, Lupski JR (2005) Microbial DNA typing by automated repetitive-sequence-based PCR. J Clin Microbiol 43(1):199–207 21. Yu B, Zhang C (2011) In silico PCR analysis. Methods Mol Biol 760:91–107 22. Kho SL, Chua KH, George E, Tan JA (2015) A novel gap-PCR with high resolution melting analysis for the detection of alpha-thalassaemia southeast Asian and Filipino β-thalassaemia deletion. Sci Rep 5:13937 23. Kanagal-Shamanna R (2016) Digital PCR: principles and applications. Methods Mol Biol 1392:43–50 24. Zuo Z, Jabbar KJ (2016) COLD-PCR: applications and advantages. Methods Mol Biol 1392:17–25 25. Abbs S, Yau SC, Clark S, Mathew CG, Bobrow M (1991) A convenient multiplex PCR system for the detection of dystrophin gene deletions: a comparative analysis with cDNA hybridization shows mistypings by both methods. J Med Genet 28(5):304–311 26. Kang Y, Sun P, Mao X, Dong B, Ruan G, Chen L (2019) PCR-reverse dot blot human papillomavirus genotyping as a primary screening test for cervical cancer in hospital-based cohort. J Gynecol Oncol 30(3):e29 27. Schmidt W, Jenderny J, Hecher K, Hackeltoer B.-J, Kerber S, Kochhan L, Held KR (2000) Detection of aneuploidy in chromosome X, Y, 13, 18 and 21 by QF-PCR in 662 selected pregnancies at risk. Mol Hum Reprod 6 (9):855–860 28. Motta FC, Born PS, Resende PC, Brown D, Siqueira MM (2019) An inexpensive and accurate reverse transcription-PCR-melting temperature analysis assay for real-time Influenza virus B lineage discrimination. J Clin Microbiol 57(12):e00602–e00619 29. Mathis A, Weber R, Kuster H, Speich R (1996) Reliable one-tube nested PCR for detection
and SSCP-typing of Pneumocystis carinii. J Eukaryot Microbiol 43(5):7S 30. Bou G, Cervero´ G, Domı´nguez MA, Quereda C, Martı´nez-Beltra´n (2000) PCR-based DNA fingerprinting (REP-PCR, AP-PCR) and pulsed-field gel electrophoresis characterization of a nosocomial outbreak caused by imipenem- and meropenem-resistant Acinetobacter baumannii. JClin Microbiol Infect 6(12):635–643 31. Maukonen J, Saarela M (2009) Microbial communities in industrial environment. Curr Opin Microbiol 12(3):238–243 32. Kwok S, Chang SY, Sninsky JJ, Wang A (1994) A guide to the design and use of mismatched and degenerate primers. PCR Methods Appl 3 (4):S39–S47 33. Tan SS, Weis JH (1992) Development of a sensitive reverse transcriptase PCR assay, RT-RPCR, utilizing rapid cycle times. Genome Res 2:137–143 ˜ o MI, Reguera JM, 34. Morata P, Queipo-Ortun ˜ ez MA, Ca´rdenas A, Colmenero Garcı´a-Ordon JD (2003) Development and evaluation of a PCR-enzyme-linked immunosorbent assay for diagnosis of human brucellosis. J Clin Microbiol 41(1):144–148 35. Kim TH, Dekker J (2018) ChIP-quantitative polymerase chain reaction (ChIP-qPCR). Cold Spring Harb Protoc 2018(5):pdb.prot082628 36. Asp P (2018) How to combine ChIP with qPCR? Methods Mol Biol 1689:29–42 37. Myakishev MV, Khripin Y, Hu S, Hamer DH (2001) High-throughput SNP genotyping by allele-specific PCR with universal energytransfer-labeled primers. Genome Res 11:163–169 38. Lundberg KS, Shoemaker DD, Adams MW, Short JM, Sorge JA, Mathur EJ (1991) Highfidelity amplification using a thermostable DNA polymerase isolated from Pyrococcus furiosus. Gene 108(1):1–6 39. Kamitaki N, Usher CL, McCarroll SA (2018) Using droplet digital PCR to analyze allelespecific RNA expression. Methods Mol Biol 1768:401–422 40. Yang Z, Zhao N, Chen D, Wei K, Su N, Huang J-F, Xu H-Q, Duan G-J, Fu W-L, Huang Q (2017) Improved detection of BRAF V600E using allele-specific PCR coupled with external and internal controllers. Sci Rep 7(1):13817 41. Lefever S, Rihani A, Van der Meulen J, Pattyn F, Van Maerken T, Van Dorpe J, Hellemans J, Vandesompele J (2019) Costeffective and robust genotyping using double mismatch allele-specific quantitative PCR. Sci Rep 9(1):2150
Enhancing Cohort PASA Efficiency from Lessons Assimilated by Mutant. . . 42. Szankasi P, Reading NS, Vaughn CP, Prchal JT, Bahler DW, Kelley TW (2013) A quantitative allele specific PCR test for the BRAF V600E mutation using a single heterozygous control plasmid for quantitation: A model for qPCR testing without standard curves. J Mol Diagn 15(2):248–254 43. Morlan J, Baker J, Sinicropi D (2009) Mutation detection by real-time PCR: a simple, robust and highly selective method. PLoS One 4:e4584 44. Wang H, Jiang J, Mostert B, Sieuwerts A, Martens JWM, Sleijfer S, Foekens JA, Wang Y (2013a) Allele-specific, non-extendable primer blocker PCR (AS-NEPB-PCR) for DNA mutation detection in cancer. J Mol Diagn 15 (1):62–69 45. Fang X, Bal L, Han X, Wang J, Shi A, Zhang Y (2014) Ultra-sensitive biosensor for K-ras gene detection using enzyme capped gold nanp[articles conjugates for signal amplification. Anal Biochem 460:47–53
33
46. Chen C-H (2016) Development of a melting curve-based allele-specific PCR of Apolipoprotein E (APOE) genotyping method for genomic DNA, Guthrie blood spot and whole blood. PlosOne 11(4):e0153593 47. Pandey A, Pandey GK (2014) The UNC-53 mediated interactome: comprehensive analysis in generation of C. elegans connectome. In: Pandey A, Pandey GK (eds) SpringerBriefs in neuroscience book series (BRIEFSNEUROSCI), pp. 31–84. ISBN 978-3-319-07827-4. 48. Pandey A, Yadav V, Sharma A, Khurana JP, Pandey GK (2017) The unc-53 gene negatively regulates rac GTPases to inhibit unc-5 activity during Distal tip cell migrations in C. elegans. Cell Adh Migr 12(3):195–203 49. Barstead RJ (2000) Reverse genetics. In: Hope I (ed) C. elegans, a practical approach. Oxford University Press, New York, pp 97–118
Chapter 3 Design of Oligonucleotides for Allele-Specific Amplification Based on PCR and Isothermal Techniques Luis Antonio Tortajada-Genaro Abstract Single-nucleotide variations have been associated to various genetic diseases, variations on drug efficiency, and differences in cancer prognostics. The detection of these changes in nucleic acid sequences from patient samples is particularly useful for accurate diagnosis, therapeutics, and disease management. A reliable allelespecific amplification is still an important challenge for molecular-based diagnostic technologies. In the last years, allele-specific primers have been designed for promoting the enrichment of certain variants, based on a higher stability of primer/template duplexes. Also, several methods are based on the addition of a blocking oligonucleotide that prevent the amplification of a specific variant, enabling that other DNA variants can be observed. In this context, genotyping methods based on isothermal amplification techniques are increasing, especially those assays aimed to be deployed at point-of-care applications. The correct selection of target sequences is crucial for reaching the required analytical performances, in terms of reaction time, amplification yield, and selectivity. The present chapter describes the design criteria for the selection of primers and blockers for relevant PCR approaches and novel isothermal strategies. Several successful examples are provided in order to highlight the main design restrictions and the potential to be extended to other applications. Key words Primer design, Single-nucleotide mutations, PCR, Isothermal amplification, Allele-specific technique, DNA biosensing
1
Introduction The growing advances in DNA sequencing have enabled a deep knowledge about the genome of several species, finding the main genetic variations and their frequencies [1]. There are sequence alterations inherited from parents (germline) or acquired over the life of an individual (somatic). The changes in the nucleotide sequence may or may not cause phenotypic changes, the latter being the principal driver of diseases. Specially, the mutations are crucial for all living organism because they are permanent alterations in the sequence of genomic DNA, modifying its integrity, stability, and functionality. A change in the structure and function of coded protein can produce some catastrophic disorders and
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_3, © Springer Science+Business Media, LLC, part of Springer Nature 2022
35
36
Luis Antonio Tortajada-Genaro
Fig. 1 Classification of allele-specific PCR-based methods
abnormalities. Mutations can result from unintentional errors, during cell division, DNA repair, and replication or as the result of environmental factors. Genome analyses have identified sequence variants related to disease susceptibility, treatment efficacy, and adverse drug responses [2]. Accurate genotyping of single-nucleotide variants (polymorphisms or mutations) is playing an important role in the development of advanced molecular-based solutions, including personalized medicine. For instance, allele-specific methods allow health professionals to accurately diagnose a disease and prescribe appropriate treatment specific to each individual or a target population. An overview of the main genotyping technologies is given in Fig. 1. Nowadays, sequencing techniques are considered the gold standard for low frequency allele detection in high-throughput format. However, their high cost and time-consuming limit their massive use. Thus, the PCR-based detection of genetic markers is still interesting, especially for those to be deployed in sustainable health systems and scenarios aiming at point-of-care detection. The objective is to implement methods with high capability for detection and quantitation, simultaneously providing good mutation sensitivity, high mutation multiplexing, fast turnaround, and low reagent and instrument cost [3]. A variety of PCR-based methods have been developed to enhance detection of genetic variants, being classified in allele enrichment and allele suppression. However, the extension of an existing method to new target locus can show problems due to their complexity. In some cases, amplification errors are caused by mispriming, because the design of primer is one of the most crucial factors affecting the success and quality of any amplification-based method [4]. In fact, the selection of oligonucleotides for the discrimination of DNA variants is more difficult than conventional PCR-based methods. Hence, there is a need for a correct choice of
Design of Oligonucleotides for Allele-Specific Amplification Based on PCR. . .
37
involved oligonucleotides, in order to develop efficient methods. In the present chapter, the criteria to find the candidate primers and blockers for allele-specific assays are revised.
2 2.1
General Requirements In Silico Design
For general PCR-methods, automated algorithms, software or specialized websites are available for primer design. Some of them are able to consider the presence of a polymorphism in the input sequence or in primers, Primer3 Plus being the most popular free primer design software [5, 6]. However, few of them are useful for the specific objective of allele-selective amplification. Thus, the optimal sequences are often done manually using a sequence alignment program and checking the properties of possible sequences. The selection of oligonucleotides usually follows a common initial protocol. Once the target variation has been defined, the next step is to find potential primers regions on the corresponding gene sequences. The design algorithm starts by considering the default requirements, widely described in the literature [4, 7, 8]. The general parameters to check are related to the primers and to the amplicon such as melting temperature, length, GC content, self-complementary, primer-dimer and hairpin formation, degree of degeneracy, end stability, and end specificity. Table 1 shows the main restrictions for designing primers in any PCR technique. A general recommendation is the selection of short products, because they amplify more effectively. Also, stable secondary structures, such as hairpins, tetraplex structures associated to poly-C or polyG regions, must be avoided because they can interfere the annealing and extension steps. In multiplexed assay, the risk of unbalanced amplification or primer-dimers is high. Additional design
Table 1 Default design restrictions applied for the primer design
Primers
Requisite
Parameter
Recommended interval
Selectivity
Length
16–25 nucleotides
Tm
55–65 C
High amplification efficiency
ΔTm primers
11.0) between the target (sub)species and the (sub) species the target (sub)species shall be differentiated from, see Note 15. 6. If a primer/probe system results in a low Ct value (~20.0–25.0) for the target (sub)species, and the ΔCt value is between 8.0 and 11.0, try to enhance selectivity by varying the real-time PCR conditions (see Subheading 3.9). 7. If the ΔCt value is still too low, introduce a third mismatch base, as described in Subheading 3.8.
78
Maria Kaltenbrunner et al.
3.8 Introduction of Three Mismatch Bases
In case, the introduction of one or two mismatch bases was not sufficient, introduce a third mismatch base and vary kind and position. 1. Select the most selective primer containing two mismatch bases. 2. Introduce the third mismatch base either at position 5, 6 or 7 from the 30 end of the primer. (In our example, primers with three mismatch bases are primer 17–19, primer 23–25, and primer 26–28, Table 2). 3. Evaluate the primers in silico (see Subheading 3.5). 4. If the primers meet the criteria in silico, order the primers and test them by running real-time PCR (see Subheading 3.6). 5. If any of the primer/probe systems results in a low Ct value (~20.0–25.0) for the target species and a high ΔCt value (>11.0) between the target (sub)species and the (sub)species the target (sub)species shall be differentiated from, see Note 15. 6. If a primer/probe system results in a low Ct value (~20.0–25.0) for the target (sub)species, and the ΔCt value is between 8.0 and 11.0, try to enhance selectivity by varying the real-time PCR conditions (see Subheading 3.9). 7. If the ΔCt value is still too low, we do not suggest introducing further mismatch bases (see Note 16). We suggest looking for a SNP at another locus in the genome.
3.9 Optimization of Real-Time PCR Conditions
A number of parameters affect amplification efficiency, including the primer concentration, and the concentration of magnesium ions. 1. Vary the primer concentration and the ratio of forward and reverse primer (see Notes 17 and 18). 2. Test another real-time PCR kit (see Note 13).
4
Notes 1. In meat species authentication, one is frequently interested not only in qualitative but also quantitative determination of meat species. It is, for example, necessary to verify if in a sausage declared as “game sausage”, at least 38% (w/w) of the total meat content derives from game species. If the game species content in a “game sausage” is 35. 17. We suggest varying the primer and probe concentration in the range from 50 nM to 1000 nM and 50 nM to 300 nM, respectively. 18. Investigate whether the selectivity for the target species can be further enhanced by using a higher concentration of the primer containing the species-specific base and the mismatch base (s) compared to the other primer. References 1. Vignal A, Milan D, SanCristobal M, Eggen A (2002) A review on SNP and other types of molecular markers and their use in animal genetics. Genet Sel Evol 34(3):275–305 2. Beissinger TM, Hirsch CN, Sekhon RS, Foerster JM, Johnson JM, Muttoni G, Vaillancourt B, Buell CR, Kaeppler SM, de Leon N (2013) Marker density and read depth for genotyping populations using genotyping-by-sequencing. Genetics 193 (4):1073–1081 3. Gibson NJ (2006) The use of real-time PCR methods in DNA sequence variation analysis. Clin Chim Acta 363(1-2):32–47 4. Cha RS, Zarbl H, Keohavong P, Thilly WG (1992) Mismatch amplification mutation assay (MAMA): application to the c-H-ras gene. Genome Res 2(1):14–20 5. Sabui S, Dutta S, Debnath A, Ghosh A, Hamabata T, Rajendran K, Ramamurthy T, Nataro JP, Sur D, Levine MM, Chatterjee NS (2012) Real-time PCR-based mismatch amplification mutation assay for specific detection of CS6-expressing allelic variants of enterotoxigenic Escherichia coli and its application in assessing diarrheal cases and asymptomatic controls. J Clin Microbiol 50(4):1308–1312 6. Easterday WR, Van Ert MN, Zanecki S, Keim P (2005) Specific detection of Bacillus anthracis using a TaqMan® mismatch amplification mutation assay. BioTechniques 38(5):731–735 7. Kreizinger Z, Sulyok KM, Gro´zner D, Beko K, Da´n A, Szabo´ Z, Gyuranecz M (2017) Development of mismatch amplification mutation assays for the differentiation of MS1 vaccine strain from wild-type Mycoplasma synoviae and MS-H vaccine strains. PLoS One 12(4): e0175969 8. Morita M, Ohnishi M, Arakawa E, Bhuiyan NA, Nusrin S, Alam M, Siddique AK, Qadri F, Izumiya H, Nair GB, Watanabe H
(2008) Development and validation of a mismatch amplification mutation PCR assay to monitor the dissemination of an emerging variant of Vibrio cholerae O1 biotype El Tor. Microbiol Immunol 52(6):314–317 9. Syverson RL, Bradeen JM (2011) A novel class of simple PCR markers with SNP-level sensitivity for mapping and haplotype characterization in Solanum species. Am J Pot Res 88 (3):269–282 10. Han EH, Lee SJ, Kim MB, Shin YW, Kim YH, Lee SW (2017) Molecular marker analysis of Cynanchum wilfordii and C. auriculatum using the simple ARMS-PCR method with mismatched primers. Plant Biotechnol Rep 11 (2):127–133 11. Kaltenbrunner M, Hochegger R, CichnaMarkl M (2018) Red deer (Cervus elaphus)specific real-time PCR assay for the detection of food adulteration. Food Control 89:157–166 12. Kaltenbrunner M, Hochegger R, CichnaMarkl M (2018) Development and validation of a fallow deer (Dama dama)-specific TaqMan real-time PCR assay for the detection of food adulteration. Food Chem 243:82–90 13. Kaltenbrunner M, Hochegger R, CichnaMarkl M (2018) Sika deer (Cervus nippon)specific real-time PCR method to detect fraudulent labelling of meat and meat products. Sci Rep 8(1):7236 14. Kaltenbrunner M, Mayer W, Kerkhoff K, Epp R, Ru¨ggeberg H, Hochegger R, CichnaMarkl M (2019) Differentiation between wild boar and domestic pig in food by targeting two gene loci by real-time PCR. Sci Rep 9(1):9221 15. Ballin NZ (2010) Authentication of meat and meat products. Meat Sci 86(3):577–587 16. Laube I (2010) Meat. In: Popping B, DiazAmigo C, Hoenicke K (eds) Molecular biological and immunological techniques and
82
Maria Kaltenbrunner et al.
applications for food chemists. John Wiley & Sons, Inc., Hoboken, New Jersey, pp 135–156 17. Broll H (2010) Quantitative Real-Time PCR. In: Popping B, Diaz-Amigo C, Hoenicke K (eds) Molecular biological and immunological techniques and applications for food chemists. John Wiley & Sons, Inc, Hoboken, New Jersey, pp 59–83 18. Kaltenbrunner M, Hochegger R, CichnaMarkl M (2018) Tetraplex real-time PCR assay for the simultaneous identification and quantification of roe deer, red deer, fallow deer and sika deer for deer meat authentication. Food Chem 269:486–494 19. Beugin M-P, Baubet E, Dufaure De Citres C, Kaerle C, Muselet L, Klein F, Queney G (2017) A set of 20 multiplexed singlenucleotide polymorphism (SNP) markers specifically selected for the identification of the wild boar (Sus scrofa scrofa) and the domestic
pig (Sus scrofa domesticus). Conserv Genet Resour 9(4):671–675 ¨ sterreichisches 20. Codex Alimentarius Austriacus, O Lebensmittelbuch, Codexkapitel/B14/Fleisch und Fleischerzeugnisse. 2005 21. Druml B, Kaltenbrunner M, Hochegger R, Cichna-Markl M (2016) A novel reference real-time PCR assay for the relative quantification of (game) meat species in raw and heatprocessed food. Food Control 70:392–400 22. Dobrovolny S, Blaschitz M, Weinmaier T, Pechatschek J, Cichna-Markl M, Indra A, Hufnagl P, Hochegger R (2019) Development of a DNA metabarcoding method for the identification of fifteen mammalian and six poultry species in food. Food Chem 272:354–361 23. Livak KJ (1999) Allelic discrimination using fluorogenic probes and the 50 nuclease assay. Genet Anal 14(5-6):143–149
Chapter 6 Primer Design for the Analysis of Closely Related Species: Application of Noncoding mtDNA and cpDNA Sequences Lidia Skuza Abstract Noncoding regions of the chloroplast (cpDNA) and mitochondrial (mtDNA) genomes are commonly used in plant phylogenetic and population studies. Consensus primers, which are homologous to most coding regions, but amplify variable noncoding regions, are very useful for this purpose. However, high genetic diversity of plants poses a problem in developing molecular methods that require conserved DNA sequences between species. This chapter describes the protocol for designing PCR primers suitable for analysis of closely related plant species. As an example, we used PCR primer design for cpDNA noncoding regions of the rye (Secale). Key words Primer design, Noncoding sequences, Related species, Species identification, mtDNA, cpDNA1
1
Introduction Molecular markers are an important tool in species classification as they can directly detect gene variations. In phylogenetic studies, it is very important to select appropriate sequences, depending on the taxonomic level at which the phylogenetic reconstruction is carried out. The primary choice regards highly variable rapidly evolving sequences. The more closely related are the tested subjects, the less variable the studied region should be. Hence, the relatively slow rate of evolution of certain sequences may exclude statistically significant analyses within a family or species, while determining the relationship between phylogenetically more distant species using slower evolving sequences can be very informative. Generally, coding sequences have a slower rate of evolution than noncoding sequences [1–5]. These regions accumulate more deletions/insertions or substitutions than coding regions, and may therefore be more suitable at the intergeneric or intrageneric level [6–8]. Noncoding regions of the chloroplast and mitochondrial genomes have been described as highly variable [6, 7, 9–16]. This has
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_6, © Springer Science+Business Media, LLC, part of Springer Nature 2022
83
84
Lidia Skuza
led to the design of “universal” primer pairs, which enable the amplification of noncoding regions separating two coding fragments in most plant species [17–20] (Table 1). The use of such consensus primers, which are homologous to most coding regions, but amplify variable noncoding regions is very useful in phylogenetic and population studies [19–21]. It has been shown that “universal” PCR primers can be effectively used to detect plant DNA from many taxonomic groups [12, 18]. However, it is often difficult to adapt one target region to all genetic varieties in different plant lines, which results in amplification failures in many species. The current work presents the reassessment of PCR primers commonly used to amplify cpDNA and mtDNA regions in plants as well as the methods of developing primers adapted to the tested material. New primers were able to amplify the entire region in representative samples of 13 species and subspecies of the genus Secale. These primers can be used in various fields of plant research, including DNA barcoding, molecular ecology, metagenomics, or phylogenetic research.
2 2.1
Materials PCR Evaluation
1. Primers can be ordered from different manufacturers (e.g., Genomed, IDT). 2. DNA from closely related plant species and subspecies: DNA can be isolated using various methods (see Note 1) or commercial kits (e.g., Maxwell® Tissue DNA Purification Kit (Promega)). 3. UV/Vis spectrophotometer Fisher)).
(e.g.,
Nanodrop
(Thermo
4. PCR reagents (e.g., ThermoFisher). 5. Thermocycler (e.g., BioRad). 6. Electrophoresis apparatus (e.g., TKBiotech). 2.2 Sequence Comparison and Primer Design
1. NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/ nuccore). 2. Sequence alignment program (e.g., MEGA7 (https:// megasoftware.net/)). 3. Primer analysis (e.g., OligoAnalyzer (https://eu.idtdna.com/ calc/analyzer)).
Primer Design for Noncoding mtDNA and cpDNA Sequences
85
Table 1 Primer list of sequences (50 –30 ) for amplification of different regions of cpDNA and mtDNA and their references used in this study Type of DNA
Locus
Direction Sequence 50 –30
Reference
cpDNA
atpB-rbcL
Forward
ACATCKA RTACKGGACCAA TAA AACACCAGCTTTRAA TCCAA CATTACAAATGCGA TGCTCT TCTACCGA TTTCGCCATATC CGAAATCGG TAGACGCTACG GGGGATAGAGGGAC TTGAAC ACCAATTGAACTACAA TCCC CCCTTTTAACTCAG TGGTAG
Chiang et al. [17]
GCATTACGATC TGCAGCTCA GGAGCTCGATTAG TTTCTGC CAGTGGGTTGGTC TGGTAATG TCATATGGGCTAC TGAGGAG CTGTYTTTTCGCAC TTAGGC GTCCGRGGTACTA TTGCTGT TTTCTTCTCTACCA TGACGA TGATCCYACTCGG TSTTCCT ACCATATTTDGATC TGCCDC YACGATHGGATTTC TMTATG GAGGTCGGAATGGGA TCGGG GGGTGAAGTCG TAACAAGGT
Demesure et al. [19] Demesure et al. [19] Demesure et al. [19] Demesure et al. [19] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22] Duminil et al. [22]
Reverse trnT (UGU)–trnL (UAA) exon
Forward Reverse
trnL (UAA) intron
Forward Reverse
trnD[tRNA–Asp(GUC)]-trnT [tRNA–Thr(GGU)]
Forward Reverse
mtDNA
nad1 exon B-nad1 exon C intron
Forward Reverse
nad4/1-2
Forward Reverse
nad4L-orf25
Forward Reverse
rps12-1/nad3-2
Forward Reverse
rps12-2/nad3-1
Forward Reverse
rrn5/rrn18-1
Forward Reverse
Chiang et al. [17] Taberlet et al. [18] Taberlet et al. [18] Taberlet et al. [18] Taberlet et al. [18] Demesure et al. [19] Demesure et al. [19]
86
3
Lidia Skuza
Methods
3.1 DNA Isolation and PCR Evaluation
1. Isolate DNA (see Note 1). 2. Run PCR amplification of cpDNA and mtDNA using consensus primers (Table 1) (see Note 2). 3. Run PCR amplification of cpDNA in 25-μl reaction mixtures containing approximately 50–150 ng of genomic DNA template, 2.0–3.0 mM MgCl2, 0.2–1.0 mM each dNTP, 0.1–1μM each primer, 0.1 mg BSA/ml, and 1 U of Taq DNA polymerase. Specifics of reaction conditions and components for amplification of each analyzed regions are given in Table 2. 4. Run PCR amplification of mtDNA in 25-μl reaction mixtures containing approximately 75–250 ng genomic DNA template, 2.5–4.0 mM MgCl2, 0.1–0.2 mM each dNTP, 0.2–0.55μM each primer, 0.05 mg BSA/ml, and 1–1.5 U Taq DNA polymerase. Specifics of reaction conditions and components for amplification of each analyzed regions are given in Table 3. 5. Run electrophoresis of the obtained products on a 1.5% (m/v) agarose gel in a 1 TBE buffer (89 mM Tris, 89 mM boric acid, 2 mM Na2EDTA, pH 8.3). 6. Sequence PCR products.
3.2 In Silico Evaluation
Built multiple sequence alignments. 1. Align sequences according to the instruction for the appropriate sequence alignment software (e.g., MEGA7) (see Note 3). An example of a multiple sequence alignment is shown in Fig. 1. 2. Select “Edit”, “Select all”, subsequently “Alignment” and “Align by ClustalW”. 3. Select the most conserved regions and use them to design the primer (see Note 4). 4. Copy the 18–35 nucleotide segment from the 50 strand of the consensus sequence and the 18–35 30 nucleotide segment to the text file. Be sure to select a fragment that contains consensus sequences and adjacent variable sequences. These sequences will be assessed for potential use as 50 and 30 primers (see Note 5). 5. PCR primers evaluated in this example are shown in Fig. 2. 6. Evaluate potential primers using the free web analysis tool— OligoAnalyzer 3.1 (https://eu.idtdna.com/calc/analyzer). First paste the 50 primer into the program window. Default settings are used for all calculations. Select “Analyze”. Record the results for the 50 primer length, melting point, and GC content in the text file. To adjust melting temperature
10 min 72 35
72
Final extension
Number of cycles 30
72
2 min
55
94
Primer extension 72
45 s
94
45 s
92
Denaturation
4 min
Primer annealing 53.5
94
10 min
2 min
1 min
1 min
3 min
Time
35
72
72
62
94
94
72
53.5
92
94
Temperature ( C)
30
10 min
2 min
45 s
45 s
4 min
Time
trnD [tRNA–Asp(GUC)]-trnT [tRNA–Thr (GGU)]
10 min 72
2 min
1 min
1 min
3 min
Temperature ( C) Time
Temperature ( C)
Temperature ( C) Time
Initial denaturation
Phase of PCR
trnT (UGU)-trnL (UAA) 50 exon trnL (UAA) intron
atpB-rbcL
Table 2 Thermocycling conditions for a PCR amplification of cpDNA noncoding (intron) regions [21]
Primer Design for Noncoding mtDNA and cpDNA Sequences 87
52
72
72
40
Primer annealing
Primer extension
Final extension
Number of cycles
72
52
94
94
Temperature ( C)
nad4/1-2
40
10 min 72
1 min
45 s
45 s
94
Denaturation
Time
1 min
Temperature ( C)
Initial denaturation 94
Phase of PCR
nad1 exon B nad1 exon C intron
72
51.5
94
94
Temperature ( C)
40
10 min 72
1 min
45 s
45 s
1 min
Time
nad4L-orf25
72
52
94
94
Temperature ( C)
40
10 min 72
1 min
45 s
45 s
1 min
Time
rps12-1/nad3(2)
Table 3 Thermocycling conditions for a PCR amplification of mtDNA noncoding (intron) regions [21]
Temperature ( C)
72
52
94
40
10 min 72
1 min
45 s
45 s
12 min 94
Time
rps12-1/nad3(1)
Temperature ( C)
72
52
94
40
10 min 72
1 min
45 s
45 s
12 min 94
Time
rrn5/rrn18-1
10 min
1 min
45 s
45 s
12 min
Time
Primer Design for Noncoding mtDNA and cpDNA Sequences
89
Fig. 1 An example of a multiple sequence alignment
Fig. 2 PCR primers evaluated in this example
(if necessary), remove or add one or more bases to the end of the primer (see Note 6). 7. Test primers for secondary structure formation, such as homodimers and heterodimers. Perform “hairpin”, “self-dimer”, and “hetero-dimer” analysis on the 50 primer. Record the results (see Note 7). 8. If primers meet in silico criteria, order the primers and test them by performing PCR. 3.3 Optimization of PCR Conditions
A number of parameters affect amplification efficiency, including primer and magnesium ion concentrations. 1. Change primer concentration and forward to reverse primer ratio or DNA concentration (see Note 8). 2. Test another polymerase or commercial PCR kit (see Note 9).
4
Notes 1. Fresh coleoptiles and leaves (etiolated), collected approximately 5–7 days after sowing on sterile plates with a blotting paper, are best suited for DNA extraction from plant tissues. The highest isolate concentration can be obtained using traditional methods (e.g., CTAB). Best purity – by using commercial kits (e.g., DNAeasy Plant Kit—Wizard® Genomic DNA Promega).
90
Lidia Skuza
2. Concentration was adjusted to ~50 ng/μl for the amplification of noncoding (intron) regions. 3. Sequence alignment must be performed for each region separately. 4. Select conserved regions used as primer binding sites that flank the variable regions commonly used in phylogenetics and population genetics studies [12]. 5. Common mistake during reverse primer design is ordering the reverse primer in the 50 –30 direction without changing nucleotides into reverse complement. 6. The use of higher primer melting temperatures will promote more specific primer binding to the template. If Tm is too low (52–53 C), more bases can be added to the primer (preferably more C or G). 7. Hairpin melting temperature should be significantly lower than primer annealing temperature in a PCR in which the hairpin is denatured. 8. 0.2 or 1.0 mM each primer and DNA concentrations of 50–150 ng can be used. 0.1 mg/ml BSA can optionally be used. 9. In this example, Taq polymerase was used. References 1. Gielly L, Yuan YM, Kupfer P, Taberlet P (1996) Phylogenetic use of noncoding regions in the genus Gentiana L.: Choloroplast trnL (UAA) intron versus nuclear ribosomal internal transcribed spacer sequences. Mol Phylogenetics Evol 5:460–466 2. Buckler IES, Holtsford TP (1996) Zea systematics: ribosomal ITS evidence. Mol Biol Evol 13:612–622 3. Kelchner S (2000) The evolution of non-coding chloroplast DNA and its application in plant systematics. Ann Mo Bot Gard 87 (4):482–498. https://doi.org/10.2307/ 2666142 4. Lockton S, Gaut BS (2005) Plant conserved non-coding sequences and paralogue evolution. Trends Genet 21(1):60–65. https://doi. org/10.1016/j.tig.2004.11.013 5. Van de Velde J, Van Bel M, Vaneechoutte D, Vandepoele K (2016) A collection of conserved non-coding sequences to study gene regulation in flowering plants. Plant Physiol 171 (4):2586–2598. https://doi.org/10.1104/ pp.16.00821 6. Palmer JD, Jansen RK, Michaels HJ, Chase MW, Manhart JR (1988) Chloroplast DNA
variation and plant phylogeny. Ann Missouri Bot Gard 75(4):1180–1206. Available from: http://www.jstor.org/stable/2399279 7. Clegg MT, Gaut BS, Learn GH, Morton BR (1994) Rates and patterns of chloroplast DNA evolution. Proc Natl Acad Sci U S A 91 (15):6795–6801. Available from: http:// www.pnas.org/content/91/15/6795. abstract 8. Wang X-R, Tsumura Y, Yoshimaru H, Nagasaka K, Szmidt AE (1999) Phylogenetic relationships of Eurasian pines (Pinus, Pinaceae) based on chloroplast rbcL, matk, rpl20rps18 spacer, and trnv intron sequences. Am J Bot 86(12):1742–1753. https://doi.org/10. 2307/2656672 9. Ogihara Y, Terachi T, Sasakuma T (1992) Structural analysis of length mutations in a hot-spot region of wheat chloroplasts DNAs. Curr Genet 22:251–258 10. Chase MW, Fay MF (2009) Barcoding of plants and fungi. Science 325 (5941):682–683. Available from: http://sci ence.sciencemag.org/content/325/5941/ 682.abstract
Primer Design for Noncoding mtDNA and cpDNA Sequences 11. Ford CS, Kl A, Toomey N, Haider N, Van Alphen SJ, Kelly LJ et al (2009) Selection of candidate coding DNA barcoding regions for use on land plants. Bot J Linn Soc 159 (1):1–11. https://doi.org/10.1111/j.10958339.2008.00938.x 12. Hollingsworth PM, Graham SW, Little DP (2011) Choosing and using a plant DNA barcode. PLoS One 6(5):e19254. Available from: https://pubmed.ncbi.nlm.nih.gov/21637336 13. Suo Z, Zhang C, Zheng Y, He L, Jin X, Hou B et al (2012) Revealing genetic diversity of tree peonies at micro-evolution level with hypervariable chloroplast markers and floral traits. Plant Cell Rep 31(12):2199–2213. https:// doi.org/10.1007/s00299-012-1330-0 14. Dong W, Xu C, Li D, Jin X, Li R, Lu Q et al (2016) Comparative analysis of the complete chloroplast genome sequences in psammophytic Haloxylon species (Amaranthaceae). PeerJ 4:e2699. https://doi.org/10.7717/peerj. 2699 15. Wang M, Xie X, Yan B, Yan X, Luo J, Liu Y et al (2018) The completed chloroplast genome of Ostrya trichocarpa. Conserv Genet Resour 10 (3):579–581. https://doi.org/10.1007/ s12686-017-0869-z 16. Xu C, Dong W, Li W, Lu Y, Xie X, Jin X et al (2017) Comparative analysis of six Lagerstroemia complete chloroplast genomes. Front Plant Sci 8:15
91
17. Chiang TY, Schaal BA, Peng C (1998) Universal primers for amplification and sequencing a noncoding spacer between the atpB and rbcL genes of chloroplast DNA. Bot Bull Acad Sin 39:245–250 18. Taberlet P, Gielly L, Pautou G, Bouvet J (1991) Universal primers for amplification of the three noncoding regions of chloroplast DNA. Plant Mol Biol 17:1105–1109 19. Demesure B, Sodzi N, Petit RJ (1995) A set of universal primers for amplification of polymorphic noncoding regions of mitochondrial and chloroplast DNA in plants. Mol Ecol 4:129–131 20. Dumolin-Lapegue S, Pemonge MH, Petit RJ (1997) An enlarged set of consensus primers for the study of organelle DNA in plants. Mol Ecol 6(4):393–397 21. Skuza L, Szuc´ko I, Filip E, Strzała T (2019) Genetic diversity and relationship between cultivated, weedy and wild rye species as revealed by chloroplast and mitochondrial DNA non-coding regions analysis. PLoS One 14(2):e0213023. https://doi.org/10.1371/ journal.pone.0213023 22. Duminil J, Pemonge MH, Petit RJ (2002) A set of 35 consensus primer pairs amplifying genes and introns of plant mitochondrial DNA. Mol Ecol Notes 2:428–430
Chapter 7 Designing PCR Primers for the Amplification-Refractory Mutation System Majid Komijani, Khashayar Shahin, Esam Ibraheem Azhar, and Mohammad Bahram Abstract The recent development in genetic research indicates that there exists intraspecific genetic variability in many organism groups. These variations, which result in a variety of genotypes and phenotypes within a population, are called polymorphism. Mutations in different ways can alter the organism’s phenotype and affect its fitness, for example, by altering disease susceptibility or resistance. Therefore, the detection of point mutations in different genes of a population is of particular importance. The amplification-refractory mutation system technique is a PCR-based method to detect single nucleotide polymorphisms in the genome. The high repeatability, low cost, high accessibility, and no need for sophisticated technology are the main advantages of the ARMS-PCR technique, compared with other available methods such as PCR-RFLP. This chapter describes the design and analysis method of primers for the ARMS-PCR technique. Key words Primer design, Molecular biology, ARMS-PCR
1
Introduction Individuals within a community may show up to 99.9% similarity in their DNA sequences; however, even such fine genetic variations (e.g., single nucleotide polymorphism, SNP) could have a major impact on population structure and function [1]. SNPs represent differences resulting from the substitution of single nucleotides through point mutation. SNPs are the most common type of genomic variability and occur almost on average 1 per 1000 bp in the genome [1]. The study of SNPs is important because some SNPs reflect the health and fitness of individuals [2]. The existing methods to study SNPs have some limitations. For example, Sanger sequencing cannot rapidly screen large numbers of mutations, next-generation sequencing (NGS) is still expensive, and PCR-RFLP is time consuming and nonspecific [1, 3, 4]. The ARMS-PCR (Amplification Refractory Mutation System-PCR)
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_7, © Springer Science+Business Media, LLC, part of Springer Nature 2022
93
94
Majid Komijani et al.
Fig. 1 The normal and mutant allele of a hypothetical gene
technique is an easy and rapid method for detecting point mutations, polymorphisms, and a heterozygous or homozygous gene locus (Newton et al. 1989) [2, 5]. Newton et al. introduced a PCR assay method for detecting SNPs, in which the allele-specific primers have the additional mismatched nucleotides in the 30 region of the primer for the nonallelic [3]. Unlike other techniques used to detect alleles, the ARMS-PCR method does not require the use of restriction enzymes or determination of the sequence of PCR products [2]. In this technique, primers specific to target sequences are needed. These primers strictly amplify the target region present in samples, and other regions remain non-amplified. Thus, the presence or absence of PCR products indicates the presence or absence of the target allele. However, in the ARMS-PCR method, SNP is detected by the size of PCR amplicon using gel electrophoresis. As such, this method is simple, rapid, and affordable. Figure 1 shows the normal and mutant alleles of a hypothetical gene. In this gene, the normal allele contains nucleotide G, and the mutant allele contains nucleotide A. In such conditions, the two alleles can be distinguished by the ARMS PCR method that requires two pairs of primers to be designed. A pair of primers specified here as outer forward and outer reverse must be designed such that the considered region will be located between them. In other words, the two primers are common to both alleles (Fig. 2) [6, 7]. In designing primers, care must be taken that the studied mutation is not located within the primers. It is also critical to consider that the location of outer forward and outer reverse primers must be considered, such that if the distance between the studied mutation to the outer forward primer is α, and its distance to the outer reverse primer is β, α divided by β or β divided by α is greater than 1.5 (Fig. 3). The target region can be homozygote for the normal allele, homozygote for the mutant allele, or heterozygote. The size of the
Designing PCR Primers for the Amplification-Refractory Mutation System
95
Fig. 2 Position of outer forward and outer reverse primers in normal and mutant alleles and the amplification of the same fragment in both allele types and gel electrophoresis results
Fig. 3 The PCR product generated by outer forward and outer reverse primers in homozygote for the normal allele, homozygote for the mutant allele, and heterozygote states
PCR product generated by outer forward and outer reverse primers is the same in all three above cases. Therefore, this fragment is used as a control and should be present in all samples [6, 7]. In addition to the above primers, two other primers named inner forward and inner reverse are also required. These primers need to be designed such that the inner reverse primer is specific to the normal allele, and the inner forward primer is specific to the mutant allele [6, 7]. The inner reverse primer, which is specific to the normal allele (here allele G) at its 30 end, has a complementary base with normal nucleotide (here nucleotide C), indicating that it cannot bind to the mutant allele. On the other hand, the inner forward primer that is specific to the mutant allele (here allele A) at its 30 end has a mutant nucleotide (here T) and, thus, cannot bind to the normal allele (Fig. 4) [6, 7].
96
Majid Komijani et al.
Fig. 4 Position of inner forward and inner reverse primers in normal and mutant alleles and the amplification of the same fragment in both allele types
Fig. 5 Position of inner and outer primers in normal and mutant alleles and gel electrophoresis results
If multiplex PCR is performed with all of the above primers, one of the results seen in Fig. 5 will be observed.
2
Methods To design a primer, it is necessary to first determine the sequence of the gene and SNP in question. To do so, the sequence of the gene in question can be obtained from databases such as https://www. ncbi.nlm.nih.gov/ or https://asia.ensembl.org/index.html. If needed, the SNP in question can be determined by searching for
Designing PCR Primers for the Amplification-Refractory Mutation System
97
the SNP name in the box via the following link https://www.ncbi. nlm.nih.gov/snp/?term¼. Primer design can be performed using the website http:// primer1.soton.ac.uk/primer1.html, which is accessible free of charge. The sequence in question must be selected on this website as described in the introduction section and entered into the box related to the source sequence (up to 1000 bases). Then, in the next box, the position of SNP from the start of the sequence must be specified. It is recommended to insert the nucleotide related to the normal allele in the allele 1 box, and the nucleotide related to the mutant allele in the allele 2 box. The rest of the options can be modified depending on the test conditions. However, it is recommended not to select a number less than 100 for minimum (inner) product size, which may outperform the optimum primer size. In addition, we recommend the maximum primer melting temperature (Tm) to be 22 nucleotides and less than 70 C. It should be noted that the values of maximum complementarity and maximum 30 complementarity are defined as 8 and 3 by default, respectively. Both these primer features strongly correspond to the probability of formation of secondary structures in the primer, i.e., their lower values facilitate setting up of PCR process. After making the desired changes, the Pick Primers option should be selected (Fig. 6). After selecting this option, the results related to the designed primers will be displayed. Each segment consists of four primers of forward inner primer, reverse inner primer, forward outer primer, and reverse outer primer. In addition, the nucleotide sequence, positioning, melting temperature of primers as well as the length of PCR product fragments for each state are specified. To increase specificity and reduce error in the PCR process, usually, the third nucleotide from the end 3 in inner primers is designed by the software as a mismatch. This will enable that only the target region is amplified; thus, care must be taken that this “false” error is not corrected by the user. 2.1 Examination of Primers Using Integrated DNA Technology (IDT) Website
The primers designed by the formation of self-dimer, hetero-dimer, and hairpin need to be examined. Integrated DNA Technology website, which is free of charge, can be used for this purpose. After entering the site at https://eu.idtdna.com/pages/tools/ oligoanalyzer, each of the primers should be entered separately in the sequence box, and the analysis option must be selected. It should be noted that the more negative Gibbs free energy (ΔG) would indicate a more spontaneous reaction. Therefore, the ΔG resulted from the analysis of self-dimer, hetero-dimer, and the hairpin is recommended to approach zero or positive values (Fig. 7).
98
Majid Komijani et al.
Fig. 6 An overview of the website http://primer1.soton.ac.uk/primer1.html and the changeable options in this site 2.2 Examination of Specificity of the Designed Primers
Once the primers have been designed, their sensitivity and specificity must be analyzed by the Primer Blast method to ensure that the designed primers only amplify the fragment in question. Both these features could be assessed using either ProbeMatch (https://rdp. cme.msu.edu/probematch) or the NCBI website (https://www. ncbi.nlm.nih.gov/). It is necessary to note some points during Primer Blast. The first point is that the specificity value of the outer forward and outer reverse; outer forward and inner reverse; and inner forward and outer reverse primers should be studied separately. Secondly, in the results of Primer Blast, nonspecific products with different lengths may be displayed. In such cases, if there is a mismatch at the end 30 , the probability of the
Designing PCR Primers for the Amplification-Refractory Mutation System
99
Fig. 7 Analysis of primers using Oligo analyzer link in Integrated DNA Technology (IDT) website
amplification of these fragments will be lower. The increased number of these mismatches at the end 3 will reduce the probability of the amplification of such fragments. On the other hand, since the third nucleotide from the end 3 in inner primers is designed as a mismatch by the software, such fragments are not actually amplified and do not cause any disruption in the results. In terms of wet-lab specificity, one may rely on the ratio between amplified targeted and non-targeted regions in the final products to further assess the performance of the tested primer sets.
References 1. Shastry BS (2002) SNP alleles in human disease and evolution. J Hum Genet 47 (11):0561–0566 2. Bai R-K, Wong L-JC (2004) Detection and quantification of heteroplasmic mutant mitochondrial DNA by real-time amplification refractory mutation system quantitative PCR analysis: a single-step approach. Clin Chem 50 (6):996–1001 3. Matsuda K (2017) PCR-based detection methods for single-nucleotide polymorphism or mutation: real-time PCR and its substantial contribution toward technological refinement. In: Advances in clinical chemistry, vol 80. Elsevier, Amsterdam, pp 45–72 4. Yang L, Ijaz I, Cheng J, Wei C, Tan X, Khan MA et al (2018) Evaluation of amplification
refractory mutation system (ARMS) technique for quick and accurate prenatal gene diagnosis of CHM variant in choroideremia. Appl Clin Genet 11:1 5. Makanga JO, Christianto A, Inazu T (2015) Allele-specific real-time polymerase chain reaction as a tool for urate transporter 1 mutation detection. In: PCR primer design. Springer, New York, pp 117–125 6. Medrano RFV, de Oliveira CA (2014) Guidelines for the tetra-primer ARMS–PCR technique development. Mol Biotechnol 56(7):599–608 7. Ye S, Dhillon S, Ke X, Collins AR, Day IN (2001) An efficient procedure for genotyping single nucleotide polymorphisms. Nucleic Acids Res 29(17):e88
Part II Primer Design for Genome-Wide Identification of Specific Regions
Chapter 8 Validation of Circular RNAs by PCR Aniruddha Das, Debojyoti Das, and Amaresh C. Panda Abstract High-throughput RNA-sequencing (RNA-seq) technologies combined with novel bioinformatic algorithms discovered a large class of covalently closed single-stranded RNA molecules called circular RNAs (circRNAs). Although RNA-seq has identified more than a million circRNAs, only a handful of them is validated with other techniques, including northern blotting, gel-trap electrophoresis, exonuclease treatment assays, and polymerase chain reaction (PCR). Reverse transcription (RT) of total RNA followed by PCR amplification is the most widely used technique for validating circRNAs identified in RNA-seq. RT-PCR is a highly reproducible, sensitive, and quantitative method for the detection and quantitation of circRNAs. This chapter details the basic guidelines for designing suitable primers for PCR amplification and validation of circRNAs. Key words Divergent primer, Full-length primer, Rolling circle amplification, PCR, Sanger sequencing
1
Introduction The advent of RNA-sequencing (RNA-seq) technologies and the novel computational pipelines discovered tens of thousands of circular RNAs (circRNAs) in various organisms, including humans [1–4]. CircRNAs are a large family of covalently closed singlestranded RNA molecules found to be ubiquitously expressed and conserved during evolution. Their size ranges from less than 100 nt to several thousand bases [3, 4]. CircRNAs are generated from the pre-mRNAs by a head-to-tail splicing mechanism known as backsplicing [5, 6]. CircRNAs are resistant to exonuclease and are very stable compared with the linear RNAs due to the lack of 50 and 30 ends [7–9]. CircRNAs have been shown to be involved in disease development by regulating various critical events in the cells. Recent evidence suggested their role in gene regulation by acting as a sponge for RNA-binding proteins and microRNAs [10]. The unique nonlinear backsplice junction sequence serves as the key for detecting circRNAs in the RNA-seq data [11, 12]. However, the backsplice junctions identified in the RNA-seq may come from
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_8, © Springer Science+Business Media, LLC, part of Springer Nature 2022
103
104
Aniruddha Das et al.
reverse transcriptase template switching, trans-splicing, and backsplicing. Validation of true circRNA and their actual spliced sequence is an essential step for their functional characterization. Further validation of circRNAs can be achieved by various techniques, including exonuclease assay [13], in situ hybridization [14], northern blotting [15], and RT-PCR [16]. Since RT-PCR is a highly sensitive and versatile method, it has been widely adopted as the second most used technique for identifying and quantifying circRNAs after RNA-seq. However, detection and quantification of circRNAs require an accurate protocol for designing special primers and performing semiquantitative/ quantitative (q)PCR. Unlike linear RNAs, circRNA detection requires outward-facing divergent primers that allow specific amplification of the backsplice junction sequence of target circRNA [16]. Here, we provide a detailed protocol for divergent primer design and the basic PCR technique followed by Sanger sequencing to validate the backsplice junction sequence obtained from RNA-seq data. Additionally, we describe the detailed method for RNase R treatment and RT-qPCR using divergent primers for validating the circular nature of the RNA molecule containing the backsplice junction sequence. However, a single pre-mRNA can generate multiple circRNAs containing the same backsplice junction but altered exon/intron combination due to alternative splicing occurring within the circRNA during backsplicing. Additionally, the sequence of the circRNAs matches with the counterpart linear RNA, making it difficult to identify the mature spliced sequence of circRNA from RNA-seq data. To overcome this issue, we also describe the circRNA-rolling circle amplification (circRNARCA) method that enables the identification of full-length sequence of the circRNAs by Sanger sequencing after PCR amplification [17]. Since the function of circRNA depends on its sequence, identification of actual spliced sequence is a crucial step for predicting their biological function. Together, this chapter discusses two important PCR methods to validate the backsplice junction and full-length spliced sequence of circRNAs, which will accelerate the functional characterization of circRNAs.
2
Materials
2.1 Divergent and Full-Length Primer Design
1. Desktop system with web browsers such as Google Chrome or Mozilla Firefox. 2. Retrieving mature circRNA sequences from CircBase, CircInteractome, UCSC browsers, etc. 3. Divergent primer design with NCBI Primer-BLAST, Primer3, and CircInteractome web tool.
PCR Validation of Circular RNAs
2.2 RNase R Treatment and RT-qPCR
105
1. 1.5 ml microcentifuge tube. 2. 0.2 ml PCR tubes. 3. PureLink RNA mini Kit (Thermo Fisher Scientific). 4. Total RNA purified from cells. 5. RNase R (20 U/μl; Lucigen). 6. Random hexamer (50 μM; Thermo Fisher Scientific). 7. 100 mM dNTPs (dATP, dCTP, dGTP, dTTP). 8. Ribolock RNase inhibitor (40 U/μl; Thermo Fisher Scientific). 9. Maxima H-minus reverse transcriptase (200 U/μl; Thermo Fisher Scientific). 10. Maxima reverse transcriptase (200 U/μl; Thermo Fisher Scientific). 11. Nuclease-free water. 12. Primer stocks (100 μM). 13. 2 PowerUp SYBR® Green master mix (Thermo Fisher Scientific).
2.3 Purification and Sanger Sequencing of PCR Products Amplified with Divergent and Full-Length Primers
1. Nuclease-free water. 2. Primer stocks (100 μM). 3. 0.2 ml PCR tubes. 4. 2 DreamTaq PCR master mix (Thermo Fisher Scientific). 5. DNA loading dye. 6. 1 kb Plus DNA Ladder (Thermo Fisher Scientific). 7. PureLink Quick Gel Extraction Kit (Thermo Fisher Scientific). 8. 10 TBE Buffer. 9. Agarose (HiMedia). 10. SYBR Gold gel stain (Thermo Fisher Scientific). 11. HiPurA PCR purification kit (HiMedia).
3
Method All reagents should be nuclease free. Wear gloves all the time and prepare all reactions in a PCR workstation to avoid RNasecontamination during the assay.
3.1 CircRNA PCR with Divergent Primers 3.1.1 Designing Divergent Primers
1. Get the mature spliced sequence for the circRNA of interest from RNA-seq data or from any circRNA database with the circRNA sequence. Mature sequence of circRNA can be retrieved from the UCSC genome browser (https://genome. ucsc.edu/) by joining all the exon sequences present between the backsplice site genomic coordinates (see Note 1).
106
Aniruddha Das et al.
Fig. 1 Schematic representation of backsplicing generating circRNA from the pre-mRNA (top). Schematic showing the design of forward (F) and reverse (R) divergent primers used for validation and quantification of circRNAs (bottom). Divergent primer pair targeting one backsplice site may amplify multiple circRNAs coming from the same gene. Hashtag (#) represents unintended amplification of circRNA_2 with divergent primers targeting circRNA_1
2. Make a circRNA-PCR amplicon template of length 200 nt including the backsplice junction sequence by joining the last 100 nt from the 30 end sequence to the 50 end of the first 100 nt. 3. CircRNA with a length of less than 200 nt can be divided into two halves. Add the 50 half at the end of the 30 half to prepare the backsplice junction PCR template for designing divergent primers. 4. Use NCBI Primer-BLAST (https://www.ncbi.nlm.nih.gov/ tools/primer-blast/) or Primer3 (http://bioinfo.ut.ee/ primer3-0.4.0/) [18] web tool to design the PCR primers for this 200 nucleotides -long circRNA junction sequence template with selecting the PCR product size ranging from 120 to 160 nucleotides (Fig. 1) (see Note 2). 5. Design primers with length ~ 20 bp and with Tm within 58–60 C. 6. Divergent primers for human circRNAs with circBase IDs can be designed with the circInteractome web tool (https:// circinteractome.nia.nih.gov/) [19]. 3.1.2 RNase R Treatment and cDNA Synthesis
1. For RNase R digestion, prepare a 20 μl reaction containing 2–5 μg of total RNA, 2 μl of 10 RNase R buffer, 1 μl of RNase R, 0.5 μl of Ribolock, and adjust the volume with nuclease-free water. Prepare a control reaction without RNase R enzyme (see Note 3).
PCR Validation of Circular RNAs
107
2. Mix the reaction by tapping and centrifuge for a few seconds to settle the reaction at the bottom of the tube 3. Incubate the control and RNase R reaction tubes at 37 C for 30 min. 4. Purify the control and RNase R treated RNAs using PureLink RNA isolation kit and elute in 40 μl nuclease-free water. 5. Take 1 μg of total RNA or equal volume of control and RNase R treated RNA in a 1.5 ml microcentrifuge tube for cDNA synthesis (see Note 4). 6. Prepare a 20 μl cDNA synthesis reaction by adding 1 μl of random hexamer (50 μM), 0.5 μl of Ribolock (40 U/μl), 1 μl of dNTP mix (10 mM of dATP, dTTP, dGTP, and dCTP), 1 μl of Maxima reverse transcriptase enzyme (200 U/μl), 4 μl of 5 Maxima RT buffer to the above RNA and adjust the volume with nuclease-free water (see Note 5). 7. Mix the reaction by tapping a few times, followed by a short spin to bring the reaction to the tube’s bottom. 8. Incubate the cDNA synthesis reaction for 10 min at 25 C followed by 60 min at 50 C (see Notes 4 and 6). 9. Incubate the reaction at 85 C for 5 min to inactivate the RT enzyme, followed by a quick chill on ice. 10. Add 500 μl of nuclease-free water to dilute the cDNA (see Note 7). 11. Use the prepared cDNA in the PCR or store at 20 C till further use. 3.1.3 Identification of circRNA by PCR with Divergent Primer
1. Thaw the cDNA, 2 DreamTaq PCR master mix, and 100 μM primer stock at room temperature. 2. Vortex all solutions for a few seconds and keep on ice. 3. Take 10 μl of forward and reverse primer from the 100 μM stock solution and add 980 μl of nuclease-free water to prepare the 1 μM working solution of divergent primer mix. Vortex the primer mix thoroughly, followed by a short spin. 4. Prepare a 20 μl PCR reaction containing 10 μl of 2 DreamTaq PCR master mix, 5 μl of diluted cDNA, and 5 μl of 1 μM divergent primer mix. Vortex the PCR tube followed by short spin (see Note 8). 5. Setup the PCR on a thermal cycler with the initial step of 95 C for 2 min followed by 40 cycles of 95 C for 2 s and 60 C for 5 s (see Note 9). 6. Store the PCR product at 20 C or use it immediately for agarose gel analysis. 7. Prepare a 2% agarose gel in 1 TBE supplemented with 1 SYBR Gold.
108
Aniruddha Das et al.
Fig. 2 Example circRNA backsplice junction sequence PCR amplified with divergent primers and visualized on 2% agarose gel stained with 1X SYBR Gold (left). Representative Sanger sequencing data of the circRNA PCR product showing the backsplice junction sequence (right). The red arrowhead represents the circRNA backsplice junction
8. Resolve the PCR product on the above agarose gel followed by an analysis of the PCR product’s length on a transilluminator. The PCR reaction should show a single product of expected size (Fig. 2) (see Notes 9 and 10). 9. Purify the PCR product using the PCR purification kit or Gel extraction kit following the manufacturer’s instructions (Fig. 2). 10. Confirm the amplification of circRNA backsplice junction sequence by Sanger sequencing using one of the divergent primers (Fig. 2). 3.1.4 Validation of Circularity by Quantitative (Q)PCR
1. Thaw the cDNA prepared from control and RNase R treated RNA, PowerUp SYBR Green master mix (2), and working solution of 1 μM divergent primer mix. 2. Set up the quantitative real-time PCR of 20 μl in a 96-well plate using 5 μl of cDNA, 5 μl of 1 μM divergent primer mix, and 10 μl of 2 PowerUp SYBR Green master mix (see Note 11). 3. Seal the plate and vortex for a few seconds to mix the reaction evenly, followed by a short spin to settle the reaction at the bottom of the well. 4. Set up the RT-qPCR on the QuantStudio 6 real-time PCR machine with an initial step of 95 C for 2 min followed by 40 cycles of 95 C for 2 s and 60 C for 5 s. (see Notes 9 and 12). 5. Check the relative expression levels of linear RNAs (GAPDH or ACTB mRNA), and circRNAs of interest in the RNase R treated sample compared with the control treatment sample using the comparative delta-CT method [20] (see Note 13).
PCR Validation of Circular RNAs
109
3.2 CircRNA PCR with Full-Length Primers
The PCR with divergent primers can only validate the backsplice junction sequence while the mature circRNA sequence is derived from the transcriptomic data computationally. However, multiexonic circRNAs can have multiple circRNA splice variants containing a different combination of exons/introns with the same backsplice junction. Here, we provide the detailed protocol to design fulllength PCR primers and perform circRNA-RCA that identifies the actual mature sequence of circRNA and its alternatively spliced isoforms.
3.2.1 Full-Length circRNA Primer Design
1. Get the mature sequence for the circRNA of interest from UCSC genome browser or RNA-seq data or any of the circRNA databases including CircNet, CircInteractome, etc. (see Note 1). 2. Take the 10 nt from the 30 end of the mature spliced circRNA sequence and add it to the 50 end. 3. The first 20 nucleotides of the 50 end of the above sequence containing the circRNA junction will be the full-length forward (fl-F) primer. Make the reverse complementary sequence of the last 20 nucleotides from the 3’ end of the mature sequence and consider that as the full-length reverse (fl-R) primer (Fig. 3). 4. Although the primers’ optimal length is around 20 nt, the length can vary to have the primer Tm between 58–60 ºC.
Fig. 3 Schematic representation of the biogenesis of two circRNA splice variants with same backsplice site and different exon combinations (top). The full-length forward primer is placed on the backsplice site while the full-length reverse primer is placed upstream of the forward primer. PCR with full-length PCR primers can amplify circRNA splice variants with different internal sequences (bottom)
110
Aniruddha Das et al.
3.2.2 Rolling Circle Reverse Transcription to Generate Full-Length cDNA
1. Take equal volume of RNA from control and RNase R treated sample in a 1.5 ml microcentrifuge tube for full-length cDNA synthesis. 2. Prepare a 20 μl cDNA synthesis reaction by adding 1 μl of random hexamer (50 μM), 0.5 μl of Ribolock, 1 μl of dNTP mix (10 mM), 4 μl of 5 Maxima reverse transcriptase buffer, and 1 μl of Maxima H minus reverse transcriptase enzyme to the above RNA (see Note 14). 3. Mix the reaction thoroughly by tapping the tubes a few times, followed by a short spin. 4. Incubate the reaction for 10 min at 25 C followed by 60 min at 50 C for cDNA synthesis (see Note 6). 5. Add 1 μl of RNase H and incubate cDNA reaction for 15 min at 37 ºC. 6. Incubate the reaction at 85 C for 5 min to inactivate the reverse transcriptase, followed by a quick chill on ice. 7. Add 500 μl of nuclease-free water to dilute the full-length cDNA stock for immediate use or store at 20 C (see Note 7).
3.2.3 CircRNA PCR Using Full-Length Primers
1. Thaw the full-length cDNA, 2 DreamTaq PCR master mix, and full-length forward and reverse primer stock (100 μM) solutions at room temperature. 2. Prepare the 1 μM full-length primer mix by adding 10 μl of the fl-F and fl-R primers from the 100 μM stock solution to 980 μl of nuclease-free water. Vortex the primer mix thoroughly, followed by a short spin. 3. Prepare a 20 μl PCR reaction in a PCR tube using 5 μl of fulllength cDNA prepared with RNase H-minus RT, 5 μl of 1 μM full-length primer mix, and 10 μl of 2 DreamTaq PCR master mix. 4. Seal the PCR tube and vortex it for a few seconds to thoroughly mix the reaction, followed by a short spin. 5. Set up the full-length circRNA PCR in a thermal cycler with the following cycling conditions: 95 C for 2 min, 40 cycles of 95 C for 5 s, 58 C for 20 s, and 72 C for 60 s (see Note 15). 6. Store the PCR products at 20 C or use them immediately for agarose gel analysis.
3.2.4 Identification of Full-Length Spliced Sequence of circRNA
1. Prepare a 2% agarose gel containing 1 SYBR Gold and 1 TBE buffer (see Notes 16 and 17). 2. Mix the full-length PCR product with DNA loading dye and resolve on an agarose gel along with DNA ladder.
PCR Validation of Circular RNAs
111
Fig. 4 circRNA PCR using full-length primers. (a) Schematic representation of rolling circle cDNA synthesis of circRNA with H minus reverse transcriptase. PCR amplification of specific circRNA using the above cDNA and full-length primers results in amplification of full-length circRNA (X 1) or doublet (X 2) of the target circRNA. Hashtag represents the amplification of circRNA splice variant. (b) Example full-length PCR products amplified using fl-F and fl-R primers were resolved on a 2% agarose gels stained with SYBR Gold. Two PCR products represent circRNA splice variants with same backsplice junction sequence
3. Visualize the gel on a transilluminator to analyze the size of the amplified PCR products for specific circRNA and their splice variants. 4. Amplification of multiple PCR products other than the expected circRNA size warrants further sequencing of each amplified product (see Note 18). 5. As shown in Fig. 4, purify both the gel bands using the PureLink Quick gel extraction kit following the manufacturer’s instruction (see Note 19). 6. Quantify the PCR products isolated from the gel (see Note 20). 7. Sequence each of the PCR products with the full-length forward or reverse primer using the Sanger sequencing protocol. 8. Sanger sequencing will identify the altered use of exon/intron in the circRNA splice variants during backsplicing.
4
Notes 1. The circRNA spliced sequence is predicted with the assumption that all the exons in between the backsplice site are included in the circRNA. Please consider the strand information when the sequences are obtained from the genome. Please
112
Aniruddha Das et al.
remember that multiexonic circRNAs can have different isoforms depending on the exons and/or introns included during circRNA biogenesis. 2. Divergent primers should not overlap the backsplice junction site. 3. RNase R treatment is not recommended for quantitation of circRNA expression by RT-qPCR. However, to examine the circularity of the target backsplice junction, RNase R treatment followed by qPCR must be performed. Additionally, preheating the RNA at 70 ºC for 5 min before treating it with RNase R may improve the degradation of linear RNAs with secondary structures. 4. The reverse transcription reaction can also be prepared in a PCR tube or PCR plate. For PCR tubes/plates, perform the reverse transcription reaction on a thermal cycler with a heated lid at 105 ºC. 5. Any reverse transcriptase or cDNA synthesis kit can be used to prepare cDNA for circRNA analysis. Since circRNAs lack polyA tail, the cDNA must be prepared with random hexamer only. 6. Incubation at 50 ºC may cause evaporation of the reaction in the tube. The reaction tube may be centrifuged for a few seconds to bring the reaction to the bottom of the tube after 60 min of incubation at 50 ºC. 7. The volume of water to dissolve the prepared cDNA depends on the initial amount of RNA taken for cDNA synthesis and the abundance of the target circRNA. 8. In this protocol, we used DreamTaq DNA polymerase as an example. Any Taq polymerase or PCR master mix can be used for PCR. 9. The time given for each PCR step is for reference only. Those can be modified depending on the PCR amplicon size and Taq polymerase used in the PCR. However, the duration of the PCR extension step is crucial for specific amplification of target circRNA. It is highly recommended to use two-step PCR or qPCR with a combined annealing and extension time of 5 s which allows specific amplification of the target circRNA backsplice junction without the amplification of other circRNAs. As shown in Fig. 1, higher annealing and extension time may allow nonspecific amplification of other circRNAs generated from the same gene. 10. Since most circRNAs are less abundant than the linear mRNAs, it is highly recommended to include a No-RT and water control in the PCR. 11. Any SYBR Green master mix can be used for qPCR.
PCR Validation of Circular RNAs
113
12. Since the same gene can generate multiple circRNAs, the amplification of a specific circRNA for the first time in PCR and RT-qPCR must be validated by gel electrophoresis and melt curve analysis. Any primer set amplifying more than one product should not be used for RT-qPCR. RT-qPCR reactions should be performed in three biological replicates, with two technical replicates for each biological replicate. 13. The linear RNAs are expected to be degraded by RNase R treatment without affecting the circRNAs due to lack of free 30 end. 14. Other RNase H-minus reverse transcriptase can be used in place of RNase H-minus Maxima RT enzyme for full-length cDNA synthesis. 15. The extension time for the full-length PCR can be changed depending on the length of the circRNA spliced sequence and the Taq polymerase used for PCR. We recommend an extension time of 1 min per kb of PCR amplicon. 16. Agarose gels can be stained with ethidium bromide or SYBR safe or any other DNA staining dye. 17. The percentage of the gel can be changed depending on the length of the circRNA. 18. All the PCR products amplified using full-length primers must be sequence-verified to find the actual sequence of the circRNAs. Multiple bands in circRNA-RCA do not always mean splice variants. The synthesis of long tandem repeats of fulllength cDNA with RNase H-minus RT may result in multiple rounds of circRNA amplification. 19. Preheating the elution buffer or nuclease-free water at 65 ºC may increase the yield of purified PCR product. 20. If the PCR product concentration is too low for Sanger sequencing, the specific band can be PCR amplified again with the full-length primers followed by sequencing of the purified PCR product.
Acknowledgments This work was supported by the DBT/Wellcome Trust India Alliance Fellowship [grant number IA/I/18/2/504017] awarded to Amaresh Panda and intramural support from Institute of Life Sciences, DBT, India. Aniruddha Das and Debojyoti Das are supported by University Grant Commission of India. Conflicts of interest: The authors declare no conflict of interest.
114
Aniruddha Das et al.
References 1. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO (2012) Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 7 (2):e30733. https://doi.org/10.1371/jour nal.pone.0030733 2. Ji P, Wu W, Chen S, Zheng Y, Zhou L, Zhang J, Cheng H, Yan J, Zhang S, Yang P, Zhao F (2019) Expanded expression landscape and prioritization of circular RNAs in mammals. Cell Rep 26(12):3444–3460.e3445. https://doi.org/10.1016/j.celrep.2019.02. 078 3. Glazar P, Papavasileiou P, Rajewsky N (2014) circBase: a database for circular RNAs. RNA 20 (11):1666–1670. https://doi.org/10.1261/ rna.043687.113 4. Vromman M, Vandesompele J, Volders PJ (2020) Closing the circle: current state and perspectives of circular RNA databases. Brief Bioinform. https://doi.org/10.1093/bib/ bbz175 5. Zhang Y, Xue W, Li X, Zhang J, Chen S, Zhang JL, Yang L, Chen LL (2016) The biogenesis of nascent circular RNAs. Cell Rep 15 (3):611–624. https://doi.org/10.1016/j.cel rep.2016.03.058 6. Chen LL, Yang L (2015) Regulation of circRNA biogenesis. RNA Biol 12(4):381–388. https://doi.org/10.1080/15476286.2015. 1020271 7. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495 (7441):384–388. https://doi.org/10.1038/ nature11993 8. Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19(2):141–157. https://doi.org/10. 1261/rna.035667.112 9. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, Loewer A, Ziebold U, Landthaler M, Kocks C, le Noble F, Rajewsky N (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441):333–338. https:// doi.org/10.1038/nature11928 10. Panda AC, Grammatikakis I, Munk R, Gorospe M, Abdelmohsen K (2017) Emerging
roles and context of circular RNAs. Wiley Interdiscip Rev RNA 8(2). https://doi.org/ 10.1002/wrna.1386 11. Jeck WR, Sharpless NE (2014) Detecting and characterizing circular RNAs. Nat Biotechnol 32(5):453–461. https://doi.org/10.1038/ nbt.2890 12. Szabo L, Salzman J (2016) Detecting circular RNAs: bioinformatic and experimental challenges. Nat Rev Genet 17(11):679–692. https://doi.org/10.1038/nrg.2016.114 13. Suzuki H, Zuo Y, Wang J, Zhang MQ, Malhotra A, Mayeda A (2006) Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids Res 34(8): e63. https://doi.org/10.1093/nar/gkl151 14. Zirkel A, Papantonis A (2018) Detecting circular RNAs by RNA fluorescence in situ hybridization. Methods Mol Biol 1724:69–75. https://doi.org/10.1007/978-1-4939-75624_6 15. Schneider T, Schreiner S, Preusser C, Bindereif A, Rossbach O (2018) Northern blot analysis of circular RNAs. Methods Mol Biol 1724:119–133. https://doi.org/10. 1007/978-1-4939-7562-4_10 16. Panda AC, Gorospe M (2018) Detection and analysis of circular RNAs by RT-PCR. Bio Protoc 8(6). https://doi.org/10.21769/ BioProtoc.2775 17. Das A, Rout PK, Gorospe M, Panda AC (2019) Rolling circle cDNA synthesis uncovers circular RNA splice variants. Int J Mol Sci 20 (16). https://doi.org/10.3390/ ijms20163988 18. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3--new capabilities and interfaces. Nucleic Acids Res 40(15):e115. https://doi. org/10.1093/nar/gks596 19. Panda AC, Dudekula DB, Abdelmohsen K, Gorospe M (2018) Analysis of circular RNAs using the web tool CircInteractome. Methods Mol Biol 1724:43–56. https://doi.org/10. 1007/978-1-4939-7562-4_4 20. Schmittgen TD, Livak KJ (2008) Analyzing real-time PCR data by the comparative C (T) method. Nat Protoc 3(6):1101–1108. https://doi.org/10.1038/nprot.2008.73
Chapter 9 Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana Pinky Dhatterwal, Sandhya Mehrotra, and Rajesh Mehrotra Abstract The aim of the present study is to optimize the PCR conditions required to amplify the promoter sequence of an amino acid transporter having an AT-rich base composition with a high number of tandem repeats. The present study also covers the key parameters that need to be kept in mind while designing primers. Results show that successful can be achieved by performing a 2-step PCR reaction at a lower extension temperature of 65 ̊C for an increased extension period of 1.5 min/kb, with MgCl2 concentration ranging from 2.5 to 3.0mM. The results also suggest that the DNA concentration of around 25–30 ng/μl was essential to achieve this amplification. Key words Primer3, Promoter, AT-rich, Tandem repeats, Arabidopsis thaliana, PCR
1
Introduction PCR is one of the indispensable techniques in molecular biology since its discovery by Kary Mullis in 1985 [1]. PCR is a method of in-vitro amplification of a specific DNA segment exponentially by DNA polymerase and is highly reliable because of its sensitivity, accuracy, and speed [2, 3]. PCR is widely used to study gene expression, detect genetic variations in medical diagnostics, forensic investigations, and agricultural biotechnology. For successful PCR reactions, good primer designing is a key step. In this chapter, we focus on key points for primer designing and standardization of PCR conditions to amplify an AT-rich promoter region (1781 bp) of an amino acid transporter (AT2G40420) from Arabidopsis thaliana with a high number of tandem repeats. Plant promoter regions are generally difficult to amplify by PCR as they are highly AT-rich and sometimes contain tandem repetitive DNA sequences [4, 5]. Tandem repeats represent
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_9, © Springer Science+Business Media, LLC, part of Springer Nature 2022
115
116
Pinky Dhatterwal et al.
two or more copies of short segments of DNA repeatedly occurring from head-to-tail within the coding and regulatory regions [6]. The problem with these templates is that they need lower annealing and extension temperatures, resulting in the amplification of undesired products [7, 8]. The in-silico analysis of the promoter sequence (AT2G40420) revealed that it possesses many important cis-acting regulatory elements such as light-responsive, auxin-responsive, salicylic acidresponsive, and abscisic acid-responsive elements along with 14 copies of an ACGT motif [9]. Studies suggest that the ciselements with ACGT core sequence responds to light, anaerobiosis, phytohormones like abscisic acid, jasmonic acid, salicylic acid, and auxin [10, 11]. Furthermore, Zou et al. (2011) conducted a study where they concluded that around 19.6% of the total pCREs (putative cis-regulatory elements) identified in the promoter regions of abiotic stress-responsive genes have ACGT as a core sequence [12]. Therefore, tapping this promoter sequence for its response to abiotic stress conditions can potentially bring forth important characteristics that can further find wide application for generation of transgenic plants with high stress tolerance. A suitable promoter is needed to achieve the desired expression levels of a transgene [13]. In the study, the promoter sequence (AT2G40420, 1781 bp) was amplified from Arabidopsis thaliana genome. However, the sequence is 65.2% AT-rich and has 15 copies of 28 base long tandem repeat [14], which makes it difficult to amplify by PCR (Fig. 1). These tandem repeat sequences have a binding site for bZIP (basic leucine zipper) transcription factors (TFs). Reports suggest that tandem repeats possessing binding sites for transcription factors in the promoter regions can affect the transcriptional
Fig. 1 Effects of MgCl2 concentration on PCR amplification at an extension temperature of 65 C. Lane M: 10 kb DNA ladder; lane 1: 1.5 mM MgCl2; lane 2: 2 mM MgCl2; lane 3: 2.5 mM MgCl2; lane 5: 3 mM MgCl2; lane 6: 3.5 mM MgCl2; lane 7: no-template negative control
Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana
117
rate of a gene [15]. To check the effect of all these TF binding sites localized in tandem repeats on the downstream gene expression, isolation of the promoter sequence with all the copies of tandem repeats was highly desirable.
2
Materials
2.1 Plant Material and Growth Conditions
3
Arabidopsis thaliana ecotype Col-0 was used for the experiment. The seeds were surface sterilized using 70% ethanol and 0.4% sodium hypochlorite solution. Sterile seeds were sown in pots filled with perlite and were kept at 4 C for 3 days for stratification. After 3 days, the pots were shifted to the growth chamber set at temperature 22 C, humidity 70%, 16 h light/ 8 h dark photoperiod.
Methods
3.1 Guidelines for Primer Designing 3.1.1 Primer Length
3.1.2 GC Content and GC Clamp
The specificity of a PCR reaction depends mainly on the primer length and annealing temperature [16]. The primers with length 18–24 nucleotides are considered to be optimal. Primers less than 18 nucleotides in length are not recommended, especially while working with complex templates such as genomic DNA, as the shorter the primers, the faster the annealing will be and, hence, lower specificity. However, while working with cDNA, the primer length could be reduced (less than 18 nucleotides) as the chances of nonspecific interactions between the primer and template are relatively low [17]. Regarding the upper limit on primer length, primers longer than 30 bases are rarely used. The longer the primer, the slower the rate at which it hybridizes to the template DNA. Hence, resulting in a significant decrease in the amplified product. The efficiency of the PCR is calculated by the amount of amplified product produced and is reduced if the primers used in the reaction are too long [18]. To ensure the stable binding of primers with the target sequence, the GC content of 40–60% is recommended. There should either be G or C bases to promote specific primer binding within the last five bases at the 30 end of primers as these bases have stronger hydrogen bonding [19]. This is known as a GC Clamp. However, too many repeating G or C bases (more than three times) should be avoided as it may lead to primer-dimer formation. Also, repetition of a single base (e.g., AAAAA or CCCCC) or dinucleotide (e.g., GCGCGCGCGC or ATATATATAT) more than four times should be avoided [20].
118
Pinky Dhatterwal et al.
3.1.3 Melting Temperature (Tm)
Melting temperature (Tm) is the temperature at which half of the primers bind their target sequence. The primers should have the Tm between 55 C and 65 C, and the difference between their Tms should not be more than 5 C (see Note 1). Tm depends on the base composition and can be roughly calculated using the formula [21]. T m ¼ 4ðG þ CÞ þ 2ðA þ TÞ However, the Tm of the primers is now calculated using the Nearest Neighbors method [22]. This is the best method available right now as it considers various factors such as primer sequence, oligonucleotide, and monovalent cation concentrations, rather than just the base composition.
3.1.4 Annealing Temperature
Specific annealing of the primer with the template DNA occurs only at a particular annealing temperature. Because if the temperature is too low, the chances of amplification of nonspecific products are high. On the other hand, if the temperature is too high, the desired product’s yield is reduced due to reduced primer-template hybridization [23]. Hence, the ideal annealing temperature should be in a range that enables primer-template hybridization and prevents the amplification of non-desired products. The annealing temperature is determined using the melting temperatures, and usually, a temperature 1–5 C below Tm is used (see Note 2). However, the optimal annealing temperature can be calculated more accurately using the formula [24]: T a ¼ 0:3 T m ðprimerÞ þ 0:7 T m ðproductÞ 14:9 where, Tm(primer) ¼ Tm of the less stable primer-template pair. Tm(product) ¼ Tm of the PCR product. Moreover, in the case of high-fidelity DNA polymerases such as Phusion, Platinum® Pfx, and Q5® High-Fidelity, the optimal annealing temperatures tend to be higher as compared with other PCR Polymerases such as Taq-based polymerases. The annealing temperatures when using high-fidelity polymerases can be determined using the Thermo Scientific Tm calculator. Thus, when primers with annealing temperatures 72 C, a two-step PCR protocol (combining the annealing and extension steps) is recommended. Also, if the Tm of the primers is higher than 65 C, a two-step PCR is suggested [25].
3.1.5 Secondary Structures
The presence of inter-primer (primer pairs possessing complementary sequences) and intra-primer (primer possessing more than three bases that complement within the primer leading to hairpin structure) homologies need to be avoided while designing the primers [17]. As this might lead to self-dimer and primer-dimers
Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana
119
Table 1 Forward and reverse primer specifications for AT2G40420 promoter sequence Primer
Primer sequence (50 to 30 )
Tm ( C)
GC%
Product size
AT2G40420F
CCTACTAGTTCGTGATACTG
52.05
45.00
1781 bp
AT2G40420R
CGAACGATTCCTTCATCACG
57.02
50.00
formation instead of annealing to the desired DNA templates resulting in low or no product yield. The freely available software such as Primer3 (https://bioinfo.ut.ee/primer3-0.4.0/) and OligoAnalyzer™ Tool (https://www.idtdna.com/calc/analyzer) can be used to screen for potential primer-dimer and intramolecular hairpin formations. 3.1.6 Specificity Check
The last step of primer designing is to check their specificity. The primer’s specificity can be checked through NCBI Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/) against the genome of interest to ensure that they are binding to the desired genomic regions.
3.2 Primer Designing for Promoter Region
Primers were designed to amplify a 1781 bp promoter region of an amino acid transporter (AT2G40420) (Table 1) using the Primer3 program [26], and their specificity was ensured by performing primer-BLAST (www.ncbi.nlm.nih.gov/tools/primer-blast/) within the Arabidopsis genome. Further, the Integrated DNA Technologies SciTools Oligoanalyzer tool (https://www.idtdna. com/calc/analyzer) was used to look for the presence of any secondary structure and primer dimers.
3.3 Genomic DNA Isolation
Genomic DNA was isolated from Arabidopsis thaliana leaves using Qiagen DNeasy Plant Mini kit (Murray and Thompson 1980), according to the manufacturer’s recommendations yielding 3–30 μg of high-quality DNA per sample (see Note 3). The DNA integrity was confirmed by running 0.8% agarose gel electrophoresis at 80 V for 30 min.
3.4
Each 20 μl PCR reaction contained 2 μl of genomic DNA (50 ng), 4 μl of 5 Phusion HF buffer, 0.4 μl of 10 mM dNTPs, 0.8 μl of each 10 μM forward and reverse primer, 0.2 μl of Phusion DNA polymerase (2 U/μl), and varying concentrations of MgCl2 ranging from 1.5 to 3.5 mM (see Note 4). All the reagents were procured from Thermo Fisher Scientific (Catalog number: F530S, Waltham, MA, USA) and MB-grade nucleasefree water from Himedia (Catalog number: ML024). A two-step PCR reaction was carried out using the Applied Biosystems® Veriti® 96-Well Thermal Cycler (Catalog number: 4375786, Foster
PCR Conditions
120
Pinky Dhatterwal et al.
City, CA, USA) with conditions as follows: Initial denaturation at 98 C for 1.5 min; followed by 35 cycles of denaturation at 98 C for 30 s, extension at 60 C/65 C/68 C/72 C for 3 min and final extension at 60 C/65 C/68 C/72 C for 7 min (see Note 5). PCR reactions for each extension temperature with varying MgCl2 concentrations were performed separately and in triplicates. PCR products were checked by electrophoresis in 1% (w/v) agarose gel at 80 V for 30 min. Results show that successful amplification can be achieved by performing a two-step PCR reaction at a lower extension temperature of 65 C for an increased extension period of 1.5 min/kb, with MgCl2 concentration ranging from 2.5 to 3.0 mM. The results also suggest that the DNA concentration of about 25–30 ng/μl was essential to achieve this amplification. 3.5 Amplicon Sequence Analyses
4
The QIAquick Gel Extraction Kit (Qiagen, Catalog number: 28704) was used to purify the PCR products. The purified PCR product along with the primers used for amplification was then directed for sequencing to verify the specificity of the amplified product. The amplicon specificity was confirmed by analyzing the obtained sequencing results with the reference sequence deposited in the TAIR database (https://www.arabidopsis.org) of the amino acid transporter promoter region [27].
Notes 1. Primer melting temperature. In the case of a very low Tm of primer, try to increase the primer length by a few bases or select that portion of the sequence with more GC content. 2. Gradient PCR to determine the optimum annealing temperature. Generally, an annealing temperature of 5 C below the primer melting point (Tm) is used for the PCR reaction. However, in most cases it needs to be tested empirically. This can be achieved by using the Gradient PCR. The annealing temperature gradient should start with a temperature of 5–10 C lower than the annealing temperature generated by the Tm calculator and could be increased up to the extension temperature (two-step PCR). With the gradient PCR, not only the annealing temperature but also other factors such as concentration of MgCl2, buffer, and primers can also be optimized. 3. Quality and concentration of template DNA. The DNA template should be pure, homogeneous, and concentration should be around 50–60 ng for genomic DNA
Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana
121
and 10–20 ng for cDNA or purified plasmid per 20 μl of reaction volume. 4. Concentration of magnesium ions. The Magnesium ion concentration greatly influences the PCR reaction as DNA polymerase requires Mg2+ ions for its proper functioning [28, 29]. Therefore, to achieve maximal PCR yield, the MgCl2 concentration needs to be optimized. As a high Mg2+ ion concentration can hinder the reaction by preventing proper melting of template DNA and can also promote nonspecific binding of primers. Even a low Mg2+ ion concentration can adversely affect the product yield. With this aim, varying concentrations of MgCl2 such as 1.5, 2.0, 2.5, 3.0, and 3.5 mM were tried. The desired amplicon yield was obtained at a 3.0 mM MgCl2 concentration (Fig. 1). 5. Extension temperature. For successful amplification, the extension time and temperature need to be carefully optimized. Xin-Zhuan Su et al. (1996) reported that to amplify an AT-rich DNA, reduced extension temperatures are needed [30]. In the present study, a two-step PCR (denaturation and amplification) was performed at four different extension temperatures 60, 65, 68, and 72 C with increased extension time from the usual 1 to 1.5 min/kb. Successful amplification was achieved at an extension temperature of 65 ̊C with 2.5 mM MgCl2 yielding a faint band while an intense band was observed with 3 mM MgCl2 concentration (Fig. 1). No results were obtained at other extension temperatures (60, 68, and 72 C) at any of the five MgCl2 concentrations tested (data not shown).
Acknowledgement The authors are grateful to BITS Pilani, K. K. Birla Goa Campus, Goa, India, for providing infrastructural and logistic support. P.D. is thankful to BITS Pilani and CSIR for financial assistance. This study was supported by SERB project EMR/2016/002470 sanctioned by the Government of India to S.M. References 1. Garibyan L, Avashia N (2013) Polymerase chain reaction. J Invest Dermatol 133:3. https://doi.org/10.1038/jid.2013.1 2. Coleman WB, Tsongalis GJ (2006) The polymerase chain reaction. In: Coleman WB, Tsongalis GJ (eds) Molecular diagnostics for the clinical laboratorian. Humana Press Inc., Totowa, NJ, pp 47–55
3. Obradovic J, Jurisic V, Tosic N, Mrdjanovic J, Perin B, Pavlovic S et al (2013) Optimization of PCR conditions for amplification of GC-rich EGFR promoter sequence. J Clin Lab Anal 27:487–493. https://doi.org/10.1002/jcla. 21632 4. Sahdev S, Saini S, Tiwari P, Saxena S, Saini KS (2007) Amplification of GC-rich genes by
122
Pinky Dhatterwal et al.
following a combination strategy of primer design, enhancers and modified PCR cycle conditions. Mol Cell Probes 21:303–307. https://doi.org/10.1016/j.mcp.2007.03.004 5. Gemayel R, Cho J, Boeynaems S, Verstrepen KJ (2012) Beyond junk-variable tandem repeats as facilitators of rapid evolution of regulatory and coding sequences. Genes 3:461–480. https://doi.org/10.3390/ genes3030461 6. Quilez J, Guilmatre A, Garg P, Highnam G, Gymrek M, Erlich Y et al (2016) Polymorphic tandem repeats within gene promoters act as modifiers of gene expression and DNA methylation in humans. Nucleic Acids Res 44:3750–3762. https://doi.org/10.1093/ nar/gkw219 7. Hommelsheim CM, Frantzeskakis L, € Huang M, Ulker B (2014) PCR amplification of repetitive DNA: a limitation to genome editing technologies and many other applications. Sci Rep 4:5052. https://doi.org/10.1038/ srep05052 8. Kennedy S, Oswald N (2011) PCR troubleshooting and optimization: the essential guide. Caister Academic Press, Norfolk 9. Lescot M, De´hais P, Thijs G, Marchal K, Moreau Y, Van de Peer Y, Rouze´ P, Rombauts S (2002) PlantCARE, a database of plant cis-acting regulatory elements and a portal to tools for in silico analysis of promoter sequences. Nucleic Acids Res 30:325–327. https://doi.org/10.1093/nar/30.1.325 10. Mehrotra R, Yadav A, Bhalothia P, Karan R, Mehrotra S (2012) Evidence for directed evolution of larger size motif in Arabidopsis thaliana genome. Sci World J:1–5. https://doi.org/ 10.1100/2012/983528 11. Mehrotra R, Sethi S, Zutshi I, Bhalothia P, Mehrotra S (2013) Patterns and evolution of ACGT repeat cis-element landscape across four plant genomes. BMC Genomics 14:203. https://doi.org/10.1186/1471-216414-203 12. Zou C, Sun K, Mackaluso JD, Seddon AE, Jin R, Thomashow MF, Shiu SH (2011) Cis-regulatory code of stress-responsive transcription in Arabidopsis thaliana. Proc Natl Acad Sci U S A 108:14992–14997. https:// doi.org/10.1073/pnas.1103202108 13. Potenza C, Aleman L, Sengupta-Gopalan C (2004) Invited review: targeting transgene expression in research, agricultural, and environmental applications: promoters used in plant transformation. In Vitro Cell Dev Biol Plant 40:1–22. https://doi.org/10.1079/ IVP2003477
14. Chow CN, Zheng HQ, Wu NY, Chien CH, Huang HD, Lee TY et al (2016) PlantPAN 2.0: an update of plant promoter analysis navigator for reconstructing transcriptional regulatory networks in plants. Nucleic Acids Res 44: D1154–D1160. https://doi.org/10.1093/ nar/gkv1035 15. Vinces MD, Legendre M, Caldara M, Hagihara M, Verstrepen KJ (2009) Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324:1213–1216. https://doi.org/10.1126/science.1170097 16. Wu DY, Ugozzoli L, Pal BK, Qian J, Wallace RB (1991) The effect of temperature and oligonucleotide primer length on the specificity and efficiency of amplification by the polymerase chain reaction. DNA Cell Biol 10:233–238. https://doi.org/10.1089/dna.1991.10.233 17. Dieffenbach CW, Lowe TM, Dveksler GS (1993) General concepts for PCR primer design. PCR Methods Appl 3:S30–S37. https://doi.org/10.1101/gr.3.3.s30 18. Rybicki EP (2001) PCR primer design and reaction optimisation. In: Coyne VE, James MD, Reid SJ, Rybicki EP (eds) Molecular biology techniques manual. Innis, pp 39–45 19. Chuang LY, Cheng YH, Yang CH (2013) Specific primer design for the polymerase chain reaction. Biotechnol Lett 35:1541–1549. https://doi.org/10.1007/s10529-0131249-8 20. Lorenz TC (2012) Polymerase chain reaction: basic protocol plus troubleshooting and optimization strategies. J Vis Exp 63:e3998. https://doi.org/10.3791/3998 21. Suggs SV, Hirose T, Myake EH, Kawashima MJ, Johnson KI, Wallace RB (1981) In: Brown DD (ed) ICN-UCLA symposium for developmental biology using purified gene, vol 23. Academic Press, New York, pp 683–693 22. SantaLucia J Jr (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor thermodynamics. Proc Natl Acad Sci U S A 95:1460–1465. https://doi. org/10.1073/pnas.95.4.1460 23. Brown TA (2010) Gene cloning & DNA analysis an introduction, 6th edn. Blackwell Publishing, Hoboken, NJ 24. Rychlik W, Spencer WJ, Rhoads RE (1990) Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Res 18:6409–6412. https://doi.org/10. 1093/nar/18.21.6409 25. Jin HX, Seo SB, Lee HY, Cho S, Ge J, King J, Budowle B, Lee SD (2014) Differences of PCR efficiency between two-step PCR and standard
Primer Designing for Amplifying an AT-Rich Promoter from Arabidopsis thaliana three-step PCR protocols in short tandem repeat amplification. Aust J Forensic Sci 46:80–90. https://doi.org/10.1080/ 00450618.2013.788681 26. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M et al (2012) Primer3— new capabilities and interfaces. Nucleic Acids Res 40:e115–e115. https://doi.org/10. 1093/nar/gks596 27. Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E et al (2015) The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis 53:474–485. https://doi. org/10.1002/dvg.22877
123
28. Ekman S (1999) PCR optimization and troubleshooting, with special reference to the amplification of ribosomal DNA in lichenized fungi. Lichenologist 31:517–531. https://doi. org/10.1006/lich.1999.0226 29. Cao Y, Zheng Y, Fang B (2004) Optimization of polymerase chain reaction-amplified conditions using the uniform design method. J Chem Technol Biotechnol 79:910–913. https://doi.org/10.1002/jctb.1078 30. Su XZ, Wu Y, Sifri CD, Wellems TE (1996) Reduced extension temperatures required for PCR amplification of extremely A+ T-rich DNA. Nucleic Acids Res 24:1574–1575. https://doi.org/10.1093/nar/24.8.1574
Part III Primer Design for Multiplex PCR. Multiplex
Chapter 10 PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing to Assort Plasmids into Taxonomic Units Raquel Cuartas, Teresa M. Coque, Fernando de la Cruz, and M. Pilar Garcilla´n-Barcia Abstract Plasmids transmissible by conjugation are responsible for disseminating antibiotic-resistance genes, making plasmid detection relevant for pathogen tracking. We describe the use of a multiplex PCR method for the experimental identification of specific plasmid taxonomic units (PTUs) of transmissible plasmids. The PCR primers were designed to target conserved segments of the relaxase MOB gene of PTUs encoding adaptive traits for enterobacteria (antimicrobial resistance, virulence, and metabolism). In this way, PlasTax-PCR detects the presence of these plasmids and allows their direct assignation to a PTU. Key words Plasmid taxonomic units, Bacterial conjugation, Horizontal gene transfer, Relaxase, MOB family, Plasmid typing
1
Introduction Plasmids are critical vehicles in disseminating antimicrobial resistance (AMR) [1]. Thus, their detection and classification are crucial for molecular epidemiology to track AMR beyond the boundaries of specific bacterial clones. Plasmids transmissible by conjugation have a distinctive characteristic, they encode a MOB relaxase that recognizes a cognate plasmid sequence, the nic site within the origin of transfer (oriT) [2]. Nine MOB relaxase classes are currently described [3]. Still, just five of them comprise more than 95% of conjugative relaxases present in plasmids hosted in the order Enterobacterales [4]. To detect transmissible plasmids, a PCR-based method, Degenerate-Primer MOB typing (DPMT), was developed [5, 6] (Fig. 1). Applied to collections of clinical and environmental isolates of enterobacteria [8–12], DPMT detected plasmids with relaxase sequences identical or nonidentical
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_10, © Springer Science+Business Media, LLC, part of Springer Nature 2022
127
128
Raquel Cuartas et al.
Fig. 1 DPMT and PlasTax-PCR scheme concepts. Phylogenetic tree representations of the five relaxase MOB classes prevalent in Enterobacterales are schematized. Families into each MOB class with members in Enterobacterales are represented by colored triangles. Those identified by the DPMT Scheme [6, 7] are indicated in the left panel. At the right panel, the same families are depicted, but only the tips contain colored triangles corresponding to the PTUs included in the PlasTax-PCR scheme
to those previously known and classified them into broad MOB families. Plasmids have been recently assorted into taxonomic units (PTUs), which gather members with a common genomic backbone
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
129
[4]. One thousand seven hundred and seventy out of the 2535 plasmids hosted in the order Enterobacterales were included in 83 PTUs. Out of them, 55 PTUs did not correspond to any known incompatibility (Inc) group. Nevertheless, 50 out of these 83 PTUs were MOB+, according to MOBscan [3]. Besides, each PTU was characterized by a single relaxase MOB type, while replication functions within a given PTU showed considerable variation [4]. These facts endorse the use of the MOB relaxase sequences to classify transmissible plasmids into their corresponding PTUs. Here, we describe a PCR-based method, Plasmid Taxonomic PCR (PlasTax-PCR), to detect plasmid relaxases from PTUs circulating in clinical enterobacteria. This method targets specifically the MOB relaxases corresponding to 19 PTUs, and of two additional groups without PTU assignment (IncT and ColA-like colicin plasmids) (Fig. 1). For each group, the coding sequences of the N-terminal relaxase domain (50 900 nucleotides) were aligned and conserved specific blocks were chosen to design the primers. As proof of principle, this method was applied to detect transmissible plasmids in a series of E. coli ST131 clinical isolates, a polyclonal cluster able to harbor a high variability of plasmids (Fig. 2).
2
Materials
2.1 Solutions (See Notes 1 and 2)
1. InstaGene™ Matrix (Bio-Rad). 2. DNA polymerase and its corresponding reaction buffer, e.g., BioTaq™ DNA polymerase (Bioline) (see Note 3). 3. 50 mM MgCl2. 4. 10 mM dNTP. 5. PCR-grade H2O (see Note 4). 6. 100 μM solutions for each primer. PlasTax-PCR primers are listed in Table 1 (see Notes 5 and 6). 7. Agarose basic, DNAse free. 8. 10 TBE (Tris/Borate/EDTA) buffer, pH 8.3. 9. Intercalating agent, e.g., SafeView™ Classic (ABM) (5 μL stock/100 mL gel). 10. Ladder for DNA electrophoresis, e.g., HyperLadder™ 1 kb (Bioline). 11. DNA loading buffer: 30% (v/v) glycerol, 0.25% (w/v) bromophenol blue.
2.2
Equipment
12. UV-visible spectrophotometer, e.g., NanoDrop™ 2000 (Thermo Scientific). 13. A thermocycler to carry out the PCR reactions.
130
Raquel Cuartas et al.
Fig. 2 Analysis of PlasTax-PCR reactions. For each multiplex, a 2% agarose gel is shown. M: HyperLadder™ 1Kb. Positive controls (see Table 1): individual plasmids, indicated above the corresponding lane, and a mix of the individual plasmids in the same PCR sample (mix C+). Negative control: no DNA. Samples: PCRs from clinical E. coli ST131 isolates, whose genomes are fully sequenced (strains FV9873, E35BA, E2022, E61BA, and FV13998 from [14]) or not (E. coli isolates from [15, 16])
14. An agarose electrophoresis system. 15. A gel imaging system to visualize fluorescent nucleic acid stains.
3
Methods
3.1 Preparation of the PlasTax-PCR DNA Template
Total DNA was extracted from Escherichia coli cultures bearing the plasmids listed in the second column of Table 1 using InstaGene™ Matrix (Bio-Rad) and following the manufacturer’s recommendations (see Note 7). 1. Centrifuge 50 μL of a saturated bacterial culture at 13,800 rcf for 1 min.
MOBP12/PTU-B/O/K/Z/R387 CGGACAAAGCTGTTTTTTCCCGTA TGAGCAGTCTATTCTTTCGTTG MOBH11/PTU-HI1A/R27 TACAGAGAC GGATAAGGCGCTGTTTACGGAAC MOBF12/PTU-FK1 – PTU-FK2/ pKpQIL c TG CTCCTACGGGAGGCAGCAG 16S rRNA a GGGGGCGAAACATCCGACT CGCACCGGCTGGCCG CCCCTGCCTGGTGTACGAACC CTCCTACGGGAGGCAGCAG
MOBP51/-/ColA, pCK02 MOBC11/PTU-E4/CloDF13 MOBQ41/PTU-E10/pIGWZ12
16S rRNA a
4
16S rRNA a
MOBH11/PTU-HI2/R478
MOBF11/PTU-W/R388 MOBH12/PTU-C/pIP1202 MOBF12/PTU-FE/R100 b CATAGGGCGGGCTGCAAGC GTATGCCGCGTTTGTTCACCTG GTTGAGTTTCAGCGTTGTTAAA TCGG ATCTCAGGAGAATGATGCAACCTC TG CTCCTACGGGAGGCAGCAG
AGTCATCAGAAAATGGTCGTAAG TCAGCT CTCCTACGGGAGGCAGCAG
MOBP3/PTU-X1/pOLA52
16S rRNA a
CTGGCGACCCAGCACACGA AAAATCATTGAGGGCCGAAGGG CATGAAGGACGGCTGCGAATG
MOBP11/PTU-P1/R751 MOBP13/PTU-L/M/pCTX-M3 MOBP12/PTU-I1/R64
Forward primer (50 ! 30 )
3
2
1
MOB family/PTU/plasmid Multiplex prototype
Table 1 Targets, primers, and conditions for PlasTax-PCR
1034
GACGGGCGGTGTGTRCA
1034
142 270 394
600
GGTCCTGCAGCCTCCCTGACC
ACCGTTCGTGCTCGCTGAAT TGACCTGCGTCGCCCGG TATGAACGATGGTCTATCTCTTCC TGATAAC GACGGGCGGTGTGTRCA
253 405
1034
712
169 340 567
1034
823
297 436 629
(continued)
64
62
62
62
Amplicon Tann size (bp) ( C)
TCCTGGCTCCAGGAAAGCCAG GGCGCCCATGTTAATGTTTCACTC
AAGACATACCGGGTTTAGGA TTCGC GACGGGCGGTGTGTRCA
GTGGCCCTCGCCGATATTCC GCGCCACCGGATAAAGTAACG AGCACGTTCTCGGAAAATCCG
GACGGGCGGTGTGTRCA
GTAGTCGCGCTCCAGGGCC CATACTGGTGATCGGACATGCCC ATTTTCCGTCCATCTGAGACAGG TACA CTTTCCCGCGAACATAAGTCCC
Reverse primer (50 ! 30 )
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . . 131
GGCGGCAACAAACACCGC CGGACTCAGGACGGGGCG CTCCTACGGGAGGCAGCAG GCTGGGCAGCATGGGAGAAC CTCCTACGGGAGGCAGCAG
MOBF12/PTU-FS/pSLT MOBF12/PTU-FY/pMT 16S rRNA a
MOBF12/PTU-FE/F b
16S rRNA a
CAATCTGATTAGCGTACACATTC TCAATG GACGGGCGGTGTGTRCA
GCGTCTCTGATTTTCGTCTCG TTTG AACCTTCTCTTTCAGCACCGCG CCACTCGGCCATGCGCTG GACGGGCGGTGTGTRCA
CTTTCCTGCGAATCCCGTTTCC CTTCACGCACAGCAGCGGC CCGCTCAATCTTTTCAGTTTCGG GACGGGCGGTGTGTRCA
Reverse primer (50 ! 30 )
1034
509
411 630 1034
280
176 355 460 1034
b
62
64
65
Amplicon Tann size (bp) ( C)
16S rDNA primers are 357F and 1391R, described by [13] PTU-FE presents low conservation in the gene repertoire of its members [4], and thus two different primer pairs were designed to target this group c This is the only case in which a PlasTax-PCR primer pair targets plasmids from two different PTUs. Plasmids from the related group PTU-FK3 are not targeted
a
AAAAGGCCGTCAGGATGTGATTCA
MOBP51/PTU-E1/ColE1
6
7
CCCGTGAGATTTCCCGCGAG CTTGATATAACCACGATTACCCGCC AGCTGAATTGGGTTTTCCGGC CTCCTACGGGAGGCAGCAG
MOBQ4/PTU-E20/pIGMS5 MOBF11/PTU-N1/R46 MOBH12/-/Rts1 16S rRNA a
Forward primer (50 ! 30 )
5
MOB family/PTU/plasmid Multiplex prototype
Table 1 (continued)
132 Raquel Cuartas et al.
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
133
2. Add a volume of 200 μL of InstaGene™ Matrix to the pellet. 3. Incubate at 56 C for 15–30 min. 4. Vortex at high speed for 10 s. 5. Incubate at 100 C for 8 min. 6. Repeat step 4. 7. Centrifuge at 13,800 rcf for 3 minutes and recover the supernatant. 8. Repeat step 7 to eliminate traces of the matrix. 9. Quantify the DNA (see Note 8). 10. Store the supernatant at 20 C (see Note 9). 3.2 PlasTax-PCR for Enterobacterial Plasmids
1. Preparation of the reaction mixture for the multiplex PCRs. For a reaction of 50 μL, add 100 ng of genomic DNA (see Notes 8 and 10), 5 μL of 10 reaction buffer, 1.5 μL of 50 mM MgCl2 (final concentration 1.5 mM), 1 μL of 10 mM dNTP (final concentration 0.2 mM), 0.5 μL of 100 μM primers (final concentration 1 μM), 1 U of BioTaq polymerase (Bioline), and ddH2O up to 50 μL (see Note 4). 2. PCR running (see Note 11). The amplification program includes an initial denaturation step 5 min at 94 C, followed by 30 cycles of 30 s at 94 C + 30 s at annealing temperature (see Table 1) + 30 s at 72 C, and a final extension at 72 C during 7 min. 3. Visualization of the amplicons. Load 10 μL of the PCR reaction and 2 μL of the DNA loading buffer onto a 2% agarose gel containing an intercalating agent. Load at least one lane of DNA ladder in each gel (see Note 12). Separate by electrophoresis at room temperature at voltage 100 V for 40 min. Visualize under ultraviolet light (see Note 13).
4
Notes 1. Pipette all reagents and samples using filter tips. The preparation of the reactions before the DNA amplification should be ideally carried out in a clean area (pre-PCR) to reduce the chance of contamination. 2. Aliquot reagents to avoid multiple freeze-thaw cycles and the contamination of master stocks. 3. No need to use extremely low error-rate DNA polymerases if amplicons are not going to be sequenced. 4. The use of commercially available nuclease-free water is preferred.
134
Raquel Cuartas et al.
5. A thorough process of primer design is recommendable. A nucleotide multiple sequence alignment of the relaxase gene portion encoding the N-terminal relaxase domain is the basis to detect high identity stretches for primer designing. It is useful to design several primer pairs to detect a single PTU, which render amplicons of different sizes, in order to ease their combination in multiplex reactions. Sequences that produce intraor inter-oligonucleotide secondary structures should be avoided. The melting temperature (Tm) of the primers should be as close as possible, favoring Tm > 60 C to avoid nonspecific targets. The primer pair selected should have good performance in both the individual and the multiplex PCR reaction. 6. The individual amplicons included in a multiplex reaction should differ in size, ideally 150–200 bp, as to be clearly distinguished in a 2% agarose gel. Take it into account when designing the primers. 7. A purification method rendering PCR-quality template DNA is advised, instead of picking and adding a bacterial colony directly into the PCR master mix. 8. We prepared aliquots of template DNA at 20 ng/μL to add the same volume to each PCR reaction. 9. DNAs extracted using InstaGene Matrix can be stored at 20 C for a month without degradation. 10. When a mix of template DNAs is used in the reaction, 100 ng of each DNA sample is added. 11. Positive controls for each reaction should be used in parallel. A list of GenBank accession numbers of plasmids that can be targeted by each primer pair and can thus be used as positive controls, according to the criteria explained in the footnote to Table 2, is provided: For multiplex 1: PTU-P1: NC_013176.1, NC_005088.1, NC_017908.2, NC_019263.1, NC_019264.1, NC_019283.1, NC_016968.1, NC_016978.1, NC_004956.1, NC_006830.1, NC_021077.1, NC_019312.1, NC_001735.4, NC_024998.1, NC_007353.2, NC_010935.1, NC_019320.1, NZ_CP017760.1, NC_020994.1, NZ_CP015373.1, NZ_CP009797.1, NZ_CP014846.1, NZ_CP019238.1, NC_014911.1, NC_008766.1, NC_014641.1, NC_007337.1, NZ_CP021650.1, NC_008385.1. PTU-L/M: NC_004464.2, NC_005246.1, NC_011641.1, NC_019063.1, NC_019154.1, NC_019344.1, NC_019346.1, NC_019368.1, NC_019889.1, NC_021078.1, NC_021488.1, NC_023027.1, NC_024997.1, NC_025134.1,
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
135
Table 2 PlasTax-PCR efficiency for each targeted PTU
Multiplex
PTU
Number of PTU members (RefSeq84)a
Number of targeted PTU membersb
1
P1 L/M I1 X1
32 58 120 49
29 56 119 24
2
W C FE HI2
4 92 198 33
4 87 52 29
3
B/O/K/Z HI1A FK1 - FK2
20 8 68/32
14 8 37/29
4
E4 E10
40 23
39 21
5
E20 N1
9 60
8 60
6
E1 FS FY
51 56 38
45 43 10
7
FE
198
94
a
Data are taken from Supplementary Table S4 [4] Number of plasmids that are potentially targeted by the corresponding primers. Take into account that not all plasmids from a PTU contain a MOB relaxase. Plasmids are considered targeted if the primers potentially anneal to their sequence producing a fragment with the size indicated in Table 1. Hybridization mismatches are allowed, except if they occur in the 30 12 nucleotides of the primers, in which case the plasmid was ruled out. According to these criteria, a list of plasmids that can be used as positive controls is provided in Note 11 b
NZ_CP007733.1, NZ_CP009852.1, NZ_CP009857.1, NZ_CP010365.1, NZ_CP011593.1, NZ_CP011599.1, NZ_CP011609.1, NZ_CP011614.1, NZ_CP011632.1, NZ_CP011640.1, NZ_CP014298.1, NZ_CP014698.2, NZ_CP015071.1, NZ_CP015075.2, NZ_CP016927.1, NZ_CP017282.1, NZ_CP017288.1, NZ_CP017853.1, NZ_CP017932.1, NZ_CP017936.1, NZ_CP018315.1, NZ_CP018342.1, NZ_CP018449.1, NZ_CP018452.1, NZ_CP018461.1, NZ_CP018669.1, NZ_CP018690.1, NZ_CP018700.1, NZ_CP018706.1, NZ_CP018712.1, NZ_CP018717.1, NZ_CP018723.1, NZ_CP018736.1, NZ_CP018974.1, NZ_CP019841.1, NZ_CP020844.1, NZ_CP021742.1, NZ_CP022147.1, NZ_CP022150.1, NZ_CP022153.1, NZ_CP022826.1, NZ_KX118608.1.
136
Raquel Cuartas et al.
PTU-I1: NC_013120.1, NC_014383.1, NC_019044.1, NC_019043.1, NC_019061.1, NC_019097.1, NC_002122.1, NC_022885.1, NC_023326.1, NC_024980.1, NC_023915.1, NC_025140.1, NC_025144.1, NC_025147.1, NC_025180.1, NC_025198.1, NC_025142.1, NC_025143.1, NC_025176.1, NC_024975.1, NC_024976.1, NC_024977.1, NC_024979.1, NC_024978.1, NC_024955.2, NC_019123.1, NC_019131.1, NC_019137.1, NC_019099.1, NC_005014.1, NC_015965.1, NC_019111.1, NC_023899.1, NC_023900.1, NC_019104.1, NC_023329.1, NC_022267.1, NC_023275.1, NC_023276.1, NC_023290.1, NC_020991.1, NC_022742.1, NC_018659.1, NZ_CP019220.1, NC_011419.1, NC_016904.1, NC_017637.1, NC_017665.1, NC_017642.1, NZ_CP009580.1, NZ_CP006641.1, NZ_CP010317.1, NZ_CP012627.1, NZ_CP015161.1, NZ_CP015916.1, NZ_CP015996.1, NZ_CP015838.1, NZ_CP018116.1, NZ_CP018122.1, NZ_CP018110.1, NZ_CP010130.1, NZ_CP010233.1, NZ_CP019215.1, NC_017675.1, NC_011081.1, NC_017718.1, NC_011077.1, NC_021811.1, NC_021813.2, NZ_CP009566.1, NZ_CP012039.1, NZ_CP012835.1, NZ_LN890525.1, NZ_CP012923.1, NZ_CP012936.1, NZ_CP013224.1, NZ_CP012929.1, NZ_CP013221.1, NZ_CP014662.1, NZ_CP016516.1, NZ_CP016520.1, NZ_CP016533.1, NZ_CP016522.1, NZ_CP016568.1, NZ_CP016572.1, NZ_CP016585.1, NZ_CP016387.1, NZ_CP016407.1, NZ_CP016411.1, NZ_CP016413.1, NZ_CP016409.1, NZ_CP019205.1, NZ_CP019207.1, NZ_CP010142.1, NZ_CP019173.1, NZ_CP018946.1, NZ_CP018975.1, NZ_CP018993.1, NZ_CP014972.2, NZ_CP014622.1, NZ_KX443694.1, NZ_LT838199.1, NC_032100.1, NZ_CP018774.2, NZ_CP021208.1, NZ_CP021533.1, NZ_CP021693.1, NZ_CP021739.1, NZ_CP021841.1, NZ_CP021845.1, NZ_CP021882.1, NZ_CP018625.1, NZ_CP020494.1, NZ_CP019691.1, NZ_CP022456.1, NZ_CP014096.1, NZ_CP010831.1, NZ_CP014494.1, NZ_CP007651.1. PTU-X1: NC_010378.1, NC_010421.1, NC_010422.1, NC_010860.1, NC_011204.1, NC_011739.1, NC_013503.1, NC_015472.1, NC_016036.1, NC_019013.1, NC_019046.1, NC_019067.1, NC_019088.1, NC_019096.1, NC_019106.1, NC_019256.1, NC_024961.1, NZ_CP011431.1, NZ_CP012734.1, NZ_CP014974.1, NZ_CP019180.1, NZ_CP020088.1, NZ_CP020341.1, NZ_CP020836.1.
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
137
For multiplex 2: PTU-W: NC_010643.1, NC_009982.1, NC_010716.1, NC_028464.1. PTU-C: NC_008612.1, NC_008613.1, NC_009139.1, NC_009140.1, NC_012690.1, NC_012692.1, NC_012693.1, NC_016974.1, NC_016976.1, NC_017645.1, NC_018994.1, NC_019045.2, NC_019065.1, NC_019066.1, NC_019069.1, NC_019107.1, NC_019116.1, NC_019118.1, NC_019121.1, NC_019153.1, NC_019158.1, NC_019375.1, NC_019380.1, NC_020180.1, NC_021667.1, NC_021815.1, NC_022372.1, NC_022377.1, NC_022522.2, NC_022652.1, NC_023291.1, NC_023898.1, NC_023908.1, NZ_CP003998.1, NZ_CP006661.1, NZ_CP007486.1, NZ_CP007636.1, NZ_CP008790.1, NZ_CP009409.2, NZ_CP009560.1, NZ_CP009562.1, NZ_CP009564.1, NZ_CP009567.1, NZ_CP009570.1, NZ_CP009868.1, NZ_CP010373.2, NZ_CP010391.1, NZ_CP011429.1, NZ_CP011540.1, NZ_CP011622.1, NZ_CP012682.1, NZ_CP013324.1, NZ_CP014295.1, NZ_CP014658.1, NZ_CP014775.1, NZ_CP014978.1, NZ_CP015139.1, NZ_CP015394.1, NZ_CP015835.1, NZ_CP016013.1, NZ_CP016036.1, NZ_CP017055.1, NZ_CP017987.1, NZ_CP018318.1, NZ_CP018689.1, NZ_CP018698.1, NZ_CP018704.1, NZ_CP018710.1, NZ_CP018716.1, NZ_CP018722.1, NZ_CP018817.1, NZ_CP018956.1, NZ_CP019441.1, NZ_CP020049.1, NZ_CP020056.1, NZ_CP021206.1, NZ_CP021551.1, NZ_CP021709.1, NZ_CP021719.1, NZ_CP021835.1, NZ_CP021853.1, NZ_CP021936.1, NZ_CP021952.1, NZ_CP021956.1, NZ_CP022126.1, NZ_CP022359.1, NZ_LT904892.1. PTU-FE: NC_019424.1, NC_019095.1, NC_019090.1, NC_019072.1, NC_019071.1, NC_019057.1, NC_018998.1, NC_017630.1, NC_016039.1, NC_013727.1, NC_013655.1, NC_013542.1, NC_013175.1, NC_011812.1, NC_011749.1, NC_009133.1, NC_007941.1, NC_005327.1, NC_002134.1, NZ_CP018125.1, NZ_CP018119.1, NZ_CP018113.1, NZ_CP018107.1, NZ_CP014496.1, NZ_CP017287.1, NZ_CP015072.1, NZ_CP014523.1, NC_008460.1, NZ_CP020340.1, NZ_CP014493.1, NZ_CP012139.1, NZ_CP021938.1, NZ_CP021871.1, NZ_CP021180.1, NZ_CP020117.1, NZ_CP019009.1, NZ_CP018982.1, NZ_CP018954.1, NZ_CP018952.1, NZ_CP016035.1, NZ_CP015140.1, NZ_CP015077.1, NZ_CP015070.1, NZ_CP010882.1, NZ_CP009860.1, NZ_CP009579.1, NZ_CP008715.1, NC_013951.1,
138
Raquel Cuartas et al.
NC_025177.1, NZ_CP021690.1, NZ_CP014498.1, NZ_CP010239.1. PTU-HI2: NC_005211.1, NC_009838.1, NC_010870.1, NC_012555.1, NC_012556.1, NC_019114.1, NC_024983.1, NZ_CP008825.1, NZ_CP008899.1, NZ_CP008906.1, NZ_CP011062.1, NZ_CP011601.1, NZ_CP012170.1, NZ_CP012931.1, NZ_CP015833.1, NZ_CP016526.1, NZ_CP016764.1, NZ_CP016838.1, NZ_CP019214.1, NZ_CP019443.1, NZ_CP019559.1, NZ_CP019647.1, NZ_CP020493.1, NZ_CP021177.1, NZ_CP021209.1, NZ_CP022165.1, NZ_CP022533.1, NZ_CP022696.1, NZ_CP023143.1. For multiplex 3: PTU-B/O/K/Z: NZ_CP023144.1, NZ_CP015141.1, NZ_CP013024.1, NZ_CP009107.1, NC_025138.1, NC_022996.1, NC_022992.1, NC_022371.1, NC_018995.1, NC_014843.1, NC_007365.1, NZ_CP005999.1, NC_011754.1, NZ_CP018772.1. PTU-HI1A: NC_002305.1, NC_003384.1, NC_009981.1, NC_013365.1, NC_016825.1, NC_023289.2, NZ_CP022495.1, NZ_LT904879.1. PTU-FK1 - FK2: NC_019390.1, NC_024992.1, NC_020132.1, NC_021654.1, NC_009649.1, NC_022078.1, NZ_CP007729.1, NZ_CP008800.1, NZ_CP008829.1, NZ_CP008930.1, NZ_CP009777.1, NZ_CP010393.1, NZ_CP010574.1, NZ_CP011577.1, NZ_CP011977.1, NZ_CP011990.1, NZ_CP015386.1, NZ_CP015823.1, NZ_CP018355.1, NZ_CP018365.1, NZ_CP018424.1, NZ_CP018430.1, NZ_CP018434.1, NZ_CP018441.1, NZ_CP018460.1, NZ_CP018693.1, NZ_CP019773.1, NZ_CP020072.1, NZ_CP020109.1, NZ_CP020838.1, NZ_CP020842.1, NZ_CP021540.1, NZ_CP021544.1, NZ_CP021713.1, NZ_CP021752.1, NZ_CP021834.1, NZ_CP022145.1, NC_009650.1, NZ_CP015395.1, NZ_LT216438.1, NZ_CP022698.1, NZ_CP022693.1, NZ_CP019774.1, NZ_CP018992.1, NZ_CP018989.1, NZ_CP015824.1, NZ_CP014765.1, NZ_CP014669.1, NZ_CP014650.1, NZ_CP011991.1, NZ_CP011986.1, NZ_CP009875.1, NZ_CP009115.1, NZ_CP008833.1, NZ_CP008830.1, NZ_CP007730.1, NC_025187.1, NC_025167.1, NC_025166.1, NC_023906.1, NC_023905.1, NC_023904.1, NC_023903.1, NC_019165.1, NC_019155.1, NC_014016.1. For multiplex 4: MOBP51/-/ColA, pCK02: NC_009794.1, NC_001373.1, NC_016151.1.
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
139
PTU-E4: NC_002119.1, NC_009793.1, NC_018953.1, NC_019156.1, NC_019159.1, NC_020182.1, NC_021666.1, NC_022083.1, NZ_CP006928.1, NZ_CP007728.1, NZ_CP008828.1, NZ_CP008832.1, NZ_CP009772.1, NZ_CP009778.1, NZ_CP009873.1, NZ_CP009877.1, NZ_CP010394.1, NZ_CP011979.1, NZ_CP011982.1, NZ_CP011984.1, NZ_CP011988.1, NZ_CP011993.1, NZ_CP014648.1, NZ_CP015384.1, NZ_CP015391.1, NZ_CP018353.1, NZ_CP018425.1, NZ_CP018431.1, NZ_CP018433.1, NZ_CP018445.1, NZ_CP019776.1, NZ_CP020112.1, NZ_CP020840.1, NZ_CP021543.1, NZ_CP021717.1, NZ_CP021779.1, NZ_CP021837.1, NZ_CP022576.1, NZ_LT216440.1. PTU-E10: NZ_CP016041.1, NZ_CP019024.1, NZ_CP018996.1, NZ_CP018972.1, NZ_CP018942.1, NC_019134.1, NZ_CP011432.1, NZ_CP011140.1, NZ_CP018980.1, NZ_CP019648.1, NZ_CP015142.1, NZ_CP012629.1, NC_010885.1, NZ_CP018966.1, NZ_CP018208.1, NZ_CP010878.1, NZ_CP006634.1, NC_011411.1, NZ_HG941720.1, NC_010486.1, NZ_CP018941.1. For multiplex 5: PTU-E20: NC_010883.1, NC_011977.1, NC_012882.1, NC_020412.1, NC_022585.1, NC_023325.1, NZ_CP014098.1, NC_013367.1. PTU-N1: NC_003292.1, NC_007682.3, NC_009131.1, NC_009132.1, NC_009980.1, NC_011383.1, NC_011385.1, NC_011617.1, NC_014208.1, NC_014231.1, NC_014368.1, NC_015599.1, NC_019033.1, NC_019082.1, NC_019087.1, NC_019098.1, NC_019124.1, NC_019888.1, NC_020086.1, NC_020088.1, NC_021622.1, NC_021660.2, NC_021664.2, NC_022374.1, NC_022375.1, NC_023909.1, NC_023910.1, NC_024967.1, NC_024974.1, NC_025019.1, NC_025183.1, NC_025186.1, NC_032101.1, NZ_CP008901.1, NZ_CP008908.1, NZ_CP009853.1, NZ_CP009858.1, NZ_CP009862.1, NZ_CP009864.1, NZ_CP009867.1, NZ_CP009874.1, NZ_CP009881.1, NZ_CP011589.1, NZ_CP014524.1, NZ_CP017725.1, NZ_CP018442.1, NZ_CP018945.1, NZ_CP018959.1, NZ_CP018963.1, NZ_CP018977.1, NZ_CP019006.1, NZ_CP019026.1, NZ_CP020059.1, NZ_CP020119.1, NZ_CP021899.1, NZ_KX062091.1, NZ_KX154765.1, NZ_KX276209.1, NZ_KX397572.1, NZ_LT838197.1. MOBH12/-/Rts1: NC_003905.1. For multiplex 6:
140
Raquel Cuartas et al.
PTU-E1: NC_001371.1, NC_002809.1, NC_003079.1, NC_003457.1, NC_005019.1, NC_005970.1, NC_008488.1, NC_009791.1, NC_010485.1, NC_010672.1, NC_011214.1, NC_011407.1, NC_011799.1, NC_013363.1, NC_014235.1, NC_017321.1, NC_017636.1, NC_017654.1, NC_017655.1, NC_017661.1, NC_017662.1, NC_017721.1, NC_018997.1, NC_019076.1, NC_019078.1, NC_019102.1, NC_019136.1, NC_019250.1, NC_019357.1, NC_019982.1, NC_020251.1, NC_025004.1, NC_025026.1, NC_025178.1, NZ_CP010175.1, NZ_CP012628.1, NZ_CP014198.2, NZ_CP015917.1, NZ_CP016039.1, NZ_CP016513.1, NZ_CP016519.1, NZ_CP016584.1, NZ_CP017845.1, NZ_CP018943.1, NZ_CP019025.1. PTU-FS: NC_002638.1, NC_003277.2, NC_006855.1, NC_007208.1, NC_012124.1, NC_013437.1, NC_014476.2, NC_016855.1, NC_016858.1, NC_016861.1, NC_016864.1, NC_017054.1, NC_017720.1, NC_019001.1, NC_019108.1, NC_019109.1, NC_019112.1, NC_022570.1, NZ_AP014566.1, NZ_CP007489.1, NZ_CP007582.1, NZ_CP008745.1, NZ_CP012345.1, NZ_CP012348.1, NZ_CP013721.1, NZ_CP014050.1, NZ_CP014357.1, NZ_CP014359.1, NZ_CP014537.1, NZ_CP014577.1, NZ_CP014968.1, NZ_CP014970.1, NZ_CP014973.1, NZ_CP014976.1, NZ_CP014980.1, NZ_CP015158.1, NZ_CP016390.1, NZ_CP017618.1, NZ_CP017729.1, NZ_CP020923.1, NZ_CP022137.1, NZ_LN999012.1, NZ_LT855377.1. PTU-FY: NC_006323.1, NC_009378.1, NZ_CP006746.1, NZ_CP006749.1, NZ_CP006752.1, NZ_CP006756.1, NZ_CP006760.1, NZ_CP006779.1, NZ_CP009714.1, NZ_CP010248.1. For multiplex 7: PTU-FE: NZ_LT838198.1, NZ_CP017726.1, NZ_CP021847.1, NZ_CP021843.1, NZ_CP018994.1, NZ_CP014489.1, NZ_CP012636.1, NC_025106.1, NC_019089.1, NZ_CP010223.1, NZ_CP010215.1, NZ_CP019007.1, NZ_CP018455.1, NZ_CP017981.1, NZ_CP017633.1, NZ_CP014273.1, NZ_CP014271.1, NZ_CP011496.1, NZ_CP010158.1, NC_025175.1, NC_025139.1, NC_024956.1, NC_019073.1, NC_004998.1, NC_002483.1, NZ_CP012113.1, NZ_CP010316.1, NZ_CP010192.1, NC_013362.1, NC_011747.1, NZ_CP013026.1, NC_010409.1, NZ_CP015914.1, NZ_CP012683.1, NZ_CP012626.1, NZ_CP012928.1, NZ_CP011916.1, NZ_CP006001.1,
PLASmid TAXonomic PCR (PlasTax-PCR), a Multiplex Relaxase MOB Typing. . .
NC_018966.1, NC_018954.1, NC_014382.1, NC_011076.1, NZ_CP021880.1, NZ_CP021733.1, NZ_CP021204.1, NZ_CP019028.1, NZ_CP018958.1, NZ_CP016498.1, NZ_CP010232.1, NC_023315.1, NC_014615.1, NC_010558.1, NZ_CP011019.1, NZ_CP010184.1, NZ_CP019561.1, NZ_CP021737.1, NZ_CP018978.1, NZ_CP013833.1, NZ_CP011493.1, NZ_CP010138.1, NC_014384.1, NC_007675.1, NZ_CP013027.1, NC_019122.1, NZ_CP022166.1, NZ_CP020934.1, NZ_CP019018.1, NZ_CP018775.2, NZ_CP010141.1, NZ_CP010123.1, NZ_CP005931.1, NC_025179.1, NC_012944.1, NC_009837.1, NZ_CP009167.1, NC_017640.1.
141
NC_017627.1, NC_006671.1, NZ_CP021289.1, NZ_CP018990.1, NZ_CP010372.1, NC_020271.1, NC_010488.1, NZ_CP010181.1, NZ_CP011064.1, NZ_CP013832.1, NC_019037.1, NZ_CP012632.1, NC_019117.1, NZ_CP019909.1, NZ_CP015239.1, NZ_CP006635.1, NC_017659.1, NC_010720.1,
12. For gels with more than ten lanes, include the DNA ladder in several lanes at different positions. 13. The amplicons obtained in a multiplex PCR can be confirmed by repeating the corresponding simplex PCR. Its product can be purified and sequenced by standard Sanger sequencing to verify the result.
Acknowledgments This work was financed by the Spanish Ministry of Science and Innovation (PID2020-117923GB-I00) to FdlC and MPG-B, and by the European Commission, the Joint Programming Initiative in Antimicrobial Resistance and the Ministry of Science and Innovation (UE/STARCS AC16/00039, UE/ST131TS AC16/00043, and PI18/01942) to TMC. References 1. de la Cruz F, Davies J (2000) Horizontal gene transfer and the origin of species: lessons from bacteria. Trends Microbiol 8(3):128–133. http://www.ncbi.nlm.nih.gov/pubmed/ 10707066 2. Garcilla´n-Barcia MP, Francia MV, de la Cruz F (2009) The diversity of conjugative relaxases and its application in plasmid classification. FEMS Microbiol Rev 33(3):657–687. http:// www.ncbi.nlm.nih.gov/pubmed/19396961 3. Garcilla´n-Barcia MP, Redondo-Salvo S, Vielva L, de la Cruz F (2020) MOBscan:
automated annotation of MOB relaxases. In: de la Cruz F (ed) Horizontal gene transfer: methods and protocols, Methods in molecular biology. Science+Business Media, LLC, part of Springer Nature, New York, NY, pp 295–308 4. Redondo-Salvo S, Ferna´ndez-Lo´pez R, Ruiz R, Vielva L, de Toro M, Rocha EPC et al (2020) Pathways for horizontal gene transfer in bacteria revealed by a global map of their plasmids. Nat Commun 11(1):3602. http://www.ncbi.nlm.nih.gov/pubmed/ 32681114
142
Raquel Cuartas et al.
5. Garcilla´n-Barcia MP, Alvarado A, de la Cruz F (2011) Identification of bacterial plasmids based on mobility and plasmid population biology. FEMS Microbiol Rev 35(5):936–956. http://www.ncbi.nlm.nih.gov/pubmed/ 21711366 6. Alvarado A, Garcilla´n-Barcia MP, de la Cruz F (2012) A degenerate primer MOB typing (DPMT) method to classify gammaproteobacterial plasmids in clinical and environmental settings. PLoS One 7(7):e40438. http://www.ncbi.nlm.nih.gov/pubmed/ 22792321 7. Villa L, Carattoli A (2020) Plasmid typing and classification. Springer, New York, pp 309–321. https://doi.org/10.1007/978-14939-9877-7_22 8. Valverde A, Canto´n R, Garcilla´n-Barcia MP, Novais A, Gala´n JC, Alvarado A et al (2009) Spread of bla(CTX-M-14) is driven mainly by IncK plasmids disseminated among Escherichia coli phylogroups A, B1, and D in Spain. Antimicrob Agents Chemother 53 (12):5204–5212. http://www.ncbi.nlm.nih. gov/pubmed/19786598 9. Mata C, Miro´ E, Alvarado A, Garcilla´n-Barcia MP, Toleman M, Walsh TR et al (2012) Plasmid typing and genetic context of AmpC β-lactamases in Enterobacteriaceae lacking inducible chromosomal ampC genes: findings from a Spanish hospital 1999-2007. J Antimicrob Chemother 67(1):115–122. http://www. ncbi.nlm.nih.gov/pubmed/21980067 10. Coelho A, Piedra-Carrasco N, Bartolome´ R, Quintero-Zarate JN, Larrosa N, Cornejo-Sa´nchez T et al (2012) Role of IncHI2 plasmids harbouring blaVIM-1, blaCTX-M-9, aac(6’)Ib and qnrA genes in the spread of multiresistant Enterobacter cloacae and Klebsiella pneumoniae strains in different units at Hospital Vall d’Hebron, Barcelona, Spain. Int J Antimicrob Agents 39(6):514–517. http://www.ncbi.nlm. nih.gov/pubmed/22481058
11. Garcilla´n-Barcia MP, Ruiz del Castillo B, Alvarado A, de la Cruz F, Martı´nez-Martı´nez L (2015) Degenerate primer MOB typing of multiresistant clinical isolates of E. coli uncovers new plasmid backbones. Plasmid 77:17–27. http://www.ncbi.nlm.nih.gov/pu 12. Adelowo OO, Caucci S, Banjo OA, Nnanna OC, Awotipe EO, Peters FB et al (2018) Extended Spectrum Beta-lactamase (ESBL)producing bacteria isolated from hospital wastewaters, rivers and aquaculture sources in Nigeria. Environ Sci Pollut Res 25 (3):2744–2755. https://doi.org/10.1007/ s11356-017-0686-7 13. Turner S, Pryer KM, Miao VPW, Palmer JD (1999) Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis. J Eukaryot Microbiol 46(4):327–338. https://doi. org/10.1111/j.1550-7408.1999.tb04612.x 14. Lanza VF, de Toro M, Garcilla´n-Barcia MP, Mora A, Blanco J, Coque TM et al (2014) Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences. PLoS Genet 10(12):e1004766 ˆ , Carattoli A, Poirel L, 15. Coque TM, Novais A Pitout J, Peixe L et al (2008) Dissemination of clonally related Escherichia coli strains expressing extended-Spectrum β-lactamase CTX-M-15. Emerg Infect Dis 14 (2):195–200. http://wwwnc.cdc.gov/eid/arti cle/14/2/07-0350_article.htm 16. Novais A, Pires J, Ferreira H, Costa L, Montenegro C, Vuotto C et al (2012) Characterization of globally spread Escherichia coli ST131 isolates (1991 to 2010). Antimicrob Agents Chemother 56(7):3973–3976. http:// www.ncbi.nlm.nih.gov/pubmed/22491693
Chapter 11 Multiplex PCR Design for Scalable Resequencing Darren Korbie and Matt Trau Abstract While conventional PCR applications typically focus on a single PCR assay per reaction, multiplex PCR applications are a convenient and scalable solution becoming more routine. Multiplex methods can be applied to virtually any DNA template source (e.g., plant or human DNA, FFPE DNA isolated from clinical samples, bisulfite-converted DNA for DNA methylation analysis), and offers a cheap, convenient, and scalable solution for experiments that require characterization and analysis of multiple genomic regions. This method will detail the procedures to successfully design, screen, and prepare multiplex amplicon libraries; as well as supporting instructions on how to prepare these libraries for sequencing on Illumina, Ion Torrent, and Oxford Nanopore platforms. The flexibility of assay design allows means that custom multiplex panels can range in size from two assays up to a few hundred amplicons or more. Notably, the method described here is also amenable to whatever PCR buffer system the user prefers to use, making the system globally adaptable to the needs and preferences of the end user. Key words PCR, Multiplex PCR, Sequencing, Resequencing, NGS, DNA methylation, SNP screening, Bisulfite DNA
1
Introduction The polymerase chain reaction was first described in 1985 [1], and detailed a method wherein a region of DNA could be exponentially amplified 220,000 fold in a day. While the technique was rapidly adopted by molecular biology labs as a convenient and cheap way to characterize DNA, the initial application of the technique focused exclusively on the use of a single pair of DNA primers to amplify a single region of interest; if multiple different targets were under investigation, they would be run individually in separate reactions. This focus on limiting the methodology to single amplicon reactions was initially driven by a variety of practical limiting factors; for example, the relative cost of oligonucleotide synthesis for primer manufacture could be cost-prohibitive for many labs, and effective solutions for PCR-specific problems such as the production of primer dimers and off-target amplification effects were not well
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_11, © Springer Science+Business Media, LLC, part of Springer Nature 2022
143
144
Darren Korbie and Matt Trau
established, both of which could render expensive DNA oligonucleotides unusable. Since that time, advances in primer design, PCR formulations, and the assembly of the whole genomes have all worked to increase the fidelity of PCR applications and reduce its overall cost. This price reduction now allows multiplex panels of dozens [2], hundreds [3], and even tens-of-thousands of amplicons [4] to be concurrently amplified in a single reaction, thereby offering a convenient, flexible, and scalable method for profiling DNA from any source. These multiplex reactions have now been utilized for screening DNA variants [5], analysis of bisulfite-converted DNA for DNA methylation studies [6], gene expression profiling [4], and quantitative PCR applications such as qPCR [7] and ddPCR [8]. These advances are reflected in the current scientific marketplace, as many molecular biology vendors now offer multiplex amplification solutions. At the time of this manuscript, the Ampliseq methodology developed by Thermofisher has achieved some of the densest reactions, with over 20,000 different primer pairs in a single pool. However, other molecular biology vendors have also developed similar methods to amplify large numbers of primers within a single tube. One shared feature of these different commercial solutions is the fact they each employ slightly different methods to maintain the specificity and remove unwanted amplification artifacts, typically based on proprietary buffer and enzyme systems. While these unique formulations are effective, they can also rapidly inflate the per-reaction price to a point where it is not cost-effective, particularly for projects with small numbers of regions but a large number of samples to screen. This method will therefore take users through the steps to design and implement their own cheap, effective multiplex PCR panels, using a methodology that has been routinely used for multiple different applications and samples [2, 9–12]. The standard process for designing and implementing multiplex assays involves an initial assessment of the dimer-score at which primer dimers occur using PrimerROC [12] (http://www.primer-dimer.com/ roc/). After this is determined, individual PCR assays are designed using PrimerSuite [10] (http://www.primer-suite.com/) and ordered as individual oligonucleotides, followed by subsequent QC of the individual assays and multiplex pool. Once QC is finished, the final library pools are prepared using ligation-mediated PCR (Fig. 1), which is a second PCR step that employs a universal fusion sequence at the 50 end of every primer to add platformspecific sequencing adaptors and sample barcodes, resulting in the construction of a sequencing-ready pool of samples (see Note 10 for a more detailed description about ligation and barcoding). The methodology described has been routinely used for multiple applications [2–5] and can be applied to small assay pools (for example, two to four primer pairs for multiplexing qPCR or droplet-digital
Fusion Sequence CS1
1st Stage PCR
Reverse Gene specific primer1
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Forward Gene specific primer1
b
2nd Stage PCR
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ligaon
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Reverse Gene specific primer1
Plaorm-specific adaptors |||||||
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Plaorm-specific adaptors |||||||
Stage PCR
End Repair
1st
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Forward Gene specific primer1
Fig. 1 Conceptual illustration of the difference between fusion primers (a) and the conventional PCR primer workflow (b). Fusion primers have an additional sequence of DNA appended to their 50 , which is shared by all primers in the reaction. The incorporation of the fusion sequence (which is referred to as the CS1 and CS2 sequences within this method) allows the sequencing adaptors and library barcodes to be incorporated using ligation-mediated PCR. This is in contrast to the standard PCR workflow (b), which will require a separate, enzymatic end-repair and ligation treatment steps to prepare the sample for sequencing
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Barcoding PCR
Plaorm-specific primer 2
Fusion Sequence CS2
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Plaorm-specific primer 1
a
Multiplex PCR Design for Scalable Resequencing 145
146
Darren Korbie and Matt Trau
PCR assays [7, 8]), but scales up easily, and we have successfully validated and run larger pool pools of 50–100 primer pairs in a single reaction for multiple different applications. By running multiple large pools at the same time, users can easily scale up to screen hundreds of amplicons simultaneously and adapt the protocol to suit their sequencing platform of choice.
2
Materials The method described how to construct libraries using PCR primers with universal consensus sequences appended to their 50 end, followed by a ligation-mediated PCR step to add sequencing adaptors. However, an alternate method for multiplex resequencing can also be performed using non-fusion primers and enzymatic ligation; see Note 1 and Fig. 1 for a more detailed description of the key differences between the two.
2.1 PCR Primer Ordering
1. PCR primers can be ordered from whichever supplier the user prefers. Our lab has had success with Integrated DNA Technologies (IDT), but any vendor will suffice. 2. This method employs fusion primers and requires that the following sequences are appended to the 50 ends of all forward and reverse primers. See Note 2 for more background information. Critical note: All forward primers must have the CS1 sequence appended to their 50 end, and all reverse primers must have the CS2 sequence appended to their 50 end. CS1-14 bp forward fusion sequence GACATGGTTC TACA. CS2-14 bp reverse fusion sequence CAGA GACTTGGTCT. For example, assuming two gene-specific primers where the forward primer was AAACC and the reverse primer was GGGTT, the oligos to order would be: Forward: 50 -GACATGGTTCTACAAAACC-30 . Reverse: 50 -CAGAGACTTGGTCTGGGTT-30 . 3. For most projects, ordering the smallest synthesis scale (25 nmole) provides more than enough primer, and it is recommended users begin with the smallest scale for screening and optimization purposes. 4. If less than eight primer pairs are being ordered, then ordering in tube format is appropriate. However, if more than eight primer pairs are being ordered, it is convenient (and frequently cheaper) to have them delivered in plate format, as this expedites the resuspension and initial primer screen. See Note 3 if ordering in plate format.
Multiplex PCR Design for Scalable Resequencing
2.2
PCR Reagents
147
While users can employ whichever PCR buffer system they are comfortable with, a hot-start polymerase is critical for successful multiplex amplification. Given this, the following two PCR buffer systems have been previously validated for this method. 1. High fidelity polymerase: Phusion Green Hot Start II HighFidelity DNA Polymerase (2 U/μL), F537L, Thermofisher Scientific, USA. Suitable for standard DNA and variant/SNP screening applications. 2. Standard Taq Polymerase: Promega Gotaq Flexi Hotstart, M5005, Promega, USA. Suitable for bisulfite DNA methylation analysis. 3. 10 mM dNTP solution mix: New England Biolads (NEB), USA. N0447L. 4. Exonuclease I: New England Biolads (NEB), USA, cat # M0293L. 5. Library adaptors compatible with the CS1 and CS1 fusion sequences: Access Array Barcode Library for Illumina Sequencers—384, Single Direction, Fluidigm, USA: Product # 100-4876. If users wish to use Ion Torrent or Oxford Nanopore systems, see Notes 1 and 10. 6. Low Tris-EDTA (LTE) buffer: 10 mM Tris–HCl (pH 8.0), 0.1 mM EDTA. 7. AMPure XP beads: Beckman Coulter, USA: Cat# A63880. See Note 4. 8. Qubit HS Flourmetric DNA measurement kit: Q32851, Thermofisher Scientific, USA.
3
Methods
3.1 Primer Design and Resuspension
The method described details on how to construct libraries using PCR primers with universal consensus sequences appended to their 50 end, followed by a ligation-mediated PCR step to add sequencing adaptors. However, an alternate method for multiplex resequencing can also be performed using non-fusion primers and enzymatic ligation; see Note 1 and Fig. 1 for a more detailed description of the key differences between the two (Fig. 2). 1. Prior to primer design, users should employ PrimerROC [6] (http://www.primer-dimer.com/roc/) to determine the dimer score, which will assist in determining the threshold at which primer dimer artifacts occur within their PCR buffer system. 2. Go to PrimerSuite (http://www.primer-suite.com/) [5] and paste or upload a FASTA file containing the regions of interest. PrimerSuite accommodates assay design for standard genomic
148
Darren Korbie and Matt Trau
F1 R1 F8 R8 F2 R2 F9 R9 F3 R3
Fig. 2 Suggested layout for primer pairs when ordering in a plate format. For ease of use it is recommended that users order oligos arrayed so that forward and reverse primers are in alternating columns. This simplifies the process of combing the forward and reverse pairs when performing screening and pooling
DNA and bisulfite DNA methylation assays, and contains a variety of parameters thatthe user can tune to their design specifications. The default options PrimerSuite uses are generally a good starting point for most applications; however, if using a PCR buffer system different from what is recommended in this method, the Minimum Dimer Score Cutoff should be changed to that determined using PrimerROC in step 1. The current version of PrimerSuite is restricted to 15 DNA regions. If attempting to design more than 15 regions, then more than multiple different multiplex pools will need to be designed. 3. Once the primers have arrived, resuspend all oligos to the same concentration using low Tris-EDTA buffer (LTE). A standard concentration to resuspend to is 100 μM. The calculation for determining the total volume of LTE to add to achieve a final concentration of 100 nM is total nmole of oligonucleotide 10 ¼ volume of LTE to add for 100 μM final concentration. Example: 9.7 nmole of oligo 10 ¼ 97 μl of LTE for a final concentration of 100 μM. After adding LTE, it is recommended that the tubes be vortexed and then quickly heated for 10 min at 37 C to ensure they are fully dissolved, as the multiplex reaction can be sensitive to differences in primer concentration.
Multiplex PCR Design for Scalable Resequencing
149
100uM stocks FP1
RP1
50uL
50uL
50uM mixed stock FP1 RP1
2uL primer
48uL LTE
2uM screening stock FP1 RP1
Fig. 3 Primer dilution schema for preparing the 50 μM and 2 μM working solutions
4. Make a 50 μM primer-pair stock by combing equal volumes of the 100 μM forward and reverse primers into a single tube. For example, combining 50 μl of forward-primer1 with 50 μl of reverse-primer1 into a single tube gives a total of 100 μl, with each primer at 50 μM individually (Fig. 3). 5. Perform an additional dilution of the 50 μM primer-pair stock to 2 μM, which is required to perform the initial singleplex screen. 2 μl of the 50 μM primer-pair stock added to 48 μl LTE is recommended (Fig. 3). 3.2 Singleplex Screening and Quality Control
After the primer pairs have been resuspended and working stocks have been made, the assays are first run individually in singleplex reactions for preliminary QC, as outlined in the following section. 1. Using the 2 μM primer pair stocks, perform a screening reaction using the PCR conditions below to check the amplification
150
Darren Korbie and Matt Trau
(+)(-) (+)(-) (+)(-) Product
Dimer Yes
No
Yes
Include in multiplex Fig. 4 Representative gel image showing expected PCR products for the singleplex screen. (+) PCR with template; () no template control. The primer pairs on the left and right can both be included in the final multiplex pool. The primer pair in the middle gives a strong dimer band in both the template and no-template controls, and should be excluded from the multiplex pool
fidelity of the primer pair, and to assess whether individual assays form primer dimers. Ensure that no template controls are run for all assays, as the no-template controls assist in determining the tendency of the assay to form primer dimers (Fig. 4). Note: the following reaction conditions will work for both Phusion Green (Thermofisher) and GoTaq Flexi Green (Promega). Alternately, the PCR assays can be screened using your buffer system of choice.
Component
Final Volume concentration Notes
Water
Add to 25 μl
5 green
5 μl
1
10 mM dNTPS
0.5 μl
200 μM each
2 μM forward/ reverse primer pair mix
2.5 μl
200 nM each
Template DNA
μl
Hot start DNA polymerase
0.25 μl 0.02 U/μl
25 mM MgCl2
6 μl
Both Phusion and GoTaq are supplied at 5 concentration
5–50 ng input recommended. See Note 5
6 μM
PCR cycling conditions
See Note 6
Multiplex PCR Design for Scalable Resequencing
151
95 C 7 min 9 95 C 20 s > > > > > 56 C 30 s = > 62 C 30 s > > > > ; 72 C 90 s
35 cycles
The above cycling conditions are sufficient for the majority of multiplex reactions (i.e., amplicons less than 200 bp in size, and less than 20 amplicons in a single pool). For denser amplicons and/or amplicons which are larger in size, it may be beneficial to increase the extension time, to ensure equal amplification of all products. Perform QC by running the PCR products out using DNA gel electrophoresis. Assays which give strong dimers with relatively little amplicon of the expended size should be excluded from future use. However, PCR assays which amplify well but produce a small amount of dimer can still be utilized in the multiplex reaction (Fig. 4). 3.3 Multiplex Screening and QC
1. Combine equal volumes of all primer pair assays which passed the singleplex QC to make the final multiplex pool. 2. Based on the pooling, work out the concentration of each individual primer within the reaction. For example, if 20 μl from 14 different primer pairs were combined to create a multiplex pool, and the starting concentration of each primer pair stock was 50 μM, the final concentration of each individual primer would be 3.57 μM. This can be calculated in the following way: 14 primer pairs 20 μl volume each pair at 50 μM ¼ 280 μl total volume. Using the standard dilution factor formula C1V2 ¼ C2V2. (50 μM)(20 μl) ¼ C2 (280 μl). C2 ¼ 3.57 μM. 3. Perform a PCR screen using the same conditions as outlined in the singleplex screening reaction above, with the following changes. (a) The total volume of primers put into the PCR reaction needs to be adjusted so that the final volume of each individual primer within the reaction is 200 nM. Using the example calculation detailed in step 2 above, this would be: C1V2 ¼ C2V2. (0.2 μM final primer concentration) (25 μl final PCR reaction) ¼ (3.57 μM initial primer concentration) V2.
152
Darren Korbie and Matt Trau
Table 1 Recommended cycles for multiplex amplification # of PCR assays in multiplex
# of PCR cycles to perform for QC
# of PCR cycles to perform for library preparation
2
34
31
4
33
30
8
32
29
16
31
28
32
30
27
64
29
26
V2 ¼ 1.4 μl of multiplex primer pool to add to the 25 μl PCR reaction. (b) The total number of PCR cycles needs to be reduced, since product formation is now occurring at a faster rate proportional to the number of assays in the pool. To estimate the number of cycles, the PCR should be reduced by, refer to Table 1. 4. Assess multiplex quality by running the PCR products out using DNA gel electrophoresis. A single strong band should be observed of the expected size, with minimal-to-no primer dimer observed, similar to the first panel in Fig. 4. 3.4 Multiplex Amplification of Target Samples and Library Preparation
1. Once the multiplex pool has been confirmed to work, proper samples can now be amplified. PCR conditions remain identical, except that the total number of PCR cycles should be further reduced as outlined in Table 1. See Note 7. 2. Add 2 μl of Exonuclease I to each multiplex pool reaction to remove unincorporated primers. Incubate at 37 C for 15 min. 3. Clean up the PCR reaction using AMPure XP beads, or a similar SPRI bead system, following the vendor’s recommended protocol (see Note 4). Based on the size of the amplicons produced, the following ratio of beads to PCR reaction is recommended. Amplicons equal to or smaller than 150 bp in size: Use 1.5 volume of beads. Amplicons larger than 150 bp in size: Use 1.2 volume of beads. 4. Resuspend the sample in 20 μl of LTE and incubate at 37 C for 5 min to fully elute the DNA from the beads. 5. Prepare the barcoding PCR reaction according to the following recipe. Note: Only the Phusion polymerase is used for this step to ensure high-quality sequencing data.
Multiplex PCR Design for Scalable Resequencing
Component
Volume Final concentration
Water
Add to 30 μl
5 Phusion Green
5 μl
1x
10 mM dNTPS
0.5 μl
200 μM each
2 μM forward/reverse access array 5 μl barcoding primer
153
333 nM each See Notes 1 and 10 for sequencing on other platforms
Purified amplicons from step 4
10 μl
Phusion hot start DNA polymerase
0.25 μl
0.02 U/μl
25 mM MgCl2
7.2 μl
6 μM
6. Perform the barcoding PCR using the following cycling conditions Barcoding PCR
94 C 5 min9 94 C 20 s > > = 45 C 30 s > > ; 72 C 120 s
9 cycles
7. Perform QC by running the PCR products out using DNA gel electrophoresis. The expected size of the final library is roughly 150 basepairs larger than the original amplicon size, and if the process has been successful, a single strong band should be observed, although frequently a smaller dimer band around 150 bp can be seen. If too much or too little product is observed, the total PCR cycles may have to be adjusted up or down. If the predominant product is less than 200 bp in size, then primer-dimer artifacts have dominated the barcoding PCR reaction. In this case, refer to Note 8 for troubleshooting. 8. Once the final libraries are confirmed to have worked, the individual samples can be pooled together for sequencing. If the total amount of product as visualized by gel electrophoresis is roughly equivalent between all samples, then equal volumes of every sample can be combined together. If the samples vary in their total yield, see Note 9 for pooling recommendations. Once the libraries are pooled in this way, sequencing on the platform of choice can be performed. In determining the best sequencing configuration, users should consult with their local genomic service provider.
154
4
Darren Korbie and Matt Trau
Notes 1. The choice of whether to use primers with 50 fusion sequences verses non-fusion primers has several implications. The following consideration may assist users in determining which is the best to use. (a) For projects that have a large number of samples (i.e., 100 samples or more) with a small number of regions for analysis (i.e., less than 20 regions being interrogated), using fusion primers is recommended as this allows up to 384 samples to be batched together on a single sequencing run, which is the most cost-effective option. (b) For projects with a large number of regions which require relatively dense multiple reactions per pool (i.e., more than 50 primer pairs in a single reaction), non-fusion primers may be preferable, as the inclusion of the fusion sequence increases the likelihood that primer-dimers artifacts will occur [6]. If users wish to use a ligation method for their library construction, some additional components are required. A separate set of index adaptors that suit the sequencing platform of choice will need to be ordered directly from the platform vendor, e.g., Illumina, Ion Torrent, Oxford Nanopore, or others. Next, molecular biology reagents for end repair and ligation must also be purchased to perform ligation. NEB’s Ultra II End Prep and Ultra II Ligation Module is one system users can refer to. 2. The fusion sequences employed were originally developed by Fluidigm for their Access Array platform, and were originally 23 bases long. However, the sequences provided for the method described in this chapter are 14 bp in length. Both sequences are illustrated below. Fluidigm’s original CS1
ACACTGACGACATGGTTCTACA
Truncated 14 bp CS1
GACATGGTTCTACA
Fluidigm’s original CS2
TACGGTAGCAGAGACTTGGTCT
Truncated 14 bp CS2
CAGAGACTTGGTCT
The 14 bp truncated version is preferred since shorter fusion sequences generally result in less dimer formation during PCR [6]. As well, 14 bp was empirically determined to be the smallest length that could be used which still functions well using ligation-mediated PCR to incorporate the sequencing adaptors into the amplicon library.
Multiplex PCR Design for Scalable Resequencing
155
3. When more than eight primer pairs are being ordered, it is cheaper and easier to order them in a 96-well plate format. If ordering in a plate format, users should attempt to order in such a way as that the Forward and Reverse primer pairs are laid out in alternating columns, as illustrated below in Fig. 2. Doing so allows the user to rapidly combine the primer stocks into forward + reverse primer pair aliquots using an 8-well multichannel pipette. When ordering in plate format, users should also choose the synthesis option wherein the total nmole in every well is standardized to the same quantity, as this greatly simplifies oligo resuspension and pooling. 4. Although Agencourt Ampure XP beads are listed in this method, any type of Solution Phase Reversable Immobolization (SPRI) bead-based DNA purification can be used. If using a different SPRI system than AMPure XP, some minor optimization regarding the ratio of beads:PCR may be required. 5. The method is flexible in the amount of input template to use, with the method having previously worked well with as little as 5 ng of input DNA. However, putting more templates into the reaction up to 50 ng can be beneficial and can help to reduce primer-dimer formation. If possible, users can screen a variety of template input amounts to determine the best quantity to use. 6. This recommended workflow uses a 6 mM final concentration of Mg2+ in the PCR conditions. In general, we have found the increased Mg2+ concentration does not increase off-target amplification, but does greatly enhance product formation. If users are having difficulty with off-target amplification, then a lower concentration of Mg2+ can be used. 7. The total number of PCR cycles is further reduced when amplifying actual target samples for two reasons. The first is to ensure that the reaction does not enter into a non-exponential product formation phase, which can be problematic and occurs at higher cycles. Second, differences in efficiency between high- and low-performing assays are magnified if too many PCR cycles occur. The functional outcome of this is that read coverage in the final sequencing reaction can be dominated by a just a few highly performing primer pairs, and the easiest way to control for this is to reduce the total number of PCR cycles. 8. The final product produced in the final barcoding PCR reaction should be approximately 150 bp larger than the original amplicon size, and should be readily visible when run on a gel. However, if the dominant product is 200bp or less, then the exonuclease I treatment and SPRI bead cleanup were not
156
Darren Korbie and Matt Trau
effective in completely removing the residual unincorporated multiplex PCR primers. If this is the case, additional exonuclease I can be put into the reaction and the digest allowed to proceed for longer; and/or an additional SPRI bead clean may have to be performed. 9. Frequently some samples will produce more library than others due to variation in sample input and/or quality. In such cases, it is recommended to pool samples together based on their relative product yield first, then combine equal masses of each library into a final pool. For example, pool equal volumes of the high-yield samples into one tube, and the low-yield samples into a second tube. After pooling in this manner, each pool is then cleaned with SPRI beads and resuspended in 30ul LTE. Next, measure the concentration of each pooled sample using a fluorometric method such as Qubit HS (High Sensitivity). Finally, based on the concentration of the samples, combine the two pools in such a way so that there is an equal mass of each library in the final pool. Performing the pooling this way ensures that low-yield samples are not titrated out in the final sequencing reaction by the high-yield samples, and helps to normalize read coverage. 10. The process of preparing amplicon pools for sequencing requires an additional step which adds the platform-specific DNA sequences that enable sequencing, as detailed in Fig. 1. For example, Illumina utilizes DNA sequences referred to as P5 and P7 which are specific to their platform; Ion Torrent utilizes the A and P1 sequences. Sequencing adaptors are therefore specific and unique to each platform. The sequencing adaptors also contain an extra segment of DNA referred to as the barcode. The barcode is a small string of DNA bases typically 8–10 bp in length which are unique for every sample; in this way multiple samples can be combined together on a single sequencing run, and after the barcode is sequenced, the amplicons that relate to the original starting sample can be determined. Currently, Fluidigm only directly sells barcode adaptors to be used with the CS fusion sequences and Illumina sequencing platforms. However, the multiplex method can easily be adapted to other platforms with relative ease. If the user wishes to use Oxford Nanopore to sequence, the CS1 and CS2 sequences should be replaced with the following sequences instead: Forward primer 50 - TTTCTGTTGGTGCTGATATTGC —your primer sequence-30
Multiplex PCR Design for Scalable Resequencing
157
Reverse primer 50 - ACTTGCCTGTCGCTCTATCTTC —your primer sequence-30 Users are also referred to Oxford Nanopores Four-primer PCR workflow (Documents SQK-PSK004 or SQK-PBK004). If the user wishes to use Ion Torrent platforms to sequence, the CS1 and CS2 sequences can still be used, but additional HPLC purified PCR barcode primers must be ordered in, since (at the time of this protocol) no commercial set can be purchased. Note: Substitute in the DNA barcode of choice for the A-barcode X-CS2 primer, and order sufficient barcode primers to ensure every sample of interest can be uniquely barcoded. Order the following at 100nmol synthesis scale, HPLC purified: A-barcode X-CS1 CCATCTCATCCCTGCGTGTCTCCGACTCAG [barcode]GATACACTGACGACATGGTTCTACA Order the following at 100 nmol synthesis scale, HPLC purified P1-CS2 CCTCTCTATGGGCAGTCGGTGATTACGG TAGCAGAGACTTGGTCT For example, if the IonExpress 001 and 002 barcodes were selected, the following sequences would be used (barcode sequence is bolded and underlined): A-IonXpress001-CS1 50 CCATCTCATCCCTGCGTGTCTCCGACTCAG CTAAGGTAACGATACACTGACGACATGGTTCTACA A-IonXpress002-CS1 50 CCATCTCATCCCTGCGTGTCTCCGACTCAG TAAGGAGAACGATACACTGACGACATGGTTCTACA References 1. Saiki RK, Scharf S, Faloona F, Mullis KB, Horn GT, Erlich HA, Arnheim N (1985) Enzymatic amplification of beta-globin genomic sequences and restriction site analysis for diagnosis of sickle cell anemia. Science 230:1350–1354 2. Korbie D, Lin E, Wall D, Nair SS, Stirzaker C, Clark SJ, Trau M (2015) Multiplex bisulfite PCR resequencing of clinical FFPE DNA. Clin Epigenetics 7:28 3. Magor GW, Tallack MR, Klose NM, Taylor D, Korbie D, Mollee P, Trau M, Perkins AC (2016) Rapid molecular profiling of myeloproliferative neoplasms using targeted exon resequencing of 86 genes involved in JAK-STAT
signaling and epigenetic regulation. J Mol Diagn 18:707–718 4. Li W, Turner A, Aggarwal P, Matter A, Storvick E, Arnett DK, Broeckel U (2015) Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis. BMC Genomics 16:1–13 5. Schmid P, Pinder SE, Wheatley D, Macaskill J, Zammit C, Hu J, Price R, Bundred N, Hadad S, Shia A et al (2016) Phase II randomized preoperative window-of-opportunity study of the PI3K inhibitor pictilisib plus anastrozole compared with anastrozole alone in
158
Darren Korbie and Matt Trau
patients with estrogen receptor-positive breast cancer. J Clin Oncol 34:1987 6. Stirzaker C, Zotenko E, Song JZ, Qu W, Nair SS, Locke WJ, Stone A, Armstong NJ, Robinson MD, Dobrovic A, others (2015) Methylome sequencing in triple-negative breast cancer reveals distinct methylation clusters with prognostic value. Nat Commun 6:1–11 7. Al-Tebrineh J, Pearson LA, Yasar SA, Neilan BA (2012) A multiplex qPCR targeting hepato-and neurotoxigenic cyanobacteria of global significance. Harmful Algae 15:19–25 8. Oscorbin I, Kechin A, Boyarskikh U, Filipenko M (2019) Multiplex ddPCR assay for screening copy number variations in BRCA1 gene. Breast Cancer Res Treat 178:545–555 9. Stone A, Zotenko E, Locke WJ, Korbie D, Millar EK, Pidsley R, Stirzaker C, Graham P,
Trau M, Musgrove EA et al (2015) DNA methylation of oestrogen-regulated enhancers defines endocrine sensitivity in breast cancer. Nat Commun 6:1–9 10. Lu J, Johnston A, Berichon P, Ru K, Korbie D, Trau M (2017) PrimerSuite: a highthroughput web-based primer design program for multiplex Bisulfite PCR. Sci Rep 7:1–12 11. Lam D, Luu P-L, Song JZ, Qu W, Risbridger GP, Lawrence MG, Lu J, Trau M, Korbie D, Clark SJ et al (2020) Comprehensive evaluation of targeted multiplex bisulphite PCR sequencing for validation of DNA methylation biomarker panels. Clin Epigenetics 12:1–16 12. Johnston AD, Lu J, Ru K, Korbie D, Trau M (2019) PrimerROC: accurate conditionindependent dimer prediction using ROC analysis. Sci Rep 9:1–14
Part IV Primer Design for qPCR
Chapter 12 Identification of Gene Copy Number in the Transgenic Plants by Quantitative Polymerase Chain Reaction (qPCR) Poonam Kanwar, Soma Ghosh, Sibaji K. Sanyal, and Girdhar K. Pandey Abstract Transgenic events are defined as exogenous DNA insertion in the genome through genetic transformation. It is a powerful means for the improvement of crop plants and to understand the gene function. Multiple DNA insertion events may occur at one or several chromosomal locations. One of the important tasks, after validation of the transformation of transgenic plants, is the identification of single copy in the transgenic. This means the insertion of exogenous DNA fragment only in a single locus in the genome. Southern blot hybridization is a convincing and reliable method, for estimation of copy number in transgenic lines but it is cumbersome and time-consuming process. One of the other well-known methods is quantitative polymerase chain reactions (qPCR), a simple and rapid method to identify copy number from a population of independent transgenic lines. In comparison to the Southern hybridization method, qPCR is simpler to perform, requires less DNA, lesser time and does not require any labeled probes. This method utilizes specific primers to amplify target transgenes and endogenous reference genes. Designing an appropriate and specific primer pair is a very crucial part of the estimation of the gene copy number. In this chapter, we have illustrated a detailed methodology for identification of the gene copy of the transgenic plants. Key words qPCR, SYBR Green, Primer, Gene copy number, Transgenic plant
1
Introduction Transgenic plants are extensively used to understand gene function(s) and also for other biotechnological studies. Introduction of exogenous DNA into the plant genome leads to the generation of transgenic plants and number of copies of exogenous DNA inserted in to the genome of the transgenic plant is defined as transgene copy number. Effectiveness of any transgenic depends on the copy number. Multiple transgene copies could lead to higher expression of the gene as well as result in transgene silencing. Therefore, transgene copy number (TCN) determination is usually an essential part of transgene studies and gene function analysis.
Poonam Kanwar and Soma Ghosh contributed equally. Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_12, © Springer Science+Business Media, LLC, part of Springer Nature 2022
161
162
Poonam Kanwar et al.
The conventional method for TCN determination is Southern blot hybridization. Southern blot hybridization-based TCN determination is both costly, time-consuming, and requires microgram quantities of high-quality DNA. A robust alternative to Southern blot analysis is quantitative PCR (qPCR), for a high-throughput determination of TCN [1, 2]. qPCR is a simple modification of PCR to quantify the amount of target DNA by introducing fluorescent or intercalating dyes to detect PCR product as it accumulates in realtime during PCR cycles. It gives absolute or relative quantification of any DNA, including chromosomal DNA, mitochondrial DNA (mtDNA), chloroplast DNA (ctDNA/chDNA), or cDNA generated by reverse transcription of RNA. It has emerged as the method of choice for fast, affordable, and efficient estimation of copy number [3, 4]. There are two methods used for quantification of copy number via qPCR: external standard curve-based method and the ΔCt method involving an internal reference gene [5]. In this chapter, we will discuss the methodology for the estimation of transgene copy number based on internal reference gene methods. In this method, qPCR involves amplification of a test locus with unknown copy number and a reference locus with known copy number. Primer designing is a crucial part for precise and accurate estimation of transgene copy number with qPCR. Here, the details of methodology and statistical model will be discussed for the estimation of transgene copy based on a standard curve generated for the endogenous reference gene. The data quality control issue of transgene copy estimation for a robust and reproducible result is discussed at the end of this chapter.
2 2.1
Materials Plant Material
CIPK9 overexpressing transgenic plants developed in Arabidopsis thaliana Colombia-0 (Col-0). Wild-type (WT) Col-0 plants were used as control.
2.2 Material for Genomic DNA Isolations
15 days old Arabidopsis seedlings grown in half-strength Murashige and Skoog (MS) media.
2.3 Genomic DNA Isolation
DNA isolation buffer (5 mM Tris–HCl at pH 8.0, 100 mM NaCl, 50 mM EDTA, 10% SDS, 10 mM β-mercaptoethanol), Trissaturated phenol, chloroform: isoamyl alcohol (24:1), isopropanol, 70% ethanol.
2.4 Genomic DNA PCR
10X Taq buffer, 2.5 mM MgCl2, 2.5 mM dNTPs, 0.2 pmol primers, Taq polymerase, double-distilled milli-Q water (ddMQ), genomic DNA and PeQSTAR Thermocycler.
Identification of Gene Copy Number in the Transgenic Plants. . .
2.5
Primer Design
2.6 Genomic DNA qPCR
3 3.1
163
Reference nucleotide sequence: Genomic DNA sequence of 4-hydroxyphenylpyruvate dioxygenase (4HPPD) (AT1G06570); and hygromycin phosphotransferase (hptII) from pCAMBIA1300 vector. Sequence analysis tools: The Arabidopsis Information Resource (TAIR); www.arabidopsis.org. Primer design tools: Primer designing tool – National Center for Biotechnology Information (NCBI); www.ncbi.nlm.nih.gov. Biological samples: Genomic DNA in different serial dilution. Chemicals: SYBR Green I master mix (Roche Molecular Biochemicals). Milli-Q water, primers for (4HPPD and hptII). Instrumentation: Nanodrop (Thermo Fisher SCIENTIFIC NANODROPC), qPCR-compatible 96-well plates (Agilent), Agilent adhesive plate seals, Agilent AriaMx qPCR, a centrifuge with rotor adapted for microtiter plates cycler instrument (AOSHENG mini-plate centrifuge).
Methods Plant Material
1. CIPK9 overexpressing Arabidopsis plants were generated using a binary vector, which is derived from the pCAMBIA1300 vector under the control of 2X CaMV 35S promoter and NOS terminator. Strandard transformation protocol was used to transform the Col-0 to generate the transgenic plants. 2. Three transgenic CIPK9 overexpressing lines, numbered #2, #5, and #9 were selected for the analysis of the TCN in this study.
3.2 Isolation of Genomic DNA of Arabidopsis Seedlings
1. Seeds of CIPK9 overexpressing transgenic lines and Col-0 were plated in 1/2 MS media containing 1% sucrose, vernalized for 2 days at 4 C in the dark and subsequently grown at 22 C in 16 h in light/8 h dark cycle. Genomic DNA was extracted after 15 days from both transgenic and Col-0 seedlings. 2. Seedling (0.01–0.1 g) was homogenized in a 1.5 ml microcentrifuge tube. 500 μl of DNA isolation buffer was added to the homogenized tissue. This was then incubated at room temperature for 15 min, followed by spinning at 10,625 g for 2 min. at 4 C and then the supernatant was transferred to a new micro-centrifuge tube. Finally 200 μl Tris-saturated phenol (Basic) was added to the supernatant, mixed well by vortexing for 1 min., and incubated at room temperature for 2 min. 3. After this, an equal volume of chloroform: isoamyl alcohol (24:1) was added and mixed very gently (to avoid shearing of the DNA) by inverting the tube until the phases were
164
Poonam Kanwar et al.
Fig. 1 2μg of gDNA was loaded in 0.8% agarose gel to assess the quality of isolated gDNA of Col-0 and transgenic lines
completely mixed; and then centrifuged at 10,625 g for 3 min. at 4 C. The upper aqueous phase was transferred to a new tube. 4. To this, 2 volumes of isopropanol was added and mixed gently until the DNA precipitated. The tube was incubated in ice for 15–20 min to aid the DNA precipitation. The tube was then centrifuged at 12,470 g for 10 min. at 4 C. DNA precipitate was washed two times with 70% ethanol. 5. The DNA pellet was dried at 37 C for 10–20 min. Then it was resuspended in Milli-Q water with RNase A (10 mg/ml) and then the tube was kept at 37 C for 1 h for digestion of RNA. 6. Concentration of DNA was determined by measuring the absorbance at 260 nm using a Nanodrop. These gDNA samples can be stored at 4 C for the short term or 20 or 80 C for the long-term period. Figure 1 shows the 2 μg of gDNA (overexpression lines of CIPK9 along with Col-0) loaded on 0.8% agarose gel for determining the quality of gDNA. 3.3 Genomic DNA PCR
1. For quantification of copy number from a transgene, we have selected the hptII gene in the pCAMBIA-based binary vector. hptII gene sequence was taken from reference nucleotide sequence (NCBI) and primers was designed for normal PCR based on the sequence. Forward 50 -GCCTGAACTCACCG CGACG-30 and reverse 50 -CTCATCGAGAGCCTGCGCG-30 primers of the hptII were used for genomic DNA PCR to detect the transgene in the putative transgenic plants and Col-0 was used as a negative control. The amplified product was analyzed on 0.8% agarose gel. Amplification of the ACTIN gene was used as control for PCR (Fig. 2). The ACTIN gene PCR amplification was performed using the primer pair 50 -ATGGCTGAGGCTGATGATATT-30 and 50 -TTAGAAAC ATTTTCTGTGAAC-30 (see Note 1).
Identification of Gene Copy Number in the Transgenic Plants. . .
165
Fig. 2 Genomic PCR of the transgenic lines of CIPK9 along and Col-0 performed with primers for Hpt11. Actin PCR was performed for all the samples as a loading control
2. The following reaction mixture was used for the PCR reaction: 10 Taq buffer, 1.5 μl; 2.5 mM MgCl2, 2 μl; 2.5 mM dNTP, 1.5 μl; 0.2 pmol forward primer, 0.7 μl; 0.2 pmol reverse primer, 0.7 μl; Taq polymerase: 0.1 μl (0.5–1.0 U/50 μl of reaction mix); ddH2O to a final volume of 15 μl. Target DNA: 100 ng (genomic DNA). 3. PCR reactions were carried out using PeQSTAR Thermocycler machine according to the following protocol: Initial denaturation at 94 C, 4:00 min. 1 cycle. Denaturation: 94 C, 0:30 s. Annealing: 58–65 C, 0:30 s 27–40 cycles and Extension: 72 C, 1:00 min. Final extension: 72 C, 5:00 min. 1 cycle. Based on our results, CIPK9 transgenic line # 2 and # 9 were selected for the detection of the copy number since line number #5 did not yield amplicon with hptII primers. 3.4 Primer Design for TCN Analysis
1. For quantification of copy number of a transgene, we again selected the hptII in the pCAMBIA1300 vector. 4HPPD was used as reference gene (internal control) since it is a single-copy gene in the genome of Arabidopsis [6–8]. Full-length genomic DNA sequence of the 4HPPD was searched, analyzed, and downloaded from online available public databases TAIR (see Notes 2 and 3).
166
Poonam Kanwar et al.
Fig. 3 Screenshot of the NCBI primer designing tool. (a) Screenshot showing different options for manual modification of various important primer parameters to obtain an optimum combination. Various primer parameters such as Tm, GC content, primer length, and amplicon size can be modified according to the requirement of user. (b) Screenshot showing display of the primer search outcome in the NCBI primer designing tool
2. Primer designing tools of the NCBI were used, and the following important factors were considered for designing qPCR primers: GC content of 50–60%, amplicon of 75–100 bp is ideal since short PCR products are typically amplified with higher efficiency than longer ones. Also, the primer length should be 18–24 bp for qPCR analysis (Fig. 3). 3. It is important to assess the specificity of the primer. Therefore, the BLAST tool available with NCBI and TAIR was used to align the primers with a selected reference sequence and
Identification of Gene Copy Number in the Transgenic Plants. . .
167
another similar sequence. This step is done to ensure that the respective primer(s) bind only to the desired region on the reference sequence and not to any other nonspecific sequence in the genome. 4. The 4HPPD gene primer pair used in this study is 50 -CGGC TCTTGTCGTTCCTTCT -30 and 50 - TGGAGAAAGCT GACTCTGCG-30 . The hptII primer pair used in this study is 50 -CCTGACCTATTGCATCTCCCG 30 and 50 -CCTCCGC GACCGGTTGTA-30 . 3.5 qPCR for TCN Analysis
1. DNA from each sample were diluted in the different concentrations. The dilutions were 100 ng/μl, 10 ng/μl, 1 ng/μl, 100 pg/μl, 10 pg/μl, 1 pg/μl, 0.1 pg/μl, and 0.01 pg/μl. 2. Total reaction volume for each sample is 10 μl and contains 1 μl of genomic DNA, 10 pmol of each primer, 1 μl of SYBR Green I master mix. 3. A master mix containing all of the components was prepared and distributed in 96 plates. After this, the plate was sealed. The sealed plates were then briefly centrifuged at 500 g, at room temperature to ensure that all liquid was at the bottom of wells (see Note 4). 4. qPCR was performed using the protocol: 95 C for 10 min. followed by 35 cycles with 95 C for 10 s, 55 C for 30 s, and fluorescent detection step at 55 C (automated for the qPCR machine). 5. Genomic qPCR was performed with each primer pair of the hptII and 4HPPD with all the dilutions of the CIPK9 transgenic lines and Col-0 (see Notes 4 and 5).
3.6 Quantification of Copy Number
1. qPCR was done using either 4HPPD primers or hptII primers for three biological repeats and three technical repeats for two independent CIPK9 overexpressing Arabidopsis plants (Line #2 and Line #9). 2. There was no change in the Ct values in lower dilutions (ranging from 10 to 0.01 pg/μl); therefore, we used Ct values of four serial dilutions (ranging from 100 ng/μl to 100 pg/μl) to obtain the standard curves for each gene (for each line). The correlation coefficients of the standard curves in the range between 0.96 and 0.99. The standard curves are shown in Fig. 4 (see Notes 6 and 7). 3. qPCR efficiency was determined using the slope and intercept values from the standard curve. In our case, we obtained an efficiency of more than 90% for each primer set. For calculation, we used the qPCR efficiency calculator from Thermo Scientific
a
b
Ct
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
0
1
0.5
1
2 2.5 3 3.5 1.5 Log DNA concentration per microL
4
y = –3.8484x + 35.585 R2 = 0.9934
4
4.5
4.5
y = –3.0832x + 33.254 R2 = 0.9643
1.5 2 2.5 3 3.5 Log DNA concentration per microL
Standard Curve hptII Line #2
0.5
Standard Curve 4HPPD Line #2
C
d
0
5
10
15
20
25
30
35
0
5
10
15
20
25
30
35
0
0
1
0.5
1
1.5 2 2.5 3 3.5 Log DNA concentration per microL
4
y = –3.2463x + 33.115 R2 = 0.9667
4
4.5
4.5
y = –3.2648x + 33.137 R2 = 0.9959
1.5 2 2.5 3 3.5 Log DNA concentration per microL
Standard Curve hptII Line #9
0.5
Standard Curve 4HPPD Line #9
Fig. 4 Standard curves generated through the Ct values of Line #2 and Line #9. (a) and (b) Standard curves of Ct value for 4HPPD and hptII for CIPK9 overexpressing transgenic Line #2. Different serial dilutions of genomic DNA (100 ng, 10 ng, 1 ng, and 100 pg). (c, d) Standard curves of Ct value for 4HPPD and hptII for CIPK9 overexpressing transgenic Line #9. Different serial dilutions of genomic DNA (100 ng, 10 ng, 1 ng, and 100 pg)
Ct
Ct Ct
168 Poonam Kanwar et al.
Identification of Gene Copy Number in the Transgenic Plants. . .
169
(www.thermofisher.com/in/en/home/brands/thermo-scien tific/molecular-biology/molecular-biology-resources-library/ thermo-scientific-web-tools/qPCR-efficiency-calculator. html). 4. For calculation of copy number, we followed the protocol from Weng et al. [9]. The slope and intercept values derived from the standard curve were used. The average Ct values for the four dilutions were used. All these values were plotted in an equation CxIx CrIr X ¼ 10ð Sx Sr Þ R
Cx and Cr are average Ct values of hptII and 4HPPD for a particular line, Ix and Ir are intercepts for hptII and 4HPPD, respectively, and Sx and Sr are slopes for hptII and 4HPPD, respectively. The value of X/R is doubled to obtain the copy number [9]. Following this formula, we found that Line #2 in our experiment has 1 copy of CIPK9 and Line #9 has 2 copies of CIPK9.
4
Notes 1. DNA quality is an important factor for TCN analysis. So, it should be checked using a normal PCR reaction and any housekeeping gene primers to ascertain the yield. 2. Designing primers is a very crucial part of the TCN analysis. It may be challenging to design qPCR primers that contain a DNA sequence that is not unique within the genome. For the reference genes, it should have only single copy of the gene in the genome. We used the 4HPPD gene in our analysis. 3. Since CIPK9 is also an endogenous gene, using this we would not have been able to identify the variation in copy number. So, we used hptII (present in the binary vector) primers for the quantification of the copy number in the CIPK9 overexpressing transgenic lines. 4. qPCR requires biological and technical triplicates for each run. It is also important to tighten the seal of the PCR plate because evaporation in a well may result in the loss of signal in the respective well in the test PCR plate. Also, it is necessary to determine visually if evaporation occurred in some of the wells to rule out the possibility of failed amplification. The other possible reasons for failed PCR might be nonoptimal PCR conditions due to primers or reaction components. 5. Contamination while executing the qPCR reactions might result in positive signal in the negative control. In this case, remaking all critical components for the qPCR reaction will
170
Poonam Kanwar et al.
help in resolving the problem. The specificity of the assay can be assessed by melting curve analysis (needs to be performed in each run). Nonspecific amplification can be detected by the presence of different melting temperature curves either above or below the specific one. 6. The slope of the standard curve should be in the range of 3.0–3.4. Also, the regression coefficient should be in the range of 0.95–0.99. The PCR efficiency should be between 90% and 110%. These factors should be carefully calculated before copy number calculation. If they are not in range, careful optimization is to be performed. 7. We have used four dilutions. Using higher-order dilutions (diluting the DNA further) will flaten the curve and as a result, will affect the properties of the standard curve.
5
Conclusion 1. We have described here a quantitative real-time PCR assay in Arabidopsis for estimation of exogenous hptII copy numbers by comparison with the endogenous reference gene coding for 4HPPD. 2. Using this method is a definite advantage over the standard Southern blot analysis. This can be done by comparing the Ct signals of the target gene and an endogenous control gene. 3. This method is easier to handle, highly reliable, and useful in the identification of copy numbers in transgenic lines.
Acknowledgement Research work in G.K.P.’s lab is partially supported by grants from the University of Delhi (IoE/FRP grant), Department of Biotechnology (DBT), Department of Science and Technology (DST), and Council of Scientific and Industrial Research (CSIR), India. S.G. acknowledges CSIR for Junior Research Fellowship. References 1. Bubner B, Baldwin IT (2004) Use of real-time PCR for determining copy number and zygosity in transgenic plants. Plant Cell Rep 23:263–271 2. Song P, Cai CQ, Skokut M et al (2002) Quantitative real-time PCR as a screening tool for estimating transgene copy number in WHISKERS™-derived transgenic maize. Plant Cell Rep 20:948–954. https://doi.org/10. 1007/s00299-001-0432-x
3. Mason G, Provero P, Vaira AM, Accotto GP (2002) Estimating the number of integrations in transformed plants by quantitative real-time PCR. BMC Biotechnol:2. https://doi.org/10. 1186/1472-6750-2-20 4. Schmidt MA, Parrott WA (2001) Quantitative detection of transgenes in soybean [Glycine max (L.) Merrill] and peanut (Arachis hypogaea L.) by real-time polymerase chain reaction. Plant
Identification of Gene Copy Number in the Transgenic Plants. . . Cell Rep 20:422–428. https://doi.org/10. 1007/s002990100326 5. Yuan JS, Burris J, Stewart NR et al (2007) Statistical tools for transgene copy number estimation based on real-time PCR. BMC Bioinformatics. https://doi.org/10.1186/ 1471-2105-8-S7-S6 6. Garcia I, Rodgers M, Pepin R et al (1999) Characterization and subcellular compartmentation of recombinant 4-hydroxyphenylpyruvate dioxygenase from arabidopsis in transgenic tobacco. Plant Physiol 119:1507–1516. https://doi.org/ 10.1104/pp.119.4.1507 7. Honda M, Muramoto Y, Kuzuguchi T et al (2002) Determination of gene copy number
171
and genotype of transgenic Arabidopsis thaliana by competitive PCR. J Exp Bot 53:1515–1520. https://doi.org/10.1093/jxb/53.373.1515 8. Kihara T, Zhao CR, Kobayashi Y et al (2006) Simple identification of transgenic Arabidopsis plants carrying a single copy of the integrated gene. Biosci Biotechnol Biochem 70:1780–1783. https://doi.org/10.1271/ bbb.50687 9. Weng H, Pan A, Yang L et al (2004) Estimating number of transgene copies in transgenic rapeseed by real-time PCR assay with HMG I/Y as an endogenous reference gene. Plant Mol Biol Report 22:289–300. https://doi.org/10. 1007/BF02773139
Chapter 13 qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design Wei Chang, Yue Niu, Mengna Yu, Tian Li, Jiana Li, and Kun Lu Abstract Real-time quantitative polymerase chain reaction (qPCR) is a powerful tool for analyzing and quantifying gene expression, and its primer design is the first and most important step. In order to improve the efficiency and effectiveness of primer design, we designed a database qPrimerDB, based on thermodynamics gene-specific for multispecies qPCR primers design. In this chapter, we explained the working principle of the database and detailed the step-by-step practical steps with examples. The valuable and time-saving qPrimerDB database is publicly accessible at http:// biodb.swu.edu.cn/qprimerdb and will be routinely updated. Key words qPrimerDB, Quantitative real-time polymerase chain reaction, Free online database, Primer design, High-level efficiency
1
Introduction Real-time quantitative polymerase chain reaction (qPCR) has become one of the most powerful tools for molecular genetic studies and has been widely used to quantify the gene expression levels, even those with low expression levels, in clinical and biological fields since 1993 [1, 2]. The fluorescence signal is monitored based on the whole classical PCR reaction by adding fluorophore. Then, the initial concentration of desired gene is quantified through the Ct (cycle threshold) value, which has a linear relationship with the logarithm of the initial copy number of the template. Nowadays, there are two most population methods to detect the fluorescence signal in qPCR. One is the unspecific detection independent of the target sequence using fluorescent dyes such as SYBR Green, and another one is sequence-specific fluorescent oligonucleotide probes such as TaqMan probes or molecular beacons [3]. Either any method, however, the suitability of the designed primers and probes is one of the most essential factors for the successful qPCR, since the specificity of the qPCR is closely related
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_13, © Springer Science+Business Media, LLC, part of Springer Nature 2022
173
174
Wei Chang et al.
to the annealing of primers to their complementary targets [4], and there are many genes with similar sequences, especially in polyploid species. Moreover, previous studies have found that the transcription levels existed greatly variation among different gene regions in several eukaryotes [5–7]. In order to ensure the accuracy and reliability results of qPCR, the primers need to be designed strictly and efficiently; thus, various computational methods, programs, and databases have been developed. Several local or Web-based programs based on Primer3 [8], an important program for primer design, have been developed for batch primer design, including BatchPrimer3 [9], QuantPrime [10], PCRTiler [11], PRIMEGENS [12], and PrimerMapper [13]. The primers designed by these tools lack sufficient specificity, because the primers are only evaluated by the Nucleotide Basic Local Alignment Search Tool (BLASTN) or sequence similarity searching. To address the abovementioned challenge, some programs based on the thermodynamics specificity checked were published, such as MFPrimer-2.0 [14] and the MapReduce-based method, MRPrimer [15]. As a result, many qPCR primer databases were created, such as PrimerBank [16], MRPrimerW [17], and GETPrime 2.0 [18]. However, they contain only a few important species. In our study, we provided the most comprehensive, uniform qPCR primer design database to date, qPrimerDB, to researchers. We developed it through an automatic gene-specific qPCR primer design and thermodynamics-based validation workflow. Furthermore, qPrimerDB database provides precomputed primer pair spanning 147 important organisms including as many sequenced genomes as possible, such as human, mouse, yeast, rice, and zebrafish. Our database contains 3,331,426 of the best primer pairs for each gene, based on primer pair coverage, as well as 47,760,359 alternative gene-specific primer pairs, which can be conveniently batch downloaded and rapid retrieval of qPCR primers for selected genes, via a user-friendly interface. The primers designed by this database for 66 randomly selected genes shown specific and accurate results through qPCR assays and gel electrophoresis. Next, we will introduce the use and function of this website in detail.
2
Materials
2.1 Computer and Websites
Please prepare a computer with Internet access during this procedure. The major websites used in this procedure is qPrimerDB (http://biodb.swu.edu.cn/qprimerdb), as well as other websites you may need to use as a supplement include NCBI (http://www. ncbi.nlm.nih.gov/).
qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design
2.2 DNA Sequences and Gene ID
3
175
DNA Sequences and Gene ID representing the target gene (s) of interest can be based on your own collection or be downloaded from NCBI, as well as a site dedicated to the target species. Sequences should be in a FASTA format.
Methods The qPrimerDB website is a database for qPCR primers. The main page of the website comprises eight functional sections (Fig. 1). On the “Home” page, users can view the site’s basic functions and the latest news, for instance, the qPrimerDB website has been updated to V1.2 since 2018, updated and added 304 new organisms, and a total 516 organisms have been included (see Note 1). The next “Browse” section in the navigation bar, we classify all organisms into four parts “Favorites,” “Animals,” “Plants” and “Others,” help users find target organisms more concisely. In the “Tools” section, we embedded the BLAST program. The “Download” section can help users easily download the target primer sequences in batches (see Note 2). By selecting “Documents,” users can find the manual, pipelines for primer design and database implementation, statistics for each organism, and related resources (see Note 3). In the “help” section, the main purpose is to help users solve
Fig. 1 Browse interface on the “Home” page. A search box is provided in the upper right corner of every page to enable convenient searching of keywords of interest. Functional sections and information summary are picked out in the red box which provides different applications
176
Wei Chang et al.
3.1 Accessing Website 3.2
Primer Search
problems that are difficult to solve in using the website and let us improve the website to make it more user-friendly (see Note 4). A “search” tool is available on the upper right corner of website page, users input gene names, primer IDs, or keywords of interest to get the target primer sequence. Common biological database links exist and can be analyzed by clicking “BioDBP” (see Note 5). In this protocol, we use the Arabidopsis thaliana gene AT1G72390 as a demonstration to explain in detail how to design the qPCR primers step by step and, when appropriate, show optimal parameters. Use your computer or laptop to access the qPrimerDB website via the Internet: http://biodb.swu.edu.cn/qprimerdb. There are two ways to help users find the target qPCR primers: input gene ID to search interest primers (Fig. 2) or entering nucleotide sequence manually (or uploading the nucleotide sequence in FASTA format) to search them by BLAST (Fig. 3). The choice of options depends upon the user’s resources and format.
3.2.1 Search by Gene ID
A search box is provided with the upper right corner of every page to enable convenient searching for keywords of interest. Option 1—users click “organisms” to find option “Arabidopsis thaliana” and enter AT1G72390 into the input box next to “organisms” (see Notes 6 and 7), and then, click on the green button next to input box and wait for the results (Fig. 2a). When the page jumps, users will get the best qPCR primer(s) of AT1G72390 provided by the website, Fprimer: GGCTGAA GATTTTCTCTTAGCG and Rprimer: ACTGTTGCA TATCGTTTGCAG (Fig. 2b, see Note 8). Option 2 to get this result is users can browse the organism of interest in “home” page, by sequentially clicking “Plants,” “Eudicotyledons,” and “Arabidopsis thaliana” (Fig. 2c), enter AT1G72390 into the input box the upper right corner of website page (Fig. 2d), and finally wait for the page to jump to the same result (Fig. 2e). After users clicking the blue button below in primerID in Fig. 2b or primerID A.thaliana.005900v1 in Fig. 2e, detailed information for the primers is presented (Fig. 3a). If the user has other requirements for the primers, please click blue button below in GeneID Fig. 2b or GeneID AT1G72390 in Fig. 2e; all primers will be listed in table format (Fig. 3b).
3.2.2 Search by Sequence BLAST
When users have the nucleotide sequence but not sure about the gene ID, click the “tool’” button in the navigation bar and select BLAST (Fig. 4a). Click the “1. Database” menu and select “Arabidopsis thaliana.” In terms of parameter setting, the E-value is a widely
qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design
177
Fig. 2 Screenshots of the navigation bar and browse module in qPrimerDB. (a) Use the search box to find the primers. For example, input target gene “AT1G72390” in search box and choose “organisms,” “Eudicotyledons,” and “Arabidopsis thaliana.” (b) Best qPCR primer(s) provided by the website after clicking the blue button in (a). (c) Users can browse the organism of interest, for example, by sequentially clicking “Plants,” “Eudicotyledons,” and “Arabidopsis thaliana.” (d) Example of A. thaliana primers in table format. Both the record number per page and the order of each column can be adjusted, as needed. Users can input target gene ID in search box for a more precise lookup. (e) After precise lookup, qPrimerDB will provide the user with the best primer(s)
178
Wei Chang et al.
Fig. 3 qPrimerDB primer details page. (a) Detailed information for primer ID A.thaliana.005900v1 is presented in three sections: Gene Description (gene ID, organism, gene description, and a blue button “All primers for AT1G72390”); Primer Pair Description (primer pair ID and level, amplicon location, amplicon size, amplicon GC content, number of exons spanned, PPC); and Primer Pair Sequences (Tm values of primers, primer sequences, primer length and amplicon and template coding/mRNA sequences). (b) Example of all primers for AT1G72390 in table format. After clicking the blue button in (a), all primers will be listed in table format
accepted measure for assessing potential biological relationship and E-values 0.01 normally suggest homologous sequences, so we can use the default parameters, if you do not have any other requirements. The last step is entering sequence manually or uploading the nucleotide sequence in FASTA format to search them by BLAST and click “submit” button (Fig. 4b).
qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design
179
Fig. 4 Screenshots of search primers by sequence BLAST. (a) BLAST function in navigation bar “Tool” button. (b) BLAST page requires users to select the Database (Animals, Other_eukaryotes, Plants), set the parameters (Program, Expect, Word size and Output Format), and submit the target sequence (or load local sequence file in FASTA format) of the primers to be designed as well, then clicking submit. (c) Detailed information for Blast results
180
Wei Chang et al.
After input AT1G72390 sequence, the website will display the results of the BLAST and users can click the blue “get primers” button at the bottom of the page (Fig. 4c); the page then jumps to Fig. 3 (see Note 9). 3.3 Checking Primers
4
Detailed information for primerID AT1G72390.1 0-299 is presented in three sections: Gene Description (gene ID, organism, gene description); Primer Pair Description (primer pair ID and level, amplicon location, amplicon size, amplicon GC content, number of exons spanned, primer pair coverage); and Primer Pair Sequences (primer sequences, primer length, Tm values of primers, and amplicon sequence) (Fig. 3a, see Note 10). Users can click the blue bottom “All primers for AT1G72390” to get more primers of interest, if have other requirements about qPCR primers design (Fig. 3b, see Note 11).
Notes 1. qPrimerDB database will be updated regularly based on user feedback, while nonspecific primers will be removed. 2. In qPrimerDB, all qPCR primers and the best primers designed for each organism are, respectively, compressed into two zip files. Due to unstable Internet, users also can download two primer files from the “Downloads” page and then pick genespecific primers in personal computer at any time. 3. The primer pairs for each gene are divided into three levels based on PPC and the binding stability of the binding site, and it can be scanned by “Pipeline” from “Documents.” 4. On the Help page, users can scan frequently asked questions (FAQs) and answers. It also shows a Contact Us tab in the Help menu, where users can submit some comments or suggestions about the database establishment. 5. The BioDB Platform is a collaborative project and developed by scientists from professional fields of Molecular Biology, Genomics, Microbiology, Bioinformatics, and Computer Science. 6. Please send a feedback form to us if the qPrimerDB did not cover the organisms of your interest, and we will add the genespecific qPCR primer pairs for you. 7. Some gene qRT-PCR primers may not be included in our database for the time being, please design your own primers according to Note 9. You are also welcome to email us to improve the quality of our database. 8. Website also provides multiple version support for some important organisms, such as cotton, and please make your choice.
qPrimerDB: A Powerful and User-Friendly Database for qPCR Primer Design
181
9. For different annotation files of the same species with different names, users can input FASTA file to convert gene ID according to BLAST function. 10. qRT-PCR primer design for all the template fragments with the following parameters suggestion: amplicon size 80–300 bp; amplicon GC content 40–60%, with an optimal GC content of 50%; primer length 18–28 nt, with an optimal length of 22 nt; and melting temperature (Tm) 58–64 C, with an optimal Tm of 60 C and maximum Tm difference, per primer pair, of less than 3 C. 11. Users are invited to submit their qPCR detection results (from melting curve analysis and/or gel electrophoresis) to the database designer, especially the specificity of the experimentally examined primer pairs. Such information will be used for further improving qPrimerDB. References 1. Higuchi R, Fockler C, Dollinger G et al (1993) Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Nat Biotechnol 11:1026–1030 2. Lu K, Li T, He J, Chang W et al (2018) qPrimerDB: a thermodynamics-based gene-specific qPCR primer database for 147 organisms. Nucleic Acids Res 46(D1):D1229–D1236 3. Rodrı´guez-La´zaro D, Herna´ndez M (2013) Real time PCR in food science: introduction. Curr Issues Mol Biol 15:25–38 4. Rosadas C, Cabral-Castro MJ, Vicente AC et al (2013) Validation of a quantitative real-time PCR assay for HTLV-1 proviral load in peripheral blood mononuclear cells. J Virol Methods 193:536–541 5. Arhondakis S, Clay O, Bernardi G (2008) GC level and expression of human coding sequences. Biochem Biophys Res Commun 367:542–545 6. S’emon M, Mouchiroud D, Duret L (2005) Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet 14:421–427 7. Rao YS, Chai XW, Wang ZF, Nie QH, Zhang XQ (2013) Impact of GC content on gene expression pattern in chicken. Genet Sel Evol 45:9 8. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3––new capabilities and interfaces. Nucleic Acids Res 40:e115
9. You FM, Huo N, Gu Y, Luo M, Ma Y, Hane D, Lazo GR, Dvorak J, Anderson OD (2008) BatchPrimer3: a high throughput web application for PCR and sequencing primer design. BMC Bioinformatics 9:253 10. Arvidsson S, Kwasniewski M, Riano-Pachon DM, Mueller-Roeber B (2008) QuantPrime––a flexible tool for reliable highthroughput primer design for quantitative PCR. BMC Bioinformatics 9:465 11. Gervais AL, Marques M, Gaudreau L (2010) PCRTiler: automated design of tiled and specific PCR primer pairs. Nucleic Acids Res 38: W308–W312 12. Kushwaha G, Srivastava GP, Xu D (2015) PRIMEGENSw3: a web-based tool for highthroughput primer and probe design. Methods Mol Biol 1275:181–199 13. O’Halloran DM (2016) PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection. Sci Rep 6:20631 14. Qu W, Zhou Y, Zhang Y, Lu Y, Wang X, Zhao D, Yang Y, Zhang C (2012) MFEprimer-2.0: a fast thermodynamics-based program for checking PCR primer specificity. Nucleic Acids Res 40:205–208 15. Kim H, Kang NN, Chon KW, Kim S, Lee NH, Koo JH, Kim MS (2015) MRPrimer: a MapReduce-based method for the thorough design of valid and ranked primers for PCR. Nucleic Acids Res 43:1–10
182
Wei Chang et al.
16. Wang X, Spandidos A, Wang H, Seed B (2012) PrimerBank: a PCR primer database for quantitative gene expression analysis, 2012 update. Nucleic Acids Res 40:D1144–D1149 17. Kim H, Kang N, An K, Koo J, Kim MS (2016) MRPrimerW: a tool for rapid design of valid high-quality primers for multiple target qPCR
experiments. Nucleic Acids Res 44: W259–W266 18. David FPA, Rougemont J, Deplancke B (2017) GETPrime 2.0: gene- and transcript-specific qPCR primers for 13 species including polymorphisms. Nucleic Acids Res 45:D56–D60
Part V Primer Design for Identification of Plant and Animal Viruses
Chapter 14 PCR Primer Design for the Rapidly Evolving SARS-CoV-2 Genome Wubin Qu, Jiangyu Li, Haoyang Cai, and Dongsheng Zhao Abstract Real-time quantitative PCR is currently the most widely used method for the human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) identification. Due to the rapid evolution of the SARSCoV-2 genome, novel mutations on the primer binding sites will cause the failure of PCR. Therefore, in addition to a well-designed primer set, these primers need to be updated and evaluated regularly to ensure that the rapidly evolving genome primers can be amplified. In this protocol, (1) we firstly use assembled genome sequences in the SARS-CoV-2 database to identify and characterize indels and point mutations; (2) design primers skipping the sites of mutations; (3) check the coverage of the primers with the daily update SARS-CoV-2 database; (4) redesign them if novel mutations found in the primer binding sites. Although this protocol takes SARS-CoV-2 as an example, it is suitable for other species that have genomes accumulating mutations over time. Key words SARS-CoV-2, Primer Design, Variants
1
Introduction SARS-CoV-2 is an RNA virus with limited proofreading capability of correcting replication errors [1] and evolving continuously with new mutations [2, 3]. If these mutations are located in the binding regions of PCR primers, it will cause the failure [4] of PCR-based method [5]. Generally, we will use multiple sequence alignment methods [6] to find conserved regions and then design primers in the conserved regions. This method is effective in most cases. However, the SARS-CoV-2 is a rapidly evolving virus that mutates every day, which brings great challenges to multiple sequence alignment tasks. As of December 13, 2020, there are 30,645 complete new coronavirus genome sequences in the NCBI Virus database [7], which are updated daily. With the advance of next-generation sequencing (NGS) technology, in addition to get the full genome sequence of SARS-CoV2, there are a lot of free or open-source bioinformatics software
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_14, © Springer Science+Business Media, LLC, part of Springer Nature 2022
185
186
Wubin Qu et al.
available to identify the variants of the genome [8, 9]. For example, MicroGMT [3] is a Python-based package, which takes either raw sequence reads or assembled genome sequence as input and compares against database sequences to identify and characterize small indels and point mutations in the microbial genomes. With the help of MicroGMT, we can use one of the sequences as a reference genome and compare other sequences to this reference genome. The inconsistent area is point mutations or indels. Then, the regions without point mutations or indels on the reference genome are suitable for primer design. In the following sections, (1) we firstly use assembled genome sequences in the SARS-CoV-2 database to identify and characterize indels and point mutations; (2) design primers skipping the sites of mutations; (3) check the coverage of the primers with the daily update SARS-CoV-2 database; (4) redesign them if novel mutations found in the primer binding sites. Although this protocol takes SARS-CoV-2 as an example, it is suitable for other species that have genomes accumulating mutations over time.
2
Materials This protocol: (1) use MicroGMT [3] to identify the point mutations and indels; (2) use the famous Primer3 [10, 11] for primer design; (3) use the local command-line version of MFEprimer-3.0 [12] for primer evaluation. And the operating system is Linux (see Note 1) with minimum disk size 40Gb and minimum memory 8Gb. Create a directory named “SARS-CoV-2” as a working directory for this protocol and two subdirectories “bin” and “data.” Python3 (https://www.python.org/) is also required.
2.1 Prepare SARSCoV-2, Human and Influenza Virus Genome Database
1. Download and prepare the SARS-CoV-2 genome database in FASTA format (see Note 2). Here, we download the sequences from the National Center for Biotechnology Information (NCBI) Virus database [7]. Visit the website. “https://www. ncbi.nlm.nih.gov/labs/virus/vssi/#/”, click “Search by virus,” enter “Severe acute respiratory syndrome coronavirus 2,” and select option “complete” for “Nucleotide Completeness” from the left panel. Then, click the “Download” button, select “Nucleotide” for step 1, select “Download All Records” for step 2, and “Use default” for step 3. Then, rename the downloaded file to a name like “SARS-CoV-2-2020092717111.fasta,” which means there are 17,111 sequences and downloaded on September 27, 2020. Put the file “SARSCoV-2-20200927-17111.fasta” into directory “data.” 2. Download the FASTA format of the reference genome of SARS-CoV-2 from https://www.ncbi.nlm.nih.gov/nuccore/
PCR Primer Design for SARS-CoV-2
187
NC_045512.2. And also place the file into the “data” directory. 3. Download the human genome database (hg19) from UCSC [13] http://hgdownload.soe.ucsc.edu/goldenPath/hg19/ bigZips/hg19.fa.gz and unzip it into the “data” directory. 4. Download the influenza virus database [14] from https://ftp. ncbi.nih.gov/genomes/INFLUENZA/. 2.2 Install MicroGMT, Primer3, MFEprimer-3.0, and Other Software
1. Download the source code of MicroGMT from https:// github.com/qunfengdong/MicroGMT/archive/master.zip and put the file into the “bin” directory and unzip the file. 2. MicroGMT requires minimap2 [9] software for sequence alignment. Change the working directory into the “bin,” and run the command “curl -L https://github.com/lh3/minimap2/releases/download/v2.17/minimap2-2.17_x64-linux. tar.bz2 | tar -jxvf - ./minimap2-2.17_x64-linux/minimap2”, then minimap2 will be downloaded and install into “bin/minimap2-2.17_x64-linux/” directory. 3. MicroGMT also requires bcftools [15] for calling variants. Download the source code from http://www.htslib.org/down load/ and install it by following the instructions on the same page. 4. Install the latest version MFEprimer-3.0 from https://www. mfeprimer.com/mfeprimer-3.1/#2-command-line-version. 5. Install the latest version Primer3 from https://sourceforge. net/projects/primer3/. 6. Install “seqkit” [16] command for sequence manipulation from https://github.com/shenwei356/seqkit/releases. 7. Install “rush” command for parallel jobs from https://github. com/shenwei356/rush. 8. Install “bedtools” [17, 18] command for manipulating BED files from https://bedtools.readthedocs.io/en/latest/con tent/installation.html. 9. Install “blat” [19] command for sequence alignment from http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/ blat/. 10. At last, add the path of minimap2 and bcftools into $PATH. If errors are arises during the installation, Bioconda [20] (see Note 3) can be used to install this bioinformatics software easily.
188
3
Wubin Qu et al.
Methods Change into the working directory “SARS-CoV-2” and create a “variants” in this directory to store the mutations file. And the directory tree should like Fig. 1.
3.1 Identify the Mutations Compared to the Reference Genome
1. In the working directory, run the command: “python3 bin/ MicroGMT-master/sequence_to_vcf.py -r data/NC_045512. fa -i assembly -fs data/SARS-CoV-2-20200927-17111.fasta -o variants”. This command will generate 17,111 files with the suffix “.vcf”. Each VCF (see Note 4) file [21] contains the variants of this record compared to the reference genome. Figure 2 is one of the VCF files. Take the line in the red box as an example, this line means that the record “MW035376.1” has base “T” in position 241, while the reference genome NC_045512 has base “C” in the same position, indicating this position is a point mutation. Primers should not select from this region, at least the 30 end of the primers should skip this region.
Fig. 1 Working directory structure
PCR Primer Design for SARS-CoV-2
189
Fig. 2 VCF file example
2. Merge all VCF files into one with command: “find variants -name “*.vcf“ | rush ‘grep -v “#“ {}s >> all.variants.vcf’”, please ignore the error messages (means that no variants found for a certain record). 3. Convert VCF format to BED format with command: “awk ’! / \#/’ all.variants.vcf | awk ’{if(length($4) > length($5)) print $1"\t"($2-1)"\t"($2+length($4)-1); else print $1"\t"($2-1)"\t"($2+length($5)-1)}’ > all.variants.bed”. 4. Sort and merge the variant file in BED format with command: “cat all.variants.bed | bedtools sort | bedtools merge > all. variants.merged.bed”. 3.2
Design Primers
1. Create a “design” directory in the working directory “SARSCoV-2.” 2. Mask the reference genome sequences with variants file: bedtools maskfasta -soft -fi data/NC_045512.fa -bed all.variants. merged.bed -fo data/NC_045512.softmask.fa. 3. Merge the multiple lines into one line: awk ’/^>/ {printf("\n% s\n",$0);next;} {printf("%s",$0);} END {printf("\n");}’ < data/NC_045512.softmask.fa > data/NC_045512.softmask. one-line.fa.txt.
190
Wubin Qu et al.
4. Change into the “design” directory, copy the Primer3 example input file into current directory: cp ../bin/primer3-2.4.0/ example input.txt. 5. Edit the “input.txt” file: Add “PRIMER_ LOWERCASE _MASKING¼1” (see Note 5) and “PRIMER_THERMODY NAMIC _P ARAMETERS _PATH¼../bin/primer3–-2.4.0/ src/primer3_config/”, change “PRIMER_NUM_RETURN¼100” and “PRIMER_PRODUCT_SIZE_RANGE¼150-200” (see Note 6). Also replace the sample sequence with the sequence in file “../data/NC_045512.softmask.one-line.fa.txt” for tag “SEQUENCE_TEMPLATE¼”. Other parameters leaves with default values. 6. Run Primer3 to design primers: “../bin/primer3-2.4.0/src/ primer3_core < input.txt > output.txt.” 7. Manually review the “output.txt” primer file and select the primer pairs with 30 end (see Note 7) with no lowercase bases. In my test for this protocol, the 30 end of primer 9 (red box in Fig. 3) has no lowercase letters. The sequences of primer 9 are: forward GAAGTggGTttTGTCGTGCC, reverse TCaGCAGC CAAAACACAAGC.
Fig. 3 Primer3 output
PCR Primer Design for SARS-CoV-2
191
Fig. 4 Primers in FASTA format 3.3
Evaluate Primers
We use MFEprimer to analyze whether this pair of primers can amplify all the genome sequences in SARS-CoV-2 and whether it can amplify the human genome and other influenza viruses; these databases are also called background databases (see Note 8). 1. Prepare primers in the FASTA format shown in Fig. 4. 2. Index SARS-CoV-2 database: mfeprimer index -i data/SARSCoV-2-20200927-17111.fasta. 3. Index human genome database: mfeprimer index -i data/ hg19.fa. 4. Index influenza virus: mfeprimer index -i data/influenza.fna. 5. Run MFEprimer to check the coverage of this pair of primers on the SARS-CoV-2 database: mfeprimer -d data/SARS-CoV2-20200927-17111.fasta -i p.fa -S 300 -t 55 --virus -o p.mfe. txt. Figure 5 shows that this pair of primers can coverage 99.87% sequences in database SARS-CoV-2-2020092717111.fasta, while misses 23 sequences. The missed sequence ID information is automatically stored in file “p.mfe.txt.virus. Failed.txt”. 6. To check whether this primer pair failed on these missed records. Firstly, copy and paste the first amplicon sequence in FASTA format from file “p.mfe.txt” into a file named “amp.fa” (Fig. 6). Secondly, run command “cat data/SARS-CoV-220200927-17111.fasta | seqkit grep -n -f p.mfe.txt.virus. Failed.txt > p.mfe.txt.virus.Failed.fa” to get the missed sequences and saved as file “p.mfe.txt.virus.Failed.fa.” Thirdly, run command “blat p.mfe.txt.virus.Failed.fa amp.fa amp.fa.
192
Wubin Qu et al.
Fig. 5 Primer coverage analysis against SARS-CoV-2 database downloaded on September 27, 2020
Fig. 6 First amplicon sequence in FASTA format
PCR Primer Design for SARS-CoV-2
193
Fig. 7 Sequence alignment result shows that the missed sequences have N bases in primer binding regions shown in the red circle
blat.txt -out¼blast” to do the sequence alignment. Figure 7 shows that the missed sequences have N bases in primer binding regions (shown in red circle). 7. Run MFEprimer to check the specificity of primers against human and influenza virus database with the command: mfeprimer -d data/hg19.fa -d data/influenza.fna -i p.fa -S 300 -t 55 -o p-against-human-influenza.mfe.txt. And the data (Fig. 8) shows that there are no nonspecific amplicons found on these two databases. 3.4 Re-evaluate Primers when Database Growing
In order to simulate the evolution of the SARS-CoV-2 genome, we used the database downloaded on November 11, 2020, which has 26,456 complete SARS-CoV-2 genome sequences. If a new mutation occurs at the primer binding site, the primer will not be able to amplify these variant sequences, so the coverage rate will decrease. Run the command “mfeprimer -d data/SARS-CoV-220201111-26456.fa -S 300 -t 55 -i p.fa -o p.SARS-CoV-220201111.mfe.txt --virus”. Figure 9 shows that the sequence
194
Wubin Qu et al.
Fig. 8 Primer specificity analysis against human and influenza virus database
number has arisen to 26,456, and the coverage rate is still larger than 99%. However, the missed sequences now have 105 records. Repeat step 6 in Subheading 3.3 to check the failure reason. If not N bases but new mutations occur, redesign primers may be needed. The redesign process starts with identifying the mutations (Subheading 3.1) for the newly added sequences and ends in this step.
4
Notes 1. Linux is an open-source Unix-like operating system. Distributions like Cent OS and Ubuntu are popular in the bioinformatics field. 2. FASTA format is a text-based format for representing nucleotide sequences, in which base pairs are represented using singleletter codes. A sequence in FASTA format begins with a singleline description with the first letter “>”, followed by lines of
PCR Primer Design for SARS-CoV-2
195
Fig. 9 Primer coverage analysis against SARS-CoV-2 database downloaded on November 11, 2020
sequence data. This format can be easily manipulated by computer languages like Python (https://www.python.org/) and Go (https://golang.org/). 3. Bioconda (https://bioconda.github.io/) is a channel for the conda (https://conda.io/en/latest/index.html) package manager specializing in bioinformatics software. For example, installing bcftools from conda is simply running the command: conda install -c bioconda bcftools (https://anaconda.org/bio conda/bcftools). 4. The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions, and structural variants, together with rich annotations. 5. Primer3 option PRIMER_ LOWERCASE _MASKING will reject candidate primers with lowercase letter exactly at 30 end. 6. The SARS-CoV-2 is an RNA virus and is prone to degradation, so the amplified product should not exceed 200 bp.
196
Wubin Qu et al.
7. Lowercase means there is a point mutation here. Mutations in the primers can cause PCR to fail. The mutation at the 30 end is much worse than the mutation at the 50 end. Therefore, try to choose primers that do not have lowercase bases. If it is not possible, choose primers that do not have lowercase bases for at least the last five bases. 8. The background database refers to the DNA sequences other than the target DNA. For example, if we design primers for SARS-CoV-2, then the human genome DNA in the sample is the background database.
Acknowledgments This work was supported by the Research Foundation of iGeneTech [2019SX001, 2020SX001]. References 1. Denison MR, Graham RL, Donaldson EF, Eckerle LD, Baric RS (2011) Coronaviruses: an RNA proofreading machine regulates replication fidelity and diversity. RNA Biol 8 (2):270–279. https://doi.org/10.4161/rna. 8.2.15013 2. Islam MR, Hoque MN, Rahman MS, Alam A, Akther M, Puspo JA, Akter S, Sultana M, Crandall KA, Hossain MA (2020) Genome-wide analysis of SARS-CoV-2 virus strains circulating worldwide implicates heterogeneity. Sci Rep 10(1):14004. https://doi.org/10.1038/ s41598-020-70812-6 3. Xing Y, Li X, Gao X, Dong Q (2020) MicroGMT: a mutation tracker for SARSCoV-2 and other microbial genome sequences. Front Microbiol 11:1502. https://doi.org/ 10.3389/fmicb.2020.01502 4. Qu W, Zhang C (2015) Selecting specific PCR primers with MFEprimer. Methods Mol Biol 1275:201–213. https://doi.org/10.1007/ 978-1-4939-2365-6_15 5. Jalandra R, Yadav AK, Verma D, Dalal N, Sharma M, Singh R, Kumar A, Solanki PR (2020) Strategies and perspectives to develop SARS-CoV-2 detection methods and diagnostics. Biomed Pharmacother 129:110446. https://doi.org/10.1016/j.biopha.2020. 110446 6. Nagy A, Jirinec T, Cernikova L, Jirincova H, Havlickova M (2015) Large-scale nucleotide sequence alignment and sequence variability assessment to identify the evolutionarily highly conserved regions for universal screening PCR
assay design: an example of influenza A virus. Methods Mol Biol 1275:57–72. https://doi. org/10.1007/978-1-4939-2365-6_4 7. Hatcher EL, Zhdanov SA, Bao Y, Blinkova O, Nawrocki EP, Ostapchuck Y, Schaffer AA, Brister JR (2017) Virus variation resource— improved response to emergent viral outbreaks. Nucleic Acids Res 45(D1): D482–D490. https://doi.org/10.1093/nar/ gkw1065 8. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14):1754–1760. https://doi.org/10.1093/bioinformatics/ btp324 9. Li H (2018) Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 (18):3094–3100. https://doi.org/10.1093/ bioinformatics/bty191 10. Untergasser A, Cutcutache I, Koressaar T, Ye J, Faircloth BC, Remm M, Rozen SG (2012) Primer3—new capabilities and interfaces. Nucleic Acids Res 40(15):e115. https://doi. org/10.1093/nar/gks596 11. Untergasser A, Nijveen H, Rao X, Bisseling T, Geurts R, Leunissen JAM (2007) Primer3Plus, an enhanced web interface to Primer3. Nucleic Acids Res 35:W71–W74. https://doi.org/10. 1093/nar/gkm306. |ISSN 0305-1048 12. Wang K, Li H, Xu Y, Shao Q, Yi J, Wang R, Cai W, Hang X, Zhang C, Cai H, Qu W (2019) MFEprimer-3.0: quality control for PCR primers. Nucleic Acids Res 47(W1):
PCR Primer Design for SARS-CoV-2 W610–W613. https://doi.org/10.1093/nar/ gkz351 13. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D (2002) The human genome browser at UCSC. Genome Res 12(6):996–1006. https://doi. org/10.1101/gr.229102 14. Bao Y, Bolotov P, Dernovoy D, Kiryutin B, Zaslavsky L, Tatusova T, Ostell J, Lipman D (2008) The influenza virus resource at the National Center for Biotechnology Information. J Virol 82(2):596–601. https://doi.org/ 10.1128/JVI.02005-07 15. Li H (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27(21):2987–2993. https://doi.org/10. 1093/bioinformatics/btr509 16. Shen W, Le S, Li Y, Hu F (2016) SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11 (10):e0163962. https://doi.org/10.1371/ journal.pone.0163962 17. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic
197
features. Bioinformatics 26(6):841–842. https://doi.org/10.1093/bioinformatics/ btq033 18. Quinlan AR (2014) BEDTools: The SwissArmy tool for genome feature analysis. Curr Protoc Bioinformatics 47:11-12–11-34. https://doi.org/10.1002/0471250953. bi1112s47 19. Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12(4):656–664. https://doi.org/10.1101/gr.229202 20. Gruning B, Dale R, Sjodin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Koster J, Bioconda T (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15 (7):475–476. https://doi.org/10.1038/ s41592-018-0046-7 21. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis G (2011) The variant call format and VCFtools. Bioinformatics 27(15):2156–2158. https://doi. org/10.1093/bioinformatics/btr330
Chapter 15 Universal Primers for Detection of Novel Plant Capsid-Less Viruses: Papaya Umbra-like Viruses as Example Jorge H. Ramirez-Prado
and Luisa A. Lopez-Ochoa
Abstract For diagnosis of positive-sense single-stranded RNA viruses, primers are usually raised against the sequence encoding capsid proteins, since structural proteins are more conserved. This chapter focuses on the design of primers for a group of novel viruses lacking a capsid, known as papaya Umbra-like viruses (unassigned genus) associated with Papaya Sticky Disease, which represent a threat to papaya production. Based on sequence alignments of a region encoding the RNA-dependent RNA Polymerase, universal primers to detect all the known viruses from four countries are proposed. The Forward universal primer can be used in combination with clade- and subclade-specific primers for rapid virus identification. We walk the reader through downloading sequences from nucleotide databases, doing sequence alignments and phylogenetic tree construction to identify conserved and variable regions as valid primer targets; we also show how to design and analyze the primers. Key words RdRP, RT-PCR, Papaya umbra-like viruses, Papaya Sticky Disease (PDS), Meleira disease, Virus detection
Abbreviations bp BLAST CDS dsRNA Indels kb kcal/mol ML NCBI NGS NJ PMeV PMeV-1 PMeV-2 PMeV-Mx
Base pair Basic Local Alignment Search Tool Coding sequence Double-stranded RNA Insertion–deletions Kilobases Kilocalorie per mole Maximum Likelihood National Center for Biotechnology Information Next-Generation Sequencing Neighbor-Joining Papaya meleira virus Papaya meleira virus 1 Papaya meleira virus 2 Papaya meleira virus-Mexican variant
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_15, © Springer Science+Business Media, LLC, part of Springer Nature 2022
199
200
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
PpVQ PRSV-P PSD PCR qPCR RdRP (+) RNA RT-PCR Tm
1
Papaya virus Q Papaya ringspot virus type P Papaya Sticky Disease Polymerase chain reaction Quantitative PCR RNA-dependent RNA Polymerase Positive-sense single-stranded RNA Reverse transcription polymerase chain reaction Melting temperature
Introduction When new disease symptoms or variation of symptoms of a known disease are first discovered on a crop, it is common to use polymerase chain reaction (PCR) to identify the putative pathogen. PCR is an economical, easy, fast, and reliable method for the detection of viruses and other plant pathogens. With the development of novel technologies such as Next-generation Sequencing (NGS) applied to metagenomics–metaviromics, the discovery of new viruses has been accelerated in recent years [1, 2]. Most known plant viruses have positive-sense single-stranded RNA genomes and are associated with diseases or defined symptoms [1, 3]. Single-stranded positive-sense RNA viruses ((+) RNA viruses) produce double-stranded RNA (dsRNA) during replication. dsRNA can be purified from infected plants and sequenced, allowing the identification of viruses [2, 3]. After sequence assembly and comparison with known viral sequences at databases—i.e., the GenBank, novel viruses are discovered [4]. Next, its etiology is studied by infectivity assays and confirmation of the virus presence by reverse transcription PCR (RT-PCR). Primers’ characteristics and the expected amplicon size depend on the method used for detection. This chapter focuses on end-point PCR because it allows faster optimization, it is cheap, and it is sensitive enough for most plant viruses. However, to detect low levels of viral RNA—e.g., for insect transmission studies— quantitative PCR (qPCR), also called real-time PCR, should be used. The precision and effectivity of virus detection by PCR relies on its inter- and intragenic variability [4]. Because primer binding is key to this method’s success, diagnosis of virus variants might be affected if genetic variation occurs at the target site. For virus identification purposes, it is desirable to have a battery of primers to: (1) identify viruses at the species levels, (2) discriminate between virus variants or strains, and, whenever possible, (3) detect viruses from a higher hierarchical level—e.g., genus or family level. Therefore, sequence alignments and phylogenetical analysis must identify
Universal Papaya Umbra-like Viruses Primers
201
conserved and unique regions from the target sequences for primer design [4]. Most plant viruses encode capsid proteins to enclose and protect nucleic acids; because of their structural function, their sequence remains highly conserved. Proteins involved in viral replication are also among the most conserved [1, 3]. Therefore, for capsid-less (+) RNA viruses, such as papaya umbra-like viruses, the RNA-dependent RNA Polymerase (RdRP) is an ideal target for diagnosis. Papaya (Carica papaya L.) is a tropical crop whose fruit is marketed worldwide, and it is consumed for its taste and nutritional quality. The most important and widespread viral papaya disease is caused by Papaya ringspot virus type P (PRSV-P), a (+) RNA virus [5]. Another viral disease that has recently [6–8] gained worldwide relevance is ‘Papaya Sticky Disease’ (PSD) or ‘papaya meleira’—in Portuguese. It was first reported in Brazil in the 1980s, and it is characterized by the spontaneous exudation of latex in fruits, which turns black upon oxidation. A dsRNA virus called Papaya meleira virus (PMeV) was first proposed as the causal agent for PSD [9]; its 8.7 kilobases (kb) genome is related to Totiviruses [10]. In 2008, PSD symptoms were found in Mexico [11]; however, primers against the 629 nucleotides sequence of PMeV [12] failed to detect the disease in Mexico [13]. In 2015, a partial viral sequence—1154 nucleotides—encoding a protein with 42% identity to the RdRP of umbraviruses was identified from papaya plants in Mexico showing PSD symptoms [13]. Two set of specific primers targeting this sequence also allowed detecting a virus in a plant from Brazil with meleira disease. Since the amplicons’ sequences—173 and 491 bp—were highly similar in both countries, the virus found in Mexico was named Papaya meleira virus-Mexican variant (PMeVMx) [13]. On 2015, a 4285 nucleotides partial sequence of PMeVMx was released at the GenBank—accession number KF214786.1 (our unpublished results). Also in 2015, an umbra-like virus was reported in papaya plants showing variations of symptoms produced by PRSV-P in Ecuador; the new virus was named Papaya virus Q (PpVQ), and it was found associated to PRSV-P [14]. In 2016, an umbra-like virus related to PpVQ and PMeV-Mx was also reported in Brazil in papaya plants displaying PSD symptoms in synergistic association with PMeV. The new umbra-like virus was named Papaya meleira virus 2 (PMeV-2), while PMeV was renamed as PMeV-1 [15]. PMeV-2 RNA was found inside purified PMeV-1 particles, suggesting that trans-encapsidation of the umbra-like virus genome takes place in nature for transmission by insect vectors [15]. In contrast, PMeV-Mx is insect-transmitted and produces PSD symptoms in papaya plants in the absence of a PMeV1 related virus [6]; PMeV-Mx is also seeds transmitted [16]. In 2018, three partial sequences with similarity to PMeV-2 from papaya plants associated to PSD symptoms in Colombia were
202
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
uploaded at the GenBank—accession numbers MG570380.1 MG570381.1 and MG570382.1 (unpublished). In 2019, a new umbra-like virus was found in Australia in papaya plants displaying PSD symptoms, using NGS [8]; although its sequence has not yet been published, the report says the virus is also seed transmitted. In summary, several papaya umbra-like viruses have been identified and associated to PSD in four countries: Brazil, Mexico, Colombia, and Australia; two of them show seed transmission in Australia and Mexico [8, 13], while the virus found in Ecuador (PpVQ) does not associate with PSD. More studies are required to understand how papaya umbra-like viruses such as PMeV-2 and PMeV-Mx produce PSD and why others like PpVQ do not. The development of universal primers for these capsid-less viruses will contribute to study virus genetic variation as well as to perform fast diagnosis in places were PSD has not yet been reported. The strategy described here aims to design universal primers able to detect all known papaya umbra-like viruses, as well cladespecific primers targeting conserved and variable RdRP gene regions. The method outline is as follows: 1. Sequence download. 2. FASTA file editing. 3. Multiple sequence alignment. 4. Identification of conserved/variable regions. 5. Phylogenetic inference for grouping. 6. Primers design and analysis. 7. Checking for specificity and cross-reactivity of primers: in silico PCR.
2
Materials
2.1 Software Tools and Web-Based Applications
Computer equipment: All bioinformatics procedures described for this methodology can be carried out on most modern 64-bits desktop/laptop computers. The most CPU/RAM intensive parts of the methodology are executed at online open servers offloading the computational burden from the user’s equipment. Online servers: National center for Biotechnology Information (NCBI) GenBank (https://www.ncbi.nlm.nih.gov/nucleotide/) [17]; NCBI BLAST (https://blast.ncbi.nlm.nih.gov/Blast.cgi) [18] MAFFT online version 7 (https://mafft.cbrc.jp/alignment/server/) [19]. Oligo Analyzer Tool is simple and easy-to-use tool that allows determining important primer properties such as melting temperature (Tm), also commonly known as annealing temperature, GC content, primer dimers, primer–primer compatibility, and primer loops. It also allows checking primers for multiplexing, etc. It can
Universal Papaya Umbra-like Viruses Primers
203
be freely downloaded from the Web page https://oligo-analyzer. software.informer.com/download/ For the reverse primer, use the ‘Reverse complement tool’ at https://www.bioinformatics.org/sms/rev_comp.html. Primer-BLAST application at https://www.ncbi.nlm.nih.gov/ tools/primer-blast/ [20] is a toolbox to design primers which includes a tool for ‘in silico PCR’ ligated to a nonspecificity test in which our universal primers can be aligned against the plant host genome (Carica papaya) to look for nonspecific amplifications.
3
Methods PCR primers designed from a single DNA sequence will yield an amplification fragment when that DNA sequence is used as a template, but their specificity—or lack of it—when used with other templates can only be predicted by determining the position of the primers relative to the conserved and variable regions of the template sequence. This can be achieved by means of aligning it to DNA belonging to closely—or not—related strains. On this regard, the first step to designing efficient PCR primers is to obtain sequences related to our DNA of interest and design PCR primers once the conserved/variable regions are pinpointed.
3.1 Sequence Download
For the purposes of our working example, as has been explained above, we will search for nucleotide sequences of the RdRP gene of the papaya meleira umbra-like viruses. There are several online databases and repositories for genomic data (i.e., DNA, RNA, proteins, and genomes). Three of the most used are GenBank at the NCBI (https://www.ncbi.nlm.nih.gov/ nucleotide/), EMBL-EBI at the European Molecular Biology Laboratory (https://www.ebi.ac.uk/), and the DNA Data Bank of Japan DDBJ (https://www.ddbj.nig.ac.jp/index-e.html), and share their data through the International Nucleotide Sequence Database Collaboration (http://www.insdc.org) [21] meaning that a search at any of these will retrieve data deposited on any of them. For sake of simplicity, these methodologies will center on retrieving data from GenBank at NCBI. There are three main ways of finding sequences at GenBank: by accession numbers, keywords, or homology. Any of the methods, or combinations of these, can be used to gather the relevant sequences. It should be noted that sequence records at these online databases can be very heterogeneous with regard to their lengths, coverage of the gene of interest, and annotation of other genes, ORFs, or hypothetical proteins present. On our working example, we will be using sequences deposited on GenBank for papaya umbra-like viruses isolated from Mexico [5, 13], Ecuador [14],
204
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
Brazil [15], and Colombia (three accessions; unpublished). The completeness of the 50 and 30 ends for each record varies as well as the number of genes annotated. Records from Mexico and Brazil include the annotation for a hypothetical protein called ORF1 upstream of RdRP. Records from Colombia are truncated at the 30 end of the RdRP (missing about 364 nucleotides). The seven records include noncoding regions in addition to the annotated genes. All these features of the records should be taken in consideration when downloading and processing the sequences for the subsequent analysis. 3.1.1 Searching and Downloading by Accession Numbers
The most straightforward way of finding genomic data is through the use of one or several accessions numbers. These are unique identifiers for each DNA, RNA, or protein sequence available in the database. Accession numbers can be gathered from publications presenting/using sequence data. The GenBank accession numbers for the seven isolates of Papaya umbra-like viruses for our practical case are as follows: Mexico (KF214786, MN203218), Ecuador (KP165407), Brazil (KT921785), and Colombia (MG570380, MG570381, MG570382). To retrieve just the coding sequences (CDS) for the RdRP genes, these steps are followed. 1. Access GenBank database at https://www.ncbi.nlm.nih.gov/ nucleotide/. 2. On the search box, type or copy the following list of accession numbers without quotes: ‘KF214786, MN203218, KP165407, KT921785, MG570380, MG570381, MG570382’ (see Note 1). A list of the seven sequence records should be displayed. For each record, the following information is shown: title; molecule properties, length (in base pairs), topology (circular/linear), type (DNA/RNA); Accession and Gene Identifier (GI) numbers; links to associated coded proteins and taxonomy data. Below each record are links to display the data in three different formats: GenBank, FASTA (see Note 2), or as a graphical representation. Note the different length sizes for all the records. 3. There are several ways to download the sequences linked to each record. The easiest and most reliable way (compatible with downward analysis) (see Note 3) is to use the ‘Send to:’ menu at the top right. Activate its drop-down menu clicking the arrow (downward-pointing triangle). Select ‘Coding sequences’ (see Note 4). Select ‘FASTA Nucleotide’ as the Format required. Click the ‘Create File’ button to download the seven records sequences (see Note 5).
Universal Papaya Umbra-like Viruses Primers 3.1.2 Searching and Downloading by Keywords
205
Without accession numbers, relevant sequence records can be retrieved using keywords on the search box. Keywords can be any word contained on any of the multiple fields that conform the record. Usually, the name of the virus should be enough to retrieve significant results, but sometimes special combinations are better suited. For example, the keyword combination ‘Papaya Meleira virus’ will retrieve around 20 records but will not include the Ecuador record (KP165407). This is due to the fact that the title of the record does not contain the word ‘Meleira’ instead being labeled as ‘umbra virus.’ The keyword combination ‘Papaya Meleira Umbra virus’ will recover zero records because not a single record contains the four words (the search expects to match all words). To overcome this, a keywords combination using ‘Boolean operators’ (AND, OR, NOT) can be used (see Note 6). 1. Access GenBank database at https://www.ncbi.nlm.nih.gov/ nucleotide/. 2. Use the following keywords combination (without quotes but with the parenthesis): ‘(papaya AND virus) AND (meleira OR umbra).’ A list of records should be displayed. 3. As mentioned previously, many records are only small fragments. To order the records from longest to shortest, activate the ‘Sort by’ drop-down menu at the top on the middle and select ‘Sequence Length.’ 4. For consistency with the rest of the example, we will select the same seven records as from Subheading 3.1.1: KF214786.1, MN203218.1, KP165407.1, KT921785.1, MG570380.1, MG570381.1, MG570382.1. Check the tick boxes for the corresponding records. 5. As above, to download the sequences, use the ‘Send to:’ menu at the top right. Activate its drop-down menu. Select ‘Coding sequences.’ Select ‘FASTA Nucleotide’ as the Format required. Click ‘Create File’ button to download the selected sequences.
3.1.3 Searching and Downloading by Homology
If a viral sequence of interest is already known, it can be used as a reference—query—to identify closely related sequences by a homology search of the database using the Basic Local Alignment Search Tool (BLAST) at NCBI. For the purpose of this example, we will use the KF214786.1 record as our query. 1. Access the BLAST tool at https://blast.ncbi.nlm.nih.gov/ Blast.cgi. 2. Select the ‘Nucleotide BLAST’ version (see Note 7). 3. On the first section (Enter Query Sequence), copy/paste or type the accession number KF214786.1 at the box (see Note 8).
206
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
4. On the ‘Query subrange,’ use ‘1231’ in ‘From’ and ‘2436’ in ‘To’ (both numbers without quotes) (see Note 9). 5. On the second section (Choose Search Set), to speed the process, we can limit the search to be done only against records from viruses. To this effect, type ‘viruses’ (without quotes) on the box ‘Organism’ (see Note 10). 6. On the third section (Program Selection), at the ‘Optimize for’ list of options, select ‘Somewhat similar sequences (blastn)’ (see Note 11). 7. For the present case, we will use the rest of the default parameters (see Note 12). 8. Click the blue BLAST button. 9. The results are presented in four tabs: A table of ‘Descriptions,’ a ‘Graphic Summary,’ the ‘Alignments,’ and a ‘Taxonomy’ report (see Note 13). 10. At the table of the ‘Descriptions’ tab, deselect all results and then select those corresponding to the example list of accessions provided: KF214786.1, MN203218.1, KP165407.1, KT921785.1, MG570380.1, MG570381.1, and MG570382.1. 11. Retrieve the selected sequences using the drop-down menu ‘Download’ located at the top of the table. Select the option ‘FASTA (aligned sequences)’ (see Note 14). 3.2 FASTA File Editing
As explained before, for the design of the diagnostic primers, only the coding sequence (CDS) of the RNA-dependent RNA polymerase (RdRP) will be used as the target of PCR. The FASTA formatted files obtained through Subheadings 3.1.1 and 3.1.2 contain, for some strains, the ORF1 CDS in addition to the RdRP CDS. Before generating the multiple sequence alignment, the FASTA file must be manually edited to remove the ORF1 CDS and retain only RdRP CDS. 1. Open the downloaded file from Subheading 3.1.1 or Subheading 3.1.2 using a suitable text editor (see Note 15), e.g., notepad (MS Windows), TextEdit (Mac OS), or Text Editor (Linux). 2. Identify header lines that contain the words ‘RNA-dependent RNA polymerase’ and/or ‘RdRP.’ There should be seven such lines. These are the sequences that we want to retain (see Note 2 as a guide). 3. Identify headers that do not match the above description. Delete these headers and the associated sequence lines. 4. Save the file and exit.
Universal Papaya Umbra-like Viruses Primers
3.3 Multiple Sequence Alignment
207
Both the multiple sequence alignment construction and phylogeny inference are computationally taxing steps, with the phylogeny being the most demanding one. For multiple sequence alignment, there are many algorithms available implemented on different software packages or accessible through online servers applications. Clustal [22] and Muscle [23] algorithms are very popular options due to both being fast alignment algorithms that use low CPU and RAM resources, but both perform poorly when dealing with highly variable regions and/or long stretches of insertion–deletions (indels), which is often the case with viral sequences. On the other hand, algorithms such as MAFFT (E-INS-i option) [19], Probalign [24], and PRANK [25] are better suited for these kinds of datasets but become computationally expensive very quickly when adding extra sequences. In our experience, MAFFT is a good compromise between alignment accuracy of highly variable regions and resources required. To minimize the computational burdens, and as an operating system independent solution, the following steps are carried out at a publicly available online server running the MAFFT algorithm. 1. Access the MAFFT alignment server at htpts://mafft.cbrc.jp/ alignment/server/. 2. As input, load the FASTA file edited on the previous section (Subheading 3.2): On the ‘Input’ section, use the ‘Choose File’ button to select and upload the corresponding file. 3. On the next section, defaults can be used with the following considerations. *‘Direction of nucleotide sequences’: Option 1: ‘Same as input’ (default). Use this option if using a FASTA file created as detailed in Subheading 3.1.1 or Subheading 3.1.2 (with the ‘Coding sequences’ option selected at download). All sequences should be on the correct direction of transcription (see Note 4). Option 2: ‘Adjust direction according to the first sequence (accurate enough for most cases).’ Use this option if the FASTA file was created from instructions in Subheading 3.1.3 (NCBI BLAST results), since some of the sequences could be reversed (see Note 14). This option should also be used if the FASTA file was created by using the ‘Complete Record’ option when downloading the sequences from NCBI nucleotide database. ‘Job name’: Although optional, it is a good practice to fill in to keep track of multiple experiments. ‘Notify when finished’: For small sets of sequences, results will be quickly displayed, but it still is a good practice to add an email to get the results link in case the browser quits.
208
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
4. For typical cases, most of the defaults for the ‘Advanced settings’ can be used with the following exception: ‘Strategy’: MAFFT has several methods for alignment strategies. For the case of highly variable viral sequences, the ‘EINS-i’ method is the most appropriate. 5. Click on the ‘Submit’ button. Wait time will depend on the number/size of sequences and load of the server. 6. Results are displayed in the CLUSTAL alignment format. 7. ‘Clustal format’ and ‘FASTA format’ links on top can be used to download the alignment on the indicated formats. The ‘FASTA’ format is widely accepted by other bioinformatics applications. The ‘Clustal’ format is a more human-readable format. 3.4 Identification of Conserved/Variable Regions
Alignments on the Clustal format are useful for visually inspecting and identifying conserved/variable regions. Sequences are divided (usually) on 60 nucleotides long lines. A bottom line is added indicating the level of conservation: Asterisks (*) below columns indicate 100% identity. Periods (.) indicate a conserved change (purine to purine or pyrimidine to pyrimidine substitutions). 1. Open on a suitable text editor application the multiple sequence alignment on Clustal format (see Notes 3, 15, and 16). 2. To find appropriate target regions for the design of diagnostic primers, we will need to find long enough regions (at least 18 nucleotides) of 100% (or almost) identical positions. These can be identified by long stretches of uninterrupted asterisks (*). Keep in mind that some identical regions could be split across two lines (end of one and start of the following). 3. If long enough 100% identical regions are not present, stretches of identical nucleotides interrupted by a very small number of either conserved substitutions (best) or mismatches can be used as explained before: one or two conserved substitutions (.) near the 50 end or middle of the primer may be acceptable, or even useful to design more general degenerate primers. At this point, three possible scenarios exist: A. Long enough identical/conserved target regions are identified for at least a forward and a reverse primer (more than one target region for each one is ideal). B. Only a long enough identical/conserved target region is identified for either a forward or a reverse primer. C. There are not long enough identical/conserved target regions for neither of the PCR diagnostic primers.
Universal Papaya Umbra-like Viruses Primers
209
4. For scenarios A and B, copy the identified candidate regions to a text file, noting their positions on the sequence. For scenarios B and C, to be able to find appropriate regions, we will need to divide the sequences into less variable (more closely related) groups. 3.5 Phylogenetic Inference for Grouping
As stated above, the phylogenetic inference is a very computationally, intensive bioinformatic step. For taxonomy analysis, lineage divergence, and/or very detailed evolutionary histories determinations, a precise phylogenetic inference should be carried out. Similar to the multiple sequence alignment situation, phylogenetic distance methods like Neighbor-Joining (NJ) [26] or UPGMA [27] are quite fast but again are not optimal for alignments containing highly variable regions and/or multiple large indels. Maximum Likelihood (ML) methods are more appropriate for these cases, and tools like PAML [28] or PhyML [29] are good implementations. Previously to the phylogeny construction, the best model and parameters for the inference must be tested. This step can take as much (or longer) than the actual phylogeny reconstruction. Bootstrapping (calculation of confidence levels for internal nodes) is even more computationally demanding. However, if our only aim is to divide the divergent sequences into groups of more closely related sequences to aid on the pinpointing of conserved target regions, such a fine analysis is not required, and a distance method such as NJ is suitable. For this, we will use the phylogenetic tools associated with the MAFFT server fused before. 1. Go to the MAFFT alignment server at https://mafft.cbrc.jp/ alignment/server/ and repeat steps 1–5 of Subheading 3.3 (multiple sequence alignment). 2. At the results page, click the ‘Phylogenetic tree’ button. 3. For the ‘Settings,’ select ‘NJ Conserved sites’ (see Note 17). Method and the ‘Jukes-Cantor’ [30] ‘Substitution model’ (for our purposes, there is no need for ‘Bootstrap’ calculation). 4. Click ‘Go!’ button. 5. On the results page, there are many options to display the tree. Use the ‘View tree on Phylo.io’ button. Tree should be presented on a new window. 6. Visually inspect the distribution of sequences in the different clades and determine accordingly the most likely groupings. 7. Create as many copies of the FASTA file edited on Subheading 3.2 as the number of groupings determined on the previous step. Rename the file copies accordingly.
210
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
8. Manually edit each FASTA file deleting the appropriate sequences to keep only the relevant sequences for the intended group. 9. Using these new FASTA files, repeat for each one the steps of Subheadings 3.3 and 3.4 to find suitable target regions for the design of primers. If several conserved/unique regions are found after multiple sequence alignment, for end-point PCR, select amplicon length from 200 bp to 1 kb, but ideally close to 500 bp (see Note 18). Apply regular rules for primer parameters (see Note 19). For this example, to design the Forward universal primer, a 23-nucleotide stretch of 100% identity among all sequences located at position 360 from the start codon (ATG) was selected (Fig. 1, left). A second conserved region 300 nucleotides downstream from the first and spanning 20 nucleotides was selected for reverse primer (Fig. 1, right). The first segment exhibits 100% identity among all accessions, while the second has seven of the eight accessions identical, with exception of PMeV-2 from Brazil (accession KT921785) which only has two mismatches (see Note 20). Thus, for the universal forward primer, we can use the sequence as it is; while for the universal reverse primer, it is necessary to include two degenerations, as shown in Fig. 1 (see Notes 20 and 21). Because apart from these two conserved regions, the rest of the alignment revealed heterogeneity (not shown), the Phylogeny of the RdRP gene was inferred (Fig. 2) in order to select sequences for ‘groupspecific’ primers. Perform primer analysis as follows: Install ‘Oligo Analyzer’ in your computer and open it, select the primer tab (Fig. 3). To design the universal Forward primer, copy the 23 nucleotides sequence (50 GAGAAACCTGTCTACTCTTGTTT 30 ) of the first conserved region in the alignment (in Fig. 1, right) and paste it on the first Primer box. Name the primer (optional) in the primer name box. We recommend replacing the ‘Default parameters’ in ‘Oligo
3.6 Primer Design and Analysis
UNIVERSAL FORWARD PRIMER -> GAG AAA CCT GTC TAC TCT TG ECU COL COL COL MEX MEX BRA
KP165407 MG570380 MG570382 MG570381 MN203218 KF214786 KT921785
ATG ATG ATG ATG ATG --ATG
... ... ... ... ... ... ...
GAG GAG GAG GAG GAG GAG GAG *** [354] E
AAA AAA AAA AAA AAA AAA AAA *** K
CCT CCT CCT CCT CCT CCT CCT *** P
GTC GTC GTC GTC GTC GTC GTC *** V
TAC TAC TAC TAC TAC TAC TAC *** Y
TCT TCT TCT TCT TCT TCT TCT *** S
TGT TGT TGT TGT TGT TGT TGT *** C
) followed immediately by the sequence identifier. The sequence can be split into multiple lines. Here is a typical header from Genbank followed by the first three lines of its associated sequence: >KF214786.1 Papaya meleira virus hypothetical protein and RNA-dependent RNA polymerase genes, complete cds ATGAACATTTCGAACATTCCCGTGGGACGTCTCACCTCCCATGCCGGTTTTAATTTGTTGAAGCTGGCAA GCAAGCTTGGAAAGCAAAATCCTCCTTCCAAGTTTGCGGGTGGGCGAAGGCCAAATCCTGGTGAGTCTGG CACCTCACACGGTGGGCCTTCTTCATCATCTCGACCTTCCCGAAGGCGAGGTAAGTTTGCCCTTAGGAGG
Although the header line, when wrapped, can extend more than a line, it should not be separated by a hard return. The actual sequence identifier ends at the first blank space (in this case KF214786.1 is the identifier). Anything after the first blank space but before a return is considered as humanreadable metadata. If a file contains more than one sequence (a multisequence FASTA file), each sequence must have a unique identifier up to the first blank space. Everything after the ‘first return’ and up to the ‘next greater than’ sign (or end of file) is considered the sequence. The sequence can be contained on a single line or divided on lines of arbitrary size. 3. The file automatically generated is in a simple text format. Copying and pasting a FASTA sequence (header and sequence lines) onto a MS Word document or similar word processing
Universal Papaya Umbra-like Viruses Primers
215
software will produce unusable files. Although the format may look like a FASTA file, these applications include hidden code that makes them unsuitable for use with any bioinformatics application, unless an option to save as ‘simple text’ is available. 4. The ‘Coding sequences’ option will automatically select the fragments of sequence corresponding to the coding portion(s) of the annotated gene(s) present on the record. For doublestranded DNA sequences, if an annotated gene is transcribed on the complementary strand, this option will retrieve the appropriate reverse-complement sequence. 5. If all records are left unchecked, every result will be formatted and downloaded. This is especially useful when retrieving more than 20 sequences. By default, only the first 20 records are shown on the first page. Accession lists longer than this will be split into multiple pages. If all tick boxes are left unchecked, everything will be downloaded even if it is on a different page. 6. Boolean operators are the words (all in capitals) AND, OR, NOT. By default, the search algorithm implicitly includes the AND Boolean operator after each keyword. This is a strict operator that will only be true if all keywords are matched. The OR Boolean operator will be true if either the keywords before it or right after it are matched. Combinations can be thus created by pairing keywords with parenthesis and the necessary Boolean operators. For example, the records recovered by the keyword combination ‘(papaya AND virus) AND (meleira OR umbra)’ must include three keywords: both the papaya and virus words plus another keyword that could be either meleira or umbra. Conversely, the NOT operator will discard any record that contains the immediately following keyword (or combination of keywords). More information can be found here https://www.nlm.nih.gov/pubs/techbull/ ja97/ja97_pubmed.html. 7. When doing a homology search on a database through BLAST, there are four options of algorithms to use depending on the reference sequence used (query) and the database to be searched against. For the purpose of primer design, we need to recover nucleotide sequences so the nucleotide database should be searched against to. If our query is an mRNA or coding sequence, then nucleotide BLAST (blastn) should be used. Starting with a protein sequence, it is also possible to search the nucleotide database thorough means of the tblastn algorithm which will compare the amino acids of the query against a translated version of the nucleotide database. 8. The Query box can accept accession numbers, gene identifiers (GI), or copied and pasted FASTA sequences (even without a header line, although this is not advisable). For nucleotide
216
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
sequences, these may have numbers (that will be ignored) and the letter ‘N’ for ambiguities, but not any of the other ambiguities codes (e.g., R and Y). 9. The Query subrange can be used to limit the search to be carried out with only a determined portion of the query provided. For this case, the CDS for the RdRP gene is located on nucleotides 1231 to 2436 for this record (both inclusive). 10. While typing, suggestions will appear. Multiple names for the same organism or group, with small variations (different capitalization or pluralization, for example, ‘viruses,’ ‘Viruses’ or ‘Vira’), may be suggested. As long as the accompanying taxid number is the same, there is no difference in which one selected (for viruses taxid:10239). 11. Three variations of the blastn algorithm are available: megablast, discontiguous megablast, and blastn. The first two options tend to be too strict and are useful when very little variation is expected among the sequences. ‘Optimize for Somewhat similar sequences (blastn)’ is more permissive of accumulated mutations, which is the case for viruses. A more detailed explanation can be found on this guide: https://www.ncbi.nlm.nih.gov/blast/BLAST_guide.pdf. 12. All BLAST algorithms have a number of parameters that can be fine-tuned depending on the particular characteristics of the query and/or search required. It is beyond the scope of this chapter, an explanation of all of the parameters but NCBI maintains many online resources that can be consulted to better understand the capabilities of these parameters. Some useful links are provided (this is not an exhaustive list): https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD¼Web& PAGE_TYPE¼BlastDocs. https://www.ncbi.nlm.nih.gov/books/NBK1734/. 13. It is also beyond the scope of this chapter an explanation of all the data displayed on the results, but the reader is directed to this handy introductory handout: https://ftp.ncbi.nlm.nih.gov/pub/factsheets/HowTo_ NewBLAST.pdf. 14. In Subheadings 3.1.1 and 3.1.2 to recover only the coding sequences from the sequences, we selected that option from the ‘Send to:’ menu (see Note 4). The tool to download records from a BLAST results page does not have this option, but a similar function can be obtained with the ‘FASTA (aligned sequences)’ option. Since our query was limited to the subrange that encompasses the RdRP CDS, using the ‘FASTA (aligned sequences)’ option will download only sequences that matched that fragment of the original record. However, note that some results maybe shorter than expected
Universal Papaya Umbra-like Viruses Primers
217
if there is too high variability on the ends. Also, while the ‘Coding sequences’ option of the ‘Send to:’ menu will obtain the CDS on the right translation direction (reversecomplementing the sequence if necessary), the ‘FASTA (aligned sequences)’ option will not (and should be taken in consideration when making multiple sequence alignments). 15. As mentioned on Note 3, FASTA files should be edited and manipulated using text editors that open and save files in simple text format. Some text editors also have the option to save in ‘rich text format,’ which also must be avoided. 16. The visual inspection of a multiple sequence alignment on Clustal format relies on the proper alignment of the characters displayed on the file. To accomplish this, a monospaced font (also called fixed-pitch, fixed-width, or nonproportional font) must be used. Some examples of monospaced fonts available on most systems are Courier, Courier New, Lucida Console, Monaco, or Consolas. Typical default document fonts like Arial, Times New Roman, Calibri, and Helvetica are not monospaced and will wrongly display the alignment file. If the alignment looks wrong, select all (Ctrl+A) and change the font accordingly. 17. Sequences from Colombia are truncated at the 30 end and are missing about 364 nucleotides. The 50 end also differs for some of the sequences aligned. These differences produce gaps on both ends of the alignment. If these gaps are used during the phylogenetic inference, the algorithm considers each one as an evolutionary event of insertion or deletion wrongly inflating the evolutionary distances between the sequences. Selecting the ‘NJ Conserved sites’ option will trim all the gaps from the alignment and only consider columns with nucleotides present in all of the records. 18. Visualization of amplicons smaller than 200 bp on agarose gels could be affected by an excess of primers in the reaction tube or when there is no amplification—e.g., in the negative controls— a spot of primers can be seen masking bands absence/presence. Use agarose concentration according to amplicon size. Larger than 1 kb amplicons will take more thermocycler time, it also slows the diagnosis process. 19. Primer length and sequence determine annealing temperature (Tm), which in turn determines PCR success. Tm should not differ more than 5 C between Forward and Reverse primers, but a difference of 2 C or less is ideal. A primer length of 18–20 nucleotides is recommended with a Tm from 54 C to 60 C, although Tms closer to 60 C are ideal to reduce selfannealing (loops/harping formation) and primer dimers (homo- and heterodimers) possibilities.
218
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
20. Please note that the sequence in Fig. 3 is just for the purpose of representation of the primer position at the conserved region, but the actual universal Reverse primer sequence has to be ‘reverse-complemented’ since it will be part of the complementary strand. To get the reverse-complement, copy the sequence and paste it in the box of the ‘Reverse complement tool’ at https://www.bioinformatics.org/sms/rev_comp.html, make sure than the ‘reverse- complement Tab’ is selected, press submit. Also, note that in virus diagnosis it is common to use the Reverse primer for first-strand cDNA synthesis; this is done to reduce background, however this can also generate false negatives (if the primer is not recognizing that particular strain of virus) or loss of sensitivity (when a degenerate primer is used). In our experience, diagnosis works very well using random hexamers for first-strand synthesis followed of PCR with specific primers, this cDNA can also be used in the PCR with host gene primers—e.g., the ‘actin’ gene in papaya—as positive control [13]. 21. Keep in mind than combinations of the degenerated bases will be produced when the primer is synthesized, therefore variations on the calculated Tm should be expected. As stated before, having a universal primer would allow detecting virus variants that might have not yet being reported, however the diagnosis sensitivity could be reduced. 22. To adjust parameters in ‘Oligo analyzer,’ go to the ‘Options’ tab and fill in the boxes the following values: 30 -tail length ‘7,’ Salt concentration ‘50’ mM, DNA Conc. ‘500’ pM, click the Save button. 23. By default, the PCR product is expected to have a length between 70 and 1000 bp. The optimal Tm for the primers is, by default, between 57 C and 63 C. If the primers designed in Subheading 3.7 are outside these default values of product length and/or Tm, the values should be changed accordingly.
Acknowledgments This work was funded by the CONACYT research grant A1-S19850 to L.L.O. References 1. Dolja VV, Koonin EV (2018) Metagenomics reshapes the concepts of RNA virus evolution by revealing extensive horizontal virus transfer. Virus Res 244:36–52. https://doi.org/10. 1016/j.virusres.2017.10.020
2. Roossinck MJ, Martin DJ, Roumagnac P (2015) Plant virus metagenomics: advances in virus discovery. Phytopathology 105:6, 716–727 3. Dolja VV, Krupovic M, Koonin EV (2020) Annu Rev Phytopathol 58:23–53. https://
Universal Papaya Umbra-like Viruses Primers doi.org/10.1146/annurev-phyto030320-041346 4. Rubio L, Galipienso L, Ferriol I (2020) Detection of plant viruses and disease management: relevance of genetic diversity and evolution. Front Plant Sci 11:1092. https://doi.org/10. 3389/fpls.2020.01092 ˜ o RI, Casarrubias-Castillo K, 5. Alcala-Brisen Lo´pez-Ley D et al (2020) Network analysis of the Papaya Orchard Virome from two agroecological regions of Chiapas, Mexico. mSystems 5:e00423-19. https://doi.org/10.1128/ mSystems.00423-19 ˜ a6. Garcia-Camara I, Tapia-Tussell R, Magan Alvarez A et al (2019) Empoasca papayae (Hemiptera: Cicadellidae)-mediated transmission of Papaya meleira virus-Mexican variant in Mexico. Plant Dis 103:2015–2023 ˜ero LJ 7. Sa´-Antunes TF, Maurastoni M, Madron (2020) Battle of three: the curious case of papaya sticky disease. Plant Dis 104:2754–2763. https://doi.org/10.1094/ PDIS-12-19-2622-FE 8. Campbell P (2018) New test to offer early detection of papaya sticky disease. Papaya Press. https://australianpapaya.com.au/ website/wp-content/uploads/2018/05/ PAPAYAPRESS-MAY.pdf 9. Maciel-Zambolim E, Kunieda-Alonso S, Matsuoka K, De Carvalho M, Zerbini F (2003) Purification and some properties of Papaya meleira virus, a novel virus infecting papayas in Brazil. Plant Pathol 52:389–394 10. Abreu EFM, Daltro CB, Nogueira EOPL et al (2015) Sequence and genome organization of papaya meleira virus infecting papaya in Brazil. Arch Virol 160:3143–3147 11. Perez-Brito D, Tapia-Tussell R, CortesVelazquez A et al (2012) First report of papaya meleira virus (PMeV) in Mexico. Afr J Biotechnol 11:13564–13570 12. Abreu PMV, Piccin JG, Rodrigues SP et al (2012) Molecular diagnosis of Papaya meleira virus (PMeV) from leaf samples of Carica papaya L. using conventional and real-time RTPCR. J Virol Methods 180:11–17 13. Zamudio-Moreno E, Ramirez-Prado J, Moreno-Valenzuela O et al (2015) Early diagnosis of a Mexican variant of Papaya meleira virus (PMeV-Mx) by RT-PCR. Genet Mol Res 14:1145–1154 14. Quito-Avila DF, Alvarez RA, Ibarra MA et al (2015) Detection and partial genome sequence of a new umbra-like virus of papaya discovered in Ecuador. Eur J Plant Pathol 143:199–204 15. Sa Antunes TFS, Amaral RJV, Ventura JA et al (2016) The dsRNA virus papaya meleira virus
219
and an ssRNA virus are associated with papaya sticky disease. PLoS One 11:e01552 ˜ a-Alvarez A, Cortes16. Tapia-Tussell R, Magan Velazquez A et al (2015) Seed transmission of Papaya meleira virus in papaya (Carica papaya) cv. Maradol. Plant Pathol 64:272–275 17. Resource Coordinators NCBI (2016) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 44: D7–D19. https://doi.org/10.1093/nar/ gkv1290 18. Zhang Z, Schwartz S, Wagner L et al (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7:203–214. https://doi.org/ 10.1089/10665270050081478 19. Katoh K, Rozewicki J, Yamada KD (2019) MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 20:1160–1166. https://doi.org/10.1093/bib/bbx108 20. Ye J, Coulouris G, Zaretskaya I et al (2012) Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinform 13(1):134. https://doi.org/10. 1186/1471-2105-13-134 21. Karsch-Mizrachi I, Takagi T, Cochrane G & International Nucleotide Sequence Database Collaboration (2018) The international nucleotide sequence database collaboration. Nucleic Acids Res 46:D48–D51. https://doi.org/10. 1093/nar/gkx1097 22. Sievers F, Higgins DG (2021) The clustal omega multiple alignment package. In: Katoh K (ed) Multiple sequence alignment. Methods in molecular biology, vol 2231. Humana Press, New York, NY, pp 3–16. https://doi.org/10. 1007/978-1-0716-1036-7_1 23. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792–1797. https://doi.org/10.1093/ nar/gkh340 24. Roshan U, Livesay DR (2006) Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformat 22:2715–2721. https://doi.org/10.1093/bio informatics/btl472 25. Lo¨ytynoja A (2014) Phylogeny-aware alignment with PRANK. In: Russel D (ed) Multiple sequence alignment methods. Methods in molecular biology (Methods and protocols), vol 1079. Humana Press, Totowa, NJ, pp 155–170. https://doi.org/10.1007/ 978-1-62703-646-7_10 26. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.
220
Jorge H. Ramirez-Prado and Luisa A. Lopez-Ochoa
https://doi.org/10.1093/oxfordjournals. molbev.a040454 27. Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kans Sci Bull 38:1409–1438 28. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 8:1586–1591. https://doi.org/10.1093/ molbev/msm088 29. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010)
New algorithms and methods to estimate maximum likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321. https://doi.org/10.1093/sys bio/syq010 30. Jukes TH, Cantor CR, Munro HN, Allison JB (1969) Evolution of protein molecules. In: Mammalian protein metabolism. Academic, New York
Part VI Use of Software for Primer Design
Chapter 16 A Guide to Using FASTPCR Software for PCR, In Silico PCR, and Oligonucleotide Analysis Ruslan Kalendar Abstract The FastPCR software is an integrated tool environment for PCR primer and probe design and for prediction of oligonucleotide properties. The software provides comprehensive tools for designing primers for most PCR and perspective applications, including standard, multiplex, long-distance, inverse, real-time with TaqMan probe, Xtreme Chain Reaction (XCR), group-specific, overlap extension PCR for multifragment assembling cloning, and isothermal amplification (Loop-mediated Isothermal Amplification). A program is available to design specific oligonucleotide sets for long sequence assembly by ligase chain reaction and to design multiplexed of overlapping and nonoverlapping DNA amplicons that tile across a region(s) of interest for targeted next-generation sequencing, competitive allele-specific PCR (KASP)based genotyping assay for single-nucleotide polymorphisms and insertions and deletions at specific loci, among other features. The in silico PCR primer or probe search includes comprehensive analyses of individual primers and primer pairs. FastPCR includes various bioinformatics tools for analysis and searching of sequences, restriction I–II–III-type enzyme endonuclease analysis, and pattern searching. The program also supports the assembly of a set of contiguous sequences, consensus sequence generation, and sequence similarity and conservancy analysis. FastPCR performs efficient and complete detection of various repeat types with visual display. FastPCR allows for sequence file batch processing that is essential for automation. The software is available for download at https://primerdigital.com/fastpcr.html and online version at https://primerdigital.com/tools/pcr.html. Key words PCR primer and probe design, PCR design software, Nucleic acid amplification technologies, Loop-mediated isothermal amplification, In silico PCR primer or probe search, Bioinformatic genome analysis
1
Introduction The polymerase chain reaction (PCR) is a fundamental technique in genetic engineering and is the most important practical molecular technique for the research laboratory. The principle of this technique has been further applied in several other simple or complex approaches, known as nucleic acid amplification technologies (NAAT) [1, 2]. The utility of PCR is dependent on identifying
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_16, © Springer Science+Business Media, LLC, part of Springer Nature 2022
223
224
Ruslan Kalendar
specific primer sequences and designing PCR-efficient primer(s). Primer design is a critical step in all types of DNA amplification methods and is necessary to ensure specific and efficient amplification of a target sequence [3, 4]. Although there are currently many online tools and commercial bioinformatic software programs available, primer design for DNA amplification methods is still not as convenient and practical as it might be for routine use [5]. A variety of thermocycling and isothermal techniques currently exist for amplification of nucleic acids [6–8]. Thermocycling techniques, such as PCR, use temperature cycling to drive repeated cycles of DNA synthesis leading to large amounts of new DNA being synthesized in proportion to the original amount of template DNA. A number of isothermal techniques have also been developed that are not based on thermal cycling to control the amplification reaction. One such method is Loop-mediated Isothermal Amplification (LAMP) [6, 9], in which the template DNA is mixed with oligonucleotide primers and polymerase with high chain displacement activity. The mixture is kept at a constant temperature of 60–75 C. Other methods of isothermal DNA amplification [10, 11] likewise depend on the chain displacement activity of specific DNA polymerases. Such methods include strand displacement amplification (SDA), helicase-dependent amplification (HDA), recombinase polymerase amplification (RPA), rolling circle amplification (RCA), transcription-mediated amplification (TMA), self-sustained sequence replication (3SR), and nucleic acid sequence-based amplification (NASBA) [2]. Each of these targeted amplification strategies requires the use of oligonucleotide primers. The amplification process results in an exponential amplification of amplicons that include oligonucleotide primers at their 50 ends and contain newly synthesized copies of the sequences located between the primers. The length and composition of the primer will depend on many factors, including melting temperature (Tm) and source and composition of the primer. For example, depending on the complexity of the sequence, the primer usually contains 12–50 nucleotides, although it may also contain more or fewer nucleotides. Targeted sequencing of genomic regions is a widespread practice in biomedical research. There are several methods for targeted sequencing, each of which is suitable for a specific task. Targeted enrichment based on PCR is a key step, as different targeting regions can be freely combined or excluded by the selection of PCR primers. The main difficulty in enrichment based on PCR is to ensure representative amplification of all targeted regions and uniform coverage of subsequent sequencing as a result. This requires careful primer design, which considers conditions such as uniformity of primer length, comparable thermodynamic
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
225
parameters, and similar melting temperatures. The combination of several primers together in the same reaction inevitably creates the possibility of undesirable interactions between primers, which negatively affect amplification itself and, therefore, the coverage of the sequence. Thus, an important step for optimization is the allocation of primers to various pre-amplification and multiplex PCR reactions to avoid unwanted primer interactions. The key to every oligonucleotide-based assay is the specific binding between the oligonucleotides and their target DNA. The specificity of oligonucleotides is of great importance in nucleic acid technologies such as DNA amplification and detection [5]. Therefore, accurate determination of the melting temperature of both complementary DNA duplexes and duplexes with mismatches is necessary to optimize the design of primers and probes in silico using machine learning based on an extensive experimental dataset of DNA/DNA duplexes. The adaptation of PCR for different applications has made it necessary to develop new criteria for PCR primer and probe design to cover uses such as multiplex PCR, microarray analysis, Insulated Isothermal PCR (iiPCR) (https://www.genereach.com/index.php? func¼technology) [12], Xtreme Chain Reaction (XCR) (https:// fluoresentric.com/principal-of-xcr/), NanoString multiplex analysis for the nCounter platform (https://www.nanostring.com/scien tific-content/technology-overview/ncounter-technology), NextGeneration Targeted Sequencing (NGS) based on PCR amplification, and related that methods require the use of mixtures or multiple oligonucleotides (primers or probes) in one tube. In developing FastPCR and online Java Web tools [13–16], our aim was to create a practical and easy-to-use software for routine manipulation and analysis of sequences for most PCR applications. The parameters adopted are based on our experimental data for efficient PCR and are translated into algorithms to design combinations of primer pairs for optimal amplification.
2
Software, General Information The FastPCR software (https://primerdigital.com/fastpcr.html) can be used with any version of Microsoft Windows. The online version of FastPCR software (https://primerdigital.com/tools/ pcr.html) is written in Java with NetBeans IDE (Apache) and requires the Java Runtime Environment (the Java SE 8 Platform) (https://www.oracle.com/java/technologies/javase-downloads. html). The program can be used with any operating system (a 64-bit OS is preferred).
226
3
Ruslan Kalendar
The Interface
3.1 Inputs to FastPCR
The software contains menus, toolbars, and a ribbon and three text editors. The ribbon is designed to help the user quickly find the commands required to complete a task. Commands are organized in logical groups, which are collected under tabs (Fig. 1). Each tab relates to a type of activity, such as “PCR Primer Design,” “in silico PCR,” or “Primer Test.” Getting started with a basic project in FastPCR software is as easy as opening a new or existing text file, performing copy–paste, or simply by typing. There are three independent text editors under different tabs: “General Sequence(s),” “Additional sequence(s) or predesigned primers (probes) list,” and “Result report.” The first two text editors are necessary for loading sequences for analysis; the text editor “General Sequence(s)” is designed for working with the project sequences. The “Additional sequence (s) or predesigned primers (probes) list” text editor can be applied to special and additional sequences, such as predesigned primers, multiple query sequences, or the numbers for input.
Fig. 1 FastPCR sequence editor and user interface
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
227
3.2
Program Output
FastPCR automatically generates results in the third text editor “Results report” in tabulated format. This allows transfer to a Microsoft Excel sheet via copy–paste. Alternatively, output results are easy to save as an .xls or .rtf text file, which is compatible with Excel or Open Office. The separated output of the primer design is a list of primers, a set of primer pair sequences with their theoretical PCR products, and for multiplex PCR, the result of the calculation of multiple PCR primers for given target sequences. In addition, the output shows the optimal annealing temperature (Ta) for each primer pair and the size of PCR product and complete information for each designed primer and for multiplex PCR product set.
3.3
Sequence Entry
The sequence data file is prepared using a text editor (Notepad++, WordPad, etc.) and saved in ASCII in a text/plain format or Rich Text Format (.rtf). The program takes either a single sequence or accepts multiple separate DNA sequences in FASTA (http://blast. ncbi.nlm.nih.gov/blast/fasta.shtml), tabulated format (two or three columns from an Excel sheet or table), EMBL (https:// www.ebi.ac.uk/services) [17], MEGA [18], GenBank and MSF (https://www.ncbi.nlm.nih.gov/books/NBK44863/), or simple alignment formats. For tabulated format, the software reads the first two columns, which includes a column for name and for sequence. For three and four columns, the software interprets them as primer pairs and probe. The template length is not limited. The FastPCR clipboard allows the user to copy and paste operations of text or table from Microsoft Office documents or an Excel worksheet or other programs and paste them into another Office document. It is important that the entire target sequences are prepared with the same format. The user can type or import from file(s) into the “General Sequence(s)” or “Additional sequence(s) or predesigned primers (probes) list” editors. When a file is opened in FastPCR, users have several options for how the file opens. The user can open the original file as read-only for editing with text editors, or open to memory without opening to text editors, which allows the user to open larger file(s) up to 300 Mb. To work with files in a directory, the user points to any file in this directory or selects a list of files, or directs the program through a special menu to use all the current files in this directory. The program will open each file while executing the task without opening it to the text editor. Additionally, the user can open all files from the selected folder and the program will join all files in the text editor. For example, this feature can be applied for converting all files from a selected folder to a single file of the list of FASTA sequences. Alternatively, this feature allows splitting FASTA sequences to individual files in the selected folder.
228
Ruslan Kalendar
The program takes either a single sequence or accepts multiple separate DNA sequences in FASTA (http://blast.ncbi.nlm.nih. gov/blastcgihelp.shtml) or text file format (tab delimited). The FASTA format is preferred; it is simply the raw sequence preceded by a definition line. The definition line begins with a “>” sign that can be optionally followed by a sequence name of any length and amount of words with no space in between. There can be many sequences listed in the same file. The format requires that a new sequence always starts with a new “>” symbol. Degenerate DNA sequences are accepted as IUPAC code (https://www.qmul.ac.uk/sbcs/iubmb/misc/naseq.html), which is an extended vocabulary of 11 letters that allows the description of ambiguous DNA code. Each letter represents a combination of one or several nucleotides: M ¼ (A/C), R ¼ (A/G), W ¼ (A/T), S ¼ (G/C), Y ¼ (C/T), K ¼ (G/T), V ¼ (A/G/C), H ¼ (A/C/ T), D ¼ (A/G/T), B ¼ (C/G/T), N ¼ (A/G/C/T), U ¼ T, and I (Inosine). The user can type or import data from file(s) into the “General Sequence(s),” “Additional sequence(s), or predesigned primers (probes) list” editors. In FastPCR software, users have several options for how to open a file when starting the program. The user can open the original file as read-only to work with text editors or open a file to memory without opening it to text editors, which allows the user to open a larger file(s). For genome analysis, the user can open all files from a selected folder and the program will open each file while executing the task without opening it to a text editor. When a sequence file is open, the FastPCR software displays the information about the opened sequences and the sequence formats. The information status bar shows the number of sequences, total sequence length (in nucleotides), nucleotide composition, and the purine, pyrimidine, and GC content. When saving a file from the current text editor, the user clicks the desired file format to save in, such as Rich Text Format (.rtf), Excel worksheet (.xls), or text/plain format (.txt).
4
The PCR Primers or Probe Design Analyze Options
4.1 PCR Primer Design Generalities
Primer design is one of the key steps for successful PCR. For PCR applications, primers are usually 18–35 bases in length and should be designed such that they have complete sequence identity to the desired target fragment to be amplified. The parameters, controllable either by the user or automatically, are primer length (12–500 nt), melting temperature (Tm) for short primers (calculated by nearest neighbor thermodynamic parameters), the theoretical primer PCR efficiency (quality at %) value, primer GC content, 30 end melting temperature, preferable 30 terminal
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
229
Table 1 Default primer design selection criteria Criteria
Default
length (nt)
Tm range ( C) a
20–23 a
53–55 0
Tm 12 bases at 3 -end 0
0
25–40 0
3 -end composition (5 -nnn-3 ) Sequence linguistic complexity (LC, %) Sequence quality (PQ, %)
swh, ssw, wsh, sww, www b
>75 >75
a
Nearest neighbor thermodynamic parameters SantaLucia [20] Sequence linguistic complexity measurement was performed using the alphabet-capacity l-gram method
b
nucleotide sequence composition in degenerated formulae, and added sequence tags at 50 termini. The other main parameters used for primer selection are the general nucleotide structure of the primer, such as linguistic complexity (nucleotide arrangement and composition in %); specificity; the melting temperature of the entire primer and the melting temperature at the 30 and 50 termini; self-complementarity; and secondary (nonspecific) binding. The software can dynamically optimize the best primer length for the entered parameters. All PCR primer (probe) design parameters are flexible and changeable according to the specifics of the analyzed sequence and task. Primer pairs are analyzed for crosshybridization, specificity of both primers, and, optionally, for similar melting temperatures. Primers with balanced melting temperatures (within 1–3 C of each other) and the thermodynamic free energy (dG) (within 1–3 kcal/mol of each other) are desirable but not mandatory. The default primer design selection criteria are shown in Table 1. It is possible to use predesigned primers or probes, or, alternatively, predesigned primers can act as references for the design of new primers. The program accepts a list of predesigned oligonucleotide sequences and checks the compatibility of each primer with a newly designed primer or probe. We consider that the GC content parameter for primer evaluation is outdated and unnecessary. Basic thermodynamic parameters (the thermodynamic free energy and the melting temperature) of primer indirectly already include GC content [19]. Therefore, this parameter was unplugged from the user’s control. 4.2 Melting Temperature Calculation
The melting temperature (Tm) is defined as the temperature at which half the DNA strands are in a double-helical state and half are in the “random-coil” state. The Tm for short oligonucleotides with normal or degenerate (mixed) nucleotide combinations is calculated in the default setting using nearest neighbor thermodynamic parameters [20, 21]. The Tm is calculated using a formula
230
Ruslan Kalendar
based on nearest neighbor thermodynamic theory with unified dS and dH parameters: Tm C ¼
dS þ R ln
c f
dH þ 0:368ðL 1Þ ln ð½K þ Þ
273:15
where dH is enthalpy for helix formation (cal/M), dS is entropy for helix formation (cal/K M), R is molar gas constant (1.987 cal/ K M), and c is the nucleic acid molar concentration (250 109 M), [K+] is salt molar concentration (default value is 0.05 M), and f ¼ 4 when the two strands are different and f ¼ 1 when self-hybridization takes place. The Tm for mixed bases is calculated by averaging the nearest neighbor thermodynamic parameters (enthalpy and entropy values) at each mixed site; the extinction coefficient is similarly predicted by averaging the nearest neighbor values at mixed sites. Mismatched pairs can be considered since the parameters provide for DNA/DNA duplexes and dangling ends (unmatched terminal nucleotides) [22]. The Tm for primer (probe) self or cross-dimers and for in silico PCR experiments with oligonucleotides with mismatches to the target is calculated using values for the thermodynamic parameters for a nucleic acid duplex. 4.3 Primer Quality (Virtual PCR Efficiency) Determination
The primer nucleotide composition and Tm of the 12 bases at the 30 -terminal position of the primers are important factors for PCR efficiency. The composition of the sequence at the 30 terminus is important; primers with two terminal C/G bases are recommended for increased PCR efficiency [23]. Nucleotide residues C and G form a strong pairing structure in the duplex DNA strands. Stability at the 30 end in primer–template complexes will improve polymerization efficiency. More than 3 G’s or C’s should be avoided in the last six bases at the 30 end of the primer. The last 12 bases of the 30 end of the hybridizing part of the primer should preferably not exceed 42 C; the last 12 bases of the 50 end of the hybridizing part of the primer should preferably have at least 42 C. Polynucleotide stretches should be avoided; runs of three or more guanine residues in particular can cause problems due to intermolecular stacking. Primers with Tm in the range of 55–60 C generally produce the best results. We specify an abstract parameter called Primer Quality (PQ) that can help estimate the efficiency of primers for PCR. PQ is calculated as the lower value of the following parameters: total sequence complexity and the Tm of the whole primer and of the terminal 30 and 50 . Self-complementarity, which gives rise to potentially stable dimer and hairpin structures, and long runs of a single base reduce the final value. PQ tries to describe the likelihood of PCR success of each primer; this value varies from 100% (best primer) to 0% (worst primer).
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
231
To meet multiplexing demands, it is possible in the program to select the best primer with an optimal temperature range, which allows for the design of qualified primers or probes for any target sequence with any GC and repeat content. PQ values 80 allow for rapid choice of the best PCR primer pair combination. No adverse effects due to modification of the reaction buffer, chosen thermostable polymerases, or variations in annealing temperature have been observed in the reproducibility of PCR amplification using primers with high PQ. 4.4 Hairpin (Loop) and Dimer Formation
Primer dimers involving one or two sequences may occur in a PCR reaction. The FastPCR tool eliminates intra- and interoligonucleotide reactions before generating a primer list and primer pair candidates. It is very important for PCR efficiency that the production of stable and inhibitory dimers is avoided. In particular, complementarity at the 30 ends of primers from where the polymerase will extend should be avoided. Stable primer dimer formation is very effective at inhibiting PCR, since the dimers formed are amplified efficiently and will compete with the intended target. Primer dimer prediction is based on analysis of nongap local alignment and the stability of both the 30 end and the central part of the primers. Primers will be rejected when they have the potential to form stable dimers based on nucleotide composition and the presence of at least 5 G/C bases at the 30 end. Tools calculate the Tm for primer dimers with mismatches for pure, mixed, or modified (inosine, uridine, or locked nucleic acid) bases using averaged nearest neighbor thermodynamic parameters provided for DNA/DNA duplexes [20, 21, 24]. In addition to Watson–Crick base pairing, there are a variety of other hydrogen bonding configurations possible that FastPCR can detect, such as G-quadruplexes or G-T base pairs. Mismatch stability is the following (in order of decreasing stability):G-C>A-T>G·G>G·TG·A>T·TA·A>T·CA·CC·C. Guanine is the most universal base, since it forms the strongest base pair and the strongest mismatches. On the other hand, cytosine is the most discriminating base, since it forms the strongest pair and the three weakest mismatches [25]. Therefore, the tools also assess stable guanine mismatches (G-G, G-T, and G-A). Guanine-rich nucleic acid sequences can fold into four-stranded DNA structures that contain stacks of G quartets. These quadruplexes can be formed by the intermolecular association of two or four DNA molecules, dimerization of sequences that contain two G bases, or by the intermolecular folding of a single strand containing four blocks of G. FastPCR predicts the presence of putative G-quadruplexes in primer sequences. Intermolecular guanine-quadruplex-forming sequences are detected according to the d(G3+N1-7G3+N1-7G3+N1-7G3+), where N is any nucleotide base (including guanine) [22, 26]. The gap sequences (N17) may have varying lengths, and a relatively stable quadruplex structure may still be
232
Ruslan Kalendar
formed with a loop seven bases long. However, increasing the length of the gap generally leads to a decrease in structure stability. It is also possible for one of the gaps to be zero length when there are long polyguanine tracts of >6 bases. 4.5 Secondary Nonspecific Binding Test; Alternative Amplification
5
Oligonucleotide specificity is one of the most critical factors for PCR efficiency. Optimal primers should hybridize only to the target sequence, particularly when complex genomic DNA is used as the template. Amplification problems can arise due to primers annealing to interspersed repetitive sequences or to homologous sequences [27, 28]. Alternative and unexpected product amplification can also occur when primers are complementary to or to homologous sequences. This is unlikely when primers have been designed using specific DNA sequences. However, the generation of interspersed inverted repetitive sequences is exploited in the Random Amplified Polymorphic DNA (RAPD) [29, 30] or the Inter-Simple Sequence Repeat (ISSR) methods [31, 32]. Similar PCR-based DNA fingerprinting techniques such as InterRetrotransposon Amplification Polymorphism (IRAP), Retrotransposon-Microsatellite Amplification Polymorphisms (REMAP) [33, 34], Inter-MITE (IMP) [35] or Inter-SINE amplification polymorphism (ISAP) [36], and Inter-Primer Binding Site (iPBS) amplification polymorphism [37] have exploited these highly abundant interspersed repeats as the target [38–42]. However, primers complementary to interspersed repetitive sequences may produce many nonspecific bands in a single-primer amplification and compromise performance of targeted PCRs [42, 43]. A homology search of the primer sequence, for example, using ‘blastn’ against all sequences in GenBank, will determine whether the primer is likely to interact with interspersed repeats. By default, FastPCR performs a nonspecific binding test for each given sequence. Additionally, the software allows this test to be performed against a reference sequence or sequences (e.g., BAC, YAC) or against a database of choice. Primers that bind to more than one location on current sequences will be rejected. Although the nonspecific primer binding test is performed as a default for all primers, the user may fully control this function [42].
Methods FastPCR provides various execution features once the input files are selected or the sequence is copy-pasted to the General Sequence(s) text editor. Figure 1 shows a design from a user’s perspective for primer design.
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
233
Fig. 2 Result of the multiplex set for the mPCR design is shown at the Result report 5.1 Execution Selected Task
The user selects the ribbon with a task required. The program will only perform the selected task. Depending on which task the user selects, pressing F5 will execute the current task (Fig. 2). The user either presses F5 or clicks on the toolbar to execute the current task. Once the executive task is complete, the result is shown in the Result report text editor. Figure 2 shows a sample result visualization window.
5.2 PCR Primer Design Options
To perform PCR analysis, select PCR -> PCR options from the menu bar. The PCR Primer or Probe Design Options dialog box appears (Fig. 3). The PCR Primer or Probe Design Options dialog box allows the user to designate basic parameters for the PCR reaction and the primers that are generated. The “PCR Primer or Probe Design Options” dialog contains various execution options for selection of PCR type and the most important PCR parameters. Figure 3 shows “PCR Primers or Probe Design Options” on the option panel. Once the user selects any attribute, the option attribute value field shows the default attributes value, which can then be modified. “PCR Primers or Probe Design Options” affects all sequences.
234
Ruslan Kalendar
Fig. 3 “PCR Primers or Probes Design Options” window
For individual PCR primer design options for each sequence, the user can type special commands at the header of the sequence. Typically, the user does not need to use commands to manage PCR primer design; these commands are used optionally and only for advanced tasks. The user can type in the text editor this help command: ‘/?’ and software replaces it with default global parameters for primer design: // -ln21-25 -tm54-56 -3tm25-40 -q75 -lc75 -npr5000 -c5[nn] -c3 [swh ssw wsh sww www] -dmr = 1 (20-5000) //,
where ‘-ln21-25’ determines the range of primer length (21–25 bases);‘-tm54-56’ determines the range of primer Tm (54–56 C);‘-3tm25-40’ determines the range of primer Tm at 3’end (25–40 C);‘-npr5000’ shows the limit for maximal primer amount designed for each target (5000);‘-dmr¼1’ sensitivity for primer dimer detection, default value is 1 (as the strictest criterion), the higher the value, the lower the detection sensitivity, if the value is 0, the program will not assess for the presence of primer dimers, the optimal value of this parameter is 2, as the most realistic; (20–5000) boundaries of the size of PCR amplicons, the exact size is allowed, in this case, the size of the PCR amplicon will be in the range of 20–5000 base pairs;‘-c5[nn]’ primer has no specific sequence pattern for 50 ends;‘-c3[swh ssw wsh sww www]’ specifies for primers 3’ends with these patterns with three bases per pattern.
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
235
Adding non-template DNA sequences to primer ends: adding sequence to the 50 end with command: ‘-5e[NN]’ or adding sequence to the 30 end with command: ‘-3e[NN]’, where ‘NN’ is a sequence of one or more bases, for example: ‘-F5e[CGACG] -R5e[TTTTTT]’, adds sequence ‘CGACG’ to forward primers and sequence ‘TTTTTT’ to reverse primers at 50 ends. For tasks in which it is necessary to specify several allelic variants with individual added tail sequences, which can be indicated with a separator, the forward slash character “/” is used to denote each tail variant. For example, for biallelic scoring of single-nucleotide polymorphisms (SNPs) and insertions with deletions (Indels) at specific loci in relation to the development of competitive allele-specific PCR (KASP) genotyping assays (https://www.biosearchtech. com/support/education/kasp-genotyping-reagents), two (up to four) variants of the added tail sequences like that: -p5e[GAAGGTGACCAAGTTCATGCT/GAAGGTCGGAGTCAACGGATT].
5.3 Examples for Primer Selection Region
The user can specify individually for each sequence multiple locations for both forward and reverse primer designs using ‘[’and ‘]’ inside each sequence. The software allows multiple and independent locations of both forward and reverse primer designs inside each of the sequences, while PCR design will be performed independently for different targets. Multiplex PCRs can be performed simultaneously within a single sequence with multiple amplicons, for different sequences, or combinations of both, such as all possible combinations of ‘(’and ‘)’ inside the sequence(s). By default, the software designs primers within the entire sequence length. The excluded region list(s) denotes locations where primers and probe must not bind. Multiple excluded regions may be defined per sequence. This feature can be used to avoid unwanted regions (such as introns and SNPs). For example, if the excluded region includes bases 500–1000, for the target selection the following command should be used: -exclude500-1000. An alternative is to use two ‘/’ signs for the start and end of the excluded region (this is possible multiple times): >example [gtcccgagaacctgagtatgcatcacccggatcgcttcttcc/gggaggtgttggggg/ ctatctcggtgttttctgactgcttggcttccgcgagtcattgccatgctagcgta] [attgcaataaccggagcgagatgatgcacc/ccccc/ccttgacaagcgccaataccacgcactattaagagtaaaaaaaa]
236
Ruslan Kalendar
5.4 PCR Primer Design
The PCR primer design algorithm generates a set of primers with a high likelihood of success in any amplification protocol. All PCR primers designed by FastPCR can be used for PCR, isothermal amplification, or sequencing experiments. The program can generate either long oligonucleotides or PCR primers for the amplification of gene-specific DNA fragments of user-defined length. FastPCR provides a flexible approach to designing primers for many applications and for linear and cycler sequences. It will check if either primers or probes have secondary binding sites in the input sequences that may yield additional PCR products. The selection of the optimal target region for the design of long oligonucleotides is performed in the same way as for PCR primers. The basic parameters in primer design are also used as a measure of the oligonucleotide quality, and the thermodynamic stability of the 30 and 50 terminal bases is evaluated. Primer pairs are suggested, and selection of the best pairs is possible. The user can vary the product size or design primer pairs for the whole sequence without specifying parameters by using default or predesigned parameters. The predesigned parameters are specified for different situations. These include sequences with low GC content, long-distance PCR, XCR, TaqMan probes design, microarray or degenerated sequences, or manual input. For thermodynamic compatibility of primers, two options can be specified by the user. Default is primer pair compatibility based on similar melting temperatures (Primer design with Tm priority) for both primers, while second option is based on similar Gibbs free energy (Primer design with dG priority). Designing primers with the same dG will render more efficient primers pairs, and matching Tms is a less accurate approach than matching dGs. Results show the sorted list of best primer candidates and all compatible primer pairs that are optimal for PCR. The program generates primer pairs (and probes, or both primers with probe) from the input sequences and shows the optimal annealing temperature for each primer pair and the sizes of PCR products together with information for each designed primer. Results are generated by the program and show the suggested primers and primer pairs in tabulated format for Excel or Open Office. The spreadsheets show the following properties: automatically generated primer name, primer sequence, sequence location, direction, length, melting temperature, GC content, molecular weight, molar extinction coefficient, linguistic complexity, and PQ. For compatible primer pairs, the annealing temperature and PCR product size with the Tm of the PCR product are also provided.
5.5 Multiplex PCR Primer Design
Multiplex PCR is an approach commonly used to amplify several DNA target regions in a single reaction. The simultaneous amplification of many targets reduces the number of reactions required; multiplex PCR thus increases throughput efficiency. The design of
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
237
multiplex PCR assays is based on the nonrecursion fast method, with the software performing checks on product size compatibility, thermodynamic compatibility, and cross-dimer interaction for all primers. To achieve uniform amplification of the targets, the primers must be designed to bind with equal efficiencies (thermodynamic compatibility) to their targets. FastPCR can quickly design several sets of multiplex PCR primers for all the input sequences, multiplex targets, or both within each sequence. In practical terms, designing primers with almost identical Gibbs free energy and Tm and with optimal annealing temperatures (Ta) is preferably. The Tm of the PCR products is also important as these are related to annealing temperature values. The Tm of a PCR product directly depends on its GC content and length; short products are more efficiently amplified at low PCR annealing temperatures (3000 bp, 60–72 C). The optimal annealing temperature for PCR is calculated directly as the value for the primer with the lowest Tm (Tmmin) and taking into account the length of the PCR fragment (L): T a C ¼ T min m þ ln L [14]. For most multiplex PCRs, there is usually a small variation (up 3–5 C) between the optimal annealing temperatures of all primer pairs and PCR products. The annealing temperature must be optimal to maximize the likelihood of amplifying the target genomic sequences while minimizing the risk of nonspecific amplification. Further improvements can be achieved by selecting the optimal set of primers that maximize the range of common Gibbs free energy. An alternative way to design compatible multiplex PCR primer pairs is to use predesigned primers as a compatible reference list for the design of new primers. The user can also select input options for the PCR products, such as the minimum product size differences between the amplicons. The user can set primer design conditions either individually for each given sequence or use common values. The individual setting has a higher priority for PCR primer or probe design than do the general settings. The results include primers for individual sequences, primers compatible together, product sizes, and annealing temperatures. As clear differentiation of the products is dependent on using compatible primer pairs in the single reactions, the program recovers all potential variants of primer combinations for analyses of the chosen DNA regions and provides (in tabular form) their compatibility with information including primer dimers, cross-hybridization, product size overlaps, and similar alternative primer pairs based on Tm. The user may choose the alternative compatible primer pair combinations that provide the desired product sizes. The user can select predesigned primer pairs from a target for their desired type of PCR reaction by changing the filtering
238
Ruslan Kalendar
conditions as mentioned above. For example, a conventional multiplex PCR requires differently sized (at least by 10 bp) amplicons for a set of target genes, so the value for the minimum size difference between PCR products can be selected. In addition to the need to avoid amplicons of identical size, multiplex PCR must also minimize the generation of primer dimers and secondary products, which becomes more difficult with increasing numbers of primers in a reaction. To avoid the problem of nonspecific amplification, FastPCR allows the selection of primer pairs that give the most likelihood of producing only the amplicons of the target sequences by choosing sequences that avoid repeats or other motifs. The input sequence can be generated by either a single with a minimum of two internal tasks or many sequences with or without internal tasks. Most parameters on the interface are self-explanatory. Optionally, the user is asked to provide the sequence and select the parameters for oligonucleotide design. On the PCR Primer Design tab, the user chooses Multiplex PCR and chooses the limit for multiple PCR-compatible combinations of primer pair sets (default three sets), minimal difference between multiplex PCR products (default is 0, so ignoring this parameters), and maximal difference between Ta of multiplex PCR products (default is 0). After specifying inputs and PCR primer design options, the user can execute the PCR primers design task. Once the primers set design is complete, the result will appear in two Result text editors: PCR primer design result and Multiplexes PCR-compatible pair primers. Figure 2 shows the access to PCR primer design output. For the results text PCR primer design result, the window displays the individual PCR primers design data, including primer list and compatible primer pairs for all sequences and their internal tasks whose primers are found. The second Multiplexes PCR-compatible pair primers window collects the final search result that is presented as a list of the sets of the compatible primer pairs for multiplex PCR. 5.6
In Silico PCR
Predicting the hybridization of primers and probe to targeted annealing sites is the only way to evaluate PCR products [4]. The last 10–12 bases at the 30 end of primer are critical for binding stability; single mismatches can significantly reduce PCR efficiency, and the effect increases with proximity to the 30 end. FastPCR allows simultaneous testing of a single primer or a set of primers designed for multiplex target sequences or genome. For in silico PCR, a quick gapless alignment for detection of primer locations on the target sequence is performed by analyses on both strands using a hash index of 7-, 9-, or 12-mers (containing up to one mismatch within) and by calculating the local similarity and the Tm for the primer sequence. The parameters can be set to allow different degrees of mismatches at the 30 end of the primers. The parameters
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
239
for quick alignment may be set; the minimum is 0–5 mismatches (default 1 mismatch) at the 30 end of the primer. The program can also handle degenerate primer or probe sequences, including those with 50 or 30 tail sequences. Probable PCR products of linear and circular templates can be found using standard or inverse PCR and multiplex PCR or bisulfite-treated DNA sequence. This in silico tool is useful for quickly analyzing primers or probes against target sequences, for determining primer location, orientation, and binding efficiency and for calculating Tm and Ta in PCR. The user must input the preexisting primer’s list into a second Additional sequence(s) or pre-designed primers (probes) list text editor. The amount of preexisting primers is not limited on the primer pair; it can be as much as the user requires. The target sequences either multiple separate DNA sequences or opening files from the entered folder. The user must specify a directory for input for in silico PCR against the whole genome(s) or a list of chromosomes. The program performs a consistent, file-by-file check of each file to the DNA sequence position of the primers. In the In silico PCR tab, the user can execute the search task with F5 or, alternatively, can specify search options such as stringency and PCR product detection options. For the stringency options, the user can specify the number of mismatches that are allowed at the primer 30 end. Once the primer sets design is complete, the result will appear in the following two Result text editors: In silico PCR Result. The results in the In silico PCR Result text editor reports the primer specificity (locations, including target position, similarity, and Tm) and summary of primer pairs in relation to the PCR template, and detailed information on each primer pair, its length, and Ta. Targetspecific primers are shown if found, and the actual targets are listed with detailed alignments between primers and targets (Fig. 4). 5.7 Primer(s) Analyses
Individual primers and primer sets can be evaluated using FastPCR software. The software calculates the primer Tm using default or other formulae for normal and degenerate nucleotide combinations, GC content, extinction coefficient, unit conversion (nmol per OD), mass (μg per OD), molecular weight, linguistic complexity, and primer PCR efficiency. Users can select either DNA or RNA primers with normal or degenerate oligonucleotides or modifications with different labels (e.g., inosine, uridine, or fluorescent dyes). The tools allow the choice of either nearest neighbor thermodynamic parameters or nonthermodynamic Tm calculation formulae. For locked nucleic acid (LNA) modifications, the four symbols dA ¼ E, dC ¼ F, dG ¼ J, and dT ¼ L are used. Both programs perform analyses on-type, which allow the user to view the results immediately on screen. The program can also calculate the volume of solvent required to attain a specific concentration from known mass (mg), OD, or moles of dry oligonucleotide. All
240
Ruslan Kalendar
Fig. 4 In silico PCR the detailed result is shown
primers are analyzed for intra- and inter-primer interactions to form dimers. Primer(s) can efficiently hybridize using the 50 end or middle of the sequences.
6
Availability The FastPCR software is available for download at https:// primerdigital.com/fastpcr.html and an online version is available at https://primerdigital.com/tools/pcr.html. YouTube tutorial videos are available at https://www.youtube.com/user/ primerdigital. The program manual and files for installation are available on the Internet at https://primerdigital.com/fastpcr/. The online version of the FastPCR software requires the Java Runtime Environment (the Java SE 8 Platform) (https://www. oracle.com/java/technologies/javase-downloads.html)or use OpenWebStart software (https://openwebstart.com/download/ ), an open-source reimplementation of the Java Web Start technology for latest Java SE 11 (LTS) or Java SE 16. Users should add the URL (https://primerdigital.com/) of this application to the
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis
241
Exception Site List (https://www.java.com/en/download/help/ exception_sitelist.html), which is located under the Security tab of the Java Control Panel (https://www.java.com/en/download/ help/appsecuritydialogs.html). Users can download self-signed certificate files (https://primerdigital.com/j/primerdigital.cer) and import it to “Signer CA” (Certificate Authority) from the Java Control Panel. Finally, users should set “Security Level” to “High” under the Security tab of the Java Control Panel (as shown here: https://primerdigital.com/image/primerdigital_certificate_ big.png).
Acknowledgments This work was supported by the company PrimerDigital Ltd. (Helsinki, Finland) and partly by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (Grant No. AP08855353). The authors wish to thank Derek Ho (The University of Helsinki Language Centre) for editing and proofreading of the manuscript. References 1. Yasukawa K, Yanagihara I, Fujiwara S (2020) Alteration of enzymes and their application to nucleic acid amplification (Review). Int J Mol Med 46(5):1633–1643. https://doi.org/10. 3892/ijmm.2020.4726 2. Gill P, Ghaemi A (2008) Nucleic acid isothermal amplification technologies: a review. Nucleosides Nucleotides Nucleic Acids 27 (3):224–243. https://doi.org/10.1080/ 15257770701845204 3. Bekaert M, Teeling EC (2008) UniPrime: a workflow-based platform for improved universal primer design. Nucleic Acids Res 36(10): e56. https://doi.org/10.1093/nar/gkn191 4. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL (2012) Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13:134. https://doi.org/10.1186/14712105-13-134 5. Guo J, Starr D, Guo H, Wren J (2020) Classification and review of free PCR primer design software. Bioinformatics 36 (22-23):5263–5268. https://doi.org/10. 1093/bioinformatics/btaa910 6. Shirato K (2019) Detecting amplicons of loopmediated isothermal amplification. Microbiol Immunol 63(10):407–412. https://doi.org/ 10.1111/1348-0421.12734
7. Mayboroda O, Katakis I, O’Sullivan CK (2018) Multiplexed isothermal nucleic acid amplification. Anal Biochem 545:20–30. https://doi.org/10.1016/j.ab.2018.01.005 8. Kim J, Easley CJ (2011) Isothermal DNA amplification in bioanalysis: strategies and applications. Bioanalysis 3(2):227–239. https://doi.org/10.4155/bio.10.172 9. Tomita N, Mori Y, Kanda H, Notomi T (2008) Loop-mediated isothermal amplification (LAMP) of gene sequences and simple visual detection of products. Nat Protoc 3 (5):877–882. https://doi.org/10.1038/ nprot.2008.57 10. James A, Macdonald J (2015) Recombinase polymerase amplification: emergence as a critical molecular technology for rapid, low-resource diagnostics. Expert Rev Mol Diagn 15(11):1475–1489. https://doi.org/ 10.1586/14737159.2015.1090877 11. Qian J, Boswell SA, Chidley C, Lu Z-x, Pettit ME, Gaudio BL, Fajnzylber JM, Ingram RT, Ward RH, Li JZ, Springer M (2020) An enhanced isothermal amplification assay for viral detection. Nat Commun 11(1):5920. https://doi.org/10.1038/s41467-02019258-y 12. Qiu J, Tsai Y-L, Wang H-TT, Chang H-FG, Tsai C-F, Lin C-K, Teng P-H, Su C, Jeng C-C,
242
Ruslan Kalendar
Lee P-Y (2012) Development of TaqMan probe-based insulated isothermal PCR (iiPCR) for sensitive and specific on-site pathogen detection. PLoS One 7(9). https://doi. org/10.1371/journal.pone.0045278 13. Kalendar R, Khassenov B, Ramanculov E, Samuilova O, Ivanov KI (2017) FastPCR: an in silico tool for fast primer and probe design and advanced sequence analysis. Genomics 109 (3-4):312–319. https://doi.org/10.1016/j. ygeno.2017.05.005 14. Kalendar R, Lee D, Schulman AH (2011) Java web tools for PCR, in silico PCR, and oligonucleotide assembly and analysis. Genomics 98 (2):137–144. https://doi.org/10.1016/j. ygeno.2011.04.009 15. Kalendar R, Muterko A, Shamekova M, Zhambakin K (2017) In silico PCR tools for a fast primer, probe, and advanced searching. Methods Mol Biol 1620:1–31. https://doi.org/10. 1007/978-1-4939-7060-5_1 16. Kalendar R, Lee D, Schulman AH (2014) FastPCR software for PCR, in silico PCR, and oligonucleotide assembly and analysis. Methods Mol Biol 1116:271–302. https://doi. org/10.1007/978-1-62703-764-8_18 17. Madeira F, Park YM, Lee J, Buso N, Gur T, Madhusoodanan N, Basutkar P, Tivey ARN, Potter SC, Finn RD, Lopez R (2019) The EMBL-EBI search and sequence analysis tools APIs in 2019. Nucleic Acids Res 47(W1): W636–W641. https://doi.org/10.1093/ nar/gkz268 18. Kumar S, Stecher G, Li M, Knyaz C, Tamura K (2018) MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 35(6):1547–1549. https://doi. org/10.1093/molbev/msy096 19. Benita Y, Oosting RS, Lok MC, Wise MJ, Humphery-Smith I (2003) Regionalized GC content of template DNA as a predictor of PCR success. Nucleic Acids Res 31(16):e99. https://doi.org/10.1093/nar/gng101 20. SantaLucia J (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearestneighbor thermodynamics. Proc Natl Acad Sci USA 95(4):1460–1465. https://doi.org/10. 1073/pnas.95.4.1460 21. Allawi HT, SantaLucia J Jr (1997) Thermodynamics and NMR of internal G.T mismatches in DNA. Biochemistry 36(34):10581–10594. https://doi.org/10.1021/bi962590c 22. Guedin A, Gros J, Alberti P, Mergny JL (2010) How long is too long? Effects of loop size on G-quadruplex stability. Nucleic Acids Res 38 (21):7858–7868. https://doi.org/10.1093/ nar/gkq639
23. Gilson MK, Given JA, Bush BL, McCammon JA (1997) The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys J 72(3):1047–1069. https:// doi.org/10.1016/S0006-3495(97)78756-3 24. Watkins NE Jr, SantaLucia J Jr (2005) Nearestneighbor thermodynamics of deoxyinosine pairs in DNA duplexes. Nucleic Acids Res 33 (19):6258–6267. https://doi.org/10.1093/ nar/gki918 25. SantaLucia J Jr, Hicks D (2004) The thermodynamics of DNA structural motifs. Annu Rev Biophys Biomol Struct 33:415–440. https:// doi.org/10.1146/annurev.biophys.32. 110601.141800 26. Todd AK, Johnston M, Neidle S (2005) Highly prevalent putative quadruplex sequence motifs in human DNA. Nucleic Acids Res 33 (9):2901–2907. https://doi.org/10.1093/ nar/gki553 27. Jurka J (1998) Repeats in genomic DNA: mining and meaning. Curr Opin Struct Biol 8 (3):333–337. https://doi.org/10.1016/ s0959-440x(98)80067-5 28. Kalendar R, Raskina O, Belyayev A, Schulman AH (2020) Long tandem arrays of cassandra retroelements and their role in genome dynamics in plants. Int J Mol Sci 21(8):2931. https:// doi.org/10.3390/ijms21082931 29. Welsh J, McClelland M (1990) Fingerprinting genomes using PCR with arbitrary primers. Nucleic Acids Res 18(24):7213–7218. https://doi.org/10.1093/nar/18.24.7213 30. Williams JGK, Kubelik AR, Livak KJ, Rafalski JA, Tingey SV (1990) DNA polymorphisms amplified by arbitrary primers are useful as genetic markers. Nucleic Acids Res 18 (22):6531–6535. https://doi.org/10.1093/ nar/18.22.6531 31. Sivolap IM, Kalendar RN, Chebotar SV (1994) The genetic polymorphism of cereals demonstrated by PCR with random primers. Cytol Genet 28(6):54–61. https://pubmed.ncbi. nlm.nih.gov/7701604/ 32. Zietkiewicz E, Rafalski A, Labuda D (1994) Genome fingerprinting by simple sequence repeat (SSR)-anchored polymerase chain reaction amplification. Genomics 20(2):176–183. https://doi.org/10.1006/geno.1994.1151 33. Kalendar R, Schulman A (2006) IRAP and REMAP for retrotransposon-based genotyping and fingerprinting. Nat Protoc 1 (5):2478–2484. https://doi.org/10.1038/ nprot.2006.377 34. Kalendar R, Grob T, Regina M, Suoniemi A, Schulman AH (1999) IRAP and REMAP: two new retrotransposon-based DNA
PCR Primer and Probe Design and Oligonucleotide Assembly and Analysis fingerprinting techniques. Theor Appl Genet 98(5):704–711. https://doi.org/10.1007/ s001220051124 35. Chang RY, O’Donoughue LS, Bureau TE (2001) Inter-MITE polymorphisms (IMP): a high throughput transposon-based genome mapping and fingerprinting approach. Theor Appl Genet 102(5):773–781. https://doi. org/10.1007/s001220051709 36. Seibt KM, Wenke T, Wollrab C, Junghans H, Muders K, Dehmer KJ, Diekmann K, Schmidt T (2012) Development and application of SINE-based markers for genotyping of potato varieties. Theor Appl Genet 125(1):185–196. https://doi.org/10.1007/s00122-0121825-7 37. Kalendar R, Antonius K, Smykal P, Schulman AH (2010) iPBS: a universal method for DNA fingerprinting and retrotransposon isolation. Theor Appl Genet 121(8):1419–1430. https://doi.org/10.1007/s00122-0101398-2 38. Kalendar R, Amenov A, Daniyarov A (2019) Use of retrotransposon-derived genetic markers to analyse genomic variability in plants. Funct Plant Biol 46(1):15–29. https://doi. org/10.1071/fp18098 39. Kalendar R, Muterko A, Boronnikova S (2021) Retrotransposable elements: DNA
243
fingerprinting and the assessment of genetic diversity. Methods Mol Biol 2222:263–286. https://doi.org/10.1007/978-1-0716-09972_15 40. Kalendar R, Schulman AH (2014) Transposon-based tagging: IRAP, REMAP, and iPBS. Methods Mol Biol 1115:233–255. https://doi.org/10.1007/978-1-62703-7679_12 41. Hosid E, Brodsky L, Kalendar R, Raskina O, Belyayev A (2012) Diversity of long terminal repeat retrotransposon genome distribution in natural populations of the wild diploid wheat Aegilops speltoides. Genetics 190 (1):263–412. https://doi.org/10.1534/ genetics.111.134643 42. Kalendar R, Kospanova D, Schulman A (2021) Transposon-based tagging in silico using FastPCR software. Methods Mol Biol 2250:245–256. https://doi.org/10.1007/ 978-1-0716-1134-0_23 43. Kalendar R, Shustov AV, Sepp€anen MM, Schulman AH, Stoddard FL (2019) Palindromic sequence-targeted (PST) PCR: a rapid and efficient method for high-throughput gene characterization and genome walking. Sci Rep 9(1):17707. https://doi.org/10.1038/ s41598-019-54168-0
Part VII Primer Design for Newer PCR Approaches
Chapter 17 Pyrosequencing Primer Design for Forensic Biology Applications Kelly M. Elkins Abstract The polymerase chain reaction (PCR) is used to copy DNA in vitro for a variety of applications including amplifying a target DNA, mutating a base, adding tags, and sequencing by synthesis applications. Nextgeneration sequencing (NGS) is a DNA sequencing technology that has been applied to screening cancer and tissue variants, deep sequencing, and gene expression analysis, and more recently, it has been applied to DNA typing for human identification, estimating age, and detecting and differentiating body fluids. Body fluids are normally identified using color tests, microscopy, and immunochromatographic assays. Pyrosequencing is an NGS approach that has been applied to body fluid analysis. The pyrosequencing assays can detect one or several mixed body fluids by analysis of their tissue-specific differentially methylated regions (tDMRs). Here, the process of designing pyrosequencing primers for forensic biology applications is described. Key words Molecular biology, Polymerase chain reaction (PCR), Primer, Next-generation sequencing (NGS), Pyrosequencing, Sequencing by synthesis (SBS), Methylated DNA, Body fluid analysis
1
Introduction Pyrosequencing is a sequencing by synthesis (SBS) next-generation sequencing (NGS) technique that was developed by Mostafa Ronaghi and coworkers in 1996 [1]. It has been applied to identifying DNA single-nucleotide polymorphisms (SNPs), mutations, unknown sequence variants, and insertion–deletions (InDels) [2, 3]. It has also been used to analyze DNA extracted from blood to predict age [4, 5] and body fluids to detect and identify them in forensic biology assays [6–11]. Pyrosequencing is a reasonably cheap and fast method for sequencing short segments of DNA using primers designed to be specific for the target region directly upstream of the variable region [2, 3]. In pyrosequencing, light emission signals base addition in the sequencing reaction [2, 3]. Prior to pyrosequencing, a target region of the DNA is amplified with polymerase chain reaction
Chhandak Basu (ed.), PCR Primer Design, Methods in Molecular Biology, vol. 2392, https://doi.org/10.1007/978-1-0716-1799-1_17, © Springer Science+Business Media, LLC, part of Springer Nature 2022
247
248
Kelly M. Elkins
(PCR) primers, one of which is biotinylated [3]. The amplicons are introduced to magnetic streptavidin-coated beads, where they are immobilized through the formation of a biotin–streptavidin complex [3]. Next, the double-stranded DNA fragments are denatured in a basic solution resulting in single strands [3]. Following wash and neutralization steps, a pyrosequencing primer is added. It anneals to the complementary location in the immobilized singlestranded DNA leading to the formation of a double-stranded stretch [3]. The DNA polymerase enzyme attaches to the doublestranded region formed by the primer binding [2, 3]. In the pyrosequencing reaction, each deoxyribonucleotide triphosphate (dNTP) base (dCTP, dGTP, dTTP and deoxyadenosine50 -(α-thio)-triphosphate (dATPαS) instead of dATP) is dispensed one at a time in a predetermined pattern [2, 3]. When the base released complements the next base in the single strand, the base hydrogen bonds to its complement in the immobilized strand next to the base at the 30 end of the primer [2, 3]. DNA polymerase attaches the base covalently to the primer, and pyrophosphate (PPi) is released in the reaction [2, 3]. Incorporation of the base starts an enzyme cascade [2, 3]. ATP sulfurase enzyme uses the PPi to form ATP in the presence of adenylyl sulfate [2, 3]. Luciferase enzyme uses the ATP produced (but not dATPαS) to produce light [2, 3]. The light and pyrogram peak signals that the complementary base was incorporated, indicating that the introduced base is next in the sequence [2, 3]. If no light is produced, the base was not incorporated; the pyrogram will register a flat line for that dispensation event [3]. The light is detected using a photomultiplier tube (PMT) [3]. A charge-coupled device (CCD) camera detects the light, and a peak is an output in the form of a pyrogram in the instrument software [3]. If the same base is incorporated consecutively in the sequence, a double-height peak will register in the pyrogram [3]. Before the next base is dispensed, residual ATP and dNTP are degraded by apyrase and washed away [3]. Pyrosequencing can sequence and quantitate base incorporation at each site in the target region up to approximately 140 bases [3]. Upon sequencing the human genome in 2001, an additional variable and hereditary layer on the DNA sequence called epigenetic information was found [12]. Scientists observed that genomic DNA can be modified through chemical modification of selected cytosine nitrogenous bases [12]. Attaching a methyl group at the 50 -carbon of cytosine forms 50 -methylcytosine or methylated cytosine [12, 13]. DNA methylation posttranscriptional modifications influence gene function in eukaryotes including tissue-specific gene regulation, aging, carcinogenesis, and X chromosome inactivation [13]. Methylation in promoter regions typically acts to repress gene transcription but, in some cases, causes transcription promotion [12]. Methylation is dynamic and can also be caused by the environment and disease [12]. Methylated cytosines tend to be found
Designing Pyrosequencing Primers
249
in cluster regions called CpG islands in which methylated cytosines are followed by guanines in the 50 to 30 direction [12]. Methylated CpG islands account for approximately 2% of the human genome [13] but 7% of CpG islands, depending upon how they are defined [12]. Pyrosequencing can be used to probe DNA methylation levels at CpG sites [3]. To differentiate between unmethylated and methylated cytosine, the extracted DNA can be treated with bisulfite [13]. At low pH, bisulfite converts unmethylated cytosine to uracil [13]. In PCR, the uracil is copied as thymine. To analyze the methylation pattern in CpG loci, cytosine and thymine will both be dispensed in the dispensation sequence, and the incorporation of one or both will be detected [3]. Tissue-specific differentially methylated regions (tDMRs) have been investigated for forensic applications [14–17]. Tissue samples and mixtures can be partially methylated. The pyrogram can be used to quantify the extent of methylation (percentage) at that site based upon the light produced [3, 18]. Pyrosequencing assays have been designed using epigenetic methylation markers for body fluids, including semen, saliva, blood, and vaginal fluid [6–11]. The BCAS4 marker has been found to be specific for saliva, ZC3H12D has been found to be specific for semen, cg06379435 has been found to be specific for blood, and PFN3A and VE_8 have been found to be specific for vaginal epithelial cells [7, 8, 10, 11]. Assays can be developed to include up to four different sequencing primers [3]; a multiplex assay for semen, blood, saliva, and vaginal cells has been developed [10]. For the assay to work, pyrosequencing primers need to be specific for the region upstream of the CpG, SNP, or de novo target region to be sequenced [3]. Although researchers may use previously designed primers, this limits their targets to the loci covered by the kit or product that is commercially available. Designing new primers affords researchers the flexibility to analyze any target and go beyond what is published or commercially available and is the key to sequencing new loci. The primers can be designed manually or using software such as the commercially available PyroMark™ (Qiagen, Hilden, Germany) software. The primers are selected for a variety of characteristics including length, formation of primer dimers, and if they overlap with a known SNP or CpG site. The software can save time, and programs will offer numerous candidate assays. To probe multiple body fluids, a primer multiplex can be designed and optimized experimentally to sequence the target loci in samples of interest. In this chapter, the process of designing pyrosequencing primers is described. A human body fluid tDMR target will be the example in this protocol.
250
2
Kelly M. Elkins
Materials
2.1 Websites and Software
1. A computer or device connected to the Internet is required to download the target sequence unless it is already available locally. Websites that are sources of genome and SNP data include NCBI Genome (https://www.ncbi.nlm.nih.gov/ genome/), Ensembl (https://uswest.ensembl.org/index. html), SNPedia (https://www.snpedia.com/index.php/ SNPedia), and the UCSC Genome Browser (https://www. genome.ucsc.edu/cgi-bin/hgGateway) (see Note 1). 2. The NCBI Standard Nucleotide Basic Local Alignment Search Tool (BLAST) BLAST Nucleotide (BLASTn) Web page (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PRO GRAM¼blastn&PAGE_TYPE¼BlastSearch&LINK_ LOC¼blasthome) can be used to assess primer specificity (see Note 1). 3. The user has the option of designing pyrosequencing primers manually or using the software. If software such as PyroMark™ is preferred, the user needs to obtain and install it before beginning primer design. For manual design, the OligoAnalyzer™ Tool Web-based software (https://www.idtdna.com/ calc/analyzer) has features for evaluating primer length, hairpin, and primer dimer formation (see Notes 1 and 2). 4. The Web-based UCSC In silico PCR tool, https://genome. ucsc.edu/cgi-bin/hgPcr, can be used to perform in silico PCR using the designed PCR primer set (see Note 1).
2.2 Obtaining a DNA Sequence for the Region to be Sequenced
A target region DNA sequence is needed for primer design. To obtain the sequence, follow the steps below. If the sequence has been obtained, proceed to Subheading 3.1. The Homo sapiens breast carcinoma-amplified sequence 4 (BCAS4) locus will serve as the example, and the UCSC Genome Browser will be used to download the sequence of the gene region and upstream and downstream regions. 1. Open the UCSC Genome Browser. From the front page, select “Genome Browser.” Next, select the Human Assembly “Dec. 2013 GRCh38/hg38,” and in the Position/Search Term box, input “AL031680.20” and select the “Go” button. This browser loads GENCODE Genes, NCBI RefSeq genes, and ENSEMBL annotation sets. The GENCODE is intended to reflect the protein-coding and functional features of genes in the genome [19]. 2. Scroll down to “Regulation” and select “CpG Islands.” On the next page, toggle Display “hide” to “show” and select “submit” and “refresh” (see Note 3). The “CpG Islands” header is
Designing Pyrosequencing Primers
251
now visible in green. The number (“92”) indicates how many CpG islands have been characterized. Right-click on the central green “CpG Islands” header. In the Configure CpG Islands box that appears, click the “+” to expand “All tracks in this collection” and select “unmasked CpG.” Change the display mode to “pack” and “ok” and “CpG 27” appears (Fig. 1). Alternatively, one can scroll down to “Regulation” and select the “CpG Island” heading and select “pack” next to “unmasked CpG” there and select “submit” to display “CpG 27.” 3. To get the DNA sequence for the CpG islands, right-click on the green CpG: 27 number and select “Get DNA for CpG: 27.” Under “Sequence Retrieval Region Options,” more bases can be added upstream and downstream of the CpG island region. I added 100 bases upstream and downstream (see Note 4). 4. Select sequence formatting options “All upper case” and “to lower case” (defaults). Select “get DNA.” The CpG region is shown in the box (see Note 5). Figure 2 shows the retrieved text sequence for the BCAS4 target. The sequence is the upper strand in the 50 to 30 direction.
3
Methods
3.1 Design of Pyrosequencing Primers
The example will focus on the use of the Qiagen PyroMark® Assay Design software [20, 21] but also introduce manual primer design. You will need the locus of interest obtained from Subheading 2 or previously obtained to design the primers. 1. Open the PyroMark® Assay Design software on your computer. To begin a new assay, select “New” in the “File” menu. 2. To create an assay for tDMRs, select the Assay Type “Methylation Analysis (CpG).” In the “Sequence editor” window box that reads “Upper Strand (50 to 30 ),” paste the DNA sequence obtained from the UCSC Genome Browser (or other source) by right-clicking and selecting “Copy Entered Sequence” or using the “Import” function to import a FASTA file. If the amplification primer sequences, amplicon length, and amplicon sequence are known, they can be entered into the window. 3. For the CpG assay, select the box to display the “Converted Sequence” to show the sequence after bisulfite conversion; it will be more T-rich (see Note 6). 4. Next, highlight the target region by clicking and dragging the mouse over the sequence. Right-click on the sequence and select “Target Region -> Set Target Region.”
Fig. 1 BCAS4 CpG islands shown in green in the UCSC Genome Browser
252 Kelly M. Elkins
Designing Pyrosequencing Primers
253
>hg38_dna range=chr20:50794228-50794759 5'pad=100 3'pad=100 strand=+ repeatMasking=none AGGGTCTATCTAGGCCGGCCTCCGAGGGCATGGAGGGAGTGGGTGCGGTT GTGAAATGTAGTGCGCTCAATAGTTTCCTGGTGAAGTTTATTTTAAAATC CGCACCGAAGAGGAAGACGAGGACCGTCACACTCGGCCTTCCCTAAATTC CAGGACCCTCCGCCCGATGCAAACTAGATGCTTTAGTAGGATGGGAACGG GTGGGGGGCGGGCGGCTTTGGGCTTCCTCTAAGCTAGCGCCTCTCTAACC CGGACGCCCGTTAGAATCACCCGGGGAGTTTTAGAAACTACCGATGCCCA AGCCCCACTCCGAAGGATTCCAACTTAATCGGCCTGGTGCGAGGCCTGGC TTCCGGGCTTTTAAAAGCTTCCCGGGGATTCTATTTTACGGCCGGGTCCG GGGCCGGGAGCCTGTACTCTACCGGGATTCCGATGGGGAGGGGTGGCTTG CCCCAATAGTTCTCAAATTTAGCTTTGGGTCAACAATCTGGTTGGAACCA CCTAACAAACATCAAAAGATCCTGATGCCCAG
Fig. 2 BCAS4 CpG island-containing sequence retrieved with the UCSC Genome Browser
Forward PCR Primer Reverse PCR Primer Pyrosequencing Primer
AGTGGGTGAGGTTGTGAAATGT CCCATCCTACTAAAACATCTAATT AGTTTAATAGTTTTTTGGTG
Fig. 3 The published, tested primers for BCAS4 are shown [10]. They vary slightly from the database sequence because they reflect the thymines that replaced the unmethylated cytosines following bisulfite treatment and PCR. Variations from the NCBI sequence are highlighted in bold
5. Select the blue “Play” button. The generated primers and assays are displayed in the box (see Note 7). The primer sets include the forward and reverse primers and the sequencing primer. The “Graphic View” displays the primer set binding (see Note 8). The sequencing primer position should be directly preceding the variable position (if it is known), which in this example is the CpG island region. During bisulfite conversion, unmethylated “C” bases will be converted to a “U” base and be converted to a “T” base in cDNA synthesis using PCR. The primer with “C” will bind the complementary base “G” at the methylated CpG site, and the primer with “A” will bind the complement “T” from the unmethylated sequence posttreatment (see Note 9). 6. A “Pyrosequencing Assay Design Analysis Report” can be generated, which contains additional assay information. The biotin tag is displayed as a circle on the primer to which it will be attached. Save this file as it contains the “Sequence to Analyze” (see Note 10). 7. Once the assay is selected, it is the completed PyroMark™ assay build for pyrosequencing (see Note 11). The pyrosequencing primer and PCR primers designed using the PyroMark Assay Design software version 2.0 for the BCAS4 example [10] are shown in Fig. 3. The locations in the NCBI sequence are shown in Fig. 4.
254
Kelly M. Elkins
Target NCBI GenBank Nucleotide Sequence (Accession AL031680.20) Forward PCR Primer Region Pyrosequencing Primer region 76221 GGAGGGAGTG GGTGCGGTTG TGAAATGTAG TGCGCTCAAT AGTTTAATGG TGAAGTTTAT CCTCCCTCAC CCACGCCAAC ACTTTACATC ACGCGAGTTA TCAAAGGACC ACTTCAAATA Pyrosequencing Region 76261 TTTAAAATCC GCACCGAAGA GGAAGACGAG GACCGTCACA CTCGGCCTTC CCTAAATTCC AAATTTTAGG CGTGGCTTCT CCTTCTGCTC CTGGCAGTGT GAGCCGGAAG GGATTTAAGG 76321 AGGACCCTCC GCCCGATGCA AACTAGATGC TTTAGTAGGA TGGGAACGGG TGGGGGGCGG TCCTGGGAGG CGGGCTACGT TTGATCTACG AAATCATCCT ACCCTTGCCC ACCCCCCGCC Reverse PCR Primer Region*
Fig. 4 Genome sequence for the NCBI GenBank Accession Number AL031680.20 for the BCAS4 locus region on human chromosome 20 with bold and boxed PCR primers and underlined pyrosequencing primer region upstream of the tDMR region. The pyrosequencing assay region is italicized, and the tDMRs are bold italicized
8. Alternatively, the pyrosequencing and PCR primers can be designed manually. As described above, the target amplicon must first be amplified with a set of PCR primers in which one the primers is biotinylated, so these must be designed. The position of the pyrosequencing primer should be directly preceding the variable position, or in this example, the CpG islands. Locate the region of the 50 (upper strand) directly before (upstream) the CpG region of interest. The pyrosequencing primer should anneal to a region upstream of the CpG loci of interest but within the region amplified by PCR primers and have a low primer dimer propensity with the PCR primers also [22–25]. Consider the sequence after treatment with bisulfite. Unmethylated “C” bases in the top strand will be converted to “T” bases and need an “A” base in the reverse primer to be complementary. Unmethylated “C” bases in the bottom strand will be converted to “T” bases and need a complementary “A” base in the forward primer. Design the PCR and pyrosequencing primer for each target. The primers should have a length of 18–30 bases and have a low propensity for primer dimer formation [22–25]. To aid in the analysis of these, the OligoAnalyzer™ Tool can be used. Upon pasting the primer sequencing into the OligoAnalyzer™ Tool Web-based software (https://www.idtdna.com/calc/analyzer), the user can select the “Analyze” button to obtain the primer length and other features, “Hairpin” to compute any hairpin formation and melt temperatures, “Self-Dimer” to compute primer dimer formation, and “Hetero-Dimer” to compute primer dimer formation with other primer sequences [22–25] (see Notes 12 and 13).
Designing Pyrosequencing Primers
255
>chr20:50794265+50794423 159bp AGTGGGTGCGGTTGTGAAATGT TCCCATCCTACTAAAGCATCTAGTT AGTGGGTGCGGTTGTGAAATGTagtgcgctcaatagtttcctggtgaagt ttattttaaaatccgcaccgaagaggaagacgaggaccgtcacactcggc cttccctaaattccaggaccctccgcccgatgcaAACTAGATGCTTTAGT AGGATGGGA
Fig. 5 Sample in silico PCR results using the BCAS4 primers 3.2 Evaluating Nonspecific Priming by NCBI BLAST Nucleotide
1. Primer specificity can be checked by pasting the primer sequences from Fig. 3 into the box in the NCBI BLASTn webpage (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PRO GRAM¼blastn&PAGE_TYPE¼BlastSearch&LINK_ LOC¼blasthome). There is an option to input the organism name, in this case, Homo sapiens for the Search Set. Use the default parameters for the other options. Select the “BLAST” button at the bottom to begin the computation. It may take a few minutes for the results to be displayed in the browser. 2. The results include descriptions and links to the aligned regions. The first two hits for the BCAS4 forward primer were for chromosome 20, the known location of BCAS4. There was also a 100% coverage hit on chromosome 15 for the third hit and hits with less coverage in other regions of the genome.
3.3 Performing In Silico PCR
1. Open the browser for the UCSC In silico PCR tool (https:// genome.ucsc.edu/cgi-bin/hgPcr). 2. Copy and paste the forward and reverse PCR primer sequences (in the 50 to 30 orientation) into the appropriately labeled boxes. Use the defaults, which is the same assembly (human genome, Dec. 2013 assembly) used to build the primers. Select “submit.” 3. The output indicates the amplicon will be 159 base pairs and amplifies the desired target on chromosome 20 (Fig. 5).
3.4 Obtaining Pyrosequencing Primer Reagents
1. Primers may be purchased commercially from a variety of manufacturers including Qiagen and Integrated DNA Technologies (IDT). The assay can be purchased as a PyroMark™ Custom Assay from Qiagen (see Notes 14 and 15). 2. Upon receiving the Qiagen custom primer sets, they must be reconstituted by adding 550μL TE4 to each (see Note 16). Two microliters (2μL) of each sequencing primer (4μM concentration) are used in each sequencing reaction. 3. The target loci must first be bisulfite-treated and amplified using the biotinylated PCR primer set. The PyroMark™ PCR Kit can be used for 25μL reactions. PCR amplification of the
256
Kelly M. Elkins
target region can be assessed by gel electrophoresis before pyrosequencing (see Note 17). The Qiagen tissue ID protocol can be used for preparing the sample(s) and performing the pyrosequencing [3, 26] (see Note 18). 4. For analysis after pyrosequencing, the percent methylation for each CpG site is automatically calculated by the PyroMark® Q24 software version 2.0.6 and is displayed as a pyrogram.
4
Notes 1. These links were active at the time of this writing. 2. A free user account is required for use. 3. Toggling the CpG display shows the CpG islands that are otherwise not visible in the masked display. 4. It is recommended to add more bases upstream and downstream of the CpG islands to have more flexibility in designing the primers. 5. Download the sequence locally by copying and pasting it into a Note or Word Document window and save. The nitrogenous bases found in DNA are designated using IUPAC codes A, T, G, C, N, R, and Y, referring to adenine, thymine, guanine, cytosine (A/T/G/C), purine (G/A), and pyrimidine (T/C), respectively. Avoid regions in which the base is not defined (e.g., N, R for purine, and Y for pyrimidine) for primer design for best performance. 6. The software converts the unmethylated cytosine (C) bases to thymine (T) and colors them red and highlights the CpG site methylated cytosines in bold. 7. The software generates over 100 different assays in ranked order on a 100 point scale. The primers are scored on several criteria, including the potential for mispriming, primer length, and propensity for primer dimer formation. In general, the higher the primers’ score, the better they should perform in the PCR and sequencing reactions. A primer set with a score of 70 or above typically performs well. 8. Do not select a primer set for use if the primers bind over a CpG site as it may be methylated or unmethylated. If a CpG site must be included, the primers can be produced to have both potential variants, such as a “C” and an “A” at that position. 9. Bisulfite conversion is harsh and will fragment DNA. Keeping the PCR amplicon size small (