Genetic Mapping and Marker Assisted Selection: Basics, Practice and Benefits 9811529485, 9789811529481

The first edition of this book, Genetic Mapping and Marker Assisted Selection: Basics, Practice and Benefits, was widely

138 73 6MB

English Pages 521 [516] Year 2020

Table of contents :
Preface (Second Edition)
Contents
About the Author
1: Genetic Mapping and Marker-Assisted Selection: Setting the Background
1.1 Setting the Background
1.2 Trends in Agricultural Product Demand: Functional Foods and Value-Added Products
1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs
1.3.1 Achievements
1.3.2 Conventional Plant Breeding: Obstacles and Challenges
1.3.3 Germplasm Exchange: International Laws and Governance
1.3.4 Biotech Crops and Biosafety Issues
1.3.4.1 Evolving New Cultivars Using Tools of Biotechnology
1.3.4.2 Plant Tissue Culture-Based Approaches
1.3.4.3 Genetic Engineering and Transgenic Plants
1.3.4.4 Biosafety and Other Issues Related to Transgenic Crops
1.4 Alternative Approach: MAS
1.5 Scope of Genetic Mapping and Marker-Assisted Selection
1.6 Need for This Book: What Can Be Expected?
Critical Thinking Questions
Bibliography
Literature Cited
Additional Readings
2: Germplasm Characterization: Utilizing the Underexploited Resources
2.1 Types of Plant Germplasm: Natural Versus Man-Made
2.1.1 Conservation of Naturally Prevailing Plant Germplasm
2.1.1.1 In Situ Conservation
2.1.1.2 Ex Situ Conservation
2.1.2 Conservation of Man-Made Plant Germplasm
2.1.2.1 Crop-Specific Germplasm Repository
2.1.2.2 Mutagenized Population
2.1.2.3 Global Germplasm Resources
2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic Characters
2.2.1 Conventional Methods of Phenotyping: Biotic and Abiotic Stress Resistance, Yield, and Quality Traits
2.2.2 Recent Developments in Phenomics and Way Forward
2.2.3 Case Study in Rice Germplasm Characterization for Drought Resistance: Formation of the Fundamental Requirements
2.2.4 Traits Useful for Germplasm Characterization in Rice
2.3 Allele Mining
2.3.1 Allele Mining: Basic Considerations
2.3.2 Insertional Mutagenesis
2.3.3 Genome Editing Tools and Induced Variations
2.3.4 TILLING, EcoTILLING, and Self-EcoTILLING
2.3.5 Mutant-Assisted Gene Identification and Characterization (MAGIC)
2.3.6 Allele Mining: Challenges and Troubleshooting Perspectives
2.4 Genetic Diversity and Clustering
2.4.1 Software for Genetic Diversity Analysis
2.4.2 Principle Behind the Genetic Diversity Analysis
2.4.3 Principle of Measuring Goodness-of-Fit of a Classification
2.5 Issues in Genetic Diversity Analysis Using Molecular Markers
2.5.1 Co-dominant Markers and Similarity Measures
2.5.2 Dominant Markers and Similarity Measures
2.6 Diversity and Phylogenetic Tree: Importance in Mapping Population Development
2.7 DNA Barcoding and Its Utilization in Germplasm Exploitation
2.8 Parental Selection
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
3: Mapping Population Development
3.1 Mapping Population and Its Importance in Genetic Mapping
3.2 Selfing and Crossing Techniques in Crop Plants
3.3 F2 Progenies
3.4 F2-Derived F3 (F2:3) Populations
3.5 F2 Intermating Populations or Immortalized F2 Populations
3.6 DH Lines
3.7 BC Progenies
3.8 Introgression Library (IL) Progenies
3.9 Recombinant Inbred Lines (RILs)
3.10 NILs, Exotic Libraries, and Advanced Backcross Populations
3.11 Backcross Inbred Lines
3.12 Advanced Intercross Lines
3.13 Recurrent Selection Backcross Population
3.14 Chromosomal Segment Substitution Lines
3.15 Interconnected Mapping Populations
3.16 Four-Way Cross Populations
3.17 Multi-cross Populations
3.17.1 Nested Association Mapping Populations
3.17.2 Multiparent Advanced Generation Intercross (MAGIC) Populations: MAGIC, MAGIC Plus, and MAGIC Global
3.18 Mapping Populations for Self- and Cross-Pollinated Species
3.19 Mapping Populations for Autopolyploid Species
3.20 Natural Populations
3.21 Chromosome-Specific Genetic Stocks for Linkage Mapping
3.22 Populations for Bulk Segregant Analysis
3.23 MutMap Populations
3.24 Combining Markers and Populations
3.25 Characterization of Mapping Populations
3.26 Choice of Mapping Populations
3.27 Challenges in Mapping Population Development and Solutions to These Challenges
Critically Thinking Questions
Bibliography
Literature Cited
Further Reading
4: Genotyping of Mapping Population
4.1 Markers and Its Importance
4.2 Morphological Markers
4.3 Biochemical Markers or Isozymes
4.4 Genome Structure and Organization
4.5 Classification of Molecular Markers: Classical and Updated Types
4.6 Hybridization-Based Markers
4.6.1 Restriction Fragment Length Polymorphism (RFLP)
4.6.2 Diversity Array Technology (DArT)
4.6.3 Single Feature Polymorphism (SFP)
4.6.4 Other Types of Microarray-Based Molecular Markers (Tagged Microarray Assay)
4.7 Arbitrarily Primed PCR-Based Markers
4.7.1 Random Amplified Polymorphic DNA (RAPD)
4.7.2 Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) and DNA Amplification Fingerprinting (DAF)
4.7.3 Amplified Fragment Length Polymorphism (AFLP)
4.8 Sequence-Specific PCR-Based Markers
4.9 Microsatellite-Based Marker Technique
4.9.1 Random Amplified Microsatellite Polymorphism (RAMP)
4.9.2 Inter-simple Sequence Repeats (ISSR)
4.10 SNPs: Novel Methods to Detect, Genotyping Procedures, and Their Utility in Genetic Mapping and MAS
4.10.1 SNP Identification Methods
4.10.2 Genotyping by Sequencing and SNPs
4.10.3 Challenges in SNP Markers
4.11 Sequence Characterized Amplified Regions (SCAR)
4.12 Cleaved Amplified Polymorphic Sequences (CAPS) and dCAPS
4.13 Randomly Amplified Microsatellite Polymorphisms (RAMP)
4.14 Sequence-Related Amplified Polymorphism (SRAP)
4.15 Target Region Amplification Polymorphism (TRAP)
4.16 Start Codon-Targeted Polymorphism (SCoT)
4.17 CAAT Box-Derived Polymorphism (CBDP)
4.18 Conserved DNA-Derived Polymorphism
4.19 Conserved Region Amplification Polymorphism (CoRAP)
4.20 Intron-Targeting Polymorphism (ITP)
4.21 Single-Strand Conformation Polymorphism (SSCP)
4.22 Transposable Elements (TE)-Based Molecular Markers
4.22.1 Retrotransposon-Based Molecular Markers
4.22.1.1 Inter-retrotransposon Amplified Polymorphism (IRAP) and Retrotransposon–Microsatellite Amplified Polymorphism (REMAP)
4.22.1.2 Sequence-Specific Amplification Polymorphism (S-SAP)
4.22.1.3 Retrotransposon-Based Insertion Polymorphism (RBIP)
4.22.1.4 Transposable Display (TD)
4.22.1.5 Inter-MITE Polymorphism (IMP)
4.23 Intron-Targeted Intron–Exon Splice Conjunction (IT-ISJ) Marker
4.24 Restriction Site-Associated DNA (RAD) Markers
4.25 RNA-Based Molecular Markers
4.25.1 cDNA–AFLP
4.25.2 RNA Fingerprinting by Arbitrarily Primed PCR (RAP–PCR)
4.25.3 cDNA–SSCP
4.26 Role of “Omics” in Molecular Marker Development
4.27 Selection of Marker Technology
4.28 Marker Genotyping and Scoring
4.29 Analyzing the Genotype Score: Chi-Square Test
4.30 χ2 Test to Analyze the Segregation Ratio Using the Program AntMap
4.31 Other Applications of Molecular Markers
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
5: Linkage Map Construction
5.1 Introduction to Genome Maps: Linkage, Cytogenetic, and Physical Maps
5.2 Basics of Genetic/Linkage Mapping: Mendelian Ratios, Meiosis, Crossing-Over, and Partial Linkage
5.3 Mapping Function and Genetic Distance Calculation: Methods and Procedures
5.4 Mapping of Genetic Markers: Genetic Consideration, General Procedure, and Validation
5.5 Testing of Linkage: LOD Scores, Threshold, Comparison, and Confirmation
5.6 Fine-Tuning the Linkage: Grouping, Ordering, and Spacing
5.7 Methods to Detect and Avoid Sources of Error
5.8 Chromosomal Assignment
5.9 Allopolyploidy and Autopolyploidy
5.10 Bridging Linkage Maps to Develop Unified Linkage Maps
5.11 High-Resolution Mapping and Complete Map
5.12 Comparative Mapping
5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic Considerations
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
6: Phenotyping
6.1 Phenomics
6.2 Forward and Reverse Phenomics
6.3 Advances in Phenomics
6.4 Phenotyping Versus QTL Mapping
6.5 Need for Precise Phenotyping
6.6 Phenotyping for Biotic Stress
6.6.1 Explaining the Concept with Case Studies
6.7 Phenotyping for Abiotic Stress
6.7.1 Explaining the Concept with Case Studies
6.8 Heritability of Phenotypes
6.9 Statistical Analysis of Phenotypic Data
6.9.1 Simple Statistics
6.9.2 Heritability Estimation
6.9.3 Correlation Analysis
6.10 Phenome-Wide Analysis in This Genomics Era: PheWAS Versus GWAS
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
7: QTL Analysis
7.1 QTL: A Prelude
7.2 History of QTL Analysis
7.3 QTL: Methods and Types
7.4 Single-Marker Analysis (SMA): Principle, Methods, and Procedure
7.5 Interval Mapping: Principle, Methods, and Procedure
7.6 Multiple QTLs: Principle, Methods, and Procedure
7.6.1 Composite Interval Mapping
7.6.1.1 Multiple Trait Mapping
7.6.1.2 Testing for Linked QTL Versus Pleiotropic QTL
7.6.2 Multiple Interval Mapping (MIM) or Multiple QTL Mapping
7.7 Statistical Significance
7.8 Permutation Testing
7.9 Bootstrapping
7.10 Permutation Versus Bootstrapping and Other Methods
7.11 QTL x QTL Interaction: Impact of Epistasis
7.12 QTL x Environment Interaction
7.13 Congruence of QTL: Across the Environments and Across the Genetic Backgrounds Are the Key in MAS
7.14 Meta-QTL Analysis
7.15 Concluding Remarks on QTL Methods
7.16 Alternatives in Classical QTL Mapping: Understanding and Practicing Different Strategies
7.16.1 Bulked Segregant Analysis and Selective Genotyping: Basics, Genetic Considerations, and Procedures
7.16.2 Genomics-Assisted Breeding: Basics, Genetic Considerations, and Procedures
7.16.2.1 Array Mapping: Basics, Genetic Considerations, and Procedures
7.16.2.2 Association Mapping: Basics, Genetic Considerations, and Procedures
7.16.2.3 Genome-Wide Association Study (GWAS)
7.16.2.4 Nested Association Mapping: Basics, Genetic Considerations, and Procedures
7.16.2.5 EcoTILLING: Basics, Genetic Considerations, and Procedures
7.17 Challenges and Troubleshooting in QTL Mapping
7.17.1 Confronts with Mapping Populations
7.17.2 Markers and Its Implications
7.17.3 Segregation Distortion
7.17.4 Phenotyping
7.17.5 Statistical Issues
7.17.6 Practical Utility
7.18 Way Forward to Incorporate QTL Studies into Regular Crop Breeding Program
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
Key References for QTL Mapping Software (Box 7.1)
8: Fine Mapping
8.1 Need for Fine Mapping or High-Resolution Mapping
8.2 Types of Molecular Markers Suitable for Fine Mapping
8.3 Conversion of Identified Marker into Breeder-Friendly Marker
8.4 Physical Mapping and Its Role in Fine Mapping
8.5 Comparative Mapping
8.6 Genetical Genomics/eQTL Mapping
8.7 Map-Based Cloning
8.7.1 Map-Based Cloning: Explaining with a Case Study
8.8 Validation of QTLs
8.9 Testing the Markers in Related Germplasm Accessions
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
9: Marker-Assisted Selection (MAS)
9.1 Prelude on MAS
9.2 Why Should We Use MAS in Plant Breeding Program?
9.3 What Prevents the Use of MAS in Routine Breeding Program?
9.4 Prerequisites for an Efficient Marker-Assisted Selection Program
9.4.1 High-Throughput DNA Extraction
9.4.2 Marker Technology and Its Genetic Features
9.4.3 Foreground Selection
9.4.4 Background Selection
9.4.5 Genetic Maps
9.4.6 Selection of QTLs for MAS
9.4.7 Knowledge of Associations and Validation Between Molecular Markers and Trait of Interest
9.4.8 Efficient Data Management System
9.5 Procedure for a Generalized MAS Program for Selection
9.6 Single Large-Scale MAS: Principles and Procedures
9.7 Pedigree MAS: Principles and Procedures
9.8 Marker-Assisted Backcross Breeding (MABB): Principles and Procedures
9.9 Gene Pyramiding or Stacking: Principles and Procedures
9.10 Accelerated Methods of Gene Pyramiding: Principles and Procedures
9.11 Marker-Assisted Recurrent Selection (MARS): Principles and Procedures
9.12 Advanced Backcross (AB)-QTL Analysis: Principles and Procedures
9.13 Mapping as You Go (MAYG): Principles and Procedures
9.14 Breeding by Design: Principles and Procedures
9.15 Combined MAS: Principles and Applications
9.16 Application of MAS in Germplasm Storage, Evaluation, and Use
9.17 Resources for MAS on the Web
9.18 Final Considerations and Future Perspectives
Bibliography
Literature Cited
Further Reading
10: Success Stories in MAS
10.1 Status of MAS in Current Plant Breeding Programs
10.2 Varieties Released Through MAS
10.3 Hybrids Developed Through MAS
10.4 MAS in Major Crops: Tomato
10.5 MAS in Major Crops: Maize
10.6 MAS in Major Crops: Wheat
10.7 MAS in Major Crops: Rice
10.8 MAS in Major Crops: Barley
10.9 MAS in Major Crops: Cotton
10.10 MAS in Major Crops: Soybean
10.11 MAS in Multinational Companies
10.12 Contrasting Stories
10.13 Conclusions and Future Prospects
Critical Thinking Questions
Bibliography
Literature Cited
Further Readings
11: Toward Genetically Improved Crop Plants: Roles of ‘Omics in MAS
11.1 Leveraging ‘Omics and Other Molecular Breeding Platforms
11.2 Comparisons of Techniques in Molecular, Biochemical, and Physiological Studies and Its Integration into MAS
11.3 Prelude on Molecular Techniques
11.4 Expression Profiling
11.5 cDNA Library Construction
11.6 Differential Display and Representational Difference Analysis
11.7 Subtractive Hybridization
11.7.1 Preparation of Driver and Tester
11.7.2 Hybridization
11.7.3 Subtraction
11.7.4 Isolation of Target Sequences
11.8 Microarray
11.8.1 Types of DNA Chips and Their Production
11.8.2 Oligonucleotide-Based Chips
11.8.3 DNA-Based Chips or cDNA Arrays
11.8.4 Hybridization and Detection Methods
11.8.5 DNA Sequencing by Hybridization
11.8.6 Single Nucleotide Polymorphisms and Point Mutations
11.8.7 Functional Genomics
11.8.8 Reverse Genetics
11.8.9 Diagnostics and Genetic Mapping
11.8.10 Genomic Mismatch Scanning
11.8.11 DNA Chips and Agriculture
11.8.12 Proteomics
11.9 Nucleic Acid Sequencing
11.9.1 Second-Generation DNA Sequencing
11.9.2 454 Pyrosequencing
11.9.3 Illumina Genome Analyzer
11.9.4 AB SOLiD
11.9.4.1 HeliScope
11.9.5 Microchip-Based Electrophoretic Sequencing
11.9.6 Sequencing by Hybridization
11.9.7 Sequencing in Real Time
11.9.8 Targeted Capture of Genomic Subsets
11.9.9 Handling and Storage of Sequence Information
11.9.10 Predicting Function from Sequence
11.9.11 Homology Searches
11.9.12 Other Sequence Comparison Strategies
11.10 Serial Analysis of Gene Expression (SAGE)
11.11 cDNA-AFLP
11.11.1 Applications
11.12 RFLP-Coupled Domain-Directed Differential Display (RC4D)
11.13 Gene Tagging by Insertional Mutagenesis
11.13.1 T-DNA Tag
11.13.2 Transposon Tags
11.14 Posttranscriptional Gene Silencing
11.15 MicroRNAs
11.16 Biochemical Techniques
11.17 Proteomics
11.17.1 Why Proteomics?
11.17.2 Types of Proteomics
11.17.2.1 Protein Expression Proteomics
11.17.2.2 Structural Proteomics
11.17.2.3 Functional Proteomics
11.17.2.4 Protein Analysis
11.17.2.5 One- and Two-Dimensional Gel Electrophoresis
11.17.2.6 Alternatives to Electrophoresis in Proteomics
11.17.3 Acquisition of Protein Structure Information
11.17.3.1 Edman Sequencing
11.17.3.2 Mass Spectrometry
11.17.4 Types of Mass Spectrometers
11.17.4.1 Peptide Fragmentation
11.17.4.2 De Novo Peptide Sequence Information
11.17.4.3 Uninterrupted MS/MS Data Searching
11.17.4.4 Proteomics Approach to Protein Phosphorylation
11.17.4.5 Phosphoprotein Enrichment
11.17.4.6 Phosphorylation Site Determination by Edman Degradation
11.17.4.7 Phosphorylation Site Determination by Mass Spectrometry
11.17.4.8 Metabolite Profiling Technologies
11.17.4.9 Physiological Techniques
11.17.4.10 Near-Infrared (NIR) Spectroscopy
11.17.4.11 Canopy Spectral Reflectance (SR) and Infrared Thermography (IRT)
11.17.4.12 Estimation of Compatible Solutes
11.18 Genomics-Assisted Breeding
11.19 Functional Markers
11.20 Comparative Genomics
11.21 Identification of Novel Molecular Networks and Construction of New Metabolic Pathway
11.22 Bioinformatics for MAS
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading
12: Forthcoming Perspectives in MAS
12.1 Prelude on Future of MAS
12.2 MAS in Orphan Crops
12.3 MAS in Horticultural and Forestry Crops
12.3.1 MAS in Vegetables
12.3.2 MAS in Fruit Crops
12.3.3 MAS in Ornamental Crops
12.3.4 MAS in Medicinal and Aromatic Crops
12.3.5 MAS in Landscaping Plants
12.3.6 Forestry Crops
12.4 MAS in Developing Countries
12.5 Community Efforts in Developing Countries and Their Implications in MAS
12.6 Field and Laboratory Infrastructure Improvement
12.7 Genetic Mapping and MAS: Lessons Learned and Concluding Remarks
Critical Thinking Questions
Bibliography
Literature Cited
Further Reading

Recommend Papers

Animal-Assisted Therapy with Dogs: Basics, Animal Ethics and Practice of Therapeutic Work 3662679647, 9783662679647

Animal-assisted therapy has been the talk of the town for several years and is increasingly being integrated into the ps

106 44 6MB Read more

Genetic Programming Theory And Practice II

407 96 6MB Read more

Genetic Programming Theory And Practice Ii

441 108 6MB Read more

Genetic Programming Theory and Practice II

421 114 3MB Read more

Genetic Programming Theory and Practice V 3540404643

439 57 3MB Read more

Mind Mapping Secrets--FreeMind Basics 9781497728981

Learn the basics of how to use FreeMind - a free, open source software that can make your mind mapping neat and consiste

192 61 1MB Read more

Genetic Programming Theory and Practice XVIII (Genetic and Evolutionary Computation) 9811681120, 9789811681127

This book, written by the foremost international researchers and practitioners of genetic programming (GP), explores the

108 41 9MB Read more

Genetic Programming Theory and Practice XVII (Genetic and Evolutionary Computation) 3030399575, 9783030399573

These contributions, written by the foremost international researchers and practitioners of Genetic Programming (GP), ex

105 77 11MB Read more

Genetic Programming Theory and Practice XX (Genetic and Evolutionary Computation) [1st ed. 2024] 9819984122, 9789819984121

Genetic Programming Theory and Practice brings together some of the most impactful researchers in the field of Genetic P

104 48 Read more

Genetic Programming Theory and Practice XX (Genetic and Evolutionary Computation) [1st ed. 2024] 9819984122, 9789819984121

Genetic Programming Theory and Practice brings together some of the most impactful researchers in the field of Genetic P

112 49 Read more

Genetic Mapping and Marker Assisted Selection: Basics, Practice and Benefits
9811529485, 9789811529481

Author / Uploaded
N. Manikanda Boopathi

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

N. Manikanda Boopathi

Genetic Mapping and Marker Assisted Selection Basics, Practice and Benefits Second Edition

Genetic Mapping and Marker Assisted Selection

N. Manikanda Boopathi

Genetic Mapping and Marker Assisted Selection Basics, Practice and Benefits Second Edition

N. Manikanda Boopathi Plant Biotechnology Tamil Nadu Agricultural University Coimbatore, Tamil Nadu, India

ISBN 978-981-15-2948-1 ISBN 978-981-15-2949-8 (eBook) https://doi.org/10.1007/978-981-15-2949-8 © Springer Nature Singapore Pte Ltd. 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd. The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721, Singapore

Preface (Second Edition)

Genetic improvement of available crop cultivars still plays a key role in plant breeding despite the continuous change in climatic conditions. Recent advances in marker technologies have shown to make the above hard task more accessible and achievable. However, researchers should carefully consider their breeding objectives and budgets before embracing the next-generation breeding strategies such as genetic mapping and marker-assisted selection. They should also weigh the relative strengths and weaknesses of these recent methods against the conventional approaches, which have shown their worth in several occasions during the past century. Hence, the final decision on designing an appropriate strategy to introduce desirable genetic variation in an elite cultivar of the given crop should consider several factors such as breeding objective, accessibility to the marker class, phenomics infrastructure, time available to release the variety, and, more importantly, financial plan. This book focuses on the above points and would help researchers to take informative and right decision by inculcating sufficient power to completely resolve virtually any question while making the decision. With emotionally melted heart, I tenderly dedicate this book to my son, Sri Ezhilalan Boopathi, and my wife, Mrs. Sri Devi, who have sacrificed all their quality 1½-year time during the revision of the second edition. Coimbatore, India 30 November 2019

N. Manikanda Boopathi

v

Contents

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background�� 1 1.1 Setting the Background �� 1 1.2 Trends in Agricultural Product Demand: Functional Foods and Value-Added Products�� 2 1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs�� 3 1.3.1 Achievements�� 9 1.3.2 Conventional Plant Breeding: Obstacles and Challenges�� 10 1.3.3 Germplasm Exchange: International Laws and Governance �� 11 1.3.4 Biotech Crops and Biosafety Issues�� 12 1.4 Alternative Approach: MAS�� 17 1.5 Scope of Genetic Mapping and Marker-Assisted Selection�� 17 1.6 Need for This Book: What Can Be Expected?�� 19 Critical Thinking Questions �� 19 Bibliography�� 20 2 Germplasm Characterization: Utilizing the Underexploited Resources�� 21 2.1 Types of Plant Germplasm: Natural Versus Man-Made�� 23 2.1.1 Conservation of Naturally Prevailing Plant Germplasm�� 24 2.1.2 Conservation of Man-Made Plant Germplasm�� 26 2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic Characters�� 29 2.2.1 Conventional Methods of Phenotyping: Biotic and Abiotic Stress Resistance, Yield, and Quality Traits�� 29 2.2.2 Recent Developments in Phenomics and Way Forward�� 29

vii

viii

Contents

2.2.3 Case Study in Rice Germplasm Characterization for Drought Resistance: Formation of the Fundamental Requirements�� 31 2.2.4 Traits Useful for Germplasm Characterization in Rice�� 32 2.3 Allele Mining�� 35 2.3.1 Allele Mining: Basic Considerations�� 35 2.3.2 Insertional Mutagenesis�� 36 2.3.3 Genome Editing Tools and Induced Variations �� 39 2.3.4 TILLING, EcoTILLING, and Self-EcoTILLING�� 40 2.3.5 Mutant-Assisted Gene Identification and Characterization (MAGIC) �� 41 2.3.6 Allele Mining: Challenges and Troubleshooting Perspectives �� 43 2.4 Genetic Diversity and Clustering�� 45 2.4.1 Software for Genetic Diversity Analysis �� 50 2.4.2 Principle Behind the Genetic Diversity Analysis�� 53 2.4.3 Principle of Measuring Goodness-of-Fit of a Classification�� 55 2.5 Issues in Genetic Diversity Analysis Using Molecular Markers �� 61 2.5.1 Co-dominant Markers and Similarity Measures�� 63 2.5.2 Dominant Markers and Similarity Measures�� 63 2.6 Diversity and Phylogenetic Tree: Importance in Mapping Population Development �� 64 2.7 DNA Barcoding and Its Utilization in Germplasm Exploitation �� 65 2.8 Parental Selection�� 66 Critical Thinking Questions �� 66 Bibliography�� 67 3 Mapping Population Development �� 69 3.1 Mapping Population and Its Importance in Genetic Mapping�� 70 3.2 Selfing and Crossing Techniques in Crop Plants�� 72 3.3 F2 Progenies�� 76 3.4 F2-Derived F3 (F2:3) Populations�� 78 3.5 F2 Intermating Populations or Immortalized F2 Populations�� 78 3.6 DH Lines �� 79 3.7 BC Progenies�� 79 3.8 Introgression Library (IL) Progenies�� 80 3.9 Recombinant Inbred Lines (RILs)�� 81 3.10 NILs, Exotic Libraries, and Advanced Backcross Populations�� 82 3.11 Backcross Inbred Lines �� 83 3.12 Advanced Intercross Lines�� 83 3.13 Recurrent Selection Backcross Population �� 84 3.14 Chromosomal Segment Substitution Lines�� 84

Contents

ix

3.15 Interconnected Mapping Populations�� 85 3.16 Four-Way Cross Populations�� 85 3.17 Multi-cross Populations�� 86 3.17.1 Nested Association Mapping Populations �� 87 3.17.2 Multiparent Advanced Generation Intercross (MAGIC) Populations: MAGIC, MAGIC Plus, and MAGIC Global �� 90 3.18 Mapping Populations for Self- and Cross-Pollinated Species�� 95 3.19 Mapping Populations for Autopolyploid Species �� 95 3.20 Natural Populations�� 95 3.21 Chromosome-Specific Genetic Stocks for Linkage Mapping�� 96 3.22 Populations for Bulk Segregant Analysis�� 97 3.23 MutMap Populations�� 99 3.24 Combining Markers and Populations�� 100 3.25 Characterization of Mapping Populations�� 101 3.26 Choice of Mapping Populations�� 102 3.27 Challenges in Mapping Population Development and Solutions to These Challenges �� 102 Critically Thinking Questions�� 105 Bibliography�� 105 4 Genotyping of Mapping Population �� 107 4.1 Markers and Its Importance�� 108 4.2 Morphological Markers�� 109 4.3 Biochemical Markers or Isozymes�� 110 4.4 Genome Structure and Organization �� 114 4.5 Classification of Molecular Markers: Classical and Updated Types�� 118 4.6 Hybridization-Based Markers �� 126 4.6.1 Restriction Fragment Length Polymorphism (RFLP)�� 126 4.6.2 Diversity Array Technology (DArT) �� 128 4.6.3 Single Feature Polymorphism (SFP) �� 131 4.6.4 Other Types of Microarray-Based Molecular Markers (Tagged Microarray Assay) �� 132 4.7 Arbitrarily Primed PCR-Based Markers �� 133 4.7.1 Random Amplified Polymorphic DNA (RAPD) �� 133 4.7.2 Arbitrarily Primed Polymerase Chain Reaction (AP-PCR) and DNA Amplification Fingerprinting (DAF)�� 135 4.7.3 Amplified Fragment Length Polymorphism (AFLP)�� 135 4.8 Sequence-Specific PCR-Based Markers �� 135 4.9 Microsatellite-Based Marker Technique �� 137 4.9.1 Random Amplified Microsatellite Polymorphism (RAMP) �� 140 4.9.2 Inter-simple Sequence Repeats (ISSR)�� 141

x

Contents

4.10 SNPs: Novel Methods to Detect, Genotyping Procedures, and Their Utility in Genetic Mapping and MAS�� 142 4.10.1 SNP Identification Methods�� 143 4.10.2 Genotyping by Sequencing and SNPs�� 145 4.10.3 Challenges in SNP Markers�� 147 4.11 Sequence Characterized Amplified Regions (SCAR) �� 149 4.12 Cleaved Amplified Polymorphic Sequences (CAPS) and dCAPS�� 150 4.13 Randomly Amplified Microsatellite Polymorphisms (RAMP)�� 151 4.14 Sequence-Related Amplified Polymorphism (SRAP)�� 151 4.15 Target Region Amplification Polymorphism (TRAP)�� 152 4.16 Start Codon-Targeted Polymorphism (SCoT)�� 152 4.17 CAAT Box-Derived Polymorphism (CBDP)�� 153 4.18 Conserved DNA-Derived Polymorphism�� 153 4.19 Conserved Region Amplification Polymorphism (CoRAP)�� 154 4.20 Intron-Targeting Polymorphism (ITP)�� 154 4.21 Single-Strand Conformation Polymorphism (SSCP)�� 155 4.22 Transposable Elements (TE)-Based Molecular Markers�� 156 4.22.1 Retrotransposon-Based Molecular Markers�� 156 4.23 Intron-Targeted Intron–Exon Splice Conjunction (IT-ISJ) Marker �� 159 4.24 Restriction Site-Associated DNA (RAD) Markers �� 160 4.25 RNA-Based Molecular Markers�� 160 4.25.1 cDNA–AFLP �� 160 4.25.2 RNA Fingerprinting by Arbitrarily Primed PCR (RAP–PCR) �� 160 4.25.3 cDNA–SSCP�� 162 4.26 Role of “Omics” in Molecular Marker Development �� 162 4.27 Selection of Marker Technology �� 167 4.28 Marker Genotyping and Scoring�� 171 4.29 Analyzing the Genotype Score: Chi-Square Test�� 173 4.30 χ2 Test to Analyze the Segregation Ratio Using the Program AntMap �� 173 4.31 Other Applications of Molecular Markers�� 174 Critical Thinking Questions �� 175 Bibliography�� 176 5 Linkage Map Construction �� 179 5.1 Introduction to Genome Maps: Linkage, Cytogenetic, and Physical Maps�� 179 5.2 Basics of Genetic/Linkage Mapping: Mendelian Ratios, Meiosis, Crossing-Over, and Partial Linkage �� 181 5.3 Mapping Function and Genetic Distance Calculation: Methods and Procedures �� 190 5.4 Mapping of Genetic Markers: Genetic Consideration, General Procedure, and Validation�� 193

Contents

xi

5.5 Testing of Linkage: LOD Scores, Threshold, Comparison, and Confirmation�� 194 5.6 Fine-Tuning the Linkage: Grouping, Ordering, and Spacing �� 195 5.7 Methods to Detect and Avoid Sources of Error�� 198 5.8 Chromosomal Assignment�� 200 5.9 Allopolyploidy and Autopolyploidy �� 201 5.10 Bridging Linkage Maps to Develop Unified Linkage Maps �� 202 5.11 High-Resolution Mapping and Complete Map �� 204 5.12 Comparative Mapping�� 206 5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic Considerations �� 207 Critical Thinking Questions �� 226 Bibliography�� 227 6 Phenotyping�� 229 6.1 Phenomics �� 229 6.2 Forward and Reverse Phenomics�� 230 6.3 Advances in Phenomics�� 231 6.4 Phenotyping Versus QTL Mapping�� 236 6.5 Need for Precise Phenotyping�� 238 6.6 Phenotyping for Biotic Stress�� 240 6.6.1 Explaining the Concept with Case Studies�� 241 6.7 Phenotyping for Abiotic Stress�� 242 6.7.1 Explaining the Concept with Case Studies�� 243 6.8 Heritability of Phenotypes�� 246 6.9 Statistical Analysis of Phenotypic Data�� 248 6.9.1 Simple Statistics�� 248 6.9.2 Heritability Estimation�� 249 6.9.3 Correlation Analysis�� 249 6.10 Phenome-Wide Analysis in This Genomics Era: PheWAS Versus GWAS�� 250 Critical Thinking Questions �� 251 Bibliography�� 251 7 QTL Analysis�� 253 7.1 QTL: A Prelude �� 254 7.2 History of QTL Analysis�� 254 7.3 QTL: Methods and Types�� 254 7.4 Single-Marker Analysis (SMA): Principle, Methods, and Procedure�� 258 7.5 Interval Mapping: Principle, Methods, and Procedure �� 260 7.6 Multiple QTLs: Principle, Methods, and Procedure �� 266 7.6.1 Composite Interval Mapping �� 267 7.6.2 Multiple Interval Mapping (MIM) or Multiple QTL Mapping�� 268 7.7 Statistical Significance�� 270 7.8 Permutation Testing�� 270

xii

Contents

7.9 Bootstrapping�� 271 7.10 Permutation Versus Bootstrapping and Other Methods�� 271 7.11 QTL x QTL Interaction: Impact of Epistasis�� 272 7.12 QTL x Environment Interaction�� 273 7.13 Congruence of QTL: Across the Environments and Across the Genetic Backgrounds Are the Key in MAS�� 275 7.14 Meta-QTL Analysis�� 275 7.15 Concluding Remarks on QTL Methods�� 276 7.16 Alternatives in Classical QTL Mapping: Understanding and Practicing Different Strategies �� 277 7.16.1 Bulked Segregant Analysis and Selective Genotyping: Basics, Genetic Considerations, and Procedures�� 278 7.16.2 Genomics-Assisted Breeding: Basics, Genetic Considerations, and Procedures�� 280 7.17 Challenges and Troubleshooting in QTL Mapping�� 292 7.17.1 Confronts with Mapping Populations�� 293 7.17.2 Markers and Its Implications �� 295 7.17.3 Segregation Distortion �� 295 7.17.4 Phenotyping�� 296 7.17.5 Statistical Issues�� 298 7.17.6 Practical Utility�� 303 7.18 Way Forward to Incorporate QTL Studies into Regular Crop Breeding Program�� 304 Critical Thinking Questions �� 323 Bibliography�� 324 8 Fine Mapping �� 327 8.1 Need for Fine Mapping or High-Resolution Mapping�� 327 8.2 Types of Molecular Markers Suitable for Fine Mapping�� 329 8.3 Conversion of Identified Marker into Breeder-Friendly Marker �� 330 8.4 Physical Mapping and Its Role in Fine Mapping�� 331 8.5 Comparative Mapping�� 332 8.6 Genetical Genomics/eQTL Mapping�� 334 8.7 Map-Based Cloning�� 337 8.7.1 Map-Based Cloning: Explaining with a Case Study �� 338 8.8 Validation of QTLs�� 340 8.9 Testing the Markers in Related Germplasm Accessions �� 341 Critical Thinking Questions �� 342 Bibliography�� 342 9 Marker-Assisted Selection (MAS)�� 343 9.1 Prelude on MAS�� 344 9.2 Why Should We Use MAS in Plant Breeding Program?�� 346 9.3 What Prevents the Use of MAS in Routine Breeding Program? �� 348

Contents

xiii

9.4 Prerequisites for an Efficient Marker-Assisted Selection Program �� 350 9.4.1 High-Throughput DNA Extraction�� 351 9.4.2 Marker Technology and Its Genetic Features�� 351 9.4.3 Foreground Selection�� 352 9.4.4 Background Selection�� 354 9.4.5 Genetic Maps�� 356 9.4.6 Selection of QTLs for MAS�� 356 9.4.7 Knowledge of Associations and Validation Between Molecular Markers and Trait of Interest�� 356 9.4.8 Efficient Data Management System�� 357 9.5 Procedure for a Generalized MAS Program for Selection�� 357 9.6 Single Large-Scale MAS: Principles and Procedures�� 358 9.7 Pedigree MAS: Principles and Procedures �� 360 9.8 Marker-Assisted Backcross Breeding (MABB): Principles and Procedures�� 360 9.9 Gene Pyramiding or Stacking: Principles and Procedures�� 367 9.10 Accelerated Methods of Gene Pyramiding: Principles and Procedures�� 368 9.11 Marker-Assisted Recurrent Selection (MARS): Principles and Procedures�� 368 9.12 Advanced Backcross (AB)-QTL Analysis: Principles and Procedures�� 372 9.13 Mapping as You Go (MAYG): Principles and Procedures�� 374 9.14 Breeding by Design: Principles and Procedures �� 377 9.15 Combined MAS: Principles and Applications�� 380 9.16 Application of MAS in Germplasm Storage, Evaluation, and Use�� 381 9.17 Resources for MAS on the Web�� 382 9.18 Final Considerations and Future Perspectives�� 382 Bibliography�� 386 10 Success Stories in MAS�� 389 10.1 Status of MAS in Current Plant Breeding Programs�� 389 10.2 Varieties Released Through MAS�� 390 10.3 Hybrids Developed Through MAS �� 391 10.4 MAS in Major Crops: Tomato�� 392 10.5 MAS in Major Crops: Maize�� 393 10.6 MAS in Major Crops: Wheat�� 394 10.7 MAS in Major Crops: Rice �� 396 10.8 MAS in Major Crops: Barley�� 401 10.9 MAS in Major Crops: Cotton�� 402 10.10 MAS in Major Crops: Soybean�� 402 10.11 MAS in Multinational Companies�� 403 10.12 Contrasting Stories�� 403 10.13 Conclusions and Future Prospects�� 403

xiv

Contents

Critical Thinking Questions �� 404 Bibliography�� 404 11 Toward Genetically Improved Crop Plants: Roles of ‘Omics in MAS�� 409 11.1 Leveraging ‘Omics and Other Molecular Breeding Platforms �� 410 11.2 Comparisons of Techniques in Molecular, Biochemical, and Physiological Studies and Its Integration into MAS�� 411 11.3 Prelude on Molecular Techniques �� 412 11.4 Expression Profiling�� 412 11.5 cDNA Library Construction�� 413 11.6 Differential Display and Representational Difference Analysis�� 415 11.7 Subtractive Hybridization �� 416 11.7.1 Preparation of Driver and Tester�� 417 11.7.2 Hybridization�� 417 11.7.3 Subtraction�� 418 11.7.4 Isolation of Target Sequences�� 418 11.8 Microarray�� 419 11.8.1 Types of DNA Chips and Their Production�� 420 11.8.2 Oligonucleotide-Based Chips�� 420 11.8.3 DNA-Based Chips or cDNA Arrays�� 421 11.8.4 Hybridization and Detection Methods�� 421 11.8.5 DNA Sequencing by Hybridization�� 422 11.8.6 Single Nucleotide Polymorphisms and Point Mutations�� 423 11.8.7 Functional Genomics�� 423 11.8.8 Reverse Genetics �� 424 11.8.9 Diagnostics and Genetic Mapping�� 424 11.8.10 Genomic Mismatch Scanning�� 425 11.8.11 DNA Chips and Agriculture�� 425 11.8.12 Proteomics �� 426 11.9 Nucleic Acid Sequencing�� 426 11.9.1 Second-Generation DNA Sequencing �� 428 11.9.2 454 Pyrosequencing�� 428 11.9.3 Illumina Genome Analyzer�� 429 11.9.4 AB SOLiD �� 430 11.9.5 Microchip-Based Electrophoretic Sequencing�� 433 11.9.6 Sequencing by Hybridization�� 433 11.9.7 Sequencing in Real Time�� 434 11.9.8 Targeted Capture of Genomic Subsets�� 435 11.9.9 Handling and Storage of Sequence Information �� 436 11.9.10 Predicting Function from Sequence�� 438 11.9.11 Homology Searches�� 438 11.9.12 Other Sequence Comparison Strategies�� 439

Contents

xv

11.10 Serial Analysis of Gene Expression (SAGE)�� 440 11.11 cDNA-AFLP �� 443 11.11.1 Applications�� 445 11.12 RFLP-Coupled Domain-Directed Differential Display (RC4D)�� 446 11.13 Gene Tagging by Insertional Mutagenesis�� 447 11.13.1 T-DNA Tag�� 447 11.13.2 Transposon Tags�� 447 11.14 Posttranscriptional Gene Silencing �� 448 11.15 MicroRNAs �� 449 11.16 Biochemical Techniques �� 450 11.17 Proteomics�� 451 11.17.1 Why Proteomics?�� 452 11.17.2 Types of Proteomics�� 454 11.17.3 Acquisition of Protein Structure Information�� 457 11.17.4 Types of Mass Spectrometers�� 461 11.18 Genomics-Assisted Breeding�� 470 11.19 Functional Markers �� 471 11.20 Comparative Genomics �� 472 11.21 Identification of Novel Molecular Networks and Construction of New Metabolic Pathway�� 474 11.22 Bioinformatics for MAS �� 476 Critical Thinking Questions �� 478 Bibliography�� 479 12 Forthcoming Perspectives in MAS �� 481 12.1 Prelude on Future of MAS�� 481 12.2 MAS in Orphan Crops�� 484 12.3 MAS in Horticultural and Forestry Crops�� 487 12.3.1 MAS in Vegetables�� 488 12.3.2 MAS in Fruit Crops �� 492 12.3.3 MAS in Ornamental Crops�� 493 12.3.4 MAS in Medicinal and Aromatic Crops�� 493 12.3.5 MAS in Landscaping Plants�� 494 12.3.6 Forestry Crops �� 494 12.4 MAS in Developing Countries�� 495 12.5 Community Efforts in Developing Countries and Their Implications in MAS�� 497 12.6 Field and Laboratory Infrastructure Improvement�� 499 12.7 Genetic Mapping and MAS: Lessons Learned and Concluding Remarks�� 501 Critical Thinking Questions �� 502 Bibliography�� 503

About the Author

N. Manikanda Boopathi is presently working as Professor of Biotechnology in the Department of Plant Biotechnology, CPMB&B, Tamil Nadu Agricultural University (TNAU), Coimbatore, India. An agricultural graduate, he completed his master’s and doctoral studies in Plant Biotechnology at TNAU and has been trained at the International Rice Research Institute in the Philippines. He has handled more than 25 courses for UG and PG students in his university and offered invited lectures and demonstrations at several institutions both in India and abroad. His scientific works were recognized in several occasions and brought him laurels and awards. He has rich experience in QTL mapping and marker-assisted selection in rice, cotton, mung bean, tomato, and chillies. He had successfully completed several national and international research projects and is currently working in countrywide and worldwide network projects that address problems of biotic and abiotic stresses.

xvii

1

Genetic Mapping and Marker-Assisted Selection: Setting the Background

Contents 1.1 S etting the Background 1 1.2 T rends in Agricultural Product Demand: Functional Foods and Value-Added Products 2 1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs 3 1.3.1 Achievements 9 1.3.2 Conventional Plant Breeding: Obstacles and Challenges 10 1.3.3 Germplasm Exchange: International Laws and Governance 11 1.3.4 Biotech Crops and Biosafety Issues 12 1.4 Alternative Approach: MAS 17 1.5 Scope of Genetic Mapping and Marker-Assisted Selection 17 1.6 Need for This Book: What Can Be Expected? 19 Critical Thinking Questions 19 Bibliography 20

1.1

Setting the Background

Leading the life in this modern world demands nutritious and hygienic food basket in everyone’s table to reach and maintain appropriate weight, reduce the risk of chronic diseases, and promote overall healthy life. Diet and health are closely related. Thus, crops are now need to be enhanced with increased levels of important biologically active substances for improved nutrition, which will lead to increase body’s resistance to illnesses and to remove undesirable food components. On the other hand, increased climate variability, frequent extreme weather events, and new variants of pathogens and pests often threaten the sustainable food productivity across the seasons and even in the favorable environments. Therefore, efficient and rapid breeding strategies that evolve novel cultivars, which have

© Springer Nature Singapore Pte Ltd. 2020 N. M. Boopathi, Genetic Mapping and Marker Assisted Selection, https://doi.org/10.1007/978-981-15-2949-8_1

1

2

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

tolerance to climatic change and disease resistance combined with good agronomical traits, can potentially improve crop productivity besides meeting the future demands. Therefore, the major objective of today’s breeding program is designed with the strategic theme: agricultural and horticultural crop productions need not only to be enhanced under changing climatic conditions, but also it should be nutritious, to meet the elite demands of ever-growing global population.

1.2

rends in Agricultural Product Demand: Functional T Foods and Value-Added Products

Though mere increased productivity of major grains and vegetables is of importance to the general marginal farmers, the significant number of the informed and elite group of consumers prefers a variety of grains, fruits, and vegetables that have low caloric value but with different colors, healthy starches, good fats, rich fibers, lean proteins, and useful phytochemicals such as antioxidants and vitamins. The Gen-Next or next-generation people are very cautious about their choice of food, the preparations, value additions, and shelf life. Further, agricultural product demand in this new era also focuses on the food, which has the therapeutic value and several health benefits. Though it is not a new concept as Hippocrates, the father of medicine, clearly outlined his ideology some 2500 years ago, as “Let food be thy medicine and medicine be thy food.” However, this philosophy was found to be insignificant during the nineteenth century attributing to the advent of modern drug therapy. By means of returning to the healthy life during the last 25 years, the knowledge on physiologically active components in plant foods rapidly develops. In addition, increased size of health-conscious population, changes in food regulations, numerous technological advances and a marketplace that emerged for the introduction of health-promoting products, and a new trend in preferring a healthy food, referred as functional food, have emerged. All foods should be referred as functional to some extent since it provides taste, aroma, and nutritive value. However, in today’s scenario, functional foods are considered to be those whole, fortified, enriched, or enhanced foods that provide health benefits beyond the provision of essential nutrients (e.g., vitamins and minerals), when they are consumed at appropriate levels on a regular basis. Examples of functional foods include fruits and vegetables, whole grains, soy, milk, enhanced foods and beverages, and some dietary supplements. Of late, consumers habitually prefer to buy more “ready-to-eat” or “ready-to-cook” food; on the other hand, farmers generally produce and market raw agricultural commodities. Hence, it is now imperative to produce the final agricultural products that are amenable for processing into value-added products or ready-to-serve product. Value addition may include any one of the following: a change in the physical state or form of the product (such as milling wheat into flour or making strawberries into jam), the production of a product in a manner that enhances its value (such as organically produced products), or the physical segregation of an agricultural commodity or

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

3

product in a manner that results in the enhancement of the value of that commodity or product (such as enhanced shelf life without any synthetic preservatives). This means that a breeding program has to be precisely designed that meets the demand of today’s market needs such as suitable for organic cultivation and less input responsive and the final agricultural raw products that require minimum processing time and efforts during the value addition process.

1.3

volving New Crop Cultivars: Promises of Conventional E Plant Breeding Programs

As and when the requirement arises in generating new cultivars that meet the demands of farmers and end users, breeders usually work for more than one and a half decades on the given task. Particularly, genetic improvement of productivity and attaining greater stability have long been recognized as a major breeding objective in all the crops. It had been demonstrated successfully in several occasions by expediting different breeding steps such as (1) creation of genetic variation (identifying from the own germplasm collection or introducing from other countries or progenies derived from hybridization/cross-pollination that were made within and between the crop species or mutagenesis products), (2) selection within that genetic variation, and (3) evaluation of selected lines in different target population of environments. Plant introduction, a frequently used conventional breeding method, involves collecting and evaluating genotypes from different parts of, or outside, the country and identifying desirable genotype adapted to the regional environment with high productivity or with desired specific trait(s) such as biotic and/or abiotic stress resistances. Hence, the nature of the introduced material, generally from locations with similar soil characteristics and climatic conditions, obviously decides the success of this breeding strategy. Generally, it occurs through interchange of material with fellow plant breeders or probing of areas showing rich variation of the species or accessing generic resources from international institutes/organizations that have rich germplasm repository for the given crop. Therefore, the success of plant introduction is the agreement of two important aspects, viz., domestication and acclimatization. Domestication is the process of bringing of a wild species under cultivation by making them suitable for new environment, whereas acclimatization is the ability of a crop to become adapted to a new climatic and edaphic condition. Thus, plant introduction is a continuous process but obviously the cheapest and fastest way of developing novel cultivars. If the introduced material is a pure line, it will be instantly released for farmer’s adoption as it shows good adaptability. This is called as primary introduction, and examples include dwarf wheat varieties such as Sonora-64 and Lerma Rojo and dwarf rice varieties such as Taichung Native-1 and IR-8. But if they are mixtures (such as landraces) or segregating material, it will be used as a productive donor line with a specific desirable trait in the breeding program, and it is referred as secondary introduction. For example, the varieties such as Kalyan

4

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

Sona and Sonalika of wheat have been selected from material introduced from CIMMYT, Mexico. The additional breeding methods that have proved successful in self-pollinated species, other than plant introduction, are mass selection, pure line selection, hybridization (segregating generations handled by the pedigree method, the bulk method, or the backcross method), and development of hybrid varieties. In mass selection, seeds are collected from (usually a few dozen to a few hundred) desirable individuals in a population, and the next generation is sown from the stock of mixed seed. This procedure, sometimes referred to as phenotypic selection, is based on how each individual looks. Mass selection has been widely used to improve landraces (crop accessions that have been passed down from one generation of farmers to the next over long periods) and is common in horticultural crops. For example, a coriander variety, Arka Isha, was developed through mass selection from an exotic introduction from Japan, IIHR ACC. No.19528. It is a multi-cut type, bushy plants, broad leaves, with good aroma, good keeping quality (21 days under refrigeration), and rich in vitamin C. However, when there is a large collection of regional landraces or ecotypes available in the given environment, purification and evaluation of such accessions can be initiated more systematically by repeated selfing and replication trials across the different environments to purify them and select the line with desirable traits. This process is often referred as pure line selection, and several varieties are available in different crops that were developed using this strategy, and they are outlined in Table 1.1. Though the above two conventional breeding strategies have provided significant contribution in evolving new crop cultivars, breeders found that the existing natural variation exhausted, and hence often they end up with little success in biotic and abiotic stress resistance improvement in crop plants using these two methods. Hybridization, an alternative strategy of the above two methods, has been introduced as novel breeding method with the main focus of combining desirable traits from two or more parents into a single cultivar. Appropriate selection of the parents is the key to attain success in hybridization program. Briefly, in a hybridization program, if the aim is to replace the existing variety with a superior one, logically the existing variety with adaptation to the local environment should be selected as one parent, and a line that supplements the first parent with the desirable trait should be the second parent. On the other hand, if the objective is construction of variation for the desired traits or broadening of the genetic base, then diverse parents should be chosen. In other words, parents are selected from different germplasm resources based on the requirement. To confirm the rational and scientific basis in selection of the two parents, biometrical approaches are employed to analyze diversity and the combining ability of the selected parents. Hybridization involves crossing of male flower from one parent to female flower of the other. In many crops, crossing is found to be a tedious job, and the success of the artificial hybridization ranges from 5% to 25% depending upon the weather,

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

5

Table 1.1 Selected examples of crop cultivars that represent their development through different conventional breeding strategies Name of the conventional breeding approach Plant introduction

Name of the variety G109-1

Characteristic feature Bruchid resistant

Rice

TN-1, IR-8

Dwarf lines

Wheat

Sonora-64, Lerma Rojo, Kalyan Sona, Sonalika All the new-age varieties

Dwarf and yield characteristics

Crop Chickpea (Cicer arietinum)

Sugarcane

Mass selection

Maize

–

– –

– –

–

–

–

–

Corn

Improved lines

Enhanced variability for heading date Yield improvement

Forest trees

22 slash pine clones Paired maize gene pools

Improved paper quality Resistant to ear rots and leaf blight

Tobacco

Reciprocal recurrent selection

–

Oat

Sorghum Rice

Barley

Recurrent selection

–

Derived from noble cane variety of Saccharum officinarum –

Jaunpur local, Tinpakhia, Basri Bajra-207, Bajri-28-15, Bichpuri local, Pusa moti RSI, T22 M-351, Vidisa 60-1, Patni 6, Aispuri, BP 53 C-251, C-50, K-12 NP28, NP63, NP70 Mutant lines

Pearl millet

Pure line selection

Superior agronomic characters

Maize

Remarks Introduced to India from Turkey Facilitated by IRRI CIMMYT, Mexico

At CIMMYT, Mexico

At CIMMYT, Mexico (continued)

6

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

Table 1.1 (continued) Name of the conventional breeding approach Mutation Breeding

Crop Rice

Rice

Characteristic feature –

Mung bean

Jiahezazhan and Jiafuzhan Binamoog-5

Grows well in salty conditions Plant hopper and blast resistance –

Tomato

Maybel

Drought resistance

Cotton

MA-9 cotton

Cotton Soybean

Lumian number 1 cotton Henong series

The world’s first mutant cotton released in 1948 drought tolerance, high yielding – –

Pear

Osa gold

Disease resistance

Banana

Albeely

Peppermint

Murray Mitcham CO 2

Better quality, high yield, and better stand Verticillium wilt tolerance Induced mutation of vegetative cutting

Rice

Clonal selection

Name of the variety Binasail, Iratom-24 and Binadhan-6 GINES

Jasmine (Jathimalli or Pitchi) Barleria

CO1

Early flowering

Chrysanthemum

CO2

Gerbera

Yercaud –1

Purple color (Rhodamine purple–29) Vase life of 7 days

Remarks Released in Bangladesh Released in Cuba Released in China Released in Bangladesh Released in Cuba Released in India

Released in China Released in China Released in Japan Released in Sudan Released in the USA Released by TNAU, India Released by TNAU, India Released by TNAU, India Released by TNAU, India

particularly temperature, wind speed, and humidity besides the parental genotypes involved in that cross combination. Since majority of the flowers of cultivated crops are small and delicate, emasculation and pollination generally cause damage to the floral parts, thus reducing the success rate. Further, timing of the pollination and fertilization, time taken by the pollen grains to germinate after pollination and to

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

7

reach the ovary base, functional properties of stigmatic cells, etc. also decide the success rate of hybridization. Hybridization may involve single crosses, three-way crosses, four-way crosses or multiple crosses depending upon the objective of the program. Single crosses are effectively and widely used in several crops to develop new cultivars by crossing two parents and selecting the recombinants that are superior to both of the parents that were used in the cross. Three-way crosses involve three different parents (each parent holds a particular desirable trait) with the aim of recombining the traits together in a new cultivar. In the first step, an F1 is derived between the two parents of choice, and then the third parent is crossed with the F1 to develop a three-way cross. These crosses provide added prospect for better gene interaction. Three-way crosses can be managed by pedigree or bulk-breeding methods. The main point to be noted here is the progenies of three-way crosses which are more variable with wide genetic base than single crosses though it takes more time to isolate uniform progenies. In contrary, more than three parents are involved in multiple crosses (e.g., four-way crosses which involve four genetically different parents), which are crossed in various ways. These crosses are used to broaden the genetic base and to enhance the opportunities for enhanced recombination. The segregating generations can be managed by following pedigree or bulk-breeding methods. Generally, the cultivars derived from multiple crosses are anticipated to have wider adaptation for wider range of environments. It is always desirable to attempt well-planned crosses based on diversity and combining ability of parents. It is generally recommended that poor performing F1 crosses can be rejected in early stage itself, which enables handling of few promising hybrids. In addition, the performance of F2 populations can also be taken as a feature for their further advancement. Usually, F1 hybrids showing enhanced heterosis and less in-breeding depression are the ideal ones to pursue with since additive gene effects seem to govern the expression of traits in such populations, which can be fixed by simple selection. On the other hand, worth of early generation selection is often debated and not recommended in some cases. For example, yield is a complex trait and generally less heritable. Therefore, selection for yield per se, particularly in the early segregating generations, may not give better results. However, indirect selection for component traits (such as number of grains or seeds, number panicles or pods, seed size, seed weight, etc.) related to productivity has been found to be useful. Further, pedigree method of selection is the commonly used method of handling segregating population in self-pollinated crops. However, there are various reports for and against the pedigree method of selection. For example, pedigree method of breeding for high yield in chickpea was extensively employed by one group of breeders, whereas other researchers found this method less suitable and advocated bulk method or the single-seed descent method for efficient selection in the same chickpea crop. Hence, it is concluded in consensus that handling of segregating populations largely depends on the objective of the breeding program, and it can be summarized chickpea breeding as pedigree method for resistance to biotic stresses; bulk-pedigree method for traits such as drought tolerance and winter hardiness;

8

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

modified bulk method for abiotic stresses, seed size, earliness, plant type, etc.; and backcross method for interspecific hybridization and limited backcross for introgression and resistance breeding that involves wild species. Recurrent selection is a variation of progeny selection with a difference that the selected progenies are allowed for all possible intercrosses through open pollination to provide all kinds of recombination. So the method can be defined as reselection generation after generation with interbreeding of selected progenies that enhance genetic recombination. There are four main types of recurrent selection: (a) simple recurrent selection, (b) recurrent selection for general combining ability, (c) recurrent selection for specific combining ability, and (d) reciprocal recurrent selection. Readers are requested to refer to a plant breeding book to get a detailed procedure of these methods (Allard 1960; Poehlman 1987). There is also another selection method named as clonal selection: A clone may be defined as a group of plants derived from a single plant by vegetative propagation. Normal stem, runner, sucker, stolon, tuber, rhizome, bulb, and root or root cuttings can be used as clones. A clone is homozygous and stable in nature. However, variability can be induced through mutation, and it can be propagated vegetatively. The procedure of clonal selection is the selection of desirable clones from the mixed population of vegetatively propagated plants. As that of self-pollinated crops, the following are the major methods in breeding cross-pollinated species: (1) mass selection, (2) development of hybrid varieties, and (3) development of synthetic varieties. Since cross-pollinated species are naturally hybrid (heterozygous) for many traits and lose vigor as they become purebred or pure line (homozygous), a goal of each of the above three breeding methods is to preserve or restore heterozygosity. A synthetic variety is developed by intercrossing a number of genotypes of known superior combining ability, i.e., genotypes that are known to give superior hybrid performance when crossed in all combinations. (By contrast, a variety developed by mass selection is made up of genotypes bulked together without having undergone preliminary testing to determine their performance in hybrid combination.) Synthetic varieties are known for their hybrid vigor and for their ability to produce usable seed for succeeding seasons. Because of these advantages, synthetic varieties have become increasingly favored in the growing of many species, such as the forage crops, in which expense prohibits the development or use of hybrid varieties. In order to further increase novel genetic variability, mutation breeding has been introduced as a fruitful strategy in plant breeding programs. The first mutation idea, traced back to 1900, was given by de Vries in Die Mutationstheorie during 1901. The oldest description of spontaneous mutants appeared around 300 BC in China and was described especially in cereal crops in an ancient book, Lulan. However, the first induced mutations were discovered in fruit fly (Drosophila melanogaster L.) by Muller in 1927. After the discovery of X-ray-induced mutations, Stadler showed its application to barley (Hordeum vulgare L.) in 1928, and then it has been successfully exploited by many scientists all over the world. After the establishment of Division of Nuclear Techniques in Food and Agriculture at Food and Agriculture Organization/International Atomic Energy Agency (FAO/

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

9

IAEA) and several national facilities such as Bhabha Atomic Research Center in India, plant breeders were continuously using this technique to increase the genetic variation in crop plants. Both kinds of mutagens (agents that induce mutations), viz., (i) physical radiation and (ii) chemical mutagens, are highly beneficial to create genetic variability in a crop species, and recently, mutagenesis has received great attention for its use as a promising new technique known as “targeting-induced local lesions in genomes” (TILLING) and “EcoTILLING” (described in Sect. 2.3.4). Most of the times, crop mutations are noticed as recessive, and they can be selected in the second generation, M2. Unlike recessive, dominant mutations occur in very small frequencies, and they can be selected in M1. However, selection of polygenic traits should be started in individual plant progenies of M3. In contrary, it is also argued that mutations are beneficial with very low frequencies (~0.1%). Further, mutation procedures reduce germination, growth rate, vigor and pollen, and ovule fertility in the treated crop plants. In addition, mutations are randomly induced, and they might occur in any gene(s), and the same gene(s) in a crop plant species may be induced again (referred recurrent mutation). It is also imperative to note that mutations have pleiotropic effects due to the fact that mutation in one gene causes variations in expression of other closely linked gene(s). Therefore, precise planning of mutation breeding program is the prerequisite that focuses on selection of variety, mutagen, and dose. The variety selected for mutagenesis should be one of the best varieties released recently, well-adapted, and high- yielding but had one or two deficient traits. An optimum dose can be determined with a preliminary treatment. Overdoses of mutagens are lethal to many plants, while underdose will produce less mutation frequency. Although air-dried seeds are the most frequently used, pollen grains are also used besides vegetative propagating materials or organs. Irradiation of pollen grains can be beneficial since it can avoid pre-fertilization and post-fertilization problems especially in intraspecific hybridizations. Treatments of interspecific and intraspecific hybrids can produce translocations. Mutations do not induce chimeras when pollens are irradiated. Thus, mutation breeding not only creates variability in a crop species but also shortens time taken for the development of cultivars. It is estimated that the average time taken from the beginning of mutation treatment to the release of the mutant cultivars is approximately 9 years, while this time is 18 years for cultivar arising from crossing programs. Moreover, mutations are induced in both qualitative and quantitative characters in a short time by altering alleles of known and unknown genes besides modifying or even breaking the undesirable linkages. Furthermore, mutations are one of the most important sources of evolution and will be helpful in studying genetics of novel characters that are created by mutations.

1.3.1 Achievements This section provides examples of successful stories in conventional plant breeding methods, and Table 1.1 describes selected examples of crop varieties that were

10

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

released in different crops. Plant introduction and other methods of conventional breeding that are described in the above section have been utilized very successfully in developing crop cultivars. For example, in India, mass selection has been useful in the development of improved varieties in cross-pollinated crops like maize, pearl millet, and mustard and in often cross-pollinated species like cotton and sorghum.

1.3.2 Conventional Plant Breeding: Obstacles and Challenges Despite the significant contributions of conventional breeding strategies, breeders found it difficult to efficiently use them in novel crop cultivar development for improving: 1. Traits that are not present in the germplasm of the target species (e.g., there is no resistance source in papaya germplasm against papaya ring spot virus). It is also noticed that apart from few stress-resistant characteristics, desirable sources of resistance have not been found in the cultivated collections with the exception of wilds, but they possess crossing incompatibility. 2. Traits that are difficult to expand through conventional phenotypic selection, because they are expensive or time-consuming to measure or have low penetrance or complex inheritance. 3. Traits whose selection depends on specific environments or developmental stages for expression of the target phenotype (e.g., virus disease development occurs in a particular phenological stage and hotspot: Mungbean Yellow Mosaic Virus (MYMV) symptoms are more prominent during summer at vegetative stage in South India). 4. Or maintaining recessive alleles during backcrossing or for speeding up backcross breeding in general. 5. By pyramiding multiple monogenic traits (such as pest and disease resistances or quality traits) or several component traits of a single target trait with complex inheritance (such as escape, avoidance, and tolerance traits that favor drought tolerance). 6. The frequency of desirable mutations, which is very low, i.e., ~0.1%, in case of mutation breeding. Success of mutation breeding depends on the methodology, effective screening techniques, population grown in M1, and successive generations. The larger the population in M1, the more the success in the selection of desirable mutants. On the other hand, screening procedures for large populations will take considerable time, labor, and other resources. Further, some mutations involve pleiotropic effects due to linked gene(s), other mutations, chromosomal aberrations, and deletions. These mutants often have to be backcrossed to parents or adapted varieties. Thus, conventional breeding methods, though significant and productive in their own right, also impose restriction on the chance of better recombination because of larger linkage blocks associated with rapid progress toward homozygosity and low

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

11

genetic variability. Further, negative correlations among yield components and high genotype × environment interactions prevent full exploitation of genetic variability for complex traits such as yield. It is also important to note that in self-pollinated crops, the conventional methods such as pedigree or bulk, which handle the segregating populations, do not provide any opportunity for the continued reshuffling of the genes. Hence, any unfavorable associations observed in an early segregating generations like in F2 are liable to persist through the filial generations. However, breeders can alter such associations by resorting to approaches like biparental mating in the segregating populations; but it is a labor-intensive and time-consuming strategy. Therefore, it is right time to re-evaluate general breeding procedures and introduce novel methods that overcome the above shortcomings.

1.3.3 Germplasm Exchange: International Laws and Governance There is also an additional bottleneck in conventional breeding approaches besides those outlined above. It is often documented that the donors for the target traits lay in the national or mostly in international germplasm repositories, and obtaining such donors across the countries needs elaborate and time-consuming efforts. For example, in India, requisition for introduction of new crop plant or new varieties or getting donors for desirable traits should be submitted to the National Bureau of Plant Genetic Resources (NBPGR) within the country or to International Bureau of Plant Genetic Resources (IBPGR). The material may be obtained on an exchange basis from friendly countries either directly or through FAO, or the material can also be purchased or obtained as free gift from individuals or organizations. On receipt of the material, the entry inspection is done by the country for other contaminants and the presence of insect, diseases, and nematodes. The materials are treated with insecticides, fungicides, or nematicides and then released to the user. The general objective of all “quarantine and regulatory” measures is to prevent pests and diseases from entering into the country as well as to check spreading further. On the other hand, it greatly delays the procedure of incorporating such desirable breeding materials into the regional breeding program. Another demerit associated with plant introduction is introduction of weeds. The weeds like Argemone (prickly poppies), Eichhornia (water hyacinth), and Lantana have been introduced from other countries with the introduction of crop plants. However, in most of the cases, the introduction of weeds, diseases, and insect pests occurred during a period when quarantine was almost nonexistent. Thus, in short, it can be concluded that though the conventional breeding has shown its potential in developing large array of varieties or hybrids that meets the demand of food, feed, and shelter sectors, it is now felt that such methods are not precise and progress well for further improving the complex desirable traits such as biotic and abiotic resistance, yield, and quality improvement. Hence, modern breeders look for tools that supplement their breeding efficiency by reducing the time required to release a cultivar or hybrid with minimum effort but high precision.

12

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

1.3.4 Biotech Crops and Biosafety Issues 1.3.4.1 Evolving New Cultivars Using Tools of Biotechnology In the way of searching novel tools to increase the genetic variation among the crop plants, it is now generally accepted that the tools and methods employed in biotechnology can greatly supplement the efficiency of plant breeding for enhanced genetic variation toward generating novel cultivars, and they are described hereunder. 1.3.4.2 Plant Tissue Culture-Based Approaches Plants usually reproduce by forming seeds through sexual reproduction. During sexual reproduction, genome from both parents is combined in new and unpredictable ways, creating unique plants. This unpredictability is a problem for plant breeders as it can take several years to breed a plant with desirable traits. However, researchers have now developed methods of growing exact copies of mother plant without sexual reproduction through a technique called “tissue culture.” Plant tissue culture is the culturing of plant cells, tissues, or organs on artificially formulated nutrient media under aseptic conditions. Plant tissue culture is seen as an important technology for developing countries for the production of disease-free, high-quality planting material and the rapid production of many uniform plants. Besides high-throughput true-to-type production of seedlings through micropropagation, plant tissue culture has also enabled to alter the makeup of plant genomes by employing its various strategies such as somaclonal variation, protoplast fusion, embryo rescue, and doubled haploid (DH) plant production. Novel desirable genetic variation has been documented in several crops, and they are represented in Table 1.2. For example, with DH production systems, homozygosity is achieved in one generation, and hence the breeder can eliminate the numerous cycles of inbreeding necessary to achieve practical levels of homozygosity by conventional methods. Somaclonal variation can pose a serious problem in any micropropagation program, where it is highly desirable to produce true-to-type plant material. On the other hand, somaclonal variation has provided a new and alternative tool to the breeders for obtaining genetic variability relatively rapidly and without any sophisticated technology in crops that are either difficult to breed or have narrow genetic base. In vitro-induced somaclonal variation has its own merits: 1 . It is cheaper than other methods of genetic manipulation procedures. 2. Tissue culture systems are available for more plant species than can be manipulated by somatic hybridization and transformation at the present time. 3. It is not necessary to have identified the genetic basis of the trait, or indeed, in the case of transformation, to have isolated and cloned it. 4. Novel variants have been reported among somaclones, and evidences indicate that both the frequency and distribution of genetic recombination events can be altered by passage through tissue culture.

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

13

Table 1.2 Selected examples of novel crop cultivars developed using different strategies of plant tissue culture Name of the plant tissue culture strategy Somaclonal variation

Crop Apple Anthurium sp. Banana

Cultivar or hybrid produced Rootstock Malling 7 Orange Hot TC1-229 TC2-425

CUDBT-B1 Aglaonema

Protoplast fusion (somatic hybridizations and cybridizations)

Capsicum Blackberry Citrus

Brassica

Doubled haploid production

Embryo rescue

Moonlight Bay, Diamond Bay, and Emerald Bay Bell sweet Lincoln Logan Rootstocks

Tomato

Brassica napus + Camelina sativa hybrids Marglobe

Rapeseed Wheat Barley Coconut

Maris Haplona Lillian, AC Andrew Mingo Macapuno

Remarks Resistance to white root rot Derived from Red Hot clone semidwarf and resistant to Fusarium wilt Resistant to Fusarium oxysporum f. sp. cubense (Foc) race 4; bunch 40% heavier than cv. Formosana Reduced height and early flowering

Yellow fruited Thornless variety Resistant to biotic and abiotic constraints and in increased yield and fruit quality With increased linolenic acid content compared to the B. napus partner First DH cultivar by Morrison (1932) By Thompson (1972) Popular in Canada during 2007 In 1980 Mass propagation

This implies that variation may be generated from different locations of the genome than those which are accessible to conventional and mutation breeding. Somaclonal variation has been most successful in crops with limited genetic systems (e.g., apomicts, vegetative propagating materials) and/or narrow genetic bases. In ornamental plants, for instance, the exploitation of in vitro-generated variability has become part of the routine breeding practice of many commercial enterprises. Krishna et al. (2016) have listed out a wide array of in vitro selection of desirable traits and development of some commercially exploited varieties through somaclonal variation in different horticultural crops. But it should be noted that somaclonal variation can become a part of plant breeding only when they are heritable and genetically stable. Merely a limited number of promising varieties so far had been released using somaclonal variations. This is perhaps due to the non- predictability of somaclones and response of crop plants to tissue culture protocol.

14

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

Another application of plant tissue culture was to rescue hybrid embryos, and the technique became a routine assistance to plant breeders to raise rare hybrids, which normally failed due to post-zygotic sexual incompatibility. However, the success of this technique depends on isolating the embryo without injury, formulating a suitable nutrient medium, and inducing continued embryogenic growth and seedling formation. Somatic hybridization (SH) using protoplast fusion has been regarded as an alternative and promising tool to produce symmetrical and asymmetrical polyploidy somatic hybrids in many agricultural crops. The technique of SH could facilitate conventional breeding by providing novel lines so as to use them as elite breeding materials in crosses for both scion and rootstock improvement. Further, SH can overcome those problems associated with sexual hybridization, viz., sexual incompatibility, nucellar embryogenesis, and male/female sterility. Successful exploitation of SH in horticultural crops mainly comes from the transfer of genes from related species that confer resistance to biotic and abiotic stresses in several horticultural crops, viz., citrus, potato, brinjal, tomato, mango, avocado, banana, strawberry, pear, cherry, etc. However, certain boundaries and limitations of SH (such as the introduction of large amounts of exogenous genetic material along with the genes of interest may induce genetic imbalance; unfavorable traits such as fruits may present undesirable characteristics such as irregular and thick skin) restrict its use over sexual hybridization.

1.3.4.3 Genetic Engineering and Transgenic Plants Evolving plant varieties expressing noble agronomic characteristics is the ultimate goal of plant breeders. However with conventional plant breeding, there is little or no guarantee of finding any particular gene combination from the millions of crosses generated. It may have a chance to meet any one or both of the following problems: (1) undesirable genes can be transferred along with desirable genes, and (2) while one desirable gene is gained, another is lost because the genes of both parents are mixed together and reassembled more or less randomly in the offspring. Further, majority of the instances, the gene of interest may not be present in the target or related wild species that are crossable with the target species. These problems limit the improvements that plant breeders intended to achieve. On the other hand, genetic engineering allows the direct transfer of one or just a few genes of interest, between closely and distantly related (even completely different) organisms to obtain the desired agronomic trait. Variety of techniques and strategies were explored to exploit this tool to generate transgenic crop plants (see below). Not all genetic engineering techniques involve inserting DNA from other organisms. Plants can also be modified by removing or switching off their own particular genes using the genetic engineering as well as recently developed genome-editing tools. Genome editing (also called gene editing) is a group of technologies that allow genetic material to be added, removed, or altered at particular locations in the genome. Several technologies to genome editing have been developed, and the most popular one is known as CRISPR-Cas9, which is short for clustered regularly interspaced short palindromic repeats and CRISPR-associated protein 9. The

1.3 Evolving New Crop Cultivars: Promises of Conventional Plant Breeding Programs

15

CRISPR-Cas9 system is the choice of such tools owing to the fact that it is faster, cheaper, more accurate, and more efficient than other existing genome-editing methods. The first transgenic plant produced was an antibiotic-resistant tobacco plant (Fraley et al. 1983). In 2013, the leaders of the three research teams that first applied genetic engineering to crops, Robert Fraley, Marc Van Montagu, and Mary-Dell Chilton, were awarded the World Food Prize for improving the “quality, quantity, or availability” of food in the world. The first field trials occurred in France and the USA in 1986, using tobacco plants engineered for herbicide resistance (James 1996). In 1987, Plant Genetic Systems (Ghent, Belgium), founded by Marc Van Montagu and Jeff Schell, was the first company to genetically engineer insect-resistant (tobacco) plants by incorporating genes that produced insecticidal proteins from Bacillus thuringiensis (Bt) (Vaeck et al. 1987). The People’s Republic of China was the first country to allow commercialized transgenic plants, introducing a virus-resistant tobacco in 1992; however, it was withdrawn in 1997 (Conner et al. 2003). The first genetically modified crop approved for sale in the USA, in 1994, was the Flavr Savr tomato. It had a longer shelf life, because it took longer to soften after ripening (Bruening and Lyons 2000), but currently it does not exist in the market. Since then, several transgenic plants were released for cultivation with improved agronomically and nutritionally important traits. Since GM crops were first introduced to global agriculture in 1996, Clive James has published annual reports on the global status of commercialized genetically modified (GM) crops as well as special reports on individual GM crops, the technology used to develop such GM crops, gene resources, etc., from the International Service for the Acquisition of Agri-biotech Applications (ISAAA), and all can be freely accessed at http://www. isaaa.org. A total of 70 countries adopted biotech crops in 2018, the 23rd year of continuous biotech crop adoption, according to the Global Status of Commercialized Biotech/GM Crops in 2018 (ISAAA Brief 54) released by the International Service for the Acquisition of Agri-biotech Applications (ISAAA; http://www.isaaa.org/ resources/publications/briefs/54/default.asp; accessed on 23rd November 2019). Totally, 26 countries (21 developing and 5 industrialized countries) planted 191.7 million hectares of biotech crops, which added 1.9 million hectares to the record of plantings in 2017. The continuous adoption of biotech crops by farmers worldwide indicates that biotech crops continue to help meet global challenges of hunger, malnutrition, and climate change. Among the transgenic traits that were modified in the crops, herbicide resistance was the widest employed trait among the GM crops. Glyphosate is a nonselective, broad-spectrum foliar herbicide with no soil residual activity that has been used for >30 years to manage over 300 weed species including annual, perennial, and biennial herbaceous grass, sedge, and broadleaf weeds as well as unwanted woody brush and trees. Glyphosate strongly competes with the substrate phosphoenolpyruvate

16

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

(PEP) at the 5-enolpyruvylshikimate-3-phosphate synthase (EPSPS) enzyme-binding site in the chloroplast, resulting in the inhibition of the shikimate pathway. Products of the shikimate pathway include the essential aromatic amino acids tryptophan, tyrosine, and phenylalanine and other important plant metabolic products. The relatively slow mode of action and physicochemical characteristics result in glyphosate translocation throughout the plant and accumulation at the vital growing points before phytotoxicity occurs. Two transgene codes for a glyphosate-insensitive EPSPS, (i) the cp4 epsps gene from Agrobacterium tumefaciens strain CP4 and (ii) the mutated zm-2mepsps from corn (Zea mays L.), were used to develop several transgenic crops. Now herbicide resistance traits are used on >80% of the estimated 134 million hectares of transgenic crops grown annually in 24 countries with a single trait, CP4 EPSPS.

1.3.4.4 Biosafety and Other Issues Related to Transgenic Crops Despite the claiming that transgenic crops offer vivid promise for meeting greatest challenges of conventional breeding strategies, like all new technologies, it also poses certain risks. This is mainly owing to the fact that the transgenic crops with new gene combinations, which are not found in nature, may have harmful effects on human and animal health, the environment, nontarget species, and biological diversity besides its impact on socioeconomic and ethical values. For example, though herbicide-resistant crops have largely been commercially cultivated, they have not been shown as sustainable tool for a number of reasons, particularly because of the higher cost of herbicides and its more restrictive application timings. Further, more importantly, the evolution of weeds against the given herbicide, that is, herbicide resistance, was faster and more widespread than it is expected. Recognizing the need of biosafety in transgenic research and development activities, an international multilateral agreement on biosafety referred as “the Cartagena Protocol on Biosafety (CPB)” has been adopted by 167 parties, including 165 United Nations countries, Niue, and the European Union. In general, Biosafety describes the principles, procedures, and policies to be adopted to ensure the environmental and personal safety. The Protocol entered into force on 11 September 2003, with objectives (i) to set up the procedures for safe trans-boundary movement of living modified organisms and (ii) to harmonize principles and methodology for risk assessment and establish a mechanism for information sharing through the Biosafety Clearing House (BCH). Accordingly, transgenic research on genetically modified organisms (GMOs) requires prior approval from the appropriate regulatory authorities of the country, and it should have the following guidelines as mandatory to minimizing biosafety issues. The primary regulatory body at research institute level is the Institutional Biosafety Committee (IBSC) or its equivalent body consisting of experts from different relevant disciplines. The IBSC ensures existence of the basic biosafety equipment required as per the safety level of the experiments to be conducted. There has been increasing awareness among the researchers, producers, and users of GMOs, administrators, policy makers, environmentalists, and general

1.5 Scope of Genetic Mapping and Marker-Assisted Selection

17

public about biosafety all over the world. Transgenic crops are not toxic nor are likely to proliferate in the environment. However, specific crops may be harmful since they have novel combinations of traits they possess. This means that the concerns associated with use of GMOs can differ greatly depending on the particular gene–organism combination, and therefore a case-by-case approach is required for risk assessment and management. Genetic Use Restriction Technologies (“GURTs”) is an ongoing topic of discussion under the Convention on Biological Diversity. The current focus surrounding this topic concerns whether and how GURTs may impact indigenous peoples, local communities, and smallholder farmers. Despite the careful considerations in release of transgenic crops, it is not accepted in several countries, and now it is mandatory to label the crop produce, whether it is GMO or not, to reach the national and international market. For example, in countries like India, still GMOs (either regular or gene edited) are not approved for commercial cultivation, except Bt cotton, and all the transgenic research trials are conducted under stringent guidelines provided by the regulatory authorities of the Government of India.

1.4

Alternative Approach: MAS

Since undergoing health and environment safety tests is a must to release transgenic crops, it makes the whole process of commercialization extremely effortful and expensive. On the other hand, universities and public institutions have low-budget startups, and they look for a biotechnology tool that supplement their regular conventional breeding program but increase the efficiency of breeding efforts. In this context, introgression of mapped target traits into the elite parent and marker- assisted selection is considered as an affordable alternative breeding program.

1.5

cope of Genetic Mapping and Marker-Assisted S Selection

The genetic gain by accelerating selection during the breeding process is the ultimate objective of a breeder to achieve long-term improvement in crop productivity. Selection normally involves assessing a breeding population for one or more traits in field or glasshouse trials (e.g., agronomic traits, disease resistance, or abiotic stress tolerance) or with chemical tests (e.g., grain quality). The goal of plant breeding is to gather more desirable combinations of genes in a novel elite line. Selection of superior plants involves visual assessment for agronomic traits or resistance to stresses, as well as laboratory tests for quality or other traits in replicated field trials. The entire process involves considerable time (5–10 years for elite lines to be identified) and expensive efforts. The size and structure of a plant population is another vital consideration for a breeding program. Usually, larger population size is required in order to identify

18

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

specific gene combinations if a large number of genes segregate in the given population. Therefore, typical breeding program grows hundreds or even thousands of populations and many thousands or millions of individual plants. Therefore, a breeder’s task would be easy, if there is a tool that reduces the complexity of selection. Marker-assisted selection (MAS) uses DNA (or other macromolecules such as proteins or RNA; explained in detail in Chap. 4)-based landmarks or selection indices for selecting superior traits and thus favors rapid identification of elite breeding materials. Thus MAS compliments traditional selective breeding methods for rapid genetic gains. This book is designed to elaborate the application of MAS in crop plants. The large number of quantitative trait loci (QTLs) mapping studies using DNA markers for diverse crops species has provided an abundance of DNA marker–trait associations that lead to efficient MAS. The detection of QTLs controlling traits is possible due to genetic linkage analysis, which is based on the principle of genetic recombination during meiosis (explained in detail in Chap. 5). The main considerations for the use of markers in MAS instead of other selection indices are reliability, affordable technical procedure for marker assay, and cost. Besides identifying QTLs, markers have also been employed in other breeding aspects such as cultivar identity, assessment of genetic diversity and parent selection, and confirmation of hybrids (explained in detail in Chap. 4). Traditionally, these tasks have been done based on visual selection and analyzing data based on morphological characteristics. Further, DNA markers have been used to define heterotic groups that can be used to successfully exploit heterosis (hybrid vigor) for hybrid crop production, especially in maize and sorghum. Markers have also been used to identify shifts in allele frequencies within the genome and monitor specific alleles or haplotypes which can be used to design appropriate breeding strategies for genetic improvement of the given trait. Eventually, marker data on genomic regions under selection can be used for the improvement of breeding lines with specific allele combinations using MAS schemes such as marker-assisted backcrossing or early generation selection. An outstanding piece of successful MAS example is the development of an improved version of the pearl millet hybrid HHB 67 that possesses downy mildew resistance. This book highlights the step-by-step procedure from parental selection to marker- assisted selection and provides examples that have successfully employed MAS to complement breeding program. This book also outlines reasons for little impact of MAS on plant breeding so far and suggests strategies that help to harvest potential of MAS. Further, it should be noted that for complex traits, with low heritability values that are strongly affected by the environment, MAS faces the same problems that conventional breeders do and, therefore, has had limited impact. In such cases, this book provides the cautious procedure to be followed to make MAS as a prosperous program. Although QTL mapping has many potential deliverables, it is considered to be a basic research process, and results are typically

Critical Thinking Questions

19

published in scientific journals. However, for plant breeding, the final “product” is a new variety. This book also highlights the importance of MAS for future breeding program and why MAS is inevitable, especially for the orphan crops and the nations that have limited resources. Besides this book also proposes troubleshooting measures for the challenges that are frequently faced by scientific workers involved in MAS.

1.6

Need for This Book: What Can Be Expected?

Poor understanding on the genetics and molecular biology of marker classes, involvement of statistical procedures in mapping and analysis, problem with discrimination of spurious and accurate association of marker and target traits, and personal preference to the fancy biotechnological tools put the young researchers in a confused state on the genetic mapping and marker-assisted selection in crops. To attract the young minds and also to update the experienced personal in this fascinating area of crop breeding, “marker-assisted selection (popularly called as MAS),” which going to rule the future crop breeding program, this book is written with the following pedagogy: 1. Explaining every little details of the techniques using step-by-step procedure by explaining types, scoring, analysis, and interpretation. 2. Simple line drawing to understand and reproduce the concepts. 3. Important information on history, online resources, novel concepts, etc., in boxes. 4. Highlighting, defining, and explaining subject-specific terminologies. 5. Hyperlinking the concepts across the chapters. 6. Critical thinking questions and additional reading resources. 7. Troubleshooting measures and effective alternatives, if the resources are limited. 8. Special emphasis is given to review the MAS in selected major agriculture and horticulture crops as case studies. I sincerely believe that the treatment of every chapter in the above lines will ensure fundamental knowledge and confidence among the end users and provide them potentials to efficiently breed the plants for the better future by taking informative decisions.

Critical Thinking Questions 1. Why do transnationally travelled plant materials not be introduced directly to the needy places? 2. Selection of appropriate selection indices is the key to the successful plant breeding program. Justify.

20

1 Genetic Mapping and Marker-Assisted Selection: Setting the Background

3. Why should an informative selection of DNA marker class be important in MAS? 4. Why a transgenic crop gradually loses its resistance to the pest over a period of time? 5. Can mutation breeding be used to precisely modify a phenotype? Why?

Bibliography Literature Cited Allard (ed) (1960) The principles of plant breeding. Willey, New York Bruening G, Lyons JM (2000) The case of the FLAVR SAVR tomato. Calif Agric 54(4):6–7 Conner AJ, Glare TR, Nap JP (2003) The release of genetically modified crops into the environment. Part II. Overview of ecological risk assessment. Plant J 33(1):19–46 Fraley RT et al (1983) Expression of bacterial genes in plant cells. Proc Natl Acad Sci U S A 80(15):4803–4807 ISAAA (2018) Global status of commercialized biotech/GM crops: 2017. ISAAA brief no. 54. ISAAA, Ithaca James C (1996) Global review of the field testing and commercialization of transgenic plants: 1986 to 1995. ISAAA, Ithaca Krishna H, Alizadeh M, Singh D, Singh U, Chauhan N, Eftekhari M, Sadh RK (2016) Somaclonal variations and their applications in horticultural crops improvement. 3 Biotech 6(1):54 Poehlman JM (1987) Breeding field crops. Springer, Dordrecht. https://doi. org/10.1007/978-94-015-7271-2 Vaeck M et al (1987) Transgenic plants protected from insect attack. Nature 328(6125):33–37

Additional Readings Kammar V, Nitin KS (2019) Molecular marker-assisted selection of plant genes for insect resistance. In: Experimental techniques in host-plant resistance. Springer, Singapore, pp 267–273 Kumar A, Mallick S (2019) Marker-Assisted Selection (MAS) in Rice Biotechnology Research in India. Perspect Glob Dev Technol 18(3):286–307 Gupta PK, Kulwal PL, Jaiswal V (2019) Association mapping in plants in the post-GWAS genomics era. Adv Genet 104:75–154

2

Germplasm Characterization: Utilizing the Underexploited Resources

Contents 2.1 T ypes of Plant Germplasm: Natural Versus Man-Made 2.1.1 Conservation of Naturally Prevailing Plant Germplasm 2.1.2 Conservation of Man-Made Plant Germplasm 2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic Characters 2.2.1 Conventional Methods of Phenotyping: Biotic and Abiotic Stress Resistance, Yield, and Quality Traits 2.2.2 Recent Developments in Phenomics and Way Forward 2.2.3 Case Study in Rice Germplasm Characterization for Drought Resistance: Formation of the Fundamental Requirements 2.2.4 Traits Useful for Germplasm Characterization in Rice 2.3 Allele Mining 2.3.1 Allele Mining: Basic Considerations 2.3.2 Insertional Mutagenesis 2.3.3 Genome Editing Tools and Induced Variations 2.3.4 TILLING, EcoTILLING, and Self-EcoTILLING 2.3.5 Mutant-Assisted Gene Identification and Characterization (MAGIC) 2.3.6 Allele Mining: Challenges and Troubleshooting Perspectives 2.4 Genetic Diversity and Clustering 2.4.1 Software for Genetic Diversity Analysis 2.4.2 Principle Behind the Genetic Diversity Analysis 2.4.3 Principle of Measuring Goodness-of-Fit of a Classification 2.5 Issues in Genetic Diversity Analysis Using Molecular Markers 2.5.1 Co-dominant Markers and Similarity Measures 2.5.2 Dominant Markers and Similarity Measures 2.6 Diversity and Phylogenetic Tree: Importance in Mapping Population Development 2.7 DNA Barcoding and Its Utilization in Germplasm Exploitation 2.8 Parental Selection Critical Thinking Questions Bibliography

© Springer Nature Singapore Pte Ltd. 2020 N. M. Boopathi, Genetic Mapping and Marker Assisted Selection, https://doi.org/10.1007/978-981-15-2949-8_2

23 24 26 29 29 29 31 32 35 35 36 39 40 41 43 45 50 53 55 61 63 63 64 65 66 66 67

21

22

2 Germplasm Characterization: Utilizing the Underexploited Resources

Farmers, in the given geographical region, cultivate only a small set of crop varieties for a long period of time. Modern plant breeding programs also resulted in severe genetic bottleneck. Altogether, reduction in genetic diversity is widespread among crop plants, and it is considered as a detrimental feature to the future farming process. This is because continuous use of the same cultivars usually leads to at least (i) extensive existence of pest and diseases and (ii) loss of landraces and wild species of the given crop plants (which is otherwise referred to as genetic erosion). It is also recorded regularly that less than 5% of the biodiversity known to exist in the world is being utilized in agriculture, especially in the case of self-pollinated crops. Due to ever-increasing population growth and continuous shrinking of farming lands, farmers are forced to cultivate crop plants under a wide range of latitudes and longitudes. Moreover, existing germplasm may not have the potential to meet the food/fodder requirement of an ever-increasing population, which is estimated to enlarge to 9 billion by 2050. This requires crop plants which can tolerate variations in light, temperature, water, and nutrients besides emergence of new pest and diseases that challenge crop production in these environments. Conventional breeding approaches have considerably contributed in genetic improvement of crops. However, only a few genetically improved lines are available to meet such challenges. The main limitations that prevent the further progress through conventional breeding methods are lack of adequate genetic/biochemical /molecular knowledge on expression of traits that are beneficial to the human. Most of the agronomically and economically important traits are quantitative in nature and having complex inheritance. Thanks to the developments in nucleic acid characterization and manipulation, it is now possible to genetically analyze such quantitative traits using quantitative trait loci (QTL) mapping and marker-assisted selection (MAS). Advances in molecular marker technologies have opened the door to new techniques for construction and screening of breeding populations, to increase ultimately the efficiency of selection and accelerate the rates of genetic gain. A marker can either be located within the gene of interest or be linked to a gene determining a trait of interest. Thus, MAS can be defined as selection for a trait based on genotype using associated markers rather than the phenotype of the trait. This book is designed to describe the basics of genetic and QTL mapping using molecular markers and practicing MAS in crop plants. The first vital step in MAS is characterization of germplasm. Traditional collections, exotic accessions, and the wild species of crop plants, which are maintained in the germplasm banks, possess excellent tolerance to the biotic and abiotic stresses that are prevalent in the existing and new crop production environments. Such germplasm collections provide potential resources for future crop improvement program designed to cope with many biotic and abiotic stresses. Hence, it is important to characterize and understand the genetic variation that exists in germplasm for their effective and proficient utilization in crop breeding programs using MAS. Characterization of germplasm facilitates identification and selection of beneficial genes or alleles in the related wild species and landraces via MAS. It involves screening each entry for morphological and agronomic characters using a

2.1 Types of Plant Germplasm: Natural Versus Man-Made

23

standard descriptor list. As many characteristics as possible should be recorded using coded qualitative scores. Further, gathering passport data (such as country, site, and location of collection) permits selection of germplasm on a geographical basis. In addition, a range of molecular markers (e.g., isozymes, RAPD, AFLP, and microsatellites) are also used for the classification of germplasm, and this data would be useful for more detailed genetic diversity analysis. Thus, screening thousands of accessions for pest and disease resistance, tolerance to different biotic and abiotic stresses, and systematic studies of the wild species and molecular studies of genetic diversity provide data on species taxonomy and genetic relationships. Based on this information, a core set of germplasm entries can be selected, and knowledge on genetic diversity and relationship among elite breeding materials constituting the germplasm can have a significant impact on the selection of parents in crop improvement program. Selection of parents is also imperative to achieve success in QTL mapping (see below). Owing to the reasons such as recent advances in quantitative genetics and genomic technologies and the realization that transgenic technology may not boost plant yields and substantially as quickly as expected, the focus is now in understanding and using natural variation for crop improvement. Therefore, understanding the world of crop plants that are having treasure of agronomically and nutritionally important trait variations is the prerequisite to process further. This section provides details on different types of crop germplasm and their utility in capturing the variations using QTL mapping.

2.1

Types of Plant Germplasm: Natural Versus Man-Made

Plant germplasm refers to living plant tissue from which new plants can be developed. It can be a seed or other part of the plant such as a leaf, a piece of stem, pollen, or even just a few cells that can be developed into a whole plant. Germplasm contains the stored information for a species’ genetic makeup, a valuable natural resource of plant diversity, which can be retrieved and used in later date. Germplasm is maintained in different formats, and they are usually maintained by following rigorous rulebooks since plant germplasm is the reservoir of the future breeding program. The aim of germplasm collection is to obtain living physical units, represented by samples that contain the genetic composition of the population of a given species of interest with reproductive ability. Every plant germplasm collection is normally defined by three key elements: (a) area of coverage (where to collect?); (b) target species, its relatives, and the populations of interest (what to collect?); and (c) sampling strategy (which and how many units to collect?). It is suggested that collection of 50–100 different plants per population from the greatest possible amount of areas would be appropriate though the difficulty of defining how many and which populations to include in the sampling continues still unresolved. The distribution of diversity of landraces is not random and presents a framework along multiple axes, such as geographic, genetic, and cultural and systematic samplings.

24

2 Germplasm Characterization: Utilizing the Underexploited Resources

A core collection is a small sample of the original collection, which is included in a spectrum of the genetic variability. The scheme of a core collection pursues to ensure the retention of genes or gene combinations that are present in an ex situ collection. Defining “which populations to conserve” and “to characterize” is crucial when working with limited financial and human resources. Clearly, establishing the smallest number of samples that best represents the diversity of landraces conserved in situ/on farm is a key component in the design of more efficient strategies for collection and subsequent conservation of ex situ collections. Both naturally existing and artificially created plant species are maintained using different germplasm conservation strategies, and they are outlined below.

2.1.1 Conservation of Naturally Prevailing Plant Germplasm It is now widely recognized that it is imperative to not only manage and conserve the plant germplasm but also restore the degraded plant ecosystems. Conservation of naturally existing plant germplasm is executed in two ways: in situ conservation and ex situ conservation.

2.1.1.1 In Situ Conservation In situ conservation is generally focused on the conservation of genetic resources in natural populations of plant (e.g., forest genetic resources in natural populations of tree species). It is the process of protecting an endangered plant species in its natural environment, either by protecting the habitat itself or by protecting the species from predator, and it is being done by declaring the specific area as protected area. It is also applied to conservation of agricultural biodiversity in agroforestry by farmers, especially, by using unconventional farming practices. There are several types of natural habitats under these conservation strategies, and it includes national parks, wildlife sanctuaries, and biosphere reserves. A national park is a well- marked and well-circumscribed area that is strictly reserved for the betterment of the wildlife, and none of the human activities such as cultivation (including forestry) or grazing on cultivation are not permitted besides not allowing any private ownership rights. In national parks, the mere emphasis is focused only on the preservation of a single plant species. Biosphere reserve is a special category of protected areas where human population also forms a part of the system, and it has three parts: core, buffer, and transition zone. Biosphere reserves are sometimes regarded as living laboratories for testing out and demonstrating integrated management of land, water, and biodiversity. Major advantages of using in situ conservations include: (a) The targeted flora and fauna live in natural habitats without human interference. (b) The life cycles of the organisms and their evolution progress in a natural way. (c) In situ conservation provides the required green cover and its associated benefits to our environment.

2.1 Types of Plant Germplasm: Natural Versus Man-Made

25

( d) It is less expensive and easy to manage. (e) The interests of the indigenous people are also protected.

2.1.1.2 Ex Situ Conservation Ex situ conservation is the preservation of plant diversity outside their natural habitats. This involves conservation of genetic resources including wild and cultivated species by employing diverse body of techniques and facilities. Such strategies include establishment of botanical gardens and conservation banks on gene, DNA, pollen, seed, seedling, and tissue culture products. Cold storages are the commonly used strategy where seeds are kept under controlled temperature and humidity for long period, and this is the easiest way to store the germplasm that remains viable for extended durations of time. During such cold storage, cryopreservation is also utilized which is considered as an efficient technology for the preservation of biotic parts. This type of conservation is done at very low temperature (−196 °C) in liquid nitrogen. The metabolic activities of the organisms are suspended under low temperature, which can later be revived by slowly exposing them to room temperature. The products of plant tissue culture are also cryopreserved. For example, long- term culture of excised roots and shoots of horticulture, especially floral, crops shown to be successful for a minimum storage of 20 years. Meristem culture is very popular in plant propagation as it is a virus- and disease-free method of multiplication. Studies have shown the meristem tips can be stored under ultra-low temperature. In case of animals, a unique strategy called long-term captive breeding is followed. It is the process of breeding animals outside of their natural environment in restricted conditions in farms, zoos, or other closed facilities. Humans control the choice of individual animals that are to be part of a captive breeding population and the mating partners within that population. Thus, this method generally involves capture, maintenance, and captive breeding on a long-term basis of individuals of the endangered species that have lost their habitat permanently or certain highly unfavorable conditions present in their habitat. Foremost applications of ex situ conservation strategies are: (a) It is useful for declining population of species. (b) Endangered animals on the verge of extinction are successfully bred. (c) Threatened species are bred in captivity and then released in the natural habitats. (d) Ex situ centers offer the possibilities of observing wild animals, which is otherwise not possible. (e) It is extremely useful for conducting research and scientific work on different species. Basic differences between in situ and ex situ conservation strategies are outlined in Table 2.1.

26

2 Germplasm Characterization: Utilizing the Underexploited Resources

Table 2.1 Major differences between in situ and ex situ conservation S. no. In situ conservation 1. It is conservation of endangered species in their natural habitats 2. The endangered species are protected from predators 3. The depleting resources are augmented 4. The population recovers in natural environment

Ex situ conservation It is conservation of endangered species outside their natural habitats The endangered species are protected from all adverse factors They are kept under human supervision and provided all the essentials Offspring produced in captive breeding are released in natural habitat for acclimatization

2.1.2 Conservation of Man-Made Plant Germplasm Besides the existence of natural plant germplasm, human activities in genetic improvement of crop cultivars also lead to evolve novel plant accessions, and they also deserve conservation as they possess new traits or phenotypes. Efficient crop- specific strategies and evolving method-specific strategies have been developed to conserve such germplasm, and they are explained hereunder.

2.1.2.1 Crop-Specific Germplasm Repository During the last six or seven decades, increases in crop yields are attributable to genetic improvements, which also led to varieties with better nutritive value and greater pest, disease, and stress resistance. The genes necessary for this crop improvement are contained in a broad array of plant materials, which includes pre- breeding materials, cultivars, and improved landraces. For example, rice has been cultivated in Asia since ancient times, and for generations, farmers have maintained thousands of different varieties and landraces. These landraces, together with the 22 pantropical wild species of Oryza, are the genetic bases for the rice breeding that sustain the productivity. Besides the landrace varieties and wild species already mentioned, the genetic resources of rice also encompass natural hybrids, commercial and obsolete varieties, breeding lines, and a range of different genetic stocks. Most countries in Asia especially China, India, Thailand, and Japan maintain large collections of rice germplasm. In Africa, there are significant collections in Nigeria and Madagascar, while in Latin America, the largest collections are in Brazil, Peru, Cuba, and Ecuador. All these collections conserve landrace varieties as well as breeding materials. Four centers of the Consultative Group on International Agricultural Research (CGIAR) viz., the International Rice Research Institute (IRRI) in the Philippines, the West Africa Rice Development Association (WARDA) in Côte d’Ivoire, the International Institute for Tropical Agriculture (IITA) in Nigeria (on behalf of WARDA), and the International Center for Tropical Agriculture (CIAT) in Colombia, also maintain rice collections. IRRI holds the largest collection; it is also the most genetically diverse and complete rice collection in the world. Although the

2.1 Types of Plant Germplasm: Natural Versus Man-Made

27

WARDA/IITA and CIAT collections do have some specific regional representation, they duplicate the germplasm conserved at IRRI; in addition, specific breeding materials are developed at these centers, and they are also being conserved. Similarly, crop-specific germplasm are also maintained in other crops such as maize (CIMMYT Maize Germplasm Bank, https://maize.org/4600-2/), wheat (https://www.k-state.edu/wgrc/), sorghum (https://www.icrisat.org/tag/germplasm/), tomato (Tomato Genetics Resource Center, https://tgrc.ucdavis.edu/), vegetable and fruit crops (https://iihr.res.in/division-plant-genetic-resources), etc.

2.1.2.2 Mutagenized Population Mutagenesis is the process that involves sudden heritable change(s) occurring in the genetic information of an organism that is induced by chemical, physical, or biological agents. Mutation breeding employs three types of mutagenesis. Hence, they are named as induced mutagenesis, in which mutations occur as a result of: 1 . Irradiation (gamma rays, X-rays, ion beam, etc.) 2. Treatment with chemical mutagens and site-directed mutagenesis, which is the process of creating a mutation at a defined site in a DNA molecule 3. Insertion mutagenesis, which is due to DNA insertions, either through genetic transformation and insertion of T-DNA or activation of transposable elements or through genome editing approaches The plant progenies that arose due to mutagenesis are referred to as mutagenized population. It encompasses several genetic variations and their corresponding traits when compared with the wild type that was originally employed in the generation of mutagenized population. Maintenance of such mutagenized population has great prospects in plant breeding since success stories in mutation breeding-derived crop varieties around the world highlight the potential of mutation breeding as a flexible and practicable approach. Several achievements in crop improvement through mutation breeding have resulted in two major outcomes: improved varieties that are directly used for commercial cultivation and new genetic stocks with improved characters or with better combining ability of traits. These traits include increased yield, enhanced nutritional quality, resistance to pest and disease, early maturity, drought and salt tolerance, etc. Although the development of new cultivars has been the primary objective of mutation breeding, the genetic stocks developed can have numerous applications in plant breeding, from being used as a donor parent in conventional breeding programs or as a parent in hybrid breeding programs. Apart from these, mutation research itself has also a very different objective, i.e., mapping of genes. The technique of identification of a gene by knockdown of the phenotypic expression through induced mutagenesis is a major component of research on molecular genetics and genomics today. Therefore, maintaining such huge set of mutagenized population in crops has several implications in modern plant breeding.

28

2 Germplasm Characterization: Utilizing the Underexploited Resources

For example, the FAO/IAEA Mutant Variety Database or MVD (www. https:// mvd.iaea.org/) collects information on plant mutant varieties (cultivars) released officially or commercially worldwide. Data on the mutagen and dose used, the characters improved, and agronomic data if available are among the information provided. The purpose of the database is to demonstrate the significance of mutation breeding as an efficient tool for preserving and enhancing global food security, to serve as a platform for breeders to showcase their varieties to a global audience, and to stimulate germplasm transfer for cultivation, breeding, or genomics studies. Besides, individual laboratories elsewhere in the world also maintain database on their mutagenized population. Examples for such resources include the GABI-Kat population of T-DNA mutagenized Arabidopsis thaliana lines with sequence- characterized insertion sites and TOMATOMA: A Novel Tomato Mutant Database Distributing Micro-Tom Mutant Collections (http://tomatoma.nbrp.jp/index.jsp).

2.1.2.3 Global Germplasm Resources Germplasm Collecting Missions Database of European Cooperative Program for Plant Genetic Resources (http://www.ecpgr.cgiar.org/resources/germplasm-databases/list-of-germplasm-databases/) lists out the following: 1. International multi-crop germplasm databases (Crop Wild Relatives Global Portal, SINGER, PGR Forum, GENESYS, Mansfeld’s World Database of Agriculture and Horticultural Crops, WIEWS, EU plant variety database) 2. National multi-crop germplasm databases (Australian Plant Genetic Resource Information Service (AusPGRIS); Austria, National Inventory of Austria; Bulgaria, National Seed Genebank; Czech Republic, Information System on Plant Genetic Resources (EVIGEZ); France, BRG-collections de ressources génétiques végétales; Germany, BIG-Flora, Zentralstelle für Agrardokumentation und -information (ZADI); Germany, Federal Research Centre for Cultivated Plants-Julius Kühn Institute; Germany, Leibniz Institute of Plant Genetics and Crop Plant Research (IPK); Italy, CRA-Consiglio per la Ricerca e Sperimentazione in Agricoltura, The Harold and Adele Lieberman Germplasm Bank, Institute for Cereal Crops Improvement (ICCI), Tel Aviv University, The George S. Wise Faculty of Life Sciences; New Zealand, Arable crop genebank and online database, New Zealand Institute for Crop and Food Research, Russian Federation-N.I. Vavilov All-Russian Scientific Research Institute of Plant Industry (VIR); Spain, INIA-Centro de Recursos Fiitogenéticos-Genebank; Sweden, Stored material at the Nordic Genebank; Switzerland, Conservation of PGRFA-Swiss National Database; the Netherlands, Centre for Genetic Resources (CGN); USA, National Plant Germplasm System) Besides, several mutant genetic resources are also available. For example, resources to find a variety of tomato mutants which include LycoTILL for Red Setter (http://www.agrobios.it/tilling/), “Genes that make tomatoes” for M82, Genes that Make Tomatoes (http://zamir.sgn.cornell.edu/mutants/), and TOMATOMA for Micro-Tom (http://tomatoma.nbrp.jp/indexAction.do) are publicly available.

2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic…

29

Such global resources provide information on all germplasm-collecting missions in the world and provide data on species name (as identified by the collector), the number of samples for each species, when the collecting mission took place, the country of collection, and whether the species was wild or cultivated. The database also maintains coded names of each institute that received the collected germplasm.

2.2

ermplasm Characterization: Phenotyping G for Morphological and Agronomic Characters

2.2.1 C onventional Methods of Phenotyping: Biotic and Abiotic Stress Resistance, Yield, and Quality Traits The most salient hurdle to the effective utilization of germplasm in the development of improved crop cultivars is the troubles in accurately phenotyping the germplasm. Combining precise phenotyping of germplasm with dissection of genetic and functional basis of yield and other agronomically and/or economically important traits under various biotic and abiotic stresses would give unprecedented ways to characterize the crop germplasm. Thus, precise phenotyping practice is the first key step, and its successful completion definitely would guarantee a better germplasm characterization. To this end, it is imperative to have knowledge on factors that affect the quality of phenotypic data, defining the nomenclature and mechanisms of crop productivity under different climatic and stress conditions. All these limiting factors should be addressed adequately for the target crop and trait. There is no general procedure that fits well to all the crops and for all the target traits. It definitely varies from crop to crop (and even within the species) and trait to trait. As an example, a detailed phenotyping procedure in rice for characterizing the germplasm for one of the most important abiotic stress, drought, is elucidated in subsequent sections. However, many of the concepts presented herein are equally useful to other crops too for drought resistance screening.

2.2.2 Recent Developments in Phenomics and Way Forward Phenotyping is necessary to improve the selection efficiency and reproducibility of results in modern plant breeding programs. The ability to accurately characterize hundreds of plant breeding materials at the same time is still challenging, and there is a phenotyping limitation that hampers progress in improving the efficiency of breeding. Therefore, reliable, automatic, and high-throughput phenotypic technologies are urgently needed that help them to define better strategies for genetic improvement of crop plants. During the past decade, plant phenomics has evolved from an emerging niche to a thriving research field. Phenomics is defined as the gathering of multidimensional

30

2 Germplasm Characterization: Utilizing the Underexploited Resources

phenotypic data at multiple levels from cell level, organ level, plant level, to population level. Zhao et al. 2019 provided detailed account of the recent developments in this area and outlined the challenges and prospective of crop phenomics in order to provide suggestions to develop new methods of mining genes associated with important agronomic traits and propose new intelligent solutions for precision breeding. It is necessary to build a multi-scale, multidimensional, and trans-regional crop phenotyping big database for precise identification of crop phenotypes. It is also important to formulate functional and structural plant modelling system based on phenomics and develop bioinformatics technologies that integrate genomes and phenotypes (discussed in Sects. 11.21 and 11.22). Especially developing various phenotyping strategies for crop morphology, structure and physiological data from cell to whole plant by developing high- throughput, high-resolution phenotyping platform that may combine laser optics and serial imaging with three-dimensional image reconstruction and quantification would have huge implications in the characterization of crop germplasm. To this end, the International Plant Phenotyping Network (IPPN) was initiated in 2016 by integrating world’s major plant phenotyping centers. The concept of next- generation phenotyping procedure has also proposed novel and efficient strategy to characterize crop germplasm besides its suggestion to efficiently link phenomics with high-resolution linkage mapping, genome-wide association studies, genomics models, etc. (Cobb et al. 2013). However, it should be noted that lowering the cost of platforms would significantly increase the scope of phenotypic research and advance the rapid expansion of phenotype–genotype analysis for complex traits. This is owing to the fact that most of the advanced phenomics platforms have high construction, operating, and maintenance costs. Because of this, most academic and research institutes do not have access to these facilities. Around the world, only few institutes have installed fully automated and high-throughput platforms (http://plantphenomics.hzau.edu.cn/, http://bri.caas.net.cn/, http://www.kenfeng.com/, and http://www.zealquest.com/). In recent years, manned aircraft and unmanned aerial vehicle remote sensing platforms (UAV-RSPs) are becoming a high-throughput tool for crop phenotyping in the field environment, which meet the demands of spatial, spectral, and temporal. For example, thermal sensors fitted to manned aircraft were used to measure canopy temperature. The sensors that UAV-RSPs carried typically included digital cameras, infrared thermal imagers, light detection and ranging (LIDAR), multispectral cameras, and hyperspectral sensors, which are applied to canopy surface modelling and crop biomass estimation based on visible imaging; crop physiological status monitor, such as chlorophyll fluorescence and N levels, based on visible–near-infrared spectroscopy and high-resolution hyperspectral imaging; plant water status detection based on thermal imaging; and crop fine-scale geometric traits analysis based on LIDAR point clouds (Zhao et al. 2019). Thus, combining ground-based platforms and aerial platforms for phenotyping offers flexibility. For example, the tractor-based proximal crop sensing platform can be attached with UAV-based platform and can be employed to characterize complex

2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic…

31

traits. Further, combining artificial intelligence and crop phenotype database will be more useful for characterizing such a large collection germplasm accessions, and the utility of such integration will be increasingly realized when it is added with genomic data used for QTL analysis (explained in Chap. 7). Therefore, we urgently need to integrate multi-domain, multi-level, and multi- scale phenotypic information to the latest achievements of artificial intelligence in depth learning, data fusion, hybrid intelligence, and swarm intelligence to develop big data management producers for supporting data integration and global sharing.

2.2.3 C ase Study in Rice Germplasm Characterization for Drought Resistance: Formation of the Fundamental Requirements It has long been realized that release of rice cultivars with enhanced resistance to drought conditions and with high yield stability is essential to ensure food security in the twenty-first century due to frequent occurrence and rigorousness of water stress around the world. Hence, we need to genetically tailor new cultivars that can withstand drought and its other closely related environmental constraints such as high temperature, salinity, and nutrient deficiency. In the past, traditional breeding strategies have shown several promising achievements. However, the progress has shown to be slow in several occasions mainly due to lack of knowledge on drought resistance mechanisms and their appropriate screening methods and strategies; poor heritability of traits under water stress in the field; lack of comprehensive interpretation of results at molecular, biochemical, physiological, genetical, and agronomical perspectives; etc. Hence, before proceeding further, it is important to set the scene on long-term and short-term objectives. As stated earlier, first we should describe the nomenclature and mechanism of expression of target trait. In general, the term “drought” is referred to in agriculture as a condition in which the amount of water available via rainfall and/or irrigation is insufficient to meet the transpiration needs of the crop. Plants adapt different mechanisms to withstand and mitigate the negative effects of such water deficit. There are traits that (i) help plants to survive under drought stress and (ii) mitigate yield losses in crops when exposed to a water stress. Therefore, it is essential to judge the overall phenotypic value of a given germplasm accession in terms of yield under water stress in the given environment. In other words, the knowledge generated by any drought-related study should address their impact on the yield and its component traits either directly or indirectly. Several out-and-out reviews and dedicated volumes and book chapters have address the mechanisms underlying drought resistance and breeding strategies that can improve yield under water stress (please see further readings). Provided below are the synopsis of this knowledge and its application in characterizing rice germplasm for drought resistance in a laboratory that has minimum facilities. To begin well, the first critical step is to define the environment to which the breeding program is targeted (referred to sometimes as target population of

32

2 Germplasm Characterization: Utilizing the Underexploited Resources

environments). Each crop is grown in a complex set of socio-physical and biological environments, and there is no one single environment even on the same farm. The identification and characterization of a target environment are facilitated by the use of historic records of weather data, cropping pattern followed during the past, etc. Simulation models can also be used to describe the target environment by the frequency of occurrence of water stress and based on the soil moisture profile. This helps to short-list the type (e.g., early/mid/terminal water stress), severity (e.g., mild/moderate/severe), and duration (e.g., short/long duration) of water stress in the given environment. This also helps to describe other associated stresses such as high temperature, dry and high wind speed, nutrient deficiency, etc. Another key point in characterizing the germplasm within the given environment is observation of genotype-by- environment interactions on expression of yield traits. This observation may include additional factors of the environment such as rainfall pattern; maximum and minimum temperature; relative humidity; soil physical (e.g., texture), chemical (e.g., presence of heavy metal or other toxic elements), and biological factors (e.g., beneficial and harmful microbial load); diseases (e.g., foliar diseases); pests/beneficial insects (e.g., pollinators); and parasites. Thus, it is nearly impossible to find a single environment that represents the target population of environments. An ideal strategy would be phenotyping for drought tolerance and yield stability across a broad range of sites within the given environment with at least three replications in Latin square design. The statistical method, Latin square design, effectively taking care of field heterogeneity. During the past decades, it has been repeatedly shown in several crops that multi-environment trials are instrumental in increasing yield potential under drought. Thus, it is essential to define the set of environments, fields, and seasons in which the given germplasm entry is expected to do well before beginning the genetic mapping and MAS. Further, recent developments in DNA sequencing strategies provide a novel way of utilizing the available germplasm. For example, completion of resequencing of 3000 rice genebank accessions (http://iric.irri.org/resources/3000-genomes-project) has revealed the allelic diversity of multiple rice genomes. The sequence data provides a “digital library” of the 3000 accessions, thus presenting opportunities to identify natural genetic variations in rice. The published SNP database has enabled the community to identify rice germplasm carrying favorable alleles simply by data mining. However, the use of landraces for breeding is often hampered by linkage drag. To make use of the diversity, an efficient mating design is needed to break the undesirable linkages and to convert landraces to breeding-ready genetic resources. Multi-parent Advanced Generation Inter-Cross (MAGIC) is such a breeding method to produce highly recombined germplasm, and it is explained in Sect. 3.17.2.

2.2.4 Traits Useful for Germplasm Characterization in Rice Considering the fact how farmers ultimately harvest grain in rice, it is vital to interpret the cause and effect relationships (usually with correlation studies) between

2.2 Germplasm Characterization: Phenotyping for Morphological and Agronomic…

33

morpho-physio-agronomical traits and grain yield (or other economic traits in case of other crops) under drought conditions. It should be noted that the sign and magnitude of this relationship are not universal and can change widely according to frequency, timing, and intensity of water stress periods. Thus, the traits that are potential in characterizing rice germplasm for improving yield under water-limited conditions should be genetically (i.e., causally) correlated with yield and preferably would have higher heritability than yield (see Sect. 6.9.2 for heritability calculation). Presence of sufficient genetic variability and lack of yield penalties under favorable conditions are considered as additional features of these traits. Ideally, measurement of such trait(s) must be non-destructive (i.e., use of small number of plants or plant samples), rapid (e.g., without using lengthy procedures to calibrate sensors to individual plants), accurate, and inexpensive and finally should provide long-term ecophysiological performance of the crop. Such traits should be cheaper and easier to measure than grain yield under stress. The reader could now realize the difficulty in identifying such potential trait since there is no single trait that can satisfy all the above-said requirements. Very often, experiments are lost due to pest or erratic weather damage before recording the final yield. In such conditions, these traits are useful. Based on the peer-reviewed literature, carefully tested under different experimental procedures and personal experience, the following traits are listed as potential candidates for characterizing rice germplasm. As a caution, it should be noted that these traits are not final and they are not suitable for all the water-limited environments. Readers are requested to finalize the traits based on the target environment, breeding objective, etc. However, the concept and procedure of characterizing the plant germplasm described here are the same for all the plants. By ensuring random representative plants are selected for measurement of traits in the each plot, sampling bias can be avoided. Again it is highlighted that the secondary traits (other than the grain yield) should always be associated (good statistical correlations) with yield and it is essential in depicting any final conclusion on the germplasm characterization. Early Vigor Several physiological and biochemical studies have shown that selection of germplasm accessions, which shown early and vigorous establishment, allow the stored water available for later developmental stages when soil moisture becomes progressively exhausted and increasingly limiting for yield. On the other hand, excessively vigorous leaf development could cause early depletion of soil moisture. Thus, the optimal degree of vigor should be selected, and besides genetic potential, it also depends on the characteristics of the given environment. Keeping all these in mind, the rice germplasm should be screened for each accession to count the number of days required to germinate and develop a particular leaf area under field conditions.

34

2 Germplasm Characterization: Utilizing the Underexploited Resources

Flowering Time Another critical factor that optimizes adaptation (and produces better yield) under low water availability is flowering time. It was established in almost all the crops that there is positive association between yield and flowering time across different levels of water availability. Days to achieve 50% flowering can be phenotyped quite easily and effectively under both irrigated control and water-stressed experimental conditions, and it can be used as a valuable trait for drought tolerance breeding program. Flowering delay (=days to flowering under stress conditions—days to flowering under irrigated control) could serve as a potential additional trait to the 50% flowering. Chlorophyll Concentration, Leaf Rolling, and Leaf Drying The traits that have been phenotyped to indirectly estimate photosynthetic potential (a critical element that decides final yield) are chlorophyll concentration, leaf rolling, and leaf drying, all of which are interconnected. Total and individual components of chlorophylls and chlorophyll stability index can be measured both under normal and water-stressed conditions. Similarly, leaf rolling and drying scores need to be phenotyped by essentially following the procedures around mid-day. Grain Yield The main objective of drought tolerance breeding program is to develop a variety that produces higher yield when compared to currently available varieties in the given environment under the types of drought stress that occur most frequently. Further, if water stress does not occur in some years, that variety should also produce high yields in the absence of stress. Thus in farmers’ viewpoint, a drought- tolerant variety is the one that produces higher yield relative to other cultivars under drought stress and produces sustainable yield under normal conditions. Hence, all the protocols and strategies that focus on breeding for drought tolerance should be designed in this light. To increase the efficiency of direct selection for yield, it is essential to ensure that i) the testing environment is the true representation of the target environments, ii) large numbers of germplasm entries (usually >500) are screened in order to increase the selection intensity, iii) uniform management of drought stress across the trials, sites and seasons with reasonable levels of replications (it was noticed that increasing the number of locations is more effective than increasing the number of replications within the location) and iv) use of best experimental design to address the field variation. The traits mentioned above are very far from being exhaustive. Therefore, the use of the above-said and other traits as selection criteria for yield should be exercised cautiously and only after defining the target environment. Irrespective of the procedures used and experimental designs employed, each phenotyping score might have a specific story, and hence, results should be inferred accordingly in characterizing the germplasm. Availability of a good record of meteorological parameters (rainfall, temperatures, wind, evapotranspiration, light intensity, and relative humidity) allows meaningful interpretation of the results. Collection of meaningful phenotypic data in field experiments greatly depends on experimental design,

2.3 Allele Mining

35

heterogeneity of experimental conditions between and within experimental units, size of the experimental unit and number of replicates, number of sampled plants within each experimental unit, and genotype x environment x management interactions. Further variations due to phenology (duration for each developmental phase) and other environmental stresses should also be considered while evaluating the germplasm. Poor attention on these factors may lead to erroneous conclusions, particularly in terms of interpreting cause and effect relationships between yield and drought tolerance traits.

2.3

Allele Mining

2.3.1 Allele Mining: Basic Considerations Allele mining refers to the identification of naturally occurring allelic variation at agronomically important genetic loci (otherwise called as genes or regulatory genomic regions). This can be performed by using a variety of approaches including mutant screening, QTL and AB-QTL analysis, association mapping, genome-wide survey for the signature of artificial selection, etc. (each method is described in details in subsequent chapters). Though several methods have been described, efficient extraction and exploitation of the adaptive variation and valuable traits present in the germplasm are yet to be uncovered. For example, several traditional and improved cultivars from drought- prone areas have some tolerance to reproductive stage drought stress, but they have rarely been used in molecular breeding program. A more extensive survey of these germplasm may lead to the identification of new germplasm entries carrying superior alleles for agronomic and economic crop traits. Such unique alleles can be integrated into molecular crop breeding program that aimed to combat pest and diseases; to promote yield, quality, or nutritional properties; or to improve abiotic stress tolerance. Thus, the successful allele mining procedure is highly dependent on the use of diverse germplasm collections, especially that rich in wild species. This is because the majority of allelic variation at the gene(s) of interest is largely assumed to occur in the wild relatives of a crop (i.e., not in the cultivating crop varieties) due to the unavoidable loss of variation during the domestication process. Several efforts have been made to identify useful new alleles that are present in the wild gene pool in almost all the crop plants. Despite those efforts, unfortunately, entire germplasm entries have not yet been efficiently characterized for their novel phenotypes due to several challenges including lack of resources for evaluating huge collections. Alternatively, core collection of germplasm has been proposed as materials for allele mining. A representative subset of the complete collection of germplasm that has been optimized to contain maximal diversity in a minimal number of accessions is referred to as core collection.

36

2 Germplasm Characterization: Utilizing the Underexploited Resources

Thus, while maintaining maximum allelic diversity at loci controlling traits of interest, core collections help in integration of novel useful alleles into molecular or conventional breeding programs by reducing the number of accessions. This will lead to the development of broad and diversified elite breeding lines with superior yield and enhanced adaptation to diverse environments. Best core collections can be constituted by assembling a wide range of evidence on diversity and subsequently sampling those accessions that are representative of this diversity. One such simple generic factor is geographic origin. Conventional accessions from different parts of the world usually have had an independent history of domestication for thousands of years and are therefore likely to show differences across the genome. Construction of such core collection can discover at least the majority of new alleles in a relatively small number of accessions. On the other side, one key factor to be remembered at this time is even a carefully constructed core collection will not allow to discover the complete list of alleles in all possible combinations. Hence, it is essential to screen the whole germplasm. When cheaper and faster technologies for allele mining are developed, this effort would not be a titanic task. To this end, large-scale genome sequencing projects and functional genomics efforts on several major food crops provide a directory of all the genes in the given crop with their function. Though this information has been generated using the reference crop cultivar or accession, this can also be extended to other varieties/species too, in light of allele mining. This is possible because of genome synteny and gene(s) sequence conservation among the species. Several approaches have been designed to isolate novel alleles from the related species and genera using this sequence information, and it would result in direct access to key alleles conferring resistance to biotic stresses, tolerance to abiotic stresses, greater nutrient use efficiency, enhanced yield, and improved quality and nutrition. One among the technique, which employs simple polymerase chain reaction (PCR) strategy to isolate useful alleles from rice germplasm, has been given in Box 2.1 as an example.

2.3.2 Insertional Mutagenesis Alternate methods of identfication of novel alleles are breifly outlinesd in this and following sections. Insertional mutagenesis, using either transposable elements or T-DNAs from Agrobacterium tumefaciens, offers an excellent experimental strategy to investigate the genetic basis of plant growth, metabolism, and development. Especially, the linkage of an insertion element with a mutant phenotype of interest greatly facilitates the isolation of the wild-type gene. This strategy is further improved by the incorporation of promoter traps or enhancer traps into the insertion elements, which can act as functional tags of regulatory elements, associated with genes in the host genome. This can improve further the efficiency of screening for target mutant phenotypes and may provide valuable markers for further analysis.

2.3 Allele Mining

37

Box 2.1 Rapid and Inexpensive Strategy for Allele Mining in Rice

There are >124,000 germplasm accessions/entries that were deposited at the International Rice Genebank, Philippines. Each genotype has ~50,000 estimated genes. Every gene has an unknown number of alleles, and each allele may change the way the rice adapts or grows or seems or tastes. Hence, understanding the function of each allele has utmost importance that decides future rice breeding. Publically available rice genome sequence database and physical map location of each rice gene (refer to the International Rice Genome Sequencing Project (IRGSP) home page at http://rgp.dna.affrc.go.jp/IRGSP/ download.html or Gramene at http://www.gramene.org/resources/, e.g.) form the base for allele mining. The first step in allele mining is deciding which part of the genome should be explored. In other words, allele mining can be conducted on specific genes that are involved in the particular mechanism of phenotypic trait expression. Usually allelic differences (also called as allelic polymorphism) will be a result of differences in intron and exon sequences or in the regulatory regions of the given gene. For example, the genes involved in abiotic stress tolerance (like genes code for heat shock proteins, transcription factors, late embryogenesis abundant proteins, etc.) can be fished out from the genome sequence, and primers that are specifically flanking the conserved genic regions can be designed. Primer3 is the most frequently used freely available online software (http://frodo.wi.mit.edu/) for primer designing. We need to paste the target sequence in FASTA format in the box provided, and by clicking the “PICK PRIMER” radio button, we can obtain appropriate primers that flank the target sequence. Since the selected genes are members of multi-gene family, the members may have conserved genic sequences. In general, member of multi-gene family dispersed around the genome or may have remained as tandem repeats within a single genetic locus. Thus, these primers can be used in PCR-based allele mining that provides an opportunity to test the evolutionary range over cultivated rice and its relatives. To increase the efficiency of identifying polymorphic alleles, it is better to design primers in the 5′ or 3′ untranslated regions of the selected genes since these DNA sequences have shown to have variation in multi-gene family when compared to coding sequences. Thus, it is important to target the conserved genic sequence while maintaining the genetic variation. Once the candidate gene(s) was explored, discovering new alleles for the selected candidate gene(s) should be performed with the germplasm collection. It should not start with the first accession and work through the collection since such effort would be inefficient, since the second accession might be similar to the first accession at the given loci. Hence, analyzing the second accession would not result in any additional information. Instead, we need to employ a subset of highly distinctive accessions, viz., core collections (see above). (continued)

38

2 Germplasm Characterization: Utilizing the Underexploited Resources

Box 2.1 (continued)

The amplified PCR product using the primers designed with the above-said principle represents either entire allele or functional component of the allele (i.e., depending on the primer designing strategy that have employed). If it is a component of the gene, the full length gene should be amplified with the same strategy explained above. The identified polymorphic allele needs to be sequenced, and at the end of this experiment, we could identify, isolate, and characterize the novel alleles of genes that are candidates for the target trait (in this case, it is abiotic stress tolerance). Since we do have data on field- based phenotyping of the given rice germplasm, we can group those accessions that are having similar alleles and tolerance level, and it is simple to proceed further in characterizing the key biochemical and physiological mechanisms of tolerance using the functional genomics tools. The strategy that associates alleles or genomic regions to the given phenotype using linkage disequilibrium or association mapping is described separately in detail (see Chap. 7). Briefly, association mapping assumes that an allele responsible for the expression of a phenotype, along with the markers that flank the allelic locus, will be inherited as a block. Hence, use of such flanking markers or allelic sequence itself as a marker will predict the performance of a progeny that express the favorable phenotype. Thus, upon complete characterization of these alleles, molecular backcross breeding strategy can be employed to transfer this useful allele into elite variety. Development of such new combination of useful alleles from different genes in different accessions will lead to breed for a novel variety that meets the farmer’s and consumer’s needs. However, this technique has some drawbacks: (i) lack of specificity during primer annealing may lead to amplification of non-specific PCR products, (ii) usually PCR will not be successful for those distantly related genera due to poor conservation of primer sequences, and (iii) when the length of gene sequence is beyond the limit of PCR, it would be difficult to proceed further for complete allelic characterization using this strategy; alternatively PCR walking would be useful in mining such alleles. The insertional mutagenesis is performed by using a DNA sequence (T-DNA, transposon, or retrotransposon) to mutate and tag the gene, which can be studied by observing the mutated site using the tag as an identifier. Furthermore, a PCR-based approach, site-selected insertion (SSI), is used for detection of mutations in known genes by insertional inactivation. In this technique, two primers are designed, one specific for sequences in the target gene and another for sequences in the transposon, and amplification products are observed for genes with an insertion. This technique was successfully used to detect the transpositions that occurred in polygalacturonase (PG) and dihydroflavonol 4-reductase (DHFR) genes by Ds1 elements.

2.3 Allele Mining

39

Antisense technology is performed to “knock down” the genes in order to study their function. It is performed in three different ways; one is by binding the single- stranded antisense complementary nucleic acid sequence to target sense mRNA to block its translation, while another is by binding catalytically active oligonucleotides like ribozymes which degrade specific RNA sequences. The third method is by RNA interference (RNAi) in which small interfering RNA (siRNA) helps in the cleavage of RNA by formation of RNA-induced silencing complex (RISC).

2.3.3 Genome Editing Tools and Induced Variations Naturally available beneficial alleles limit the effectiveness of conventional crop breeding, although new non-naturally occurring alleles can be generated by random mutagenesis using physical, chemical, and biological means (see above). Though physical and chemical mutagenesis typically generates a large number of random mutations throughout the genome, along with rare chromosomal rearrangements, and has produced over 3000 commercial varieties of food crops, the initial mutagenesis must be followed by the screening of large populations to identify mutants with desirable properties. Thus, this process is time-consuming and labor-intensive, especially for polyploid crops. An alternative to imprecise random mutagenesis methods, genome editing based on sequence-specific engineered endonucleases (SSNs), has recently emerged as an authoritative tool to promptly modify plant genomes in a precise and predictable way (Hua et al. 2019). A number of genome editing technologies have been developed, including mega-nucleases or homing endonucleases (HEs), zinc finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and type II clustered regularly interspaced short palindromic repeat (CRISPR)/CRISPR- associated9 (Cas9) endonuclease. These genome editing systems generate targeted DNA double-strand breaks (DSBs) in the genome, which are primarily repaired by either the non-homologous end-joining (NHEJ) pathway or the homology directed repair (HDR) pathway. The NHEJ pathway is normally exploited to incorporate frame-shift mutations in specific genomic loci, but it is error-prone because it typically introduces small InDels at the targeted site. The HDR pathway is a template-directed repair process that can be used along with an exogenous repair template to insert a custom sequence into the genome or to replace an existing genomic sequence. Apart from the direct applications of introducing genetic mutations and performing gene replacement, genome editing technology can also be used to modulate gene expression levels and modify the epigenome. When combined with conventional breeding, genome editing technology can accelerate the introduction of desired traits and greatly reduce costs. In addition, the genetic elements required for genome editing can be removed from the genome through genetic crosses or following segregation in the progeny, which makes genome-edited plants free of exogenous DNA. Since the first reports of successful application of genome editing technology in plants, research

40

2 Germplasm Characterization: Utilizing the Underexploited Resources

institutions and biotechnology companies worldwide have focused on its application for crop genetic improvement. To date, genome editing has been mostly applied to improve crop yield, quality, and stress resistance, but innovative applications are continually emerging as described by Hua et al. (2019).

2.3.4 TILLING, EcoTILLING, and Self-EcoTILLING It is also worth to mention here the role of EcoTILLING in allele mining. A variant of “targeting induced local lesions in genomes (TILLING),” known as EcoTILLING, was developed to identify multiple types of polymorphisms in germplasm collections or breeding materials (Comai et al. 2004). EcoTILLING allows characterization of natural alleles at a specific locus across several germplasm entries in a rapid and affordable way, and it is briefly explained hereunder. Like TILLING, EcoTILLING also relies on the enzymatic cleavage of heteroduplexed DNA (formed due to single nucleotide mismatch in sequence between reference and test genotype) with a single-strand-specific nuclease (i.e., Cel-1, mung bean nuclease, S1 nuclease, etc.) under specific conditions followed by detection through Li-Cor genotypers (Li-Cor, USA). At point mutations, there will be a cleavage by the nuclease to produce two cleaved products whose sizes will be equal to the size of full length product. The presence, type, and location of point mutation or SNP will be confirmed by sequencing the amplicon from the test genotype that carries the mutation. Although TILLING and EcoTILLING were proposed as cost-effective approaches for haplotyping and SNP discovery, these techniques require more sophistication and involve several steps starting from making DNA pools of reference and test genotypes, specific conditions for efficient cleavage by nuclease, detection in polyacrylamide gels using Li-Cor genotyper, and confirming through sequencing. Alternatively, a rapid and cost-effective method for detecting novel allelic variants of known candidate gene on agarose gels and its utility in candidate gene mapping have been described (Raghavan et al. 2007). We can anticipate that the cost of EcoTILLING can be significantly reduced by the adoption of such innovative strategies for allele mining. Recently next-generation sequencing (and sometime coupled with degradome sequencing) has further simplified such efforts by providing the full length sequence of the target genes (see below). As another strategy, Wang et al. 2008 reported a modified EcoTILLING method for the discovery of mutations in multi-gene family, which was coined as “Self- EcoTILLING” by using an allo-tetraploid Monochoria vaginalis ALS multi-gene family as an example. The mutations could be detected by TILLING of PCR products resulting from the primers specific to both Als1 and Als3 without involving the experimental step of mixture of reference and query DNA. Either of the two co- amplified loci could serve as reference DNA to the other. Thus, they demonstrated with this example that Self-EcoTILLING is a fast, reliable, and economical

2.3 Allele Mining

41

Table 2.2 Characteristics of two major approaches that used in allele mining Features/ characteristics Technical expertise Complexity Efficiency

DNA pooling Utility Cost per data point Time Throughput

EcoTILLING Requires high technical expertise starting from DNA pooling to detection of cleavage of heteroduplexes More Less efficient due to high chances of false positives, non-specific cleavage, and chances of non-detection in DNA pools Required to increase the efficiency Proposed as effective in detection of SNPs rather than InDels Comparatively high as it needs the intervention of confirmatory sequencing step Requires more time especially for sample preparation Associated complexity reduces the throughput and only less samples can be processed

Sequence-based allele mining Require less expertise with direct sequencing of PCR products Less Highly efficient since it involves one-step direct identification of sequence variations Not required Effective in detection of any type of nucleotide polymorphism Comparatively less cost is involved Comparatively less time is required Throughput and sample size increase with massively parallel sequencing platforms

technique of detecting single nucleotide mutations in polyploid plants containing multi-gene family. Another major approach for the identification of sequence polymorphisms for a given gene in the naturally occurring populations is sequencing-based allele mining. It involves amplification of alleles in diverse genotypes through PCR followed by identification of nucleotide variation by DNA sequencing. Sequencing-based allele mining would help to analyze individuals for haplotype structure and diversity to infer genetic association studies in plants. Unlike EcoTILLING, sequencing-based allele mining does not require much sophisticated equipment or involve tedious steps, but involves huge costs of sequencing. A comparison between these procedures is given in Table 2.2.

2.3.5 M utant-Assisted Gene Identification and Characterization (MAGIC) Mutant-Assisted Gene Identification and Characterization, abbreviated as MAGIC, which was proposed by Johal et al. (2008) is a gene-centered approach that uses Mendelian mutants or other genetic variants in a trait of interest as reporters to identify novel genes and variants for that trait. Readers are requested to note that it is a molecular technique to characterize the mutant; there is also another terminology MAGIC, which stands for Multi-parent Advanced Generation Inter-Cross, that is a mapping population type that is described in detail in Sect. 3.17.2.

42

2 Germplasm Characterization: Utilizing the Underexploited Resources

Johal et al. (2008) have coined the name MAGIC (Mutant-Assisted Gene Identification and Characterization) for a broad and deliberate mutant enhancer– suppressor analysis based on natural variation. MAGIC is involved in the identification and characterization of genetic modifiers and QTLs in highly diverse germplasm, and it is proposed as a method of gene discovery by assessing and surveying useful variation for a given trait in diverse germplasm. This approach can be used with both dominant and recessive mutations and can detect all classes of modifiers/ QTLs: dominant and recessive, major and minor, additive and non-additive, and epistatic and hypostatic alleles. The protocol for MAGIC will differ slightly depending on whether the initial, “reporter” mutation is recessive or dominant or partially dominant. If the mutation is recessive, it is crossed with diverse germplasm lines, and the resulting F1 hybrids are self-pollinated to generate F2 populations. About 50 plants from each of these F2 populations are then phenotyped for transgressive changes in the expression of the reporter mutant. The diverse parent of any F2 populations in which the phenotype of the mutant has been either enhanced or suppressed is thus identified as having genes/QTLs relevant to the trait of interest, with the initial mutation serving as the readout for these QTLs/modifiers. With 50 F2 progeny, enough phenotypic variation should be present to detect enhanced and/or suppressed classes of transgressive changes in the mutant phenotype. Any one of the two routes can be followed at this point, depending on what resources have been developed. For most systems, more progeny from any appropriate F2 population can be planted to identify at least 100 plants with the mutant phenotype. These mutants can then be phenotyped using a quantitative measure of the overall severity of the mutant phenotype and then genotyped. The resulting data from these mutant progenies can be assessed for QTLs to identify and localize relevant loci. Even though this approach requires extensive genotyping, it is only on lines that are already shown to segregate for modifiers of interest and should produce data sufficient to narrow the position of the modifier to a manageable region. Extreme F2 mutant segregants with enhanced or suppressed phenotypes are also of interest because they have pyramided most modifier/QTL alleles that impact the trait of interest either negatively or positively. In fact, this idea has been used to enhance the performance of two maize mutants, shrunken2 (sh2) and opaque2 (o2), respectively, for commercial production of super-sweet corn and high-quality protein maize (Prasanna et al. 2001; Tracy 1997). Major application of MAGIC is that it provides a simple genetic strategy to quickly survey the diverse and alien germplasm for useful genes or alleles in a directed fashion. In addition, this approach has tremendous potential for helping to define genes and genetic networks underlying any trait quickly, and it underscores the value both in defining diverse inbred germplasm and in developing and characterizing RILs from that germplasm. The genetic resources to do both these things are, at least for many crop and model species, readily available. Once in hand, analysis becomes as straightforward as phenotyping a trait of interest and then taking advantage of a commonly available genotyping data set.

2.3 Allele Mining

43

On the other hand, this method also is having the query that can a single mutation affecting a trait of interest unveil all the QTLs that might impact the same trait? It is unlikely that MAGIC will reveal entire genetic networks in one experiment; however, using new variants that emerge from a MAGIC screen as the starting material for additional MAGIC screens can be employed as a rapid way to fill in gaps in genetic networks. The mutants needed to conduct MAGIC screens can come directly from the large mutant collections available for many crop species. Additional transposon mutagenesis and TILLING projects that are underway for many crop species will serve as an excellent source for more such mutants, especially when they are in uniform, well- characterized backgrounds. Transgenic dominant negative mutant lines (such as RNAi-mediated transgenic plants) or overexpression mutants will also be particularly valuable starting material, revealing QTLs for the trait of interest in a single generation, just as dominant genetic mutations do. MAGIC can also be used on cytoplasmic traits by identifying modifiers/QTLs of these traits that are nuclear genes.

2.3.6 A llele Mining: Challenges and Troubleshooting Perspectives Undoubtedly every evidence stresses the point that the vast plant germplasm need to be revisited for novel alleles to further enhance the genetic potential of crop cultivars for several agronomic traits and categorize the germplasm entries for their conservation. Novel allelic diversity exists in the crop species owing to the mutation especially in the coding and/or regulatory regions that govern the agronomically important traits. Besides realizing the fact that the propagation of these altered/ novel alleles (either through natural or artificial selection) depends upon their role in the plant’s adaptation, it can be suggested that such altered/novel alleles of key genes are preserved in many related wild species, landraces, and cultivated varieties of a crop species. Therefore, exploring the natural variation in important genes is highly indispensable in modern crop improvement programs. The prime challenge in designing efficient procedure for allele mining is construction of core or mini-core germplasm collections by selection of appropriate genotypes. Handling the entire germplasm is a tedious task, be it for conventional plant breeding or for allele mining, and hence requires sampling strategies to narrow it down to a manageable size while maintaining variability. Development of core and mini-core collections out of the entire collection has been proposed to simplify the procedure of conservation of germplasm resources and effectively utilize the existing variation in gene banks. A core collection is a subset of accessions from the entire collection that capture most of the available genetic diversity of the species. For each crop, the first step consists of gathering information on various existing collections (the composite set) and applying a simple rationale for extracting a representative sample (the core sample). For defining core collections, several parameters pertaining to

44

2 Germplasm Characterization: Utilizing the Underexploited Resources

geographical, agronomical, and botanical descriptors have been used. The size of the core samples depends heavily on the global amount of resources available in the collections and may range from several hundred accessions to a maximum of 3000 for the most important crops. Generally, 10% of the crop accessions constitute the core collection representing the variability of the entire collection. However, when the size of whole collection is too large and a core collection (i.e., 10% of entire collection) becomes unmanageable, a mini-core (core of core) collection (10% of the core or 1% of the entire collection) may be constructed. Mini-core collections are developed by using the qualitative parameters to develop trait-specific subsets and by characterizing the genetic diversity of each core sample using molecular markers to reveal the structure of its diversity (refer to Sect. 2.4 for details). Molecular characterization of mini-core and trait-specific subsets was suggested to further reveal the genetic usefulness of the germplasm accessions in allele mining. The data generated from morphological and molecular characterization studies could be used to define the genetic structure of the global composite collection and to finally select a reference sample of approximately 300 accessions (depending on the size and genetic diversity of the core collection of that particular crop) representing the maximum diversity for the purpose of identification and isolation of allelic variants of candidate genes associated with beneficial traits. Therefore, it may be crystalized that the mini-core collections or the reference sets (subset of mini-core) could constitute the starting materials for dissecting the naturally occurring allelic variation at candidate genes through allele mining efforts. Another major challenge faced toward this end is that majority of the studies focus on the identification of SNPs or InDels (refer to Sect. 4.10 for details) at coding sequences or exons of the gene. However, several studies indicated that the nucleotide changes in noncoding regions (5′ UTR) including promoter, introns, and 3′ UTR also have significant effect on transcript synthesis and accumulation, which in turn alter the trait expression. Further, mining for sequence variation especially in regulatory region of genes such as promoter elements (referred to as promoter mining) may play a key role in gene regulation, and any changes in their sequences will dramatically influence gene expression resulting in variable trait expression. Such differential gene expression is usually due to sequence polymorphism in cis-acting/trans-acting regulatory elements binding sites present in upstream region. In addition, in silico analysis of promoter mining should focused on the identification of nucleotide variation in transcription factor binding motif (TFBM), number/ frequency, and location of regulatory element binding sites in promoter regions and their corresponding transcription factors that are having specific expression pattern in response to constitutive, developmental, tissue-specific, hormonal, and environmental regulation. Such effort will pave a way for taking informative decision while designing robust molecular breeding program. A key procedure in allele mining is the identification of polymorphism by comparative analysis of sequences of various genotypes. Several software tools are available for handling the complex nucleotide data, prediction of putative functional

2.4 Genetic Diversity and Clustering

45

or structural components of complex macromolecules, prediction of transcription factor binding sites, identification of sequence polymorphisms, and prediction of the amino acid changes which are responsible for changes in encoded protein structure and/or function. Besides, a consensus in fixing the size of the region for mining promoter and regulatory elements and characterization of regulatory elements (cisand trans-acting factors, environmental effect on the expression, etc.) should also be taken cared of while performing allele mining. A brief account of the bioinformatics tools used for allele mining analysis is given in Table 2.3. It should be noted that each algorithm is developed in different set of default parameters and the users should aware of these parameters that best suit to their objectives. Further, it should also be noted that the computational-based predictions depend on the pre-designed algorithms which are largely developed based on the pre-characterized sequences available in the databases. Hence, such predictions should always be validated through systematic wet-lab experimental approaches. Despite these challenges, significant contributions of allele mining in incorporating novel alleles from germplasm resources to the elite cultivars have been successfully shown. For example, allele mining is extensively used in genetic improvement of rice (Box 2.2). Similarly, Shi et al. (2008) have successfully developed allelespecific molecular markers based on the identified functional nucleotide polymorphism in on badh2 gene (which is responsible for rice fragrance) for use in marker-assisted backcross breeding (MABC). If the trait is controlled by single allelic difference as stated above, MABC can be executed. On the other hand, majority of the agronomically important traits are controlled by a suit of superior alleles, and hence, it has to be dealt with a different strategy which is explained in detail in Chap. 9.

2.4

Genetic Diversity and Clustering

Study of genetic diversity which exists in the germplasm is usually a collective process by which variation among individuals or groups of individuals is investigated. There are several methods and strategies available to study the germplasm in terms of genetic diversity which is essential to reveal the genetic relationships among the germplasm entries. Precise and dispassionate estimate of genetic relationship is dependent on sampling strategies, use of several data sets (and their pros and cons), selection of genetic distance estimate strategies, clustering procedures or other multivariate methods and their influence on genetic relationship estimation, etc. Thus, careful combinations of these features and use of appropriate statistical programs and strategies are keys in these data analyses (refer to Mohammadi and Prasanna 2003 for further details). In general, the data comprises numerical measurements and combinations of different types of variables. Further, pedigree data, passport data, morphological data, biochemical data, storage proteins data, and more recently DNA-based marker data

46

2 Germplasm Characterization: Utilizing the Underexploited Resources

Table 2.3 Selected list of bioinformatics tools useful for allele mining Name of the resource PowerCore PLACE (plant cis-acting regulatory DNA elements) PlantCARE

Application Supports to develop core and mini-core collection Database of transcription factor (TF) binding motifs

URL http://genebank.rda.go.kr/ powercore/ http://www.dna.affrc.go. jp/PLACE/index.html

Another plant cis-acting regulatory element database

http://bioinformatics.psb. ugent.be/webtools/ plantcare/html http://www.generegulation.com/pub/ programs.html http://jaspar.genereg.net/ http://www1.spms.ntu. edu.sg/∼chenxin/ WAlignACE/ http://meme.nbcr.net/ meme4_1/cgibin/meme. cgi http://mendel.cs.rhul.ac. uk/mendel. php?topic=plantprom http://www.genomatix.de/ products/MatInspector/ http://www.epd.isb-sib.ch/ http://wwwmgs.bionet. nsc.ru/mgs/gnw/trrd/ http://www.ifti.org/ootfd/ http://arabidopsis.med. ohio-state.edu http://frodo.wi.mit.edu/ primer3/ www.mbio.ncsu.edu/ BioEdit/BioEdit.html www.ebi.ac.uk/Tools/ clustralw/ http://www.ncbi.nlm.nih. gov/genbank/ http://www.proweb.org/ coddle/coddle_ help.html

TRANSFAC

Database of TF and TF binding motifs

JASPAR W-AlignACE

Transcription factor binding site database Motif discovery tool

MEME

(Multiple EM for Motif Elicitation) Motif discovery tool

PlantProm DB

Plant promoter database

MatInspector

To predict TFBS and of promoter analysis

EPD TRRD ooTFD AGRIS

Eukaryotic promoter database Regulatory region database (description of regulatory elements and TFBS) Object-oriented transcription factor database TF and RE databases

Primer3

Primer design

BioEdit

Nucleotide sequence analysis

BLASTN and ClustalW2 GenBank

Sequence alignment

CODDLE

Codons Optimized to Discover Deleterious LEsions; it has been designed for use in TILLING analysis Project Aligned Related Sequences and Evaluate SNPs can be used to display the locations of the polymorphisms in a gene/ genes in a graphical format

PARSESNP

Gene sequences in target and other crops

http://www.proweb.org/ parsesnp/

(continued)

2.4 Genetic Diversity and Clustering

47

Table 2.3 (continued) Name of the resource FastPCR

DnaSP

MEGA

FrameWorker

DiAlign TF: Multiple alignment plus TF sites

Application Software for PCR primers or probes design and in silico PCR, oligonucleotide assembly and analyses, and alignment and repeat searching for a single sequence or for comparisons of two sequences DNA Sequence Polymorphism analysis, to identify SNPs and InDels and phylogeny analysis based on the dissimilarity of DNA sequences Molecular Evolutionary Genetics Analysis and Sequence Alignment. An integrated tool for conducting sequence alignment, inferring phylogenetic trees, estimating divergence times, mining online databases, estimating rates of molecular evolution, inferring ancestral sequences, and testing evolutionary hypotheses A complex software tool that allows users to extract a common framework of elements from a set of DNA sequences. Motif discovery tool DiAlign TF displays transcription factor (TF) binding site matches within a multiple alignment. Database of promoter analysis

URL http://en.bio-soft.net/pcr/ FastPCR.html

http://www.ub.edu/dnasp

http://www.megasoftware. net/

http://www.genomatix.de/ online_help/help_gems/ FrameWorker.html http://www.genomatix.de/ online_help/help_dialign/ dialign_TF.html

Box 2.2 Employing Advances in Allele Mining in Germplasm Characterization for Rice Genetic Improvement

The main aim of this additional information is to highlight the contributions of the genome sequencing of core rice germplasm accessions (comprising 3010 accessions) to the exploration of rice diversity for crop improvement program. The information provided here is the excerpt from Leung et al. (2015) which illustrated the extraction of disease resistance traits from useful 3000 sequenced rice germplasm accessions. Through a collaboration among the Chinese Academy of Agricultural Sciences, the Beijing Genomics Institute-Shenzhen (BGI-Shenzhen), and IRRI, 3000 germplasm accessions have been sequenced at an average depth of 14X. The sequence data of the 3000 genomes were aligned with the reference genome Nipponbare. This resulted in the identification of approximately 20 million SNPs. The large amount of data is organized into a SNP-Seek (continued)

48

2 Germplasm Characterization: Utilizing the Underexploited Resources

Box 2.2 (continued)

database (www.oryzasnp.org/iric-portal). The SNP-Seek database provides a user-friendly resource to explore the genetic diversity of a large collection of germplasm accessions. Through this database, one can query by SNP haplotypes, germplasm accession names, passport data, and basic agronomic data. The challenge ahead is associating phenotypes with the sequenced accessions. To facilitate global collaboration and effective use of the new genomic resources, the International Rice Informatics Consortium (IRIC) in 2013 (http://iric.irri.org) provided one such comprehensive database. The objectives of the IRIC are to (1) organize available genotyping, phenotyping, expression, and other available data for rice germplasm into a linked, consistent, and reliable source of information for the global research community; (2) provide user-friendly access to browse, search, and analyze the data through a single portal; and (3) support information sharing, public awareness, and capacity building. Using these resources, several disease resistance novel genes were identified from the 3000 genomes and employed in the breeding program. They are briefly described hereunder. Rice blast caused by Magnaporthe oryzae is a perpetual problem of rice production. As of now, more than 25 blast resistance genes (Pi) have been documented using different cloning methods. These genes are distributed within 16 loci, suggesting that Pi genes from different donors tend to be located within the same genomic locus, for example, Pi2/Pi9/Piz-t, Pi5/Pii, and Pik/Pi1/Pikm/Pikh/Pik. Except for pi21, Pid2, and Pi54, most Pi genes exclusively encode proteins or their variants containing nucleotide binding site (NBS) and leucine-rich repeat (LRR) domains, suggesting that the NBS- LRR gene family constitutes the main reservoir of Pi genes in rice. Most of these NBS-LRR gene loci consist of multiple gene members, representing a complex genomic structure commonly conserved in other plant species. It is worth noting that the Pi alleles in the same locus from different donor plants are located either in the same genomic position (orthologs) or in different positions (paralogs). By comparing the differences in sequence and structure in resistant and susceptible haplotypes, blast R-gene loci can be grouped into two types. A type I locus refers to the one in which high sequence similarity and conserved genomic organization are maintained between resistant and susceptible haplotypes. For this type, the differentiation between R and S alleles is primarily caused by localized mutations, including nucleotide substitutions, small insertions/deletions (InDels), and insertion of transposable elements. In contrast, a type II locus refers to one in which genomic organization or sequence similarity or both in resistant haplotypes are significantly different from that (continued)

2.4 Genetic Diversity and Clustering

49

Box 2.2 (continued)

in susceptible haplotypes. It is therefore more feasible to develop R-gene- specific markers to distinguish functional from non-functional alleles for a type II locus than for a type I locus. Analyses of cultivar-specific sequences that are not in the reference genome revealed the presence of NBS-LRR-coding sequences in some of the 3000 genomes, suggesting that type II R-gene loci are frequently distributed in rice genomes. Functional characterization of these novel type II R-gene loci could help to identify more functional R genes. Therefore, an understanding of the evolutionary differentiation of resistance genes is important for exploring additional diversity in the rice gene pool. On the other hand, such analysis also found some rare alleles in the rice germplasm. For example, examination of the 3000 rice accessions for the SNP associated with rice yellow mottle virus (RYMV) resistance in Gigante showed that none of the 3000 O. sativa accessions carry the SNP associated with RYMV resistance, suggesting that RYMV resistance is an extremely rare trait in O. sativa. All resistance alleles of RYMV1 and RYMV2 were found in several accessions of O. glaberrima, except one resistance allele of RYMV1 from indica rice cultivar Gigante. Similarly, among the 12 pvr2-eIF4E (a gene that codes for eukaryotic translation initiation factor 4E (eIF4E) that act as an essential determinant in the outcome of potyvirus infection) resistance alleles sequenced in the pepper gene pool, 3 were shown to have a complementary effect with pvr6-eIF(iso)4E for resistance (Rubio et al. 2009). Two amino acid changes were exclusively shared by these three alleles and were systematically associated with a second amino acid change, suggesting that these substitutions are associated with resistance expression. The availability of new resistant allele combinations increases the possibility for the durable deployment of resistance against this pepper virus, which is prevalent in Africa. are being used to reliably estimate the genetic relationship in crop plants. Depending on the objective of the experiment, the level of resolution required, availability of resources and infrastructure facilities, and operational, cost, and time constraints decide the selection of data sets, and each data provides a specific type of information. When we use the molecular data, genetic distance or similarity or relationship among individuals of the given germplasm is usually calculated as a quantitative measure that differentiates the two individuals at sequence or allelic frequency level. In a wide range of genetic distance measurement methods available, use of any one of the methods is highly decided by the selection of software tool we employ for the analysis. Among the genetic distance measurement methods, modified Roger’s distance (GDMR) is the most frequently used measure.

50

2 Germplasm Characterization: Utilizing the Underexploited Resources

There are several constraints while employing the data for the analysis of genetic distance. One most frequently occurring problem is use of molecular marker data. When certain genotypes did not show any amplification for some marker alleles, it is often difficult to assume whether such lack of amplification is due to null alleles or failure in molecular experiment. In such cases (i.e., when we are not sure about the null status of a genotype at this specific marker locus), it should be considered as missing data during genetic distance measurements; otherwise, it will lead to erroneous inference. It should also be noted that use of dominant and co-dominant types of marker can also influence the genetic distance measurements due to unknown statistical distributions. To circumvent this limitation, several alternatives, including bootstrapping method, have been proposed in certain statistical software. When a scientist wishes to use more than one genetic distance measure to analyze the data set, it is essential to understand the correspondence between matrices derived from those measures. To reliably test this correspondence, a popularly known “Mantel test” can be engaged, and it has been widely followed in crop plants since different data sets were used for genetic diversity analysis. Resampling techniques such as “bootstrapping” and “jackknife” are used predominantly in the recent publications, particularly in relation to application of marker data in genetic diversity analysis. Especially, to find the smallest set of markers that can provide an accurate assessment of genetic relationships among the germplasm entries, resampling techniques have provided useful measures. The latest versions of statistical programs used in genetic diversity analysis have these features. Interpreting the resampling techniques is also simple. For example, a simple rule of thumb is that internal tree branches that have >70% bootstrap are likely to be correct at the 95% probability level. When sample sizes of germplasm increase, it is important to classify and order genetic variability among germplasm by using established multivariate statistical algorithms such as cluster analysis, principal component analysis, principal coordinate analysis, and multidimensional scaling. Interestingly multivariate analytical techniques simultaneously analyze multiple measurements on each individual of the germplasm and analyze the genetic diversity irrespective of the data set (i.e., morphological, biochemical, or molecular data can be used). This book has focused only on clustering method especially on salient statistical methodologies and other considerations with respect to this method, and they are outlined in Box 2.3.

2.4.1 Software for Genetic Diversity Analysis Numerous software programs are available for assessing genetic diversity, such as Arlequin, DnaSP, PowerMarker, MEGA2, PAUP, TFPGA, GDA, GENEPOP, NTSYS-pc, Structure, GeneStrut, POPGENE, MacClade, PHYLIP, SITES, ClustalW, and MALIGN. Most are freely available through the Internet. Many perform similar tasks, with the main differences being in the user interface, type of data input and output, and platform. Thus, choosing which to use depends heavily on individual preferences.

2.4 Genetic Diversity and Clustering

51

Box 2.3 Cluster Analysis

Cluster analysis refers to mathematically grouping (or clustering) the individuals of the germplasm based on their similar characteristics. Thus, individuals within the cluster show high internal homogeneity, and individuals between the cluster exhibit high external heterogeneity. Broadly, there are two types of clustering strategies. One is based on distance-based method (in which a pairwise distance matrix is used which leads to a graphical representation such as a tree or dendrogram) and another method is based on model-based methods such as parametric models (inferences on each cluster and their relationship are obtained by maximum likelihood or Bayesian methods). It has been established that the latter method is innovative and useful due to the constraints associated with the former method with respect to multi-locus genotypic data. However, at present the distance- based methods are most frequently used, and the step-by-step procedure for clustering analysis using this method is explained hereunder. Hierarchical and non-hierarchical methods are commonly used in distance- based clustering analysis, and hierarchical clustering methods are most commonly employed in analysis of genetic diversity in crop plants. These methods perform either by a series of successive merger (called as agglomerative hierarchical method) or successive divisions of group of individuals. The most similar individuals are first grouped, and these initial groups are merged according to their similarities. Among the various agglomerative hierarchical methods, unweighted pair group method using arithmetic averages (UPGMA) is the most commonly adopted clustering algorithm followed by Ward’s minimum variance method. For your information, the non-hierarchical clustering procedures do not involve in the construction of dendrogram, and hence, it can be done using statistical software such as SAS or SPSS. However, this method is not usually followed in crops primarily due to lack of prior information about the optimal number of clusters that are required for accurate assignment of individuals. Among the different types of clustering methods (such as UPGMA, UPGMC (unweighted pair group method using centroids), single linkage, complete linkage, and median), UPGMA dendrograms have been used extensively in the published reports since they provide consistency in grouping germplasm entries with relationships computed from different data types. However, despite some advantages in UPGMA, a single clustering method might not be useful or effective in uncovering genetic relationships, and it would be desirable to analyze the congruence among results obtained by different clustering procedures. The efficiency of different clustering algorithms can be estimated by calculating cophenetic correlation coefficient. It is a product moment correlation (continued)

52

2 Germplasm Characterization: Utilizing the Underexploited Resources

Box 2.3 (continued)

coefficient measuring agreement between the dissimilarity–similarity indicated by a phenogram–dendrogram as output analysis and the distance–similarity matrix as input of cluster analysis. Using this coefficient value, the degree of fit of the dendrogram can be subjectively fixed as 0.9 ≤ r, very good fit; 0.8 ≤ r 800 expressed sequence tag (EST)-containing BAC clones were sequenced to provide seed points from which to continue the whole-genome sequencing effort. Sites of potential sequence polymorphism within the initial BAC sequence data used to facilitate merger of the genetic and physical maps, while the resulting chromosome assignments are being used to guide the distribution of BACs to sequencing centers. A major focus of the genetic mapping effort is short tandem repeats, also known as simple sequence repeats (SSRs) or microsatellites (see Chap. 4). These repetitive sequences consist of direct tandem repeats of short (1–10 bp) nucleotide motifs. Unequal recombination between SSRs and slip-mispairing during DNA replication result in polymorphism rates that tend to be much greater than those observed for non-repetitive DNA sequences. The high rate of mutation combined with low selection coefficients on variant alleles results in extreme allelic diversity at microsatellite loci. Identification of SSRs in DNA sequence databases can be automated by use of public software programs, such as SSRIT (https://archive.gramene.org/db/markers/ssrtool). Moreover, because SSR alleles are typically co-dominant and their polymorphisms can be scored either in a simple agarose gel format or in high-throughput capillary arrays, they are frequently the molecular marker of choice for construction of genetic maps. Estimates suggest that 1–5% of plant ESTs contain SSRs longer than 18 nucleotides. Thus, development of EST–SSR markers has become commonplace in a wide variety of plant species. SSRs are even more abundant in the noncoding regions of genomic sequences, providing a rich source of genetic markers to map sequenced genome regions. In rice, for example, genomic SSR markers identified from BAC

208

5 Linkage Map Construction

sequences provided immediate links between genetic-, physical-, and sequence- based maps. Box 5.1 Linkage Map Construction Using MAPMAKER/EXP

Data file preparation The following is the excerpt from MAPMAKER/EXP tutorial. The very first line of your raw data file should read like data type xxxx

where xxxx is one of the allowed data types, either f2 f2 f3 ri ri

intercross backcross self self sib

The second line of the raw file should contain a list of three numbers, separated by spaces, such as 46 362 2

The first of these values indicates the number of progeny for which data are included in the file (in this case, 46). The second indicates the number of genetic loci for which data are supplied (362). The third indicates the number of quantitative traits in the data set (here 2, although this may be zero, of course). Additional information may be optionally supplied at the end of this line. In particular, you may specify the coding scheme you use for genotypes. By default, the codes used for F2 backcross (a.k.a. BC1) data are: 'A' 'H' '-'

Homozygote for the recurrent parent genotype. Heterozygote. Missing data for the individual at this locus.

For F2 intercross data, the default codes are: 'A' Homozygote for the allele from parental strain a of this locus. 'B' Homozygote for the allele from parental strain b of this locus. 'H' Heterozygote carrying both alleles a and b.

(continued)

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

209

Box 5.1 (continued) 'C' Not a homozygote for allele a (either bb or ab genotype.) 'D' Not a homozygote for allele b (either aa or ab genotype.) '-' Missing data for the individual at this locus

For RI data, the default codes are: 'A' 'B' '-' locus.

Homozygote for parental genotype a. Homozygote for parental genotype b. Missing data for the individual (or line) at this

Also by default, MAPMAKER will match genotype characters in a case- insensitive manner (that is “a” and “A” indicate the same genotypes). However, you can tell MAPMAKER to use whatever conventions you like, so long as you use the same conventions for the entire data file. First off, if you follow the numbers on the second line with the word “case,” then MAPMAKER will match genotype characters in a case-sensitive manner (that is “a” and “A” can be used to indicate different genotypes). For example, 46 362 2 case

If you do not wish to use case-sensitive genotypes, do not include the word “case.”

To specify the coding scheme itself, include on the end of the above line the word “symbols” followed by the coding scheme you wish to use, defined in terms of the coding scheme above. For example, if you wish to use the following scheme with an RI data set: '1' '2' '0' locus.

Homozygote for parental genotype a. Homozygote for parental genotype b. Missing data for the individual (or line) at this

then you would use a second line like 46 362 2 symbols 1=A 2=B 0=-

(continued)

210

5 Linkage Map Construction

Box 5.1 (continued)

Note that when interpreting this line, MAPMAKER is in fact quite finicky about spaces and case distinctions (in order to keep MAPMAKER from ever misunderstanding exactly what you mean). In particular, NO SPACES should surround the “=” signs. To use with a backcross data set the scheme: 'a' 'A' '-' locus.

Homozygote for parental genotype a. Heterozygote. Missing data for the individual (or line) at this

you should use a line like 46 362 2 case symbols a=A A=H

The main restriction on coding schemes is that the only allowed symbols are letters, numbers, and the characters “−” and “+.” After the first two header lines, the raw file should then present the genetic locus data, in the following simple format: for each locus, you list (1) the name of the locus, preceded by an asterisk (“∗”); (2) one or more spaces (or tabs etc.); and (3) the genotypic data for all individuals, in order. For example, ∗locus1

BA-HHHAAABBB-HHAA

would provide data for a locus named “locus1” with individual #1 having the B genotype, individual #2 having the A genotype, and so forth. Data for each new locus should begin on a new line (with blank lines allowed), although the genetic data for any one locus may be “broken” by any number of spaces, tabs, and line breaks. This means that, among other things, tab-delimited text files (such as those often exported by spreadsheet programs) will work well, for example,

B

∗L2 -

B H

A

-

H

H

H

A

A

A

B

B

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

211

There is a system-dependent maximum line length, although it is fairly large (at least 1000 characters, where a tab counts as one character). Locus names should be kept to at most eight characters, and must be limited to alphabetic and numeric characters, along with the underscore character (“_”) and periods (“.”). No other characters are allowed (although any dashes in locus names (“-“) will be converted to underscores). Locus names must start with an alphabetic character (so that they are not confused with locus numbers in MAPMAKER sequences). Any quantitative trait data should come after the genetic locus data. These data follow a similar format, except that the trait values for each individual must be separated by at least one space, tab, or line break. A dash (“-“) alone indicates missing data. For example, ∗weight 6.9 -

6.3 7.7 8.0 6.2 8.6 - 7.5 9.0 5.5 - - 8.4 7.7 7.4

would correspond to a trait named “weight,” for which individual #1 has a value of 6.3, individual #2 has a value of 7.7, and so on. The sixth individual is missing data for this trait (and will be ignored for all analyses involving these trait data). As for the genotypes, a new trait should begin on a new line, and line breaks are allowed. Tab-delimited text files work well here too. Traits may also be specified as functions of other existing trait data. For example, ∗weight1 6.3 7.7 8.0 6.2 ∗weight2 6.7 7.9 7.5 6.8 ∗mean= (weight1 + weight2)/2

8.6 8.0

6.9 7.3

7.5 7.5

9.0 9.5

The format of these equations is described under the “make trait” command. Such traits must be included in the number of traits indicated on the file’s second line. Note that genetic maps (particularly for MAPMAKER/QTL) are no longer included in the raw file, as they were with MAPMAKER v 2.0. Instead, use a “.prep” initialization file, described in MAPMAKER manual. Finally, note that comments may be inserted on any line starting with a number sign character (“#”). An example of a complete raw file follows: data type f2 intercross 20 5 2 # tiny data set for practical class demonstration ∗locus1 BBBHH-AAABBBHHH-AABA ∗locus2 AB-ABHABHAB-ABHABHBH ∗locus3 ABBAHHHBHABHABHBBHH# Locus3 may be mis-scored in individual 12!

212

5 Linkage Map Construction ∗locus4 ABHABAAAHAB-ABHABHHB ∗locus5 ABHABHAA-ABHABHAHHHB ∗trait1 6.3 7.7 8.0 6.2 8.8 6.2 8.7 9.0 5.2 6.8 7.2 7.1 ∗trait2 5.5 5.5 5.5 4.5 4.5 4.5 5.5 5.5 4.5 4.5 4.5 3.5

4.1 7.6 3.5 5.2

6.5 8.3 3.5 6.8

5.4 7.3 8.1 7.5 3.5 7.2 7.1

The MAPMAKER Data: How to Prepare and How Does It Look Like?

For example, if 500 recombinant inbred lines were scored for 200 SSR markers that were polymorphic to the parent A and B used in recombinant inbred line development, the data file can be prepared in the Microsoft Office Excel sheet in the following format: data type ri self 500 ∗ssr1 ∗ssr2 . . . ∗ssr200

200 A B

0 A B

B B

B –

A A

B B

A A

B B

… scoring up to 500th RILs … scoring up to 500th RILs

A

A

–

A

A

B

B

B

… scoring up to 500th RILs

Once the data file is prepared in the above said procedure in Office Excel, save this file as ∗.txt (Text tab delimited) kind of file type. Open the folder containing the above said ∗.txt file, and change the file extension as ∗.raw using Folder options. Important notes: 1. The “∗” indicates a file name of your interest. For example, the file name for the above said data is specified as RIL. If you could not find the file extension for the specified file name, click the folder options, click the “View” tab, and unclick the radio button “Hide extension for known file types.” By doing so, you can visualize the file extension in the folder for the specified file name—just change the file extension alone (i.e., RIL.txt is to be changed as RIL.raw). Running MAPMAKER Precisely how you should start MAPMAKER depends on your computer. It should be noted that MAPMAKER downloaded from http://www.broad.mit.edu/ftp/distribution/software/mapmaker3/ can be installed only in Windows XP or their previous operating system. It is not supported by other high-end operating systems such as Window Vista and Window 7. Just get into the MAPMAKER folder, and double- click the MAPMAKER icon to get into the command prompt. When MAPMAKER starts running, you will first see its start-up banner and a prompt “1 > “for the first command.

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

213

Command that should be typed into MAPMAKER is represented in the below procedure in bold italics, while MAPMAKER output is presented in regular type. The first step in almost every MAPMAKER session is to load a data file for analysis. If you are starting out an analysis on a new data set, or if you have modified the raw data in an existing data set, you will do this using MAPMAKER’s “prepare data” command. If instead you are resuming an analysis of a particular (unmodified) data set, you may use the “load data” command, which preserves many of the results from your previous session. If you are just starting out, use MAPMAKER’s “prepare data” command to load data file “RIL.raw.” From this file, MAPMAKER extracts: The type of cross, number of markers, and number of scored progeny The genotype for each marker in each individual (if available)

Other information may be present in the data files, such as quantitative trait data and pre-computed linkage results. These issues will be addressed later. Before performing any analyses of data set, first instruct MAPMAKER to save a transcript of this session in a text file for later reference. Using the “photo” command, a transcript named “RIL.out” is started. Note that if the file already exists, MAPMAKER appends new output to this file. The above said two commands are shown below as it looks in DOS window. ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ ∗ MAPMAKER/EXP ∗ ∗ (version 3.0b) ∗ ∗ ∗ ∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗ Type 'help' for help. Type 'about' for general information. 1> prepare RIL.raw preparing data from 'RIL.raw'... ri self data (500 individuals, 200 loci)... ok saving genotype data in file 'RIL.data'... ok 2> photo RIL.out 'photo' is on: file is 'RIL.out'

Finding Linkage Groups by Two-Point Linkage Initially begin the linkage map construction analysis by performing a classical “two-point” or pairwise, linkage analysis of data set. First, we need to tell MAPMAKER which loci we wish to consider in our two-point analysis. We do this using MAPMAKER’s “sequence” command (“seq” will also work). When you type something like 3> sequence 1 2 3

214

5 Linkage Map Construction

MAPMAKER is told which loci (and, in some cases, which orders of those loci) any following analysis commands should consider (e.g., SSR1, SSR2, SSR3). Since almost all of MAPMAKER’s analysis functions use the “current sequence” to indicate which loci they should consider, you will find that the “sequence” command must be entered before performing almost any analysis function. The sequence of loci in use remains unchanged until you again type the “sequence” command to change it. In this two-point analysis, we want to examine all the loci in our sample data set. Thus, we now type into MAPMAKER: 3> sequence 1 2 3 4 5 6 7 8 9 10 11 12 13 (OR) 3> sequence all

MAPMAKER gives each marker in the data file its own number; it does not work with “SSR1,” etc. If at any point you want to see the real name of the marker, use the “translate” command after specifying the “sequence” of those markers (e.g., seq 1 2 3, then translate or tra). Note that for two-point analysis, the order in which the loci are listed is unimportant. Alternatively, if you know the chromosomal location of each marker, you can specify only those marker numbers belonging to the given chromosome in the sequence command, and hence only those marker will be analyzed for their fitness into a single linkage group. For example, If SSR1 to SSR5 belong to chromosome 1, then the command to be used is 3> sequence 1 2 3 4 5

However, there are 200 markers in this data file, and suppose we don’t know the chromosomal position of each marker. If that is the case, this data set is too many to work with at once since doing all possible orders of all these markers at once would take a long time. The next step is instructing the program to divide the markers in the sequence into linkage groups, for this type MAPMAKER’s “group” command. To determine whether any two markers are linked, MAPMAKER calculates the maximum likelihood distance and corresponding LOD score between the two markers: if the LOD score is greater than some threshold, and if the distance is less than some other threshold, then the markers will be considered linked. By default, the LOD threshold is 3.0, and the distance threshold is 80 Haldane cM. For the purpose of finding linkage groups, MAPMAKER considers linkage transitive. That is, if marker A is linked to marker B, and if B is linked to C, then A, B, and C will be included in the same linkage group. It will be too complicated, if the above said data set is used in this analysis. In the below example, a simple data set is explained which contains 13 markers. As you can see, MAPMAKER has divided this 13 marker data set into two linkage groups, which is named “group1” and “group2,” and a list of unlinked markers (if there are no unlinked markers in the given data set, you may not find it). 4> group

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

215

Linkage Groups at min LOD 3.00, max Distance 80.0 group1= 1 2 3 5 7 group2= 4 6 8 9 10 11 12 unlinked 13

Exploring Map Orders by Hand To determine the most likely order of markers within a linkage group, we could imagine using the following simple procedure: for each possible order of that group, we calculate the maximum likelihood map (e.g., the distances between all markers given the data) and the corresponding map’s likelihood. We then compare these likelihoods and choose the most likely order as the answer. This type of exhaustive analysis may be performed using MAPMAKER’s “compare” command. In practice, however, this sort of “exhaustive” analysis is not practical for even medium- sized groups: a group of N markers has N!/2 possible orders, a number which becomes unwieldy (for most computers) when N gets to be between 6 and 10. In practice, one needs to order subsets of the linkage group and then overlap those subsets, mapping any remaining markers relative to those already mapped, a process which is illustrated in the next section. In the above example, since “group1” consists markers 1, 2, 3, 5, and 7, it is small enough to perform the fully exhaustive analysis. To do this, we first change MAPMAKER’s sequence to “{1 2 3 5 7}”. Here, the “{}” indicate that the order of the markers contained within them is unknown and thus that all possible orders need to be considered. We then type the “compare” command, instructing MAPMAKER to compute the maximum likelihood map for each specified order of markers and to report the orders sorted by the likelihoods of their maps. Please note the bracket type as other brackets have different meanings: [ ] mean markers within are at the same locus (so order doesn’t matter) and < > mean the order within is known but not the order of the group itself (could be the inverse order). 5> sequence {1 2 3 5 7} sequence #2= {1 2 3 5 7} 6> compare Best 20 orders: 1: 1 3 2 5 7 Like: 0.00 2: 3 1 2 5 7 Like: -6.00 3: 5 7 2 3 1 Like: -20.20 4: 5 7 2 1 3 Like: -26.26 5: 2 5 7 3 1 Like: -27.25 6: 2 5 7 1 3 Like: -28.39 7: 2 3 1 5 7 Like: -28.85 8: 5 2 3 1 7 Like: -32.33 9: 2 1 3 5 7 Like: -34.12 10: 5 7 1 3 2 Like: -35.55 11: 5 2 1 3 7 Like: -37.61

216 12: 1 3 5 13: 3 1 5 14: 5 7 3 15: 1 3 5 16: 3 1 5 17: 5 2 7 18: 5 2 7 19: 5 1 3 20: 2 5 3 order1 is

5 Linkage Map Construction 2 7 2 7 1 2 7 2 7 2 3 1 1 3 2 7 1 7 set

Like: Like: Like: Like: Like: Like: Like: Like: Like:

-37.76 -39.09 -40.38 -40.87 -41.55 -43.67 -44.78 -47.63 -52.28

Note that while MAPMAKER examines all 5!/2 possible orders, by default only the 20 most likely ones are reported. For each of these 20 orders, MAPMAKER displays the log-likelihood of that order relative to the best likelihood found. Thus the best order “1 3 2 5 7” is indicated as having a relative log-likelihood of 0.0. The second best order “3 1 2 5 7” is significantly less likely than the best, having a relative log-likelihood of -6.0. In other words, the best order of this group is supported by an odds ratio of roughly 1000,000:1 (10 to the 6th power to one), over any other order. We consider this good evidence that we have found the first order is the right order. Displaying a Genetic Map When we used the “compare” command previously, MAPMAKER calculated the map distances and log-likelihood for each of the 60 orders we were considering. The “compare” command however only reports the relative log-likelihoods and afterward forgets the map distances. To actually display the genetic distances, we must instead use the “map” command. Like “compare,” the “map” command instructs MAPMAKER to calculate the maximum likelihood map of each order specified by the current sequence. If the current sequence specifies more than one order (e.g., the sequence “{1 2 3 5 7}” specifies 60 orders), then the maps for all specified orders will be calculated and displayed. Because we found one order of this group to be much more likely than any other, we probably only care to see the map distances for this single order. First, we set MAPMAKER’s sequence, putting the markers in their best order and doing away with the set brackets. Next, we simply type “map” to display this order’s maximum likelihood map. As you can see, the distances between neighboring markers are displayed. Note, however, that these distances may be considerably different than the “two-point” distances between those markers: this is because MAPMAKER’s so-called multipoint analysis facility can take into account much more information, such as flanking marker genotypes and some amount of missing data. This is precisely the reason that we use multipoint analysis rather than two-point analysis to order markers: because more data is taken into account, you have a smaller chance of making a mistake. 7> sequence 1 3 2 5 7 sequence #3= 1 3 2 5 7

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

217

8> map ================================================================= ==== Markers Distance 1 SSR1 4.2 cM 3 SSR3 15.0 cM 2 SSR2 11.9 cM 5 SSR5 12.2 cM 7 SSR7 ---------43.2 cM 5 markers log-likelihood= -424.94 ================================================================= ===

Mapping a Slightly Larger Group As we mentioned earlier, exhaustive analyses of large linkage groups are not practical. Instead, to find a map order of a larger group, we need to find a subset of markers on which we can perform an exhaustive “compare” analysis. Thus, to map group2 (in the above example), we could pick a subset of its six markers at random, although we might do better if we pick markers which are likely to be ordered with high likelihood. Generally, this is true for sets of markers which have (i) as little missing data as possible and (ii) do not have many closely spaced markers. To quickly see how much data is available for the markers in the given group, we set MAPMAKER’s “sequence” appropriately and use MAPMAKER’s “list loci” command. MAPMAKER prints a list of loci, showing each marker by both its MAPMAKER-assigned number as well as its name in the data file. In the previous example, for each marker, MAPMAKER prints the number of informative progeny (out of the 500 in the data set) and the type of scoring. In this case all loci have been scored using “co-dominant” markers (e.g., SSR genotypes in a RILs), although clearly markers 4 and 6 are the least informative. To also look for markers which may be too close, we use MAPMAKER’s “lod table” command. MAPMAKER prints both the distance and LOD score between all pairs of markers in the current sequence. Unfortunately, the closest pair is separated by over 6.0 cM, a distance which should almost always be resolvable in a data set with so many informative meiosis. Given the results of these two analyses, a good subset to try might be 8 9 10 11 12

Note that the above two tests could have been automatically performed using MAPMAKER’s “suggest subset” command. 9> sequence 4 6 8 9 10 11 12 sequence #4= 4 6 8 9 10 11 12 10> list loci Linkage

218

5 Linkage Map Construction

Num Name Genotypes Group 4 SSR4 273 codom group2 6 SSR6 275 codom group2 8 SSR8 306 codom group2 9 SSR9 327 codom group2 10 SSR10 297 codom group2 11 SSR11 324 codom group2 12 SSR12 319 codom group2 11> lod table

Bottom number is LOD score, top number is centiMorgan distance: 4 6 8 9 10 11 6 63.1 3.33 8 16.8 56.0 39.06 4.33 9 56.3 17.8 54.8 6.77 36.70 7.68 10 106.3 27.7 - 43.3 0.89 22.51 15.08 11 14.9 74.0 6.3 65.4 43.78 2.20 80.87 5.76 12 28.2 43.1 18.4 24.1 89.1 30.1 22.24 9.13 39.84 32.39 2.22 23.90

As before (did with small linkage groups), we can also change MAPMAKER’s sequence to specify the subset we wish to test and then type the “compare” command. This time, the results are even more conclusive, with order1 is more likely than any other. The sequence of commands to be used here are 9> sequence {8 9 10 11 12} 10> compare 11> sequence order1 12> map

Note that this time we do this using a special shortcut, “order1” instead of specifying the marker sequence as shown in order1. This is to show that in both ways we can specify the markers to be analyzed by sequence command. To determine the map position of the remaining two markers in group 2, we will use the following procedure: starting with the known order of five markers, we will place the other two (one at a time) into every interval in this order and then recalculate the maximum likelihood map of each resulting six-marker order. In this analysis, MAPMAKER recalculates all recombination fractions for all intervals in each map (not just the ones involving the newly placed markers). This function is performed

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

219

by MAPMAKER’s “try” command. In its output, MAPMAKER again displays relative log-likelihood of each position for the inserted markers. The relative log- likelihood of 0 indicates the best position, while the negative log-likelihoods indicate the odd against placement in each other interval. 13> sequence {8 9 10 11 12} sequence #5= {8 9 10 11 12} 13> compare

Best 20 orders: 1: 11 8 12 9 10 Like: 0.00 2: 10 11 8 12 9 Like: -14.57 3: 8 11 12 9 10 Like: -15.23 4: 10 9 11 8 12 Like: -27.20 5: 11 8 12 10 9 Like: -29.97 6: 10 8 11 12 9 Like: -30.14 7: 9 10 11 8 12 Like: -32.23 8: 8 11 10 9 12 Like: -39.80 9: 10 9 8 11 12 Like: -39.91 10: 9 11 8 12 10 Like: -40.05 11: 11 8 10 9 12 Like: -40.25 12: 11 8 9 12 10 Like: -44.73 13: 8 11 12 10 9 Like: -45.21 14: 10 11 8 9 12 Like: -46.57 15: 8 11 9 12 10 Like: -47.46 16: 9 10 8 11 12 Like: -47.94 17: 10 8 11 9 12 Like: -49.61 18: 8 11 10 12 9 Like: -52.71 19: 9 8 11 12 10 Like: -52.74 20: 11 8 10 12 9 Like: -53.07 order1 is set 14> sequence order1 sequence #6= order1 15> try 4 6 4 6 --------------| 0.00 -42.68 | 11 | | |-35.57 -118.6 | 8 | | |-19.65 -70.19 | 12 | | |-46.80 -28.09 | 9 | |

220

5 Linkage Map Construction

|-51.35 0.00 | 10 | | |-43.40 -21.09 | |---------------| INF |-44.66 -45.03 | --------------BEST -619.33 -612.03

In this case, we see that marker 4 should be preferably placed before marker 11. “Inf” is the probability that a marker is anywhere else but not on this sequence. In the above test, we see that a log-likelihood of 44.66 supports linkage between four and the rest of the group. We also see that marker 6 strongly prefers to be in-between markers 9 and 10. Even the next most likely position for marker 6 is more than 10 to the 21.09th power times less likely. The “try” command not only tries to place markers in each interval in the framework but also tries to place each marker infinitely far away (i.e., forced 50% recombination between it and the framework). The relative log-likelihoods for this position are indicated following the “INF” entry in the MAPMAKER output. In the same way that a two-point LOD score indicates the odds of linkage between two loci when they are separated by their maximum likelihood distance, these relative log-likelihoods indicate the odds supporting linkage between one locus and a framework of loci when the locus is placed in its most likely position. As a last step, we now type the complete sequence for this group, adding markers 4 and 6 into their most likely positions. Then we type “map” to see the complete map of all markers in this group. 16> sequence 4 11 8 12 9 6 10 sequence #7= 4 11 8 12 9 6 10 17> map ================================================================= ======== Markers Distance 4 T24 14.8 cM 11 C15 6.4 cM 8 T125 18.9 cM 12 T71 24.0 cM 9 T83 18.1 cM 6 T209 28.6 cM 10 T17 ---------110.8 cM 7 markers log-likelihood= -688.99 ================================================================= ========

Likewise we need to continue this process for all the linkage groups. Note that sometimes, depending on the data file, a single chromosome may have more than one linkage group. However, when we add more markers in the data set to the

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

221

p articular chromosome, there is a possibility of finding single linkage group (i.e., the added markers merges the two or more linkage groups into a single linkage group). It is also important to note that this program compares combination of markers and gives the likelihoods of possible sequence orders. It does not tell you the “right” sequence; it will tell you the “most likely” order you must decide what LODs and cM distances you will accept; therefore, it can be highly subjective. Hence, most importantly, when you score the data, do not guess. When you make a mistake in scoring, it will look like a recombination has taken place. Therefore, missing data is better than a wrong data. MAPMAKER in Windows DOS can show the map distance; however, the graphical view of genetic map cannot be visualized in the Microsoft Windows operating system. MapChart is a specially designed Windows program that can produce the linkage map and QTL maps very easily. It is freely available at http://www.biometris.wur.nl/uk/Software/MapChart/. Alternatively, MapDraw can also be used for linkage map drawing, and it is available free of cost at http://www.nslij-genetics. org/soft/mapdraw.v2.2.xls. Tips to Improve Your Analysis 1. While you are using the “compare” command, recall that an LOD of 2 means one event is 100 times more likely, LOD 3 is 1000 times more likely, etc. A general guideline is that an LOD of 2 or 3 is conventionally acceptable. Suppose the first two orders have exactly the same likelihood, meaning that either order is equally as likely. However, if we look at the sequences, we can see that the only difference between the first two orders is that the order of two markers (say, e.g., SSR56 and SSR58) can’t be differentiated. The order of the other markers seems clearly to be, for example, SSR55 (either SSR56 or SSR58), SSR57, and SSR59. An educated guess would be that SSR56 and SSR58 are either at the same locus or tightly linked (with not enough recombinations to create a statistically significant order). We can check this by asking for a recombination difference between the two markers, using the map command. We can double-check our order by using ripple. This command assumes the general order is known but checks other possible orders within each group of three markers, moving down the given sequence. (Note that you would not want to use ripple for a completely unknown order as it only looks at three markers at a time. Further, when you specify the sequence command, omit {} or it will check all triplets of all possible combinations. 2. A map with 20 cM or more between markers might be questionable (remember, we don’t know a “sure order” just the most likely). 3. To make a complete map, you would need to keep going with this process until you had a full set of good linkage groups. There are many other commands you can try too, depending on your preferences. 4. You can probably see that there is no “right way” to use MAPMAKER. Instead of choosing some markers of Group 1 to compare, we could also have grouped again with more stringent LOD and cM levels, or we could have worked

222

5 Linkage Map Construction

b ackward by using the “first order” command to get an order and then pulled off markers that didn’t fit well. Likewise we can try several options, since it is a very iterative and somewhat subjective process. Readers are strongly recommended to read the MAPMAKER manual which is available at http://linkage.rockefeller. edu/soft/mapmaker/ before working with this program. Box 5.2 Linkage Map Construction Using AntMap

Locus ordering is an essential procedure in genome mapping. When the number of loci is large, it is quite difficult to determine the optimum order with an exhaustive search of all possible orders. The problem of searching for the optimum order has been recognized as a special case of the traveling salesman problem (TSP), i.e., given a set of cities and distances for each pair of them, find a round trip of minimal total length visiting each city exactly once. In recent years, ant colony optimization (ACO), which is a set of algorithms inspired by the behavior of real ant colonies, has been successfully used to solve discrete optimization problems, such as TSP. Iwata and Ninomiya (2006) developed a novel system based on ACO for locus ordering in genome mapping. Loci and absolute value of log likelihood (or recombination fraction) between loci were regarded as TSP cities and distance between cities, respectively. They tested the system using a simulated segregation population and found it is highly efficient for linkage grouping as well as locus ordering in genome mapping. To commoditize newly developed system, they developed software named AntMap for constructing linkage map by the system. AntMap performs segregation test, linkage grouping, and locus ordering and constructs a linkage map quite rapidly and nearly automatically. Rapidity of the algorithm based on ACO enables us to conduct a bootstrap test of estimated order. With the aid of this software, researchers can save their time and labor and can obtain a linkage map whose reliability is indicated by bootstrap values. Another advantage of AntMap is the fact that AntMap is open source (http://lbm.ab.a.u-tokyo. ac.jp/~iwata/antmap/); that is, source code and executable of AntMap are available under General Public License (GPL). Java and C++ objects that code our newly developed system will be utilized effectively for other applications as well as AntMap. Input File Format Input file format of AntMap is identical to ∗.raw files required by MAPMAKER (Lander et al. 1987). AntMap can analyze data derived from progeny of several types of crosses, including: 1. F2 intercross 2 . F2 backcross (e.g., BC1) 3. Recombinant inbred lines by self-mating 4. Doubled haploid lines (continued)

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

223

Box 5.2 (continued)

However, the current version of AntMap does not support two types of cross, F3 intercross by self-mating (f3 self) and recombination inbred lines by sib mating (ri sib), which are supported by MAPMAKER/EXP. Step-by-step procedure to be followed while using AntMap is clearly described in the AntMap Tutorial. The following are the excerpts from them: Step 0: Start AntMap. Start AntMap, in Windows operating system by double-clicking the “AntMap” icon. AntMap can also be executed by using the executable jar file “AntMap.jar” on any platforms (Linux, Solaris and Mac OS as well as Windows). Step 1: Open an input file. Open an input file in MAPMAKER format (∗.raw) through “File-Open” menu. After opening the file, contents of the file will appear in the “Data” panel. By clicking the “Log” tab, you can see a summary of the input data. Step 2: Segregation ratio test. Select “Segregation Test” from the “Analysis” menu. By doing so, you can see the results of segregation ratio tests in the “Result” panel. Step 3: Linkage grouping. Click the “Options” tab. Then you can see the “Grouping” option panel. You can choose one of the two grouping methods: “nearest neighboring locus” and “all combinations.” The former makes a group by sequentially combining a locus which shows the smallest recombination value against it. The latter will produce similar results with “group” command of MAPMAKER. You can also choose the grouping criterion, threshold value, and the minimum number of markers for a single group. Otherwise keep these options unchanged except for the threshold value. Select the “Linkage Grouping” from the “Analysis” Menu. Then you can see the results of linkage grouping in the “Result” panel. When you analyze your data, you may not be able to achieve a good separation of markers to linkage groups from the start. In such a case, please find a good set of the (continued)

224

5 Linkage Map Construction

Box 5.2 (continued)

threshold value, criterion, and method through trial-and-error strategy. It is better to organize your data according to chromosomes and then proceed separately for each chromosome. Step 4: Locus ordering and genetic map. Click the “Options” tab, and click the “Ordering” tab. Then you can see the “Ordering” option panel. In the locus ordering, you can choose one of the two criteria: “LL” and “SARF.” “LL” is an abbreviation for “Log Likelihood.” “SARF” is an abbreviation for “Sum of Adjacent Recombination Fractions.” AntMap will search a locus order which maximizes log-likelihood or minimizes “SARF.” You can also choose the number of runs of locus ordering. You can find the meaning of this option in the “AntMap Options” section of the AntMap use’s manual. A map function for calculating a map distance between adjacent markers can be selected from “Haldane” or “Kosambi” functions. Otherwise keep these options unchanged. Select the “Locus Ordering” from the “Analysis” menu. Then you can see the results of locus ordering in the “Result” panel. You can also obtain a graphic of linkage map in the “Map” panel. Step 5: One-step mapping. Select “Full Course” from the “Analysis” menu. This facilitates overall process from segregation ratio test (Step 2) to locus ordering (Step 4) at once. Step 6: Redraw a linkage map. Click the “Options” tab, and click the “Draw map” tab. Then you can see the “Draw map” option panel. You change the “Scale factor” option, and by doing so, drawing size of linkage map can be changed. After changing the option value, select “Redraw Map” from the “Analysis” menu. Then, you can obtain a modified linkage map than one obtained previously. Step 7: Bootstrap test for locus order. You can evaluate the reliability of estimated locus order by using bootstrap test. Bootstrap test (or bootstrapping) is a method for estimating the sampling distribution of an estimator by resampling with replacement from the original (continued)

5.13 Merging Linkage Maps to Cytogenetic Maps and Physical Maps: Genetic…

225

Box 5.2 (continued)

sample. In a bootstrap test, a random sample of size n is drawn from the original sample of size n, and estimates are obtained from the random sample. After repeating (iterating) this operation many times (e.g., 100–1000 times), the stability of estimates (e.g., standard error or confidence interval of estimators) is evaluated. In the bootstrap test for locus order, we can obtain probability that a locus is located at its estimated order. Click the “Options” tab, and click the “Ordering” tab. Then you can see the “Ordering” option panel. You can change the number of iterations (repeats) of bootstrapping. To get a good estimate of percentage of correct locus order, 100 may be sufficient. You can also choose a group which is targeted in the bootstrap test. Select the “Bootstrap Test” from the “Analysis” menu. Then you can see the results of bootstrap test for locus order in the “Result” panel. You can also obtain a graphic of linkage map with bootstrap values in the “Map” panel. The bootstrap test for all linkage groups may take long time even by high-end PC. Thus, you have better set your computer to perform this test at your lunch time or after going home. Step 8: Save results of linkage mapping. You can save information in “Result,” “Log,” and “Map” panels through the “Save” submenu in the “File” menu. The information in “Result” and “Log” is saved as a text file. The information in “Map” (i.e., a graphic of linkage map) is saved as a JPEG (∗.jpg) file.

Box 5.3: List of Software Available for Linkage Map Construction

A comprehensive list of computer software on genetic linkage analysis for human pedigree data, QTL analysis for animal/plant breeding data, genetic marker ordering, genetic association analysis, haplotype construction, pedigree drawing, and population genetics are listed out at http://linkage.rockefeller.edu/soft/list.html in alphabetical order. However, the following software are very often used by plant molecular breeders in genetic or linkage map construction: 1. MAPMAKER (http://www.broad.mit.edu/ftp/distribution/software/ mapmaker3/) 2. JoinMap (http://www.kyazma.nl/) 3. AntMap (http://cse.naro.affrc.go.jp/iwatah/antmap/index.html) (continued)

226

5 Linkage Map Construction

Box 5.3 (continued)

4. Map Manager QTX (http://www.mapmanager.org/) 5. QGene (http://www.qgene.org/) 6. R/QTL (http://www.rqtl.org) 7. MSTMAP (http://mstmap.org/) 8. CarthaGene (http://www.inra.fr/mia/T/CarthaGene/) 9. MadMapper (http://cgpdb.ucdavis.edu/XLinkage/MadMapper/) 10. THREaD Mapper (http://cbr.jic.ac.uk/dicks/software/threadmapper/ index.html) 11. ActionMap (http://moulon.inra.fr/~bioinfo/) 12. TetraploidMap (https://www.bioss.ac.uk/knowledge/tetraploidmap/) 13. Multipool (http://cgs.csail.mit.edu/multipool/) 14. MMAPPR: mutation mapping analysis pipeline for pooled RNA-seq (https://rdrr.io/github/jonathonthill/MMAPPR2/man/mmappr.html) 15. Next Generation Mapping (https://omictools.com/ngm-tool) 16. PolymapR and Tetraploid map – both can be used for polyploid mapping (https://cran.r-project.org/web/packages/polymapR/vignettes/Vignette_ polymapR.html; https://www.bioss.ac.uk/knowledge/tetraploidmap/) In practice, it is almost certainly best to use a mixture of approaches in developing and refining a map. This is not only because each one brings something unique to the analysis but also because we do not know which approach will succeed best for a new data set, and we do not know enough about the behavior of each tool to judge this in advance. It is strongly believed that map estimation is an iterative process, where researchers should first grasp the global pattern of their data set before revaluating and revising the grouping and ordering of markers rather that performing a rigid, linear three-stage methodology of grouping, ordering, and spacing.

Critical Thinking Questions 1. Why should recombination frequency not be directly used as a unit during genetic mapping process? 2. Why should different LOD thresholds be employed for different data sets? 3. There is no perfect genetic or linkage map exists in the plants. Why? 4. Recently physical mapping is more precise and rapid when compared with genetic mapping. Explain with examples.

Bibliography

227

Bibliography Literature Cited Bateson W, Saunders ER, Punnett RC (1905) Experimental studies in the physiology of heredity. Rep Evol Comm R Soc 2:1–131 Bovenhuis H, Meuwissen THE (1996) Detection and mapping of quantitative trait loci. Animal Genetics and Breeding Unit, UNE, Armidale, Australia. isbn: 186389-323-7 Bulmer MG (1971) The effect of selection on genetic variability. Am Nat 105:201 Haldane JBS, Smith CAB (1947) A new estimate of the linkage between the genes for colour- blindness and haemophilia in man. Ann Eugenics 14:10–31 Helentjaris T, Slocum M, Wright S, Schaefer A, Nienhuis J (1986) Construction of genetic linkage maps in maize and tomato using restriction fragment length polymorphisms. Theor Appl Genet 72:761–769 http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=genomes Iwata H, Ninomiya S (2006) AntMap: constructing genetic linkage maps using an ant colony optimization algorithm. Breed Sci 56:371–377 Kohel RJ, Richmond TR, Lewis CF (1970) Texas marker 1. Description of genetic standards for g. hirsutum L. Crop Sci 10:670–671 Lander ES, Green P, Abrahamson J, Barlow A, Daly M, Lincoln S, Newburg L (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics (in press) Morgan TH (1911) Random segregation versus coupling in Mendelian inheritance. Science 34:384 Morton NE (1955) Sequential tests for the detection of linkage. Am J Hum Genet 7:277–318 Stam P (1993) Construction of integrated genetic linkage maps by means of a new computer package: join map. Plant J 3(5):739–744 Sturtevant AH (1913) The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. J Exp Zool 14:43–59

Further Reading Bailey NTJ (1961) Introduction to the mathematical theory of genetic linkage. Oxford University Press, London Cheema J, Dicks J (2009) Computational approaches and software tools for genetic map estimation in plants. Brief Bioinform 10(6):595–608 http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=genomes McPeek MS (1996) An introduction to recombination and linkage analysis. http://www.stat.wisc. edu/courses/st992-newton/smmb/files/broman/mcpeek96.pdf Whitehouse HLK (1973) Towards an understanding of the mechanism of heredity. St. Martin’s Press, New York Wu R, Gallo-Meagher M, Littell RC, Zeng Z (2001) General Polyploid model for Analyzing gene segregation in outcrossing tetraploid species. Genetics 159:869–882

6

Phenotyping

Contents 6.1 6.2 6.3 6.4 6.5 6.6

Phenomics Forward and Reverse Phenomics Advances in Phenomics Phenotyping Versus QTL Mapping Need for Precise Phenotyping Phenotyping for Biotic Stress 6.6.1 Explaining the Concept with Case Studies 6.7 Phenotyping for Abiotic Stress 6.7.1 Explaining the Concept with Case Studies 6.8 Heritability of Phenotypes 6.9 Statistical Analysis of Phenotypic Data 6.9.1 Simple Statistics 6.9.2 Heritability Estimation 6.9.3 Correlation Analysis 6.10 Phenome-Wide Analysis in This Genomics Era: PheWAS Versus GWAS Critical Thinking Questions Bibliography

6.1

229 230 231 236 238 240 241 242 243 246 248 248 249 249 250 251 251

Phenomics

An important goal of plant biology is to understand phenotypic characteristics such as yield and its component traits, nutritional properties, pest, disease, and evolutionary fitness. Phenotypic variation is produced through a complex network of interactions between genotype and environment. However, establishing such a “genotype–phenotype” map is difficult to get without the detailed phenotypic data that allow the study of these multifaceted interactions. Despite this need, the general ability to characterize phenomes (which is defined as the full set of phenotypes of

© Springer Nature Singapore Pte Ltd. 2020 N. M. Boopathi, Genetic Mapping and Marker Assisted Selection, https://doi.org/10.1007/978-981-15-2949-8_6

229

230

6 Phenotyping

an individual and study of phenomes in this omics era is referred to as phenomics) delays our ability to characterize genomes. Therefore, phenomics should be recognized and pursued as an independent discipline to enable the development and adoption of high-throughput and high- dimensional phenotyping which would pave the way for efficient detection of genotype–phenotype interactions and quantitative trait loci (QTL) mapping (which is discussed in detail in Chap. 7).

6.2

Forward and Reverse Phenomics

Forward phenomics usually focuses on selecting and identifying the “best of the best” genotypes, whereas reverse phenomics dissects the “best” genotypes to discover why they are superior than the other genotypes. Thus, it can be resolved that forward phenomics provides immediate candidate germplasm for use in breeding while the reverse phenomics represents a longer-term strategy related to designing improved crop ideotypes. Forward phenomics involves the use of high-throughput, low-resolution phenotyping measurements, followed by low-throughput higher-resolution measurements to screen/sieve crop germplasm for different target traits (Mir et al. 2019). This method makes it possible to identify candidate lines with interesting traits at the seedling stage using just different screening tiers (progressing from less to more precise measurements). Forward phenomics therefore accelerates plant breeding by enabling the screening of large collections of germplasm using high-throughput effective phenotyping technologies including imaging technology. Trait phenotyping conducted in controlled environments (CE) can currently only deal with a small number of genotypes due to space limitations and the high costs involved. However, efforts have been made to reduce phenotyping costs when screening for sophisticated physiological and complex quantitative traits in across large germplasm sets in CEs (https://www. biocold.com/controlled-environment-rooms.html). Multi-tiered selection screens use an initial screen (less accurate or sophisticated) that enable thousands of genotypes to be screened, followed by subsequent tiers of more precise screens of fewer genotypes. It is also possible to combine first- tier screening for less expensive traits in the field, before second-tier screening for specific physiological traits in a CE. For example, it is possible to use thermal infrared detectors in the field to measure plant canopy temperature, which varies in response to several factors, including stomatal closure. This first screening can be used to identify candidates for a second, more detailed screening for stomata response in a CE. Traits amenable for first-tier field screening include plant height, canopy width, total leaf area, leaf number, and canopy shape, which are readily measured in high-throughput phenotyping systems and translate to indicators of early plant vigor and leaf area development. Multi-tier screening has

6.3 Advances in Phenomics

231

already been used to facilitate high-throughput phenotyping of more sophisticated traits in CE to identify candidate parental genotypes for soybean. Reverse phenomics utilizes a suite of new tools applied to a limited set of germplasm to elucidate common strategies responsible for stress tolerance or yield potential. It involves in-depth dissection of physiological traits down to the level of deducing underlying biochemical or biophysical processes. This helps to identify traits like yield components underpinning a superior crop variety through a hypothesis-driven (rather than a descriptive) screening approach. Thus, reverse phenomics can be used, for example, when the phenotype or target trait is already known and the goal is to determine the mechanism(s) responsible for controlling the trait and identify the responsible gene(s). Reverse phenomics tools and technology are already being used by the Australian Plant Phenomics Facility (http://www.plantphenomics.org.au/) and the Australian National University (http://www.garnetcommunity.org.uk/resources/phenomics). Both forward and reverse phenomics approaches are being used at CIMMYT in wheat to develop advanced lines with complementary physiological traits for drought tolerance as well as yield potential and heat tolerance. In summary, reverse phenomics is used to better understand the trait collections that contribute to yield, while forward phenomics is used to identify better sources of those traits (Mir et al. 2019).

6.3

Advances in Phenomics

Outstanding advances in “next-generation” DNA sequencing are rapidly reducing the costs of genotyping (see Sect. 11.9). On the other hand, plant phenotyping has shown slow improvement over the past 30 years, and obtaining sufficient, relevant phenotypic data on a single-plot or plant-by-plant basis remains problematic. Especially, complex traits such as abiotic stress tolerance and yield potential which have particular relevance for crop improvement and, ultimately, commercial production are still a nightmare to the researchers. Further, dissecting complex traits requires an examination of thousands of lines. Practical application through genomic selection or genome-wide association studies will similarly involve phenotyping thousands of genetically distinct lines (reference or association populations; see Chap. 9) grown in replication across multiple environments in order to assess differential expression of multiple genes (i.e., detection of genotype-by-environment interactions). Recognition of the limits of current approaches in phenomics has stimulated interest in high-throughput phenotyping methods that can be used to characterize large numbers of lines or individual plants accurately and that require a fraction of the time, cost, and labor of current techniques (White et al. 2012). Much of the discussion of phenotyping systems has focused on intensive measurement of individual plants using platforms that combine robotics and image analysis with controlled-environment systems.

232

6 Phenotyping

While acknowledging the value of these systems for certain targeted applications, the use of greenhouses and controlled environments to represent field environments has well-known limitations. Limited greenhouse space or chamber volumes often preclude allowing plants to flower and set seed, making it impossible to assess effects of stresses during reproductive growth. The soil volume that is provided for plants in controlled environments usually is far less than that available to plants in the field, affecting nutrient and water regimes and altering normal patterns of growth and development. Enclosed aerial environments are also problematic for characterizing responses relevant to field situations. In greenhouses and chambers, solar radiation, wind speed, and evaporation rates typically are lower than under open-air conditions. Mechanical vibration can induce physiological artifacts in plant growth. Not surprisingly, researchers focusing on demonstrable, field-level improvements in yield potential or abiotic stress tolerance favor field-based phenotyping. Field-based phenotyping (FBP) is increasingly recognized as the only approach capable of delivering the requisite throughput in terms of numbers of plants or populations, as well as an accurate description of trait expression in real-world cropping systems. However, to date, most field-based phenotyping systems have focused on rapid assessment of individual suites of traits such as vegetation indices or root morphology. Through use of vehicles carrying multiple sets of sensors, a FBP platform can transform the characterization of plant populations for genetic research and crop improvement. An example of FBP requirements for maize (Zea mays L.) will explain the fact (White et al. 2012 and references therein). The maize nested association mapping (NAM) population consists of 25 biparental crosses, each represented by 200 lines, giving a total of 5000 lines. Specialized experimental designs combined with spatial analysis permit 2 replicates, thus requiring 10,000 plots for a single treatment (e.g., well-watered or water-limited). Adding just 1 additional treatment doubles the count to 20,000 plots. Using single-row, 1-m-wide by 4-m-long plots and ignoring the need for walkways or borders, the net row-length would be 80 km (roughly 50 miles), occupying 8 ha (20 acres). A person walking 3 km h−1 would need about 27 h to visually score traits, assuming no stopping. Halting at each plot for 30 s (e.g., to measure leaf conductance or chlorophyll concentration) would require an additional 165 h. So without even considering direct applications in crop improvement, the need for high throughput is apparent. A FBP platform requires six components: 1 . Instruments for acquiring raw data from field plots 2. Physical systems for integrating different instruments including providing power, data logging or transmission, partial or complete shading, and protection from dust, vibration, and adverse weather 3. Vehicles for positioning the instrument rapidly and accurately in a field

6.3 Advances in Phenomics

233

4. High-throughput analytic capabilities to complement field measurements (e.g., of leaf or seed samples) 5. Software systems for managing and analyzing potentially large and complex datasets 6. Integrated management protocols to maximize reliability and efficiency of the phenotyping However, it should be noted that use of field-based systems does not exclude complementary phenotyping in controlled environments or rapid screening for specific traits such as shoot or root architecture. With respect to root traits, reliable techniques for screening large numbers of plants for root traits are still being developed including advancements in aeroponic, hydroponic, and agar plate systems. Coupled with digital cameras and image analysis software, these systems permit the rapid measurement of root numbers, length, and diameter in moderate (typically 20 cM). Thus, almost all of the studies have been reported the first scenario of QTLs detected for the first time and in which MAS was not expected to be of benefit compared with other field crops. However, marker-based tools have already been used in the management of breeding populations. These tools essentially use molecular marker polymorphisms and not their positions. One of these uses is fingerprinting: markers are used to assess or control the genetic identity of individuals. This is of great interest to check controlled crosses in an experimental design, to avoid mislabelling when handling large numbers of genotypes, or to monitor the deployment of improved material. Another use of marker polymorphism is paternity and maternity analyses.

12.4 MAS in Developing Countries Though there were successful examples in MAS shown in developed countries, the transfer and application of new plant biotechnologies to developing countries are recognized as a big challenge, and solutions can be found only through innovative partnerships and collaborations with advanced laboratories. Molecular breeding for polygenic traits has been successfully deployed in the multinational private sector, and several experts in the art see molecular plant breeding as the foundation for twenty-first-century crop improvement. Although the number of successful stories is increasing, it is fair to say that in today’s reality, MAS application for complex traits in breeding programs remains primarily limited to the private sector and is barely used in developing countries. Reasons for this situation in developing countries are shortage of well-trained personnel, inadequate access to high-throughput genotyping, inappropriate phenotyping infrastructure, unaffordable information systems and analysis tools, and the logistical difficulty of integrating new approaches with traditional breeding methodologies, including problems when scaling up from small to large breeding programs. Therefore, except for leading emerging economies, the capacity to conduct intensive research in plant biology and to support plant breeding remains rather limited in developing countries, and in some cases it has even decreased over the last decade. For example, although there has been a strong focus on agricultural development in Africa in recent years, many of the African breeding institutes, especially those in sub-Saharan Africa, remain dependent on international support for agricultural research. These needier institutes tend to be in countries whose population has a high proportion of resource-poor people; thus, building the capacities of breeding programs and seed systems in those countries is vital to achieving any improvement in the ability of poor farmers to grow improved varieties. In order to realize the full potential of marker technologies and bioinformatics in plant breeding, tools for molecular characterization, accurate phenotyping, efficient information systems, and effective data analysis must be integrated with breeding workflows managing pedigree, phenotypic, genotypic, and adaptation data into efficient information systems.

496

12 Forthcoming Perspectives in MAS

With all the progress achieved in marker technology, software development, analytical pipelines, and data management systems, it is time to provide an information system, available through a public platform that will offer breeding programs in developed and developing countries access to modern breeding technologies, in an integrated and configurable way, to boost crop quality and productivity. There are several constraints in developing countries that hamper the application of MAS. Some relate to access to information and publications. Others relate to data collection, management, and storage, such as availability of systems for reliable sample and data tracking. Very important are the scientific and technical concerns involved in adequate experimental design, precise and reliable trait phenotyping (i.e., dissection of complex traits), dependable marker validation, and advanced analytical methodologies and tools for accurate decision-making, among others. The main challenges hampering the potential of molecular breeding in developing countries encompass: 1. Human resources 2. Infrastructure capacity 3. Access to marker technologies 4. Availability of an efficient data management system Human capacity for molecular breeding technologies in developing countries is an ongoing challenge, and limitations include substandard agriculture programs at universities; difficulties in keeping up to date with relevant developments, including failures by others; poor technical skills in core disciplines; isolation as a result of insufficient peer critical mass in the workplace; and poor incentives to attract and retain scientists, resulting in brain drain and staff turnover. Fortunately, with the establishment of marker service laboratories and a clear change in mentality, breeders need to be trained on how to analyze the data not how to run marker genotyping, and there is general acceptance that large-scale genotyping activities are best outsourced, while nobody questions the basic local laboratories. For breeders to efficiently access relevant information generated by themselves and by other researchers, reliable data management (including sample tracking, data collection and storage, and modern analytical methodologies and tools for accurate decision-making, among others) is critical both within a given molecular breeding program and across programs. In view of this, it is essential that breeders manage pedigree, phenotypic, and genotypic information through common or mutually compatible crop information systems. However, amidst the challenges there are also actual and potential opportunities. Several of the constraints listed above, in particular access to marker technologies and limited data management systems, can be overcome through the establishment of crosscutting technology and service platforms, and several international initiatives are supporting the development of such platforms in tight collaboration with partners from developing countries.

12.5 Community Efforts in Developing Countries and Their Implications in MAS

497

To partially offset the undesirable trend of losing the “champions,” novel international initiatives such as the Alliance for a Green Revolution in Africa (AGRA) support high-quality education in the South, and although there is still a long way to go, governmental and institutional commitment is increasing for the adoption of biotechnologies in developing countries (Delannay et al. 2012).

12.5 C ommunity Efforts in Developing Countries and Their Implications in MAS The recent emergence of affordable large-scale marker technologies (e.g., Diversity Arrays Technology (DArT), SNPs), the sharp decline of sequencing costs boosting marker development based on sequence information, and the explicit efforts of national agricultural research programs (e.g., in India) and international initiatives such as generation challenge program (GCP) have all resulted in a large increase in the number of genomic resources available for less-studied crops. As a result, most key crops in developing countries now have adequate genomic resources for meaningful genetic studies and most MAS applications. In more recent times, the capacity of the national breeding institutes, in terms of their financial resources, infrastructure, and expertise, has evolved in a somewhat country- specific manner, reflecting the health of their domestic economies. Thus, capacity has degraded in some countries, while in others there have been major improvements, as evidenced by a change from requiring training and support from large international programs to becoming mutual partners in agricultural research. This is reflected in the sharp differences in capacity to conduct and apply biotechnological research in developing countries. Interestingly, newly industrialized countries, such as Brazil, China, India, Mexico, South Africa, and Thailand, substantially invest in technology and research and development (R&D) and are self-reliant in most aspects of marker technologies. These countries have the concomitant potential to effectively adopt, adapt, and apply information and communication technologies to enhance research efficiency and outputs. They are therefore naturally at the frontline in adopting molecular breeding technologies. These institutes are beginning to communicate with one another, as illustrated by the 2006 agreement between Brazil, China, and India to collaborate in the area of agriculture, including the exchange of genetic resources and joint efforts in plant biology and breeding. On the other hand, mid-level developing world economies such as Colombia, Indonesia, Kenya, Morocco, Uruguay, and Vietnam are well aware of MAS’s importance, and some effectively apply marker technologies for germplasm characterization and selection of major genes. These countries have a matching potential for a limited utilization of molecular breeding platforms, a potential that can be enhanced fairly rapidly in the medium to long term. In contrast, low-level developing world economies are struggling to sustain even basic conventional breeding.

498

12 Forthcoming Perspectives in MAS

They have very limited or no approaches to application of molecular breeding and are unlikely to adopt molecular breeding platforms except in the long term. Due to its ability to generate quickly and cost-effectively precise trait linkage information for specific regions of the genome, MAS is expected to improve the efficiency of crop breeding to progressively increase genetic gains by selecting and stacking with markers’ favorable alleles at target loci. Comparing the cost-effectiveness of MAS with phenotyping selection is not straightforward. Firstly, interlinked factors other than cost, such as trade-offs between time and money, are likely to play an important role in determining the choice of screening method. Secondly, the choice between MAS and conventional selection may be complicated by the fact that the two are rarely direct substitutes for one another or mutually exclusive, and in fact they are quite complementary under most breeding schemes. Where operating capital is not a limitation, MAS maximizes the net present value, and with the decrease in marker data point cost and increased access to marker service laboratories, marker-assisted breeding operating costs are shrinking, making this approach increasingly attractive from an economic perspective. Few economic analyses have been undertaken to assess the potential impacts of MAS. A famous example is definitely the impact of the submergence gene for rice in Asia. Among the few analyses available is an evaluation of the economic benefits of MAS to develop rice varieties with tolerance to salinity and P deficiency in Bangladesh, India, Indonesia, and the Philippines, since DNA molecular markers for these traits are available (see Chap. 10). Encompassing a broad set of economic parameters, the study concluded that MAS is estimated to save at least 2–3 years, resulting in significant incremental benefits in the range of USD 300–800 million, depending on the country, abiotic stress, and lag for conventional breeding. Another study estimates the benefits of using marker-assisted breeding, as compared with conventional breeding alone, in developing cassava varieties resistant to cassava mosaic disease, green mite, and whitefly and post-harvest physiological deterioration in Nigeria, Ghana, and Uganda. Marker-assisted breeding is estimated to save at least 4 years in the breeding cycle for varieties resistant to the pests and to result in incremental net benefits over 25 years in the range of USD 34–800 million depending on the country, the particular constraint, and various assumptions. The key technical constraint to the efficient management of crop information across the layers of implementation is standardization and consistency. At the crop level, the most important key to data integration is a community-accepted trait dictionary and ontology of traits of interest for each crop together with a set of effective protocols for their evaluation, including scales or units of measurements and data quality standards. Developing, maintaining, and supporting integrated breeding informatics applications are also critical. This would include the design of databases to manage crop information from any crop and the development of user applications to facilitate

12.6 Field and Laboratory Infrastructure Improvement

499

breeding processes. These would need to be configured to the best practices for each crop to provide common functionality under different community efforts.

12.6 Field and Laboratory Infrastructure Improvement Reliable phenotypic data are a must for high-quality genetic studies, and most developing countries lack suitable field infrastructure for proper trials and collection of accurate phenotypic data. Guidelines on best practice must be provided on how to design and run a trial and conduct precise phenotyping for genetic studies under different target environments. Improving access to homogeneous field areas and paying attention to good soil preparation and homogeneous sowing are critical. Until a few years ago, the major investment required to establish large-scale marker technology was considered a large impediment to the application of molecular breeding in developing countries. One of the challenges in conducting agronomic research in developing countries is that research stations are often underfunded and understaffed and do not have the resources necessary to establish and maintain the field environments appropriate for quality phenotyping. Even with the availability of the best genotyping resources, integrated molecular breeding programs will be doomed to failure in the absence of quality phenotypic data to support the proper identification of the main QTLs affecting key target traits. The ability to generate genotyping data has been one of the main stumbling blocks preventing wide utilization of markers in developing countries. Molecular markers rely on the availability of high-quality laboratories able to perform the necessary molecular biology operations. For simple sequence repeat (SSR) markers, these operations include at a minimum high-quality DNA extraction, polymerase chain reaction (PCR) amplification, gel electrophoresis, and gel scoring. Performing those operations requires well-trained technicians and the availability of well-equipped laboratories with stable electricity supply, reliable supply of clean water, room temperature and humidity control, and the scientific equipment necessary to perform those tasks. Refrigerators and freezers (regular freezers and −80 °C freezers) also need to be in operation on an uninterrupted basis to store temperature-sensitive reagents, primers, and DNA samples. Automatically triggered power generators need to be installed when a reliable electrical supply cannot be guaranteed. A first attempt to resolve this issue has been for donor organizations to fund the construction of genotyping laboratories in various places of the Third World. However, except for large, well-funded centers, this was often not successful because sustained resourcing was not available to hire qualified personnel and to purchase and maintain the necessary equipment and reagents. The logistics of reliably shipping perishable reagents to remote areas of the Third World is also often an obstacle. As a result, there are unfortunately a number of poorly equipped laboratories lying idle in some remote parts of Africa.

500

12 Forthcoming Perspectives in MAS

In spite of that, a few local centers, such as the National Root Crop Research Institute (NRCRI) in Umudike, Nigeria, have been successful in establishing low- throughput laboratories that can serve the basic genotyping needs of their breeders. An intermediate solution is to rely on regional hubs. Those hubs should be relatively well-funded and well-equipped laboratories that can handle primarily SSR genotyping for interested parties. Part of the IBP strategy is to rely on four hubs covering the needs of the Americas (Centro Internacional de Agricultura Tropical, CIAT, www.ciat.cgiar.org), Africa (BioSciences eastern and central Africa, BecA, http://hub.africabiosciences.org), South Asia (International Crops Research Institute for the Semi-Arid Tropics, ICRISAT, www.icrisat.org), and Southeast Asia (International Rice Research Institute, IRRI, www.irri.org). Those hubs will be able to provide basic genotyping needs and at the same time help train local scientists in the fundamentals of molecular breeding. Full integration of molecular markers into breeding programs will require the availability of high-throughput and low-cost genotyping platforms primarily based on SNPs. SNPs are the only marker type that can meet the long-term needs of integrated molecular breeding so that it can be widely applied in a cost-effective manner. However, high-throughput SNP genotyping requires the use of highly automated laboratories using an array of sophisticated equipment (pipetting robots, high- density PCR, high-throughput SNP detection machines, high-level informatics). Although large private seed companies have had the need and the resources to put in place large-scale genotyping laboratories for their own uses, smaller programs, especially in the public sector, have typically not had the resources or the justification to establish and maintain such large operations to meet their increasing needs for SNP genotyping data. In response to this need, a few private marker service laboratories have sprung up over the past few years. Those laboratories can provide complete genotyping services for their customers, from DNA extraction to generation of large numbers of SNP or other data points. Due to their broad customer base (from medical research laboratories to animal and plant breeding operations, both public and private), such laboratories can have the large volume of data point production that can lead to low costs to the customer and high throughput. They are able to invest in the most advanced equipment to keep up with the constant evolution of genotyping technologies and are able to pass on the resulting benefits to their customers. Processes have now been put in place for rapid shipment of dried leaf samples from any location (field or laboratory) around the world without the phytosanitary and similar restrictions that can affect the shipment of seed or other viable tissues. Contract genotyping is also generally exempt from Material Transfer Agreements (MTAs) and other intellectual property requirements because the material being sent is not viable and will not be used for any other purpose than the generation of genotyping data for the exclusive benefit of the customer. Examples of such companies that can service breeding programs from around the world are DNA LandMarks, Inc. of Saint-Jean-sur-Richelieu, Quebec, Canada

12.7 Genetic Mapping and MAS: Lessons Learned and Concluding Remarks

501

(http://www.dnalandmarks.ca/english), and KBiosciences Ltd. of Hoddesdon, UK (http://www.kbioscience.co.uk). This approach represents a very attractive solution for large-scale integration of markers into Third-World country breeding programs, as it does not necessitate any heavy capital investment and it completely removes the maintenance and equipment upgrade issues.

12.7 G enetic Mapping and MAS: Lessons Learned and Concluding Remarks Marker-assisted selection that complements regular conventional breeding program increases genetic gain per crop cycle, stacks favorable alleles at target loci, and reduces the number of selection cycles. In the last decade, the multinational private sector has benefitted immensely from MAS, which demonstrates its efficacy. In contrast, its adoption is still limited in the public sector, and it is hardly used in developing countries. Major bottlenecks in these countries include shortage of well-trained personnel, inadequate high-throughput capacity, poor phenotyping infrastructure, lack of information systems or adapted analysis tools, or simply resource-limited breeding programs. The emerging virtual platforms aided by the information and communication technology revolution will help to overcome some of these limitations by providing breeders with better access to genomic resources, advanced laboratory services, and robust analytical and data management tools. Apart from some advanced national agricultural research systems, the implementation of large-scale molecular breeding programs in developing countries will take time. However, the exponential development of genomic resources, including for less- studied crops, the ever-decreasing cost of marker technologies, and the emergence of platforms for accessing MAS tools and support services, plus the increasing public–private partnerships and needs-driven demand for improved varieties to counter the global food crisis, are all grounds to predict that MAS will have a significant impact on crop breeding in developing countries. These predictions are supported by some preliminary successful examples presented in Chap. 11. Advances in genomics research are generating new tools, such as functional molecular markers and informatics, as well as new knowledge about statistics and inheritance phenomena that could increase the efficiency and precision of crop improvement. In particular, the elucidation of the fundamental mechanisms of heterosis and epigenetics, and their manipulation, has great potential. Eventually, knowledge of the relative values of alleles at all loci segregating in a population could allow the breeder to design a genotype in silico and to practice whole genome selection for minor crops in developing countries. Considerable progress has been made building infrastructure for applying genomics approaches. These include one-dimensional genetic information (genome sequences), many ESTs, and gene knockout populations in several plant species of biological and agronomic importance. New knowledge and new tools are changing

502

12 Forthcoming Perspectives in MAS

the strategies used in crop plant research and will thus reduce the costs and increase the throughput of the assays. There is a continuing need to integrate disciplines such as structural genomics, transcriptomics, proteomics, and metabolomics with plant physiology and plant breeding. Bioinformatics is providing the means for integration and structured interrogation of datasets that will facilitate the cross-fertilization of disciplines. Genomics research has successfully unravelled various metabolic pathways and provided molecular markers for agronomic traits. However, the mechanisms of epigenetic phenomena are only beginning to be understood, and their potential role in crop improvement is unknown. Similarly, tantalizing bits of information concerning the possible basis of heterosis are gradually emerging. Eventual elucidation of the mechanism of heterosis might be one of the most important contributions of molecular genetics research to crop improvement. Ultimately, the goal of the breeder will be to assay the genetic makeup of individual plants rapidly and to select desirable genotypes in breeding populations. The construction of “graphical genotypes” of each plant or progeny row would allow the breeder to determine which chromosome sections are inherited from each parent to facilitate the selection process and perhaps to reduce the need for extensive field tests. A logical extension of whole genome selection for the breeder would be to design the superior genotypes in silico, an approach described as “breeding by design.” Thus, in the post-genomics era, high-throughput approaches combined with automation, increasing amounts of sequence data in the public domain, and enhanced bioinformatics techniques will contribute to genomics research for crop improvement. However, the costs of applying genomics strategies and tools are often more than is available in commercial or public breeding programs, particularly for crops that are only of regional importance. Newly developed genetic and genomics tools will enhance, but not replace, the conventional breeding and evaluation process. The ultimate test of the value of a genotype, generated through either conventional or MAS, is its performance in the target environment and acceptance by farmers and consumers.

Critical Thinking Questions 1. Why is it imperative to develop MAS strategies in underutilized and unexplored crops? 2. Advances in MAS have huge applications in vegetable crop improvement. Explain this with examples. 3. How MAS can enhance the efficiency of tree breeding? 4. Community effort or contract work in genotyping and phenotyping data analysis will be the order of the day in developing countries. Justify. 5. What are the minimum field laboratory infrastructures that require for MAS?

Bibliography

503

Bibliography Literature Cited Alcala J, Giovannoni JJ, Pike LM, Reddy AS (1997) Application of genetic bit analysis for allele selection in plant breeding. Mol Breed 3:495–502 Chagné D, Vanderzande S, Kirk C, Profitt N, Weskett R, Gardiner SE et al (2019) Validation of SNP markers for fruit quality and disease resistance loci in apple (Malus× domestica Borkh.) using the OpenArray® platform. Hortic Res 6(1):1–16 Cherif E, Zehdi S, Castillo K, Chabrillange N, Abdoulkader S, Pintaud JC (2013) Male-specific DNA markers provide genetic evidence of an XY chromosome system a recombination arrest and allow the tracing of paternal lineages in date palm. New Phytol 197:409–415 Cholin SS, Poleshi CA, Manikanta DS, Christopher C (2019) Exploring the genomic resources of carrot for cross-genera transferability and phylogenetic assessment among orphan spices and vegetables of Apiaceae family. Hortic Environ Biotechnol 60(1):81–93 Delannay X, McLaren G, Ribaut JM (2012) Fostering molecular breeding in developing countries. Mol Breed 29:857–873 Harkess A, Mercati F, Shan HY, Sunseri F, Falavigna A, Leebens Mack J (2015) Sex-biased gene expression in dioecious garden asparagus (Asparagus officinalis). New Phytol 207:883–892 Kafkas SM, Khodaeiaminjan M, Guney M, Kafkas E (2015) Identification of sex-linked SNP markers using RAD sequencing suggests ZW/ZZ sex determination in Pistacia vera. BMC Genomics 16:98–108 Khasmakhi-Sabet SA, Abdousi V, Samizadeh H, Kalatejari S (2016) Molecular marker linked to number of female flowers per node in cucumber. Int J Veg Sci 22:389–401 Liu JJ, Williams H, Zamany A, Li XR, Gellner S, Sniezko RA (2019a) Development and application of marker-assisted selection (MAS) tools for breeding of western white pine (Pinus monticola Douglas ex D. Don) resistance to blister rust (Cronartium ribicola JC Fisch.) in British Columbia. Can J Plant Pathol:1–10. https://doi.org/10.1080/07060661.2019.1638454 Liu S, Wang R, Zhang Z, Li Q, Wang L, Wang Y, Zhao Z (2019b) High-resolution mapping of quantitative trait loci controlling main floral stalk length in Chinese cabbage (Brassica rapa L. ssp. pekinensis). BMC Genomics 20(1):437 McCallum J, Clarke A, Pither-Joyce M, Shaw M, Butler R, Brash D, Havey MJ (2006) Genetic mapping of a major gene affecting onion bulb fructan content. Theor Appl Genet 112:958–967 Moodley V, Naidoo R, Gubba A, Mafongoya PL (2019) Development of potato virus Y (PVY) resistant pepper (Capsicum annuum L.) lines using marker-assisted selection (MAS). Physiol Mol Plant Pathol 105:96–101 Mulagund J, Souravi K, Dinesh MR, Ravishankar KV (2019) Molecular characterization, DNA finger printing, and genomics in horticultural crops. In: Conservation and utilization of horticultural genetic resources. Springer, Singapore, pp 595–618 Muranty H, Jorge V, Bastien C, Lepoittevin C, Bouffier L, Sanchez L (2014) Potential for marker- assisted selection for forest tree breeding: lessons from 20 years of MAS in crops. Tree Genet Genomes 10(6):1491–1510 O’Connor K, Hayes B, Hardner C, Alam M, Topp B (2019) Selecting for nut characteristics in Macadamia using a genome-wide association study. HortScience 54(4):629–632 Onozaki T, Yoshinari T, Yoshimura T, Yagi M, Yoshioka S, Taneya M, Shibata MP (2014) DNA markers linked to a recessive gene controlling single flower type derived from wild species, Dianthus capitatus ssp. andrzejowskianus. Hortic Res (Jpn) 5:363–367 Pooprompan P, Wasee S, Toojinda T, Abe J, Chanprame S, Srinives P (2006) Molecular marker analysis of days to flowering in vegetable soybean (Glycine max (L.) Merrill). Kasetsart J 40:573–581 Robbins MD, Staub JE (2009) Comparative analysis of marker-assisted and phenotypic selection for yield components in cucumber. Theor Appl Genet 119(4):621–634

504

12 Forthcoming Perspectives in MAS

Robbins MD, Masud MA, Panthee DR, Gardner RG, Francis DM, Stevens MR (2010) Marker- assisted selection for coupling phase resistance to tomato spotted wilt virus and Phytophthora infestans (late blight) in tomato. HortScience 45(10):1424–1428 Rubio M, Caranta C, Palloix A (2008) Functional markers for selection of potyvirus resistance alleles at the pvr2-eIF4E locus in pepper using tetra-primer ARMS–PCR. Genome 51(9):767–771 Tulsani NJ, Hamid R, Jacob F, Umretiya NG, Nandha AK, Tomar RS, Golakiya BA (2019) Transcriptome landscaping for gene mining and SSR marker development in coriander (Coriandrum sativum L.). Genomics. https://doi.org/10.1016/j.ygeno.2019.09.004 Vaijayanthi PV, Ramesh S, Gowda MB, Rao AM, Keerthi CM (2019) Genome-wide marker-trait association analysis in a core set of Dolichos bean germplasm. Plant Genet Resour 17(1):1–11 Wang J, Na J, Yu Q, Gschwend AR, Han J, Zeng F (2012) Sequencing papaya X and Yh chromosomes reveals molecular basis of incipient sex chromosome evolution. Proc Natl Acad Sci 109:13710–13715 Yagi M, Yamamoto T, Isobe S, Hirakawa H, Tabata S, Tanase K, Yamaguchi H, Onozaki T (2013) Construction of a reference genetic linkage map for carnation (Dianthus caryophyllus L.). BMC Genomics 14:734–738 Yagi M, Kimura T, Yamamoto T, Isobe S, Tabata S, Onozaki T (2014) QTL analysis for resistance to bacterial wilt (Burkholderia caryophylli) in carnation (Dianthus caryophyllus) using an SSR-based genetic linkage map. Mol Breed 30:495–509 Yeh T, Lin S, Shieh H, Teoh Y, Kumar S (2016) Markers for cytoplasmic male sterility (CMS) traits in chili peppers (Capsicum annuum L.): multiplex PCR and validation. SABRAO J Breed Genet 48(4):465–473

Further Reading Ali Q et al (2012) An overview of genomics assisted improvement of drought tolerance in maize (Zea mays L.): QTL approaches. Afr J Biotechnol 11(65):12839–12848 Fauquet CM, Taylor NJ, Tohme J (2012) The global cassava partnership for the 21st century (GCP21). Trop Plant Biol 5:4–8 Foolad MR, Panthee DR (2012) Marker-assisted selection in tomato breeding. Crit Rev. Plant Sci 31(2):93–123 Fridman E, Zamir D (2012) Next-generation education in crop genetics. Curr Opin Plant Biol 15:218–223 Isemura T, Kaga A, Tabata S, Somta P, Srinives P et al (2012) Construction of a genetic linkage map and genetic analysis of domestication related traits in Mungbean (Vigna radiata). PLoS One 7(8):e41304. https://doi.org/10.1371/journal.pone.0041304 Khan M (2012) Current status of genomic based approaches to enhance drought tolerance in Rice (Oryza sativa L.): an over view. Mol Plant Breed 3(1):1–10. https://doi.org/10.5376/ mpb.2012.03.00 Liu Y, He Z, Appels R, Xia X (2012) Functional markers in wheat: current status and future prospects. Theor Appl Genet 125:1–10 Nakaya A, Isobe SN (2012) Will genomic selection be a practical method for plant breeding? Ann Bot 110:1–14. https://doi.org/10.1093/aob/mcs109. Panthee DR, Foolad MR (2012) A re-examination of molecular markers for use in marker-assisted breeding in tomato. Euphytica 184:165–179 Sharma et al (2002) Applications of biotechnology for crop improvement: prospects and constraints. Plant Sci 163:381–395 Varshney RK, Graner A, Sorrells ME (2005) Genomics-assisted breeding for crop improvement. Trends Plant Sci 10(12):621–630 Xu Y et al (2012a) Whole-genome strategies for marker-assisted plant breeding. Mol Breed 29:833–854 Xu Y, Li Z-K, Thomson MJ (2012b) Molecular breeding in plants: moving into the mainstream. Mol Breed 29:831–832