Underutilised Crop Genomes 9783031008481, 3031008480

This book highlights the uses for underutilized crops, presenting the state-of-the-art in terms of genome sequencing for

98 40 17MB

English Pages 460 Year 2022

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface to the Series
Preface
Contents
Contributors
1 The Broomcorn Millet Genome
Abstract
1.1 Introduction
1.2 Taxonomy of Broomcorn Millet
1.2.1 Botanical Characteristics
1.2.2 Geographic Distribution
1.2.3 Conserved Germplasms of Broomcorn Millet
1.2.4 The Underutilized Status of Broomcorn Millet
1.2.5 Qualities and Values of Broomcorn Millet
1.3 Genome Sequencing
1.3.1 Genome Assembly
1.3.2 Genetic Linkage Map
1.3.3 Genome Annotation
1.3.4 Evolutionary History of Broomcorn Millet
1.3.5 Comparative Genomics with Other Crops
1.3.6 Genes Involved in C4 Photosynthesis
1.4 Future Goals and Prospects
Acknowledgements
References
2 Buckwheat Genome and Genomics
Abstract
2.1 Introduction
2.2 Buckwheat Genome
2.3 Buckwheat Genomics
2.4 Key Traits and Gene Function
2.5 Conclusion and Perspective
Acknowledgements
References
3 Tef [Eragrostis tef (Zucc.) Trotter]
Abstract
3.1 Crop Background
3.1.1 Botanical Description
3.1.2 Geographical Distribution
3.1.3 Accessible in Seed Banks
3.1.4 Why It Is Underutilized?
3.1.5 Benefits of Tef
3.2 Genome Sequencing
3.2.1 First Sequencing: Tef Improvement Project 2010
3.2.2 Chromosome-Level Sequencing: Tef Sequencing Consortium
3.2.3 Repetitive Element Content
3.2.4 Resequencing
3.2.5 Candidate Genes for Agronomic Traits
3.3 Conclusions
References
4 The Apricot Genome
Abstract
4.1 Introduction
4.1.1 Botanical Description
4.1.1.1 The Main Species of Apricot
4.1.1.2 Eco-geographical Groups of Common Apricot
4.2 Origin and Distribution
4.2.1 Geographic Distribution
4.2.1.1 The Seed Banks of Apricot
4.2.2 Economic and Ecological Value of Apricot
4.2.2.1 Nutritional Value
4.2.2.2 Ecological Value
4.3 Genome Sequencing
4.3.1 Strategy
4.3.1.1 Plant Materials
4.3.1.2 Sequencing Strategy
4.3.2 The Apricot Genome Assembly
4.3.3 Resequencing
4.3.4 Comparison to Other Crops
4.3.5 Gene Discovery
4.3.6 Candidate Genes for Agronomic Traits
4.3.6.1 Accumulation of β-Carotene in Apricot
4.3.6.2 Amygdalin Metabolism and the Sweet/Bitterness Kernel Forming
4.3.6.3 Plum Pox Virus (PPV)
4.4 Future Goals and Prospects
4.4.1 The Goal of Apricot Genomics Research
4.4.2 Prospects and Implications of Apricot Genomics
References
5 Chinese Jujube: Crop Background and Genome Sequencing
Abstract
5.1 Crop Background
5.1.1 Introduction
5.2 Botanical Description
5.2.1 Taxonomy
5.2.2 Geographic Distribution
5.2.3 Morphology
5.3 Nutrient, Utilization, and Propagation
5.4 Research Challenges and Opportunities
5.5 Genome Sequencing
5.6 Strategy for Jujube Genome Assembly and Annotation
5.7 Features of Jujube Genome
5.8 Comparison to Other Crops in Evolution
5.9 Candidate Genes for Agronomic Traits
5.9.1 Genes Related to Vitamin C Accumulation in Fruit
5.9.2 Genes Related to Sugar Accumulation in Fruit
5.9.3 Self-shoot-pruning Trait Related Genes
5.9.4 Abiotic/Biotic Stress-Related Genes
5.9.5 Genes Related to Flower Development
5.9.6 S-Locus Genes in Jujube
5.9.7 Gene Family-Related Research Based on the ‘Dongzao’ Genome
5.10 Resequencing in Jujube Research
5.11 Transcriptome-Related Research
5.12 Future Goals and Prospects
References
6 The Longan (Dimocarpus longan) Genome
Abstract
6.1 Background
6.2 Genome Sequencing
6.2.1 Strategy
6.2.2 Results-Genome Statistics
6.2.3 Resequencing
6.3 RNA Sequencing
6.3.1 Whole Transcriptome Sequencing
6.3.2 Single-Cell RNA Sequencing
6.3.3 microRNAs
6.3.4 Long Noncoding RNAs
6.3.5 Circular RNAs
6.4 DNA Methylation Sequencing
6.5 Proteomics
6.6 Genetic Transformation
6.7 Future Prospects
References
7 The Mangosteen Genome
Abstract
7.1 The Genus Garcinia L.
7.2 Botanical Description of Mangosteen
7.2.1 Mangosteen as an Apomictic Species
7.2.2 Genetic Variation of Mangosteen
7.3 Origin and Distribution of Mangosteen
7.3.1 Origin of Mangosteen
7.3.2 Closely Related Species of Mangosteen
7.3.2.1 G. celebica L. (Syn. G. hombroniana Pierre)
7.3.2.2 G. malaccensis Hook. F.
7.3.2.3 G. penangiana Pierre
7.3.2.4 G. opaca King
7.3.3 Geographic Distribution of Mangosteen
7.3.4 Mangosteen Export from Malaysia
7.4 Conservation of Mangosteen Germplasm
7.5 Why Mangosteen is Underutilised?
7.6 Benefits of Mangosteen
7.7 Genomics Study of Mangosteen
7.7.1 Genome Sequencing of Mangosteen
7.7.2 Genome Size of Mangosteen
7.7.3 Cytogenetics of Mangosteen
7.8 Conclusion
Acknowledgements
References
8 The Passion Fruit Genome
Abstract
8.1 Introduction
8.1.1 Genetic Studies and Breeding Efforts
8.2 Sequencing and Assembly of Passiflora Genomes
8.2.1 Transposable Element Detection in Passiflora Genomes
8.2.2 Passiflora Cytogenomics
8.2.3 Functional Annotation of Passiflora Genomes
8.2.4 The Passiflora MADS-Box Gene Family and Phase Change Transitions
8.2.5 Passiflora Organellar Genomes
8.3 Conclusion
References
9 The Soursop Genome (Annona muricata L., Annonaceae)
Abstract
9.1 Introduction
9.1.1 Early Angiosperm Genome Evolution
9.1.2 The Custard Apple Family (Annonaceae) and Pomology
9.2 Research Scope and Methodological Approach
9.2.1 Genomic DNA Extraction, Illumina Sequencing, and Genome Size Estimation
9.2.2 Library Preparation and Sequencing for PacBio, 10X Genomics and BioNano
9.2.3 Denovo Genome Assembly, 10X and Optical Scaffolding
9.2.4 Hi-C Scaffolding
9.2.5 Repeat Sequence Detection
9.2.6 RNA Sequencing and Transcriptome Assembly
9.2.7 Annotation
9.2.8 Positive Selection
9.2.9 Detecting Historical Changes in Population Size
9.2.10 Hybridization Capture Data Mapping
9.2.11 Coalescent Phylogenomics
9.2.12 Gene Family Expansion
9.2.13 Identification of Whole Genome Duplication Events
9.2.14 Evolutionary Incongruence in Early Angiosperms
9.3 Genome Perspectives for Pomological and Early Angiosperm Research
9.3.1 A High-Quality Genome of the Soursop
9.3.2 Repeat Sequences in the Soursop Genome
9.3.3 Genes Involved in Soursop Defence and Disease Resistance
9.3.4 Historical Fluctuations in Population Size of Annona muricata
9.3.5 Mapping of Annona Genes from Hybridization Capture Analyzes
9.3.6 Coalescent Phylogenomics in Annonaceae and Early Angiosperms
9.3.7 Gene Family Expansion in Annona muricata
9.3.8 Evolutionary Incongruence and WGD during Early Angiosperm Divergence
9.4 Future Goals and Prospects
Acknowledgements
References
10 Underutilised Fruit Tree Genomes from Indonesia
Abstract
10.1 Overview
10.2 Underutilised Fruits
10.2.1 Menteng; Baccaurea motleyana Müll.Arg.
10.2.2 Nangkadak; Artocarpus heterophyllus x Artocarpus integer
10.2.3 Rambutan; Nephelium lappaceum L.
10.2.4 Sidempuan Snake Fruit; Salacca sumatrana Becc.
10.2.5 Gandaria; Bouea macrophylla Griffith
10.2.6 Lobi-Lobi; Flacourtia inermis (Burm. f.) Merr.
10.2.7 Duku; Lansium domesticum (Lansium parasiticum (Osbeck) Sahni and Bennet)
10.2.8 Matoa; Pometia pinnata J.R.Forst. and G.Forst.
10.2.9 Kedondong; Spondias dulcis L.
10.2.10 Jambu Air or Wax Apple; Syzygium samarangense (Blume) Merr. and L.M.Perry
10.2.11 Sentul or Kecapi; Sandoricum koetjape (Burm.f.) Merr
10.2.12 Kasturi or Kalimantan Mango; Mangifera casturi Kosterm
10.2.13 Durian Kura-Kura; Durio testudinarius Becc
10.3 Conclusions
References
11 The Bambara Groundnut Genome
Abstract
11.1 Introduction
11.1.1 Botanical Description and General Ecology
11.1.2 Geographical Distribution
11.1.3 Genetic Resources, Accessibility to/from Seed Banks
11.1.4 Bambara Groundnut—An Important but Underutilised Crop
11.1.5 Nutritional Composition
11.1.6 Underutilisation of Bambara Groundnut
11.2 Molecular Tools and Their Application in Bambara Groundnut
11.2.1 Molecular Markers—Development and Applications
11.2.2 Microarrays
11.2.2.1 RNAseq in Bambara Groundnut
11.2.3 Bambara Groundnut Genome—Current Achievements
11.3 Structure and Nomenclature of Traits in Linkage to Genome
11.4 Future Goals and Prospects
11.5 Conclusion
References
12 Grasspea
Abstract
12.1 Overview
12.2 Practical Needs and Objectives
12.2.1 Constraints
12.2.2 Breeding Targets
12.2.3 Breeding Resources
12.3 Ongoing and Projected Research
12.3.1 Genetics
12.3.2 Genetic Maps
12.3.2.1 An F2 Map
12.3.2.2 Recombinant Inbred Mapping Populations
12.3.2.3 Relationship Between Genetic Maps of Related Legumes
12.3.3 Transcriptomic Studies
12.3.4 Genomics
12.4 Conclusion and Perspectives
Acknowledgements
References
13 The Lablab Genome: Recent Advances and Future Perspectives
Abstract
13.1 Introduction
13.2 Biology, Resources and Utilisation
13.2.1 Botanical Description
13.2.2 Geographical Distribution and General Ecology
13.2.3 Genetic Resources and Accessibility from Genebanks
13.2.4 Traits of Benefit and Reasons for Being Underutilised
13.2.5 Traditional and Improved Varieties
13.3 Molecular Tools and Their Application in Lablab
13.3.1 Molecular Markers—Development and Applications
13.3.2 Gene Expression in Lablab
13.4 A Lablab Reference Genome
13.5 Future Goals and Prospects
13.6 Conclusions
Acknowledgements
References
14 The Perennial Horse Gram (Macrotyloma axillare) Genome, Phylogeny, and Selection Across the Fabaceae
Abstract
14.1 Introduction
14.1.1 Identifying Genes Underlying Important Traits
14.1.2 Perennial Horse Gram and the Fabaceae (Legumes)
14.2 Materials and Methods
14.2.1 cpDNA Phylogeny of Macrotyloma
14.2.2 DNA Sequencing, De novo Genome Assembly and Annotation
14.2.3 Orthology Inference, Tests for Selection and GO Analysis
14.2.4 cpDNA Assembly and Annotation
14.2.5 Genetic Marker Development
14.3 Results and Discussion
14.3.1 cpDNA Phylogeny of Macrotyloma
14.3.2 DNA Sequencing, De novo Genome Assembly and Annotation
14.3.3 Orthology Inference, Tests for Selection and GO Analysis
14.3.4 cpDNA Assembly and Annotation
14.3.5 Genetic Marker Development
14.4 Future Goals and Prospects
References
15 Breeding and Genomics of Pigeonpea in the Post-NGS Era
Abstract
15.1 Background
15.1.1 Botanical Description
15.1.2 Origin and Geographical Distribution
15.1.3 Pigeonpea Genetic Resources
15.1.4 Benefits
15.2 Traditional Breeding and Cultivar Development in Pigeonpea
15.3 Advances in Pigeonpea Genomics
15.3.1 Construction of the Reference Genome Sequence
15.3.2 Whole-Genome Resequencing
15.3.3 Genome-Wide SNP Arrays
15.3.4 Genetic Linkage Maps
15.3.5 Transcriptomics and Gene Identification
15.3.6 Identification of QTL/Candidate Genes for Important Traits
15.3.7 Haplotype-Based Breeding
15.4 Rapid Generation Turnover
15.5 Conclusion and Prospects
References
16 Rice Bean—An Underutilized Food Crop Emerges as Cornucopia of Micronutrients Essential for Sustainable Food and Nutritional Security
Abstract
16.1 Introduction: A Way Forward to Nourish Future Generations
16.2 Origin and Domestication of Rice Bean
16.3 Genetic and Molecular Diversification of the Rice Bean Gene Pool
16.4 Rice Bean: Cornucopia of Potential
16.4.1 Nutritional Composition
16.4.2 Domestication of Rice Bean: Identification of Neoteric Genes Involved in Stress Resilience and Nutritional Composition
16.5 Issues Related to Rice Bean Commercial Utilization
16.6 Global Approaches for Rice Bean Germplasm Improvement
16.6.1 Conventional Approaches
16.6.2 Molecular Approaches
16.7 Conclusion
Acknowledgements
References
17 The Winged Bean Genome
Abstract
17.1 Introduction
17.2 Botanical Description, Origin and Domestication
17.2.1 Taxonomy, Plant Morphology and Reproductive Development
17.2.2 Origin, Distribution and Germplasm Collection
17.2.3 Genetic Diversity
17.3 Food and Nutritional Value
17.3.1 Protein
17.3.2 Amino Acid Composition
17.3.3 Minerals and Vitamins
17.3.4 Lipids
17.3.5 Antinutritional Factors (ANFs)
17.3.5.1 Proteinase (Trypsin and Chymotrypsin) Inhibitors
17.3.5.2 Phytohaemagglutinins or Lectins
17.3.5.3 Tannins
17.4 Barriers to the Greater Utilisation
17.5 Genome
17.5.1 Genome Sequencing
17.5.2 Transcriptome Assembly and Molecular Markers
17.6 Future Prospects
17.7 Conclusion
References
18 Castor Bean: Recent Progress in Understanding the Genome of This Underutilized Crop
Abstract
18.1 Introduction
18.1.1 Botanical Description
18.1.2 Morphological Features
18.1.3 Geographic Distribution
18.1.4 Accessions in Seedbanks
18.1.5 Usage
18.2 Characterization of Castor Bean Genome
18.2.1 Sequencing and Characterization
18.2.2 Castor Bean Genome Comparison to Other Crops
18.2.3 Gene Discovery in Castor Bean Using Orthologs from Other Species
18.2.4 Identification of Candidate Genes for Agronomic Traits
18.3 Future Goals and Prospects
References
19 Genome Resources for Ensete ventricosum (Enset) and Related Species
Abstract
19.1 Introduction
19.2 Botanical Description of Enset
19.3 Enset (Ensete ventricosum) as a Crop
19.4 Availability of Germplasm
19.5 Enset as an Underutilized Crop
19.6 Ethnopharmacology of Ensete Spp. Outside of Ethiopia
19.7 Genome Sequencing
19.7.1 Overview
19.7.2 Reference Genome Assemblies
19.7.3 Whole-Genome Resequencing
19.7.4 Sequencing of Reduced-Representation Libraries
19.7.5 The Ensete ventricosum Chloroplast Genome
19.8 Future Goals and Prospects
References
20 Yam Genomics
Abstract
20.1 Crop Background
20.1.1 Dioscorea Spp., Underutilised Crops
20.1.2 Botanical Description
20.1.3 Ploidy Level
20.1.4 Yam Geographical Distribution
20.1.5 Yam Genetic Resources Conservation
20.2 Development of Yam Genomic Resources
20.3 Comparative Genomics in Dioscorea
20.4 Future Goals and Prospects
References
21 The African Eggplant
Abstract
21.1 Background
21.1.1 Botanical Description
21.1.2 Geographic Distribution
21.1.3 Genetic Resources
21.2 S. aethiopicum Genetics and Genomics
21.2.1 Whole Genome Sequencing, Statistics, and Strategy
21.2.2 Genome Comparison with Other Crops
21.2.3 Disease Resistance Genes
21.2.4 Re-sequencing
21.2.5 Orthologous Candidate Genes for Seed Dormancy
21.3 Why is the African Eggplant Underutilized?
21.3.1 Undesirable Traits
21.3.2 Introduction of Exotic, Higher Yielding Vegetable Species
21.3.3 Low Research Investments
21.3.4 Non-existent Extension Services and Seed Systems
21.3.5 Lack of Structured Marketing Channels
21.3.6 Lack of Genetic and Genomic Resources
21.4 Future Goals and Prospects
References
22 Sequencing of the Bottle Gourd Genomes Enhances Understanding of the Ancient Orphan Crop
Abstract
22.1 Crop Background
22.1.1 Botanical Description
22.1.1.1 Root
22.1.1.2 Stem
22.1.1.3 Leaves
22.1.1.4 Flower
22.1.1.5 Fruit
22.1.1.6 Seed
22.1.2 Geographic Distribution
22.1.3 Accessible in Seed Banks
22.1.4 Why is It Underutilized?
22.1.5 What Qualities Does It Bring?
22.2 Genome Sequencing
22.2.1 Materials Used for Sequencing
22.2.2 Strategies and Tools for Sequencing
22.2.3 Genome Annotation
22.2.4 Comparison with Related Crops
22.2.5 Gene Discovery
22.2.6 Candidate Genes for Agronomic Traits
22.3 Resequencing of Germplasm
22.4 Future Goals and Prospects
22.4.1 What does a Reference Genome add to the Research of this Crop?
References
23 Advances and Prospects in Genomic and Functional Studies of the Aquatic Crop, Sacred Lotus
Abstract
23.1 Overview
23.2 Genome Assembly and Whole-Genome Resequencing Analysis
23.3 Genome Sequencing, Genetics, and Evolution
23.4 Transcriptomics in Sacred Lotus
23.5 Studies of Phenotypic Variation of Rhizome in the Two Lotus Ecotypes
23.6 Longevity and Yield of Lotus Seed
23.7 Diversity of Lotus Flower
23.8 Conclusion
References
24 Utilising Public Resources for Fundamental Work in Underutilised and Orphan Crops
Abstract
24.1 Introduction
24.1.1 Underutilised and Orphan Crops
24.1.1.1 Underutilised Crops and Climate Resilience
24.1.1.2 Establishing Resources for Underutilised Crops
24.1.1.3 Generating SSR Resources for Underutilised Crops
24.1.2 Target Species
24.1.2.1 Amaranthus viridis L. (Amaranthaceae)—Slender Amaranth
24.1.2.2 Litchi chinensis Sonn. (Sapindaceae)—Lychee
24.1.2.3 Mucuna pruriens (L.) DC. (Fabaceae)—Velvet Bean
24.2 Methods
24.2.1 Raw Data Download and QC
24.2.2 Genome Assembly
24.2.3 cpDNA
24.2.4 Mining for SSR Markers
24.2.5 SNP Polymorphism for Litchi
24.3 Results
24.3.1 Raw Data
24.3.2 Assembly
24.3.3 cpDNA
24.3.4 SSRs
24.3.5 SNPs for Litchi from Resequencing Data
24.4 Discussion
References
Recommend Papers

Underutilised Crop Genomes
 9783031008481, 3031008480

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Compendium of Plant Genomes Series Editor: Chittaranjan Kole

Mark A. Chapman   Editor

Underutilised Crop Genomes

Compendium of Plant Genomes Series Editor Chittaranjan Kole, President, International Climate Resilient Crop Genomics Consortium (ICRCGC), President, International Phytomedomics & Nutriomics Consortium (IPNC) and President, Genome India International (GII), Kolkata, India

Whole-genome sequencing is at the cutting edge of life sciences in the new millennium. Since the first genome sequencing of the model plant Arabidopsis thaliana in 2000, whole genomes of about 100 plant species have been sequenced and genome sequences of several other plants are in the pipeline. Research publications on these genome initiatives are scattered on dedicated web sites and in journals with all too brief descriptions. The individual volumes elucidate the background history of the national and international genome initiatives; public and private partners involved; strategies and genomic resources and tools utilized; enumeration on the sequences and their assembly; repetitive sequences; gene annotation and genome duplication. In addition, synteny with other sequences, comparison of gene families and most importantly potential of the genome sequence information for gene pool characterization and genetic improvement of crop plants are described.

Mark A. Chapman Editor

Underutilised Crop Genomes

123

Editor Mark A. Chapman Biological Sciences University of Southampton Southampton, UK

ISSN 2199-4781 ISSN 2199-479X (electronic) Compendium of Plant Genomes ISBN 978-3-031-00847-4 ISBN 978-3-031-00848-1 (eBook) https://doi.org/10.1007/978-3-031-00848-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

This book series is dedicated to my wife Phullara and our children Sourav and Devleena Chittaranjan Kole

Preface to the Series

Genome sequencing has emerged as the leading discipline in the plant sciences coinciding with the start of the new century. For much of the twentieth century, plant geneticists were only successful in delineating putative chromosomal location, function, and changes in genes indirectly through the use of a number of “markers” physically linked to them. These included visible or morphological, cytological, protein, and molecular or DNA markers. Among them, the first DNA marker, the RFLPs, introduced a revolutionary change in plant genetics and breeding in the mid-1980s, mainly because of their infinite number and thus potential to cover maximum chromosomal regions, phenotypic neutrality, absence of epistasis, and codominant nature. An array of other hybridization-based markers, PCR-based markers, and markers based on both facilitated construction of genetic linkage maps, mapping of genes controlling simply inherited traits, and even gene clusters (QTLs) controlling polygenic traits in a large number of model and crop plants. During this period, a number of new mapping populations beyond F2 were utilized and a number of computer programs were developed for map construction, mapping of genes, and for mapping of polygenic clusters or QTLs. Molecular markers were also used in the studies of evolution and phylogenetic relationship, genetic diversity, DNA fingerprinting, and map-based cloning. Markers tightly linked to the genes were used in crop improvement employing the so-called marker-assisted selection. These strategies of molecular genetic mapping and molecular breeding made a spectacular impact during the last one and a half decades of the twentieth century. But still they remained “indirect” approaches for elucidation and utilization of plant genomes since much of the chromosomes remained unknown and the complete chemical depiction of them was yet to be unraveled. Physical mapping of genomes was the obvious consequence that facilitated the development of the “genomic resources” including BAC and YAC libraries to develop physical maps in some plant genomes. Subsequently, integrated genetic–physical maps were also developed in many plants. This led to the concept of structural genomics. Later on, emphasis was laid on EST and transcriptome analysis to decipher the function of the active gene sequences leading to another concept defined as functional genomics. The advent of techniques of bacteriophage gene and DNA sequencing in the 1970s was extended to facilitate sequencing of these genomic resources in the last decade of the twentieth century. vii

viii

As expected, sequencing of chromosomal regions would have led to too much data to store, characterize, and utilize with the-then available computer software could handle. But the development of information technology made the life of biologists easier by leading to a swift and sweet marriage of biology and informatics, and a new subject was born—bioinformatics. Thus, the evolution of the concepts, strategies, and tools of sequencing and bioinformatics reinforced the subject of genomics—structural and functional. Today, genome sequencing has traveled much beyond biology and involves biophysics, biochemistry, and bioinformatics! Thanks to the efforts of both public and private agencies, genome sequencing strategies are evolving very fast, leading to cheaper, quicker, and automated techniques right from clone-by-clone and whole-genome shotgun approaches to a succession of second-generation sequencing methods. The development of software of different generations facilitated this genome sequencing. At the same time, newer concepts and strategies were emerging to handle sequencing of the complex genomes, particularly the polyploids. It became a reality to chemically—and so directly—define plant genomes, popularly called whole-genome sequencing or simply genome sequencing. The history of plant genome sequencing will always cite the sequencing of the genome of the model plant Arabidopsis thaliana in 2000 that was followed by sequencing the genome of the crop and model plant rice in 2002. Since then, the number of sequenced genomes of higher plants has been increasing exponentially, mainly due to the development of cheaper and quicker genomic techniques and, most importantly, the development of collaborative platforms such as national and international consortia involving partners from public and/or private agencies. As I write this preface for the first volume of the new series “Compendium of Plant Genomes,” a net search tells me that complete or nearly complete whole-genome sequencing of 45 crop plants, eight crop and model plants, eight model plants, 15 crop progenitors and relatives, and three basal plants is accomplished, the majority of which are in the public domain. This means that we nowadays know many of our model and crop plants chemically, i.e., directly, and we may depict them and utilize them precisely better than ever. Genome sequencing has covered all groups of crop plants. Hence, information on the precise depiction of plant genomes and the scope of their utilization are growing rapidly every day. However, the information is scattered in research articles and review papers in journals and dedicated Web pages of the consortia and databases. There is no compilation of plant genomes and the opportunity of using the information in sequence-assisted breeding or further genomic studies. This is the underlying rationale for starting this book series, with each volume dedicated to a particular plant. Plant genome science has emerged as an important subject in academia, and the present compendium of plant genomes will be highly useful to both students and teaching faculties. Most importantly, research scientists involved in genomics research will have access to systematic deliberations on the plant genomes of their interest. Elucidation of plant genomes is of interest not only for the geneticists and breeders, but also for practitioners of an array of plant science disciplines, such as taxonomy, evolution, cytology,

Preface to the Series

Preface to the Series

ix

physiology, pathology, entomology, nematology, crop production, biochemistry, and obviously bioinformatics. It must be mentioned that information regarding each plant genome is ever-growing. The contents of the volumes of this compendium are, therefore, focusing on the basic aspects of the genomes and their utility. They include information on the academic and/or economic importance of the plants, description of their genomes from a molecular genetic and cytogenetic point of view, and the genomic resources developed. Detailed deliberations focus on the background history of the national and international genome initiatives, public and private partners involved, strategies and genomic resources and tools utilized, enumeration on the sequences and their assembly, repetitive sequences, gene annotation, and genome duplication. In addition, synteny with other sequences, comparison of gene families, and, most importantly, the potential of the genome sequence information for gene pool characterization through genotyping by sequencing (GBS) and genetic improvement of crop plants have been described. As expected, there is a lot of variation of these topics in the volumes based on the information available on the crop, model, or reference plants. I must confess that as the series editor, it has been a daunting task for me to work on such a huge and broad knowledge base that spans so many diverse plant species. However, pioneering scientists with lifetime experience and expertise on the particular crops did excellent jobs editing the respective volumes. I myself have been a small science worker on plant genomes since the mid-1980s and that provided me the opportunity to personally know several stalwarts of plant genomics from all over the globe. Most, if not all, of the volume editors are my longtime friends and colleagues. It has been highly comfortable and enriching for me to work with them on this book series. To be honest, while working on this series I have been and will remain a student first, a science worker second, and a series editor last. And, I must express my gratitude to the volume editors and the chapter authors for providing me the opportunity to work with them on this compendium. I also wish to mention here my thanks and gratitude to Springer staff, particularly Dr. Christina Eckey and Dr. Jutta Lindenborn, for the earlier set of volumes and presently Ing. Zuzana Bernhart for all their timely help and support. I always had to set aside additional hours to edit books beside my professional and personal commitments—hours I could and should have given to my wife, Phullara, and our kids, Sourav and Devleena. I must mention that they not only allowed me the freedom to take away those hours from them but also offered their support in the editing job itself. I am really not sure whether my dedication of this compendium to them will suffice to do justice to their sacrifices for the interest of science and the science community. New Delhi, India

Chittaranjan Kole

Preface

Crops which have often been discussed as bringing potential benefits, both now and in the future, but suffering from negative attributes are often named underutilized, orphan, or neglected. These include dozens if not hundreds of cereals, vegetables, beans, and fruits which are locally important but internationally less well known. These crops are often adapted to climates that are prone to heatwaves or droughts, and therefore, their potential in a changing climate is oft cited. Advancing these crops to the international stage requires insight into their environmental tolerances, nutrient content, varied uses, and indigenous knowledge about the varieties and their adaptability. At the same time, the reasons for their lack of expansion need to be acknowledged, for example pest susceptibility, anti-nutrient content, or simply their poor performance relative to (often non-native) staples. Understanding these traits at the genomic level will expedite the identification of suitable varieties and breeding material, and ultimately help to leverage these crops at a time when our staple crops are feeling the effects of climate change. This book comes together at a time when sequencing technologies (and associated bioinformatic approaches) are becoming advanced to the degree that a genome sequencing project is no longer out of reach for any species. In the past few years, underutilized crops which were not fulfilling their potential have had the required investment to carry out genome sequencing and to begin to understand the genetic basis of adaptive traits. We are at a time when more and more of these unusual, adaptable, and unique crops are being sequenced, allowing the identification of the genetic basis of drought and heat tolerance, yield, plant and fruit/seed architecture, nutrient content and composition, and anti-nutrient content. This information is vital going forward and is collated here in a manner which does not only consider highly contiguous, “platinum” quality genomes, but examples of crops where the sequencing is in progress or in draft form. The book is ordered based on grouped species with similar human use. Chapters 1–3 are cereal or pseudocereal crops, grown for their grains. Chapters 4–10 are all fruit crops and 11–17 are legumes. Chapter 18 concerns castor bean, a non-food but important underutilized crop. Chapters 19 and 20 are both root crops, and 21–23 are vegetables. The final chapter highlights how a draft scale genome (i.e., an incompletely assembled genome) can provide an asset for marker development, gene identification, and investigating patterns of selection across groups of species (see also Chap. 14). What should become clear to the reader is that any depth of sequencing is a xi

xii

Preface

step up for these species, as well as their close relatives (see Chap. 19 on enset and its relatives) and that many of these species have novel uses (e.g., sacred lotus as a commonly grown ornamental but with an edible and nutritious rhizome) or a variety of uses (e.g., winged bean). Although this is collated here for the first time, it is the hope of the editor that this book will require updating often as a sign of significant advances in the field, in terms of both the species being sequenced and the depth of analyses per crop. Southampton, UK 2022

Mark A. Chapman

Contents

1

The Broomcorn Millet Genome . . . . . . . . . . . . . . . . . . . . . . . . Leiting Li and Heng Zhang

1

2

Buckwheat Genome and Genomics . . . . . . . . . . . . . . . . . . . . . Yuqi He and Meiliang Zhou

19

3

Tef [Eragrostis tef (Zucc.) Trotter] . . . . . . . . . . . . . . . . . . . . . . Gina Cannarozzi and Zerihun Tadele

27

4

The Apricot Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yu-zhu Wang, Hao-yuan Sun, Jun-huan Zhang, Feng-chao Jiang, Li Yang, and Mei-ling Zhang

41

5

Chinese Jujube: Crop Background and Genome Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Meng Yang, Mengjun Liu, and Jin Zhao

69

6

The Longan (Dimocarpus longan) Genome . . . . . . . . . . . . . . . Yan Chen, Xiaoping Xu, Xiaohui Chen, Shuting Zhang, Yukun Chen, Zhongxiong Lai, and Yuling Lin

7

The Mangosteen Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Mohd Razik Midin and Hoe-Han Goh

8

The Passion Fruit Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Maria Lucia Carneiro Vieira, Zirlane Portugal Costa, Alessandro Mello Varani, Mariela Analia Sader, Luiz Augusto Cauz-Santos, Helena Augusto Giopatto, Alina Carmen Egoávil del Reátegui, Hélène Bergès, Claudia Barros Monteiro-Vitorello, Marcelo Carnier Dornelas, and Andrea Pedrosa-Harand

9

The Soursop Genome (Annona muricata L., Annonaceae) . . . 149 Joeri S. Strijk, Damien D. Hinsinger, Mareike M. Roeder, Lars W. Chatrou, Thomas L. P. Couvreur, Roy H. J. Erkens, Hervé Sauquet, Michael D. Pirie, Daniel C. Thomas, and Kunfang Cao

87

10 Underutilised Fruit Tree Genomes from Indonesia . . . . . . . . . 175 Deden Derajat Matra, M. Adrian, and Roedhy Poerwanto

xiii

xiv

11 The Bambara Groundnut Genome . . . . . . . . . . . . . . . . . . . . . . 189 Luis Salazar-Licea, Kumbirai Ivyne Mateva, Xiuqing Gao, Razlin Azman Halimi, Liliana Andrés-Hernández, Hui Hui Chai, Wai Kuan Ho, Graham J. King, Festo Massawe, and Sean Mayes 12 Grasspea. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 Noel Ellis, M. Carlota Vaz Patto, Diego Rubiales, Jiří Macas, Petr Novák, Shiv Kumar, Xiaopeng Hao, Anne Edwards, Abhimanyu Sarkar, and Peter Emmrich 13 The Lablab Genome: Recent Advances and Future Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 Brigitte L. Maass and Mark A. Chapman 14 The Perennial Horse Gram (Macrotyloma axillare) Genome, Phylogeny, and Selection Across the Fabaceae . . . . 255 David Fisher, Isaac Reynolds, and Mark A. Chapman 15 Breeding and Genomics of Pigeonpea in the Post-NGS Era . . . 281 Abhishek Bohra, Abha Tiwari, S. J. Satheesh Naik, Alok Kumar Maurya, Vivekanand Yadav, Dibendu Datta, Farindra Singh, and Rajeev K. Varshney 16 Rice Bean—An Underutilized Food Crop Emerges as Cornucopia of Micronutrients Essential for Sustainable Food and Nutritional Security . . . . . . . . . . . . . . . . . . . . . . . . . 301 Tanushri Kaul, Sonia Khan Sony, Jyotsna Bharti, Rachana Verma, Mamta Nehra, Arulprakash Thangaraj, Khaled Fathy Abdel Motelb, Rashmi Kaul, and Murugesh Easwaran 17 The Winged Bean Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Niki Tsoutsoura, Yuet Tian Chong, Wai Kuan Ho, Hui Hui Chai, Alberto Stefano Tanzi, Luis Salazar-Licea, Festo Massawe, John Brameld, Andrew Salter, and Sean Mayes 18 Castor Bean: Recent Progress in Understanding the Genome of This Underutilized Crop . . . . . . . . . . . . . . . . . 337 Sammy Muraguri and Aizhong Liu 19 Genome Resources for Ensete ventricosum (Enset) and Related Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 Lakshmipriya Venkatesan, Sadik Muzemil, Filate Fiche, Murray Grant, and David J. Studholme 20 Yam Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Hana Chaïr, Gemma Arnau, and Ana Zotta Mota 21 The African Eggplant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 Susan M. Moenga and Damaris Achieng Odeny

Contents

Contents

xv

22 Sequencing of the Bottle Gourd Genomes Enhances Understanding of the Ancient Orphan Crop . . . . . . . . . . . . . . 409 Ying Wang, Arun K. Pandey, Guojing Li, and Pei Xu 23 Advances and Prospects in Genomic and Functional Studies of the Aquatic Crop, Sacred Lotus . . . . . . . . . . . . . . . 429 Tao Shi, Zhiyan Gao, Yue Zhang, and Jinming Chen 24 Utilising Public Resources for Fundamental Work in Underutilised and Orphan Crops . . . . . . . . . . . . . . . . . . . . . 437 Mark A. Chapman and David Fisher

Contributors

M. Adrian Collaborative Research Group on Fruits (Fruitomics), Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, West Java, Indonesia Liliana Andrés-Hernández Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia Gemma Arnau CIRAD, UMR AGAP Institut, Montpellier, France Razlin Azman Halimi Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia Hélène Bergès Institut National de La Recherche Agronomique (INRAE), Centre National de Ressources Génomiques Végétales (CNRGV), Castanet-Tolosan, France Jyotsna Bharti Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Abhishek Bohra State Agricultural Biotechnology Centre & Centre for Crop and Food Innovation, Murdoch University, Murdoch, Australia John Brameld School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire, UK Gina Cannarozzi Institute of Plant Sciences, University of Bern, Bern, Switzerland Kunfang Cao State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China M. Carlota Vaz Patto Instituto de Tecnologia Química E Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, Oeiras, Portugal Luiz Augusto Cauz-Santos Departamento de Genética, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba, Brazil; Department of Botany and Biodiversity Research, University of Vienna, Vienna, Austria

xvii

xviii

Hui Hui Chai Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Semenyih, Malaysia Mark A. Chapman Biological Sciences, University of Southampton, Southampton, UK Lars W. Chatrou Systematic and Evolutionary Botany lab, Ghent University, Ghent, Belgium Hana Chaïr CIRAD, UMR AGAP Institut, Montpellier, France Jinming Chen CAS Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China; Center of Conservation Biology, Core Botanical Gardens, Wuhan, China Xiaohui Chen Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Yan Chen Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Yukun Chen Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Yuet Tian Chong School of Biosciences, University of Nottingham Malaysia, Semenyih, Selangor Darul Ehsan, Malaysia Zirlane Portugal Costa Departamento de Genética, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba, Brazil Thomas L. P. Couvreur IRD, DIADE, University Montpellier, Montpellier, France Dibendu Datta ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Alina Carmen Egoávil del Reátegui Instituto Nacional de Innovación Agraria, Sub Dirección de Recursos Genéticos, Lima, Peru Marcelo Carnier Dornelas Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil Murugesh Easwaran Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Anne Edwards John Innes Centre, Norwich, UK Noel Ellis John Innes Centre, Norwich, UK Peter Emmrich John Innes Centre, Norwich, UK; School of International Development, Norwich Institute for Sustainable Development, University of East Anglia, Norwich, UK Roy H. J. Erkens Maastricht Science Program, Maastricht University, Maastricht, The Netherlands

Contributors

Contributors

xix

Khaled Fathy Abdel Motelb Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Filate Fiche Hawassa University, Hawassa, Ethiopia David Fisher Biological Sciences, University of Southampton, Southampton, UK Xiuqing Gao Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Semenyih, Malaysia Zhiyan Gao CAS Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China; Center of Conservation Biology, Core Botanical Gardens, Wuhan, China; University of Chinese Academy of Sciences, Beijing, China Helena Augusto Giopatto Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Campinas, Brazil Hoe-Han Goh Institute of Systems Biology, Universiti Kebangsaan Malaysia, Bangi, Selangor, Malaysia Murray Grant University of Warwick, Coventry, UK Xiaopeng Hao Center for Agricultural Genetic Resources Research, Shanxi Agricultural University, Jinzhong, China Yuqi He Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Haidian District, Beijing, China Damien D. Hinsinger Alliance for Conservation Tree Genomics, Pha Tad Ke Botanical Garden, Luang Prabang, Lao PDR Wai Kuan Ho Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Semenyih, Malaysia Feng-chao Jiang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China; Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Rashmi Kaul Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Tanushri Kaul Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Sonia Khan Sony Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India

xx

Graham J. King Southern Cross Plant Science, Southern Cross University, Lismore, NSW, Australia Shiv Kumar International Center for Agricultural Research in the Dry Areas, Rabat, Morocco Brigitte L. Maass Department of Crop Sciences, Georg-August-University Göttingen, Göttingen, Germany Zhongxiong Lai Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Guojing Li Institue of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China Leiting Li State Key Laboratory of Plant Molecular Genetics, Shanghai Center for Plant Stress Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China Yuling Lin Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Aizhong Liu Southwest Forestry University, Kunming, China Mengjun Liu Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, Hebei, P. R. China Jiří Macas Laboratory of Molecular Cytogenetics, Institute of Plant Molecular Biology, Biology Centre CAS, Ceske Budejovice, Czech Republic Festo Massawe Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Semenyih, Malaysia Kumbirai Ivyne Mateva Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Semenyih, Malaysia Deden Derajat Matra Collaborative Research Group on Fruits (Fruitomics), Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, West Java, Indonesia Alok Kumar Maurya ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Sean Mayes School of Biosciences, University of Nottingham, Loughborough, Leicestershire, UK; Crops For the Future (UK) CIC, NIAB, Cambridge, UK Mohd Razik Midin Department of Plant Science, Kulliyyah of Science, International Islamic University Malaysia, Kuantan, Pahang, Malaysia Susan M. Moenga The Plant Pathology Department, University of California Davis, Davis, CA, USA

Contributors

Contributors

xxi

Claudia Barros Monteiro-Vitorello Departamento de Genética, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba, Brazil Ana Zotta Mota UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, Montpellier, France Sammy Muraguri Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China Sadik Muzemil University of Warwick, Coventry, UK Mamta Nehra Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Petr Novák Laboratory of Molecular Cytogenetics, Institute of Plant Molecular Biology, Biology Centre CAS, Ceske Budejovice, Czech Republic Damaris Achieng Odeny The International Crops Research Institute for the Semi-Arid Tropics, Nairobi, Kenya Arun K. Pandey College of Life Sciences, China Jiliang University, Hangzhou, China Andrea Pedrosa-Harand Departamento de Botânica, Universidade Federal de Pernambuco, Recife, Brazil Michael D. Pirie Department of Natural History, University Museum, University of Bergen, Bergen, Norway Roedhy Poerwanto Collaborative Research Group on Fruits (Fruitomics), Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Bogor, West Java, Indonesia Isaac Reynolds Biological Southampton, UK

Sciences,

University

of

Southampton,

Mareike M. Roeder Xishuangbanna Tropical Botanical Garden (CAS), Menglun, China; Aueninstitut, Institute for Geography and Geoecology, Karlsruhe Institute of Technology, Rastatt, Germany Diego Rubiales Institute for Sustainable Agriculture, CSIC, Córdoba, Spain Mariela Analia Sader Departamento de Botânica, Universidade Federal de Pernambuco, Recife, Brazil Luis Salazar-Licea School of Biosciences, University of Nottingham, Loughborough, Leicestershire, UK Andrew Salter School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire, UK

xxii

Abhimanyu Sarkar John Innes Centre, Norwich, UK; National Institute of Agricultural Botany, Cambridge, UK S. J. Satheesh Naik ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Hervé Sauquet National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, NSW, Australia Tao Shi CAS Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China; Center of Conservation Biology, Core Botanical Gardens, Wuhan, China Farindra Singh ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Joeri S. Strijk Alliance for Conservation Tree Genomics, Pha Tad Ke Botanical Garden, Luang Prabang, Lao PDR; Institute for Biodiversity and Environmental Research, Universiti Brunei Darussalam, Jalan Tungku Link, Brunei Darussalam David J. Studholme Biosciences. University of Exeter, Exeter, UK Hao-yuan Sun Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China; Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Zerihun Tadele Institute of Plant Sciences, University of Bern, Bern, Switzerland Alberto Stefano Tanzi School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire, UK Arulprakash Thangaraj Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Daniel C. Thomas National Parks Board, Singapore Botanic Gardens, Singapore, Singapore Abha Tiwari ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Niki Tsoutsoura School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire, UK Alessandro Mello Varani Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Jaboticabal, Brazil Rajeev K. Varshney State Agricultural Biotechnology Centre & Centre for Crop and Food Innovation, Murdoch University, Murdoch, Australia Lakshmipriya Venkatesan Biosciences. University of Exeter, Exeter, UK

Contributors

Contributors

xxiii

Rachana Verma Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India Maria Lucia Carneiro Vieira Departamento de Genética, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba, Brazil Ying Wang Institue of Vegetables, Zhejiang Academy of Agricultural Sciences, Hangzhou, China Yu-zhu Wang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China; Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Pei Xu College of Life Sciences, China Jiliang University, Hangzhou, China Xiaoping Xu Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China Vivekanand Yadav ICAR-Indian Institute of Pulses Research (IIPR), Kanpur, India Li Yang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China; Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Meng Yang Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, Hebei, P. R. China Heng Zhang State Key Laboratory of Plant Molecular Genetics, Shanghai Center for Plant Stress Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, China Jun-huan Zhang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China; Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Mei-ling Zhang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing, China Mei-ling Zhang Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing, China Shuting Zhang Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian, China

xxiv

Yue Zhang CAS Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan, China; Center of Conservation Biology, Core Botanical Gardens, Wuhan, China; University of Chinese Academy of Sciences, Beijing, China Jin Zhao College of Life Science, Hebei Agricultural University, Baoding, Hebei, P. R. China Meiliang Zhou Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Haidian District, Beijing, China

Contributors

1

The Broomcorn Millet Genome Leiting Li and Heng Zhang

Abstract

Broomcorn millet (Panicum miliaceum L.) is arguably the most influential crop in the ancient world, having been domesticated at least 7900 years ago on the Loess Plateau of Northern China. Up to 2000 years ago, broomcorn millet serves as a staple crop in the continent of Eurasia. The unique properties of broomcorn millet, including extremely high water-use efficiency (WUE), short life cycle, and good thermotolerance, make it a favorable crop in the first several millenniums of human agriculture. The cultivation of broomcorn millet gradually declined due to the historical rise of wheat and rice. To date, China remains one of the largest producers of broomcorn millet, but its traditional production area has been declining in favor of better-yielding and higher-income crops such as corn. Two chromosome-scale genome assemblies of broomcorn millet have been published using mainly the single-molecule real-time (SMRT) sequencing technology. We describe the basic characteristics of the broomcorn millet gen-

L. Li  H. Zhang (&) State Key Laboratory of Plant Molecular Genetics, Shanghai Center for Plant Stress Biology, Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, 3888 Chenhua Road, Shanghai 201602, China e-mail: [email protected]

ome, which in combination with a highdensity genetic map will serve as valuable resources for breeders and researchers interested in this underutilized crop.

1.1

Introduction

Millets are a group of small-grained cereal crops that are widely cultivated in semi-arid tropical areas of the world. Together they are the sixth largest grain crops in the world and the staple food for more than one-third of the human population, mainly from developing countries. Millets belong to diversified lineages of the grass family but shared the properties of high tolerance to drought and high temperature, and high nutrition values in addition to the small-size seed. Among them, broomcorn millet (Panicum miliaceum) is probably the most widely cultivated in history. By the end of the second millennium BC, broomcorn millet cultivation had spread from East Asia to the rest of Central Eurasia and Eastern Europe (Miller et al. 2016). Archaeological evidence indicates that broomcorn millet was first domesticated in China more than 8000 years ago (Lu et al. 2009) and then spread to Europe through a route later known as the Silk Road (Miller et al. 2016). As a result of this long history of cultivation and wide distribution, broomcorn millet was also known by several other names, in order of popularity, including proso millet, common millet, hog

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_1

1

2

L. Li and H. Zhang

millet, white millet, and red millet. Among these names, broomcorn millet and proso millet refer exclusively to the species of Panicum miliaceum whereas common millet sometimes also refers to Setaria. Broomcorn millet is famous for its high wateruse efficiency (WUE), defined as the amount of grain produced per unit of water used by the crop. This is probably related to the fact that broomcorn millet is a typical C4 plant with a high harvest index (the ratio of the seed production and total plant biomass). Additionally, broomcorn millet has a short growing season (typically less than 90 days) and resistance to high temperatures in the summer, making it a suitable crop for dryland agriculture in the semi-arid regions of Eurasia. Despite these advantages, the cultivation area of broomcorn millet in China has been declining over the past 70 years, from 2 million hectares in 1950–1960 to 1–1.2 million hectares in 1991–1995 and only about 0.5–0.7 million hectares in 2010 (Chai et al. 2012). The main reasons for farmers to switch to other crops are the lack of high-yielding cultivars and relatively low income.

1.2

Taxonomy of Broomcorn Millet

1.2.1 Botanical Characteristics Like many other grain crops, broomcorn millet is a member of the grass family, the Poaceae, which contains approximately 780 genera and 12,000 species (Christenhusz and Byng, 2016). Specifically, broomcorn millet belongs to Panicum, a large genus encompassing about 450 species (Aliscioni et al. 2003). Panicum is one of the 80 genera of the tribe Paniceae (Zarnkow et al. 2010), which belongs to the subfamily Panicoideae. Crops from this family provide the majority of the world’s carbohydrates. The common name ‘broom corn’ comes from its appearance at maturity, when the compact panicle droops at the top like an old broom (Habiyaremye et al. 2017). Besides broomcorn millet, there are several other crops known as ‘millets’, some of which have a close taxonomy

relationship with broomcorn millet, such as little millet (Panicum sumatrense), while others belong to different genera, such as pearl millet (Pennisetum glaucum L.R. Br.), finger millet (Eleusine coracana), kodo millet (Paspalum setaceum), foxtail millet (Setaria italica L. Beauv.), and barnyard millet (Echinochloa utilis) (Habiyaremye et al. 2017). Compared to main crops, millets are more productive in harsh environments with agricultural constraints like low fertility, limited rainfall, high temperature, and/or lack of efficient irrigation systems (Habiyaremye et al. 2017). Millets, which collectively feed more than one-third of the world’s population, rank the sixth most important cereals in the world (Habiyaremye et al. 2017). Broomcorn millet is an annual grain that produces bright green leaves with fine and long stems (Fig. 1.1). In the field, the height of the main stem ranges from 0.4 to 2 m, depending on the variety. There are also large variations in the number of tillers, with some varieties containing branches originating from aboveground nodes. The leaf of broomcorn millet is long and pointed, consisting of leaf blade, leaf sheath, ligule, and pulvinus, but without auricle. Broomcorn millet has a fibrous root system, with over 80% of the root biomass distributed in the top 20-cm soil layer. The panicle of broomcorn millet are usually 15- to 50-cm long, which can be divided into 3–5 different types depending on the angle between the axis and the inflorescence branch (Johnson et al. 2019). Caryopses (fruits) of broomcorn millet are usually 2.5–3.2 mm long and 2.0–2.6 mm wide, with an ovate or spherical shape. Caryopses from different varieties of broomcorn millet are colorful, ranging from white cream, yellow, orange, red, brown, to black (Fig. 1.1), while the processed seeds are usually yellow or light yellow. Broomcorn millet is a thermophilic crop that requires relatively high temperatures throughout the life cycle. The optimal soil temperature for its seed germination is between 20 and 30 °C. Although Northern China is the main production area of broomcorn millet, its cultivation can be found over a wide region that covers 30 latitudes. Depending on the specific area, broomcorn millet

1

The Broomcorn Millet Genome

3

Fig. 1.1 Pictures of broomcorn millet and its product. a Broomcorn millet growing in the field. b Broomcorn millet grains from different accessions. c Delicacies made from waxy broomcorn millet grains

is usually sown in spring or summer, with the life cycle ranging from 50 to 110 days. Flowering of broomcorn millet is induced under short-day conditions. Thanks to this property, broomcorn millet is often used as a catch crop in summer when other crops fail because of unfavorable weather or natural disasters. Adaptation of broomcorn millet to the warm and drought environment is reflected in several physiological aspects. For example, the seed germinates normally even when the soil moisture is insufficient for other crops. In arid and semi-arid regions without efficient irrigation systems, broomcorn millet still has advantages over other crops and plays an irreplaceable role in local agricultural production.

1.2.2 Geographic Distribution Broomcorn millet is currently cultivated all over the world, including Eurasia, as well as North

America, Australia, and Africa. As a minor crop, broomcorn millet is mainly grown on the marginal land of arid and semi-arid regions. It is estimated that between 5.5 and 6.0 million hectares of broomcorn millet were grown worldwide, but the area sown may fluctuate considerably from year to year. As the production of broomcorn millet is usually counted together with other millets, the exact statistics are difficult to come by (FAOSTAT 2020), but China and Russia are believed to be the largest producers. The cultivation area in China in 2016 was about 600 thousand hectares (Liu et al. 2017). Traditionally, the cultivation of broomcorn millet in China is classified into three ecological regions: the northern China spring-sown region, the northeast China spring-sown region, and the Loess Plateau spring- and summer-sown region, with the waxy type mainly distributed in the first two regions and the non-waxy type mostly cultivated in the Loess Plateau springand summer-sown region (Diao 2017). In Russia

4

and the former Soviet Union countries, about 45% of the world production of broomcorn millet were concentrated (Zotikov et al. 2012). The cultivation area in Russia between 2006 and 2010 averaged 557.8 thousand hectares, with the southern Federal district and the Privolzhsky Federal district accounting for 86% of the total area (Zotikov et al. 2012). Broomcorn millet was introduced to the United States in 1875 by German-Russian immigrants (Habiyaremye et al. 2017). In 2020 and 2021, the planted area in the United States is about 243 thousand hectares (NAAS-USDA 2020). Unlike Eurasian countries, where its grains were main grown for human consumption, a significant portion of broomcorn millet in the United States is grown as bird seed (Das et al. 2019).

1.2.3 Conserved Germplasms of Broomcorn Millet A rich collection of broomcorn millet germplasms (over 30,000 accessions) is preserved in seed banks worldwide. Part of these germplasms can be ordered online through the web site of individual seedbanks/genebanks, typically by filling out an online form that includes basic information about the applicant. Attributable to its long history of cultivation in the continent of Eurasia, the top three countries hosting large collections of broomcorn millet germplasms are China, Russia, and Ukraine (Table 1.1). Despite the continuous decline of broomcorn millet cultivation in the past decades, the survey and collection of broomcorn millet germplasms are still ongoing in China. The National Germplasm Bank in China (managed by the Institute of Crop Germplasm Resources, Chinese Academy of Agricultural Sciences) contains 9885 accessions of broomcorn millet (Wang and Wang 2018), which can be searched through the Chinese Crop Germplasm Information System (Table 1.1). The N.I. Vavilov Research Institute of Plant Industry in the Russian Federation is reported to hold 9019 accessions (Table 1.1). Two institutes in Ukraine hold 6236 broomcorn millet accessions (Table 1.1). Besides, broomcorn millet

L. Li and H. Zhang

germplasms are also available from governmentfunded seed banks in India, the United States, Japan, and other countries. The genetic diversity among or within different collections of germplasm is largely unknown. Databases that integrate germplasm information from multiple genebanks are useful and convenient tools. For example, EURISCO (European search catalog for plant genetic resources, https://eurisco.ipk-gatersleben.de/) is a searchable catalog of ex situ plant collections maintained in European genebanks, which contain 16,977 broomcorn millet accessions; Genesys (https://www.genesys-pgr.org), an online platform for worldwide plant genetic resources for food and agriculture, contains 18,813 broomcorn millet accessions. EURISCO provides the contact information to the holding seed banks and the direct link to the ordering system while Genesys offers the option to relay seed requests to holding seed banks.

1.2.4 The Underutilized Status of Broomcorn Millet The cultivation of broomcorn millet was influenced by climate, social, and cultural factors before modern times. Archaeological evidence indicates that the cultivation of broomcorn millet in north China dates back 10,000 to 8000 years before the present (Lu et al. 2009). Multiple characteristics of this crop, including high drought tolerance, fast maturation, and high water-use efficiency, reduced the risk of harvest failure and facilitated its adoption as a staple crop by nomad farmers (Miller et al. 2016). According to historical records, broomcorn millet was a staple cereal of northern China till the end of Han dynasty (220 CE) (Su et al. 2012). Archaeological evidence and genetic analyses also suggest that broomcorn millet spread across the continent of Eurasia early in human history. By the second half of the first millennium BC, the crop had spread across the arid zones of Central Asia, onto the Iranian Plateau, southwest Asia, and across Europe, where it became an important crop in the Mediterranean during the first century AD

1

The Broomcorn Millet Genome

5

Table 1.1 Selected germplasm collections of broomcorn millet Country

Number of accessions

Germplasm conservation institutes and web sites

China

9885a

Institute of Crop Sciences, Chinese Academy of Agricultural Sciences http://www.cgris.net

Russian Federation

9019b

N.I. Vavilov Research Institute of Plant Industry http://db.vir.nw.ru/virdb/

Ukraine

4868b

Ustymivka Experimental Station of Plant Production http://www.yuriev.com.ua

Ukraine

1368b

Institute of Plant Production n.a. V.Y. Yurjev of UAAS http://www.yuriev.com.ua

India

1039b

National Bureau of Plant Genetic Resources http://www.nbpgr.ernet.in

India

849b

International Crop Research Institute for the Semi-Arid Tropics https://www.cgiar.org/research/center/icrisat/

USA

722b

North Central Regional Plant Introduction Station, USDA-ARS, NCRPIS https://npgsweb.ars-grin.gov/gringlobal/search

Japan

503b

NARO Genebank https://www.gene.affrc.go.jp/databases_en.php

Bulgaria

497b

Institute for Plant Genetic Resources ‘K.Malkov’ http://ipgrbg.com/en/

Republic of Korea

496c

Gene Bank, Rural Development Administration http://genebank.rda.go.kr

Retrieved from the book ‘China broomcorn millet germplasm resource research’ (Wang and Wang 2018) Numbers were retrieved from the latest records (year: 2019) in the FAO WIEWS information system (http://www.fao. org/wiews/en/). Additional countries that hold fewer accessions are available from WIEWS c Based on information from Gene Bank of the Rural Development Administration (RDA), the Republic of Korea a

b

(Miller et al., 2016; Hunt et al., 2018). Broomcorn millet played a major role in the diet and culture of peoples across Europe and Asia by the first millennium BC (Miller et al. 2016). Compared to other millets, broomcorn millet also better adapts to high altitudes. It was an important crop in the Tibetan Plateau until its replacement by wheat and barley. The advantages of broomcorn millet as a crop for dryland agriculture no longer stands as the irrigation system start to become popular. With proper irrigation, the average yield of wheat was about 4 times of that of broomcorn millet in the Warring States period (475 BC-221 BC) of China (Bei and Zhao 2010), despite wheat being more adapted to the Mediterranean climate. In addition, the development of winter wheat has played a role in relieving food shortage in the spring or caused by natural disasters. With the

adoption of water-driven mills and other technologies, wheat was ground into flour, improving its taste, and gradually completing the transformation from a minor crop to a staple grain. However, broomcorn millet remained an important crop in semi-arid areas of China, where the irrigation system is not well developed. The yield gap between broomcorn millet and wheat was enlarged after the ‘green revolution’ (GR) that introduced the semi-dwarf trait into modern varieties of rice and wheat together with land intensification measures. The GR resulted in a spectacular increase in crop productivity (Hedden 2003). Between 1960 and 2000, the average yield in developing countries rose by 208% for wheat, 109% for rice, and 157% for maize (Pingali 2012). In contrast, the average yield of broomcorn millet in China increased from *0.67 t/ha in the 1950s to 1.07 t/ha in the

6

1990s (Wang and Wang 2018). However, this number is still low compared to main crops such as wheat (3.5 t/ha), rice (4.6 t/ha), and maize (5.8 t/ha) (FAOSTAT 2020). In China, the area of broomcorn millet cultivation is mainly lost to maize, which, in addition to its high yield potential, is easier to manage thanks to herbicide resistance and other traits (Dwivedi et al. 2012). Although broomcorn millet benefited little from the GR, some cultivars have high yield potentials. A record yield of 4.8 tons per ha was achieved in the experimental field of Gannan for the cultivar ‘Qishu 1’ in 2019 (Dong et al. 2020). It is conceivable that the average yield of broomcorn millet can be quickly improved if superior traits such as semi-dwarfism or herbicide resistance are introduced. Another challenge for broomcorn millet is market acceptance. In China, broomcorn millet-derived food is only popular in certain Northern Provinces such as Shanxi. Although broomcorn millet is mainly consumed as food in Asia and Europe, it is mostly considered as bird feed in the United States (Das et al. 2019). In recent years, broomcorn millet is becoming popular in highend restaurants of the United States as ingredients of artisan bread or salads, but it only occupies a small niche of the food market. The declining area of broomcorn millet has also resulted in limited research efforts in this crop, which further hinders the incorporation of broomcorn millet into the current food system and breeding regimes (Bekkering and Tian 2019).

1.2.5 Qualities and Values of Broomcorn Millet The grains of broomcorn millet are highly nutritious, containing higher contents of protein, dietary fiber, folate, several minerals, and phenolic compounds than main cereals. Protein constitutes 11.0–12.5% of the grain dry weight, about 56% and 17% higher than rice and maize, respectively (Saleh et al. 2013; Bekkering and Tian 2019). Broomcorn millet is rich in certain (but not all) essential amino acids, such as

L. Li and H. Zhang

leucine (Marti and Tyl 2021). In addition, broomcorn millet is gluten-free, making it suitable for consumption by people with celiac disease. The carbohydrate content of broomcorn millet is slightly lower than main cereals (Saleh et al. 2013; Bekkering and Tian 2019). Carbohydrates mainly exist in form of starch, whose quality divides broomcorn millet into waxy and non-waxy types that were named as Shu (黍) and Ji (稷), respectively, in China (Wang et al. 2010). The waxy varieties, whose grains are more glutinous and used for making many types of local delicacies, are mainly cultivated in East Asia (Das et al. 2019). Like rice, waxy varieties of broomcorn millet contain little amount of amylose, which is the non-branched form of starch molecules and synthesized by granulebound starch synthase (GBSSI). The dietary fiber level is around 8.5% in broomcorn millet, which is 215% higher than wheat, 16% higher than maize, and 554% higher than rice (Bekkering and Tian 2019). The general recommendation for adequate dietary fiber intake is 25.2 g/day for adult women (age 31–50) and 30.8 g/day for adult men (age 31–50) (Dietary Guidelines Advisory Committee 2015), while the average fiber intake for US children and adults are less than half of the recommended levels (Anderson et al. 2009). Broomcorn millet therefore can be used for dietary fiber supplementation. Broomcorn millet is also a rich source of folate (vitamin B-9). Folate deficiency is an important and underestimated problem of micronutrient malnutrition affecting billions of people worldwide, which can cause many diseases (Blancquaert et al. 2014). Broomcorn millet contains 85 lg of folate per 100 g grain, which is 2.3–9.6 fold higher than the content in wheat, maize, or rice (Bekkering and Tian 2019). Likewise, the content of iron and phosphorus in broomcorn millet is also higher than wheat, maize, and rice (Bekkering and Tian 2019). Considering these benefits, broomcorn millet is a valuable crop for diversifying nutritional intake. Several characteristics of broomcorn millet, including high water-use efficiency, resistance to drought and heat stress, and short growing season, provide them unique advantages under the

1

The Broomcorn Millet Genome

general trend of global climate change. Broomcorn millet is highly adapted to dryland cropping systems with rain and high temperature in the same season. Compared to wheat, corn and sorghum, broomcorn millet requires less soil water content for seed germination and has lower transpiration rate. With as low as 200–400 mm rainfall during the whole growing season, broomcorn millet can produce significant yield. This number is several fold lower than that of wheat or rice (Das et al. 2019). Traditionally, broomcorn millet plays an important role in stabilizing food production in semi-arid areas of Northern China. A similar role in global food security can be envisaged as drought and hot weather become more frequent due to global climate change (Battisti and Naylor 2009; Dai 2013). In addition to drought and high temperature, certain broomcorn millet varieties also exhibit resistance to acidic soil, salinity (up to 17 dS/m), and high altitudes (Das et al. 2019; Yuan et al. 2019). This makes broomcorn millet a suitable crop for marginal lands, where cultivation of main crops is usually economically unsustainable. Broomcorn millet may be used as a summer rotation crop with wheat or other winter crops. Compared to summer fallow, planting broomcorn millet improves weed control, reduces disease pressure, and results in better yield (Baltensperge et al. 1995; Huang et al. 2003). Thanks to its shallow root system, broomcorn millet also helps preventing soil degradation and preserves soil moisture for deep-rooted crops (Huang et al. 2003; Felter et al. 2006; Wang et al. 2011).

1.3

Genome Sequencing

1.3.1 Genome Assembly Broomcorn millet (Panicum miliaceum L.) is an allotetraploid (2n = 4x = 36) with a basic chromosome number of x = 9 (Aliscioni et al. 2003). The genus Panicum sensu stricto contains *100 species, and many of them are polyploid. Widespread reticulate evolution within this genus

7

makes resolution of the polyploid history of broomcorn millet difficult. Phylogenetic analyses of 5 low-copy nuclear gene sequences from selected species of Panicum sensu stricto suggested that broomcorn millet has a genomic composition of FFHH (Triplett et al. 2012). Cytogenetic evidence further supported the inference that half of the broomcorn millet genome shares homology with P. capillare, which was predicted to have an F genome, and the other half shared with part of P. repens, another allotetraploid species from the same genus (Hunt et al. 2014). The genome size of broomcorn millet was estimated to be *923 Megabases (Mb) (Zou et al. 2019) and *887.8 Mb (Shi et al. 2019) using K-mer-based methods, which were close to the genome size estimated through flow cytometry (C = 1.04 pg, aka. *963 Mb) (Kubešová et al. 2010). The genome size of broomcorn millet is about twice that of foxtail millet (Setaria italica), a close diploid relative with a genome size of *485 Mb (Bennetzen et al. 2012; Zhang et al. 2012). To date, three reports on the genome of broomcorn millet have been published (Table 1.2). The varieties used for genome sequencing include two from China (0390 and Longmi4) and one from the United States (Huntsman). The accession ‘0390’ is a landrace collected from Jilin Province of China (Zou et al. 2019); ‘Longmi4’ is a cultivar widely cultivated in the northern region of China (Shi et al. 2019); ‘Huntsman’ is a cultivar widely grown in the western United States (Ott et al. 2018). The Huntsman genome used a linked reads library and Illumina short-read sequencing for the assembly (Ott et al. 2018). The linked read technique developed by 10 Genomics (Pleasanton, CA, USA) utilizes microfluidics and emulsion PCR-based barcoding to reduce the complexity of genomes. This in principle could improve short read-based genome assembly and haplotyping. In short, *366 Gb (*380 genome coverage) of paired end reads were sequenced and assembled using the Supernova Assembler, which resulted in 30,819 scaffolds with an L50 of 237 scaffolds and an N50 of

8

L. Li and H. Zhang

(also called pseudochromosomes), a high-density genetic map was constructed based on wholegenome resequencing of an F6 population of recombinant inbred lines (RIL) containing 132 individuals. A total of 4,146 contigs from Pm_0390_v0.1 were anchored onto the genetic map, which consisted of 18 linkage groups and 221,787 SNP markers (more details in Sect. 1.3.2). Furthermore, the contigs can also be arranged into 18 groups based on the spatial relationship deduced from the Hi-C library. The position information from the genetic map and Hi-C were integrated with ALLMAPS (Tang et al. 2015) and a total of, 250 contigs were and oriented in 18 pseudochromosomes with a sum length of 822 Mb. The remaining 1291 unassigned contigs only accounted for 3.9% of the final assembly (Pm_0390_v1) in length, which has a total length of 855 Mb, *92.6% of the estimated genome size (*923 Mb). The quality of the assembly was assessed through comparison with fosmid sequences. Ten clones from a newly constructed fosmid library were randomly selected for PacBio sequencing with a mean coverage of *1000 . The de novo assembled fosmid sequences ranged from 24 to 46 kb in length and can be aligned to the final assembly with 99.53–100% identity and no structural discrepancies. Shi et al. (2019) used a slightly different strategy for assembling the reference genome of Longmi4. Raw data were generated using PacBio

912 kb (Ott et al., 2018). In comparison, the ‘0390’ and ‘Longmi4’ genomes are more contiguous and more complete than the ‘Huntsman’ genome thanks to the use of long-read singlemolecule real-time (SMRT) sequencing. This is exemplified by the higher contig N50 values and the larger assembled genome sizes (Table 1.2). The use of Hi-C and/or the high-density genetic map also allowed anchoring the majority of contigs from the two assemblies into pseudochromosomes. The average, maximum, and minimum length of 0390 and Longmi4 chromosomes are quite similar (45.7, 66.9, and 32.2 Mb versus 46.6, 69.2, and 32.2 Mb), indicating that the two genomes are similar in size (Table 1.2). To generate the reference genome of 0390 (Fig. 1.2), Zou et al. (2019) applied a multitechnology strategy, including SMRT sequencing (PacBio), short-read sequencing (Illumina), genetic mapping, and high-throughput chromosome conformation capture (Hi-C) assay. Firstly, *87 coverage of SMRT sequencing data with a mean subread length of 6.5 kb were assembled using Canu (Koren et al. 2017) and the draft was then error-corrected using Pilon (Walker et al. 2014) using 250-bp paired end sequences from a PCR-free library. The resulted assembly (Pm_0390_v0.1) consisted of 5541 contigs with an L50 of 423 contigs and an N50 of 369 kb. The consensus error rate was estimated to be *0.004% (1 error per *25 kb). To anchor the v0.1 contigs into pseudomolecules

Table 1.2 Three reported genome assemblies of broomcorn millet Name

0390

Longmi4

Huntsman

Accession number

CGRIS (00,000,390)

CGRIS (GS04001-1993)

USDA (PI 578,074)

Scaffold length

855 Mb

848 Mb

823 Mb

Longest scaffold

5.2 Mb (contig)

22.6 Mb

5.6 Mb

Scaffold number

1309 (contig)

905

30,819

Scaffold N50

0.4 Mb (contig)

8.2 Mb

912 Kb

Scaffold L50

422 (contig)



237

Pseudochromosome number

18

18



Pseudochromosome length

822,124,240

838,873,930



References

(Zou et al. 2019)

(Shi et al. 2019)

(Ott et al. 2018)

1

The Broomcorn Millet Genome

9

Fig. 1.2 Synteny and distribution of features in the broomcorn millet genome. The number and length (Mb) of pseudochromosomes are indicated outside of the ring. a TE coverage, b gene density, c average transcript levels, d marker density represented by the number of SNPs, and e GC (guanine-cytosine) content of the genome in 1-Mb

nonoverlapping windows. f Synteny blocks >1 Mb long among homologous broomcorn millet chromosomes are indicated. TE transposable elements, SNP single nucleotide polymorphism. This figure is credited to Zou et al. (2019) without any changes

sequencing, Illumina sequencing, BioNano optical mapping, and Hi-C. Firstly, *170 coverage of PacBio reads were self-corrected and assembled using Falcon (Chin et al. 2016) to generate the raw contigs, which were polished

with both PacBio and Illumina reads to generate 1262 consensus contigs (*839.0 Mb) with an N50 of *2.58 Mb. They next used *235 BioNano optical maps to resolve conflicts in the original contigs and anchor the resolved 1,308

10

contigs into 905 scaffolds (*848.4 Mb), which have an N50 of *8.24 Mb. Finally, 444 scaffolds that cover *98.9% of the total assembly length were clustered and oriented into 18 pseudochromosomes based on *140 coverage of Hi-C sequencing data. The final assembly contains 475 scaffolds (including the 18 pseudochromosomes) that covers *95.6% of the estimated genome size (887.8 Mb).

1.3.2 Genetic Linkage Map Genetic linkage maps are essential resources not only for anchoring contigs/scaffolds into pseudochromosomes but also for quantitative trait loci (QTL) mapping, molecular breeding, and marker-assisted selection. The first published genetic linkage map of broomcorn millet was generated through genotyping by sequencing of 93 F6 recombinant inbred lines (RILs), which are derived from a single F1 plant of ‘Huntsman’ (PI 578,074)  ‘Minsum’ (PI 649,385) (Rajput et al. 2016). After data analysis, 833 single nucleotide polymorphism (SNP) markers formed 18 major and 84 minor linkage groups. The 18 major linkage groups contain 117 high-quality SNP markers and span a total length of 2137 centimorgan (cM). To obtain a high-quality genetic linkage map, Zou et al. (2019) performed whole-genome resequencing for an F6 RIL population with 132 individuals. Genomes of the parental lines, including ‘Huntsman’ (PI 578,074) and ‘Irtyskoe-201’ (PI 476,398), were also sequenced to *30 coverage. A total of 1.36-Tb clean data were generated for 134 samples with an average sequencing depth of 11 . After read alignment and variant calling, a total of 221,787 high-quality SNP markers were used for constructing the genetic map, which contains 18 linkage groups and spans a total length of 3092 cM and 2811 cM for the male and female map, respectively. Given the estimated genome size of *923 Mb, the whole-genome average ratio of the physical size to the genetic size in broomcorn millet is *313 kb/cM, a number very close to that of rice (Kurata et al. 2002).

L. Li and H. Zhang

1.3.3 Genome Annotation Both teams used similar strategies to annotate repetitive sequences and gene models in the final assembly (Shi et al. 2019; Zou et al. 2019). Although sometimes different software was used, four general steps were performed for genome annotation: repetitive sequences including transposable elements (TE) were annotated and masked from gene model prediction; then one or more of the three non-redundant approaches, including ab initio prediction, homology-based prediction, and transcriptome-guided prediction, were used to produce intermediate gene models; the final gene models were generated by merging intermediate gene models using published software or in-house scripts; the final step is to annotate the function of protein-coding gene. For both assemblies, more than half of the genomic sequences (*58.2% for 0390 and *54.1% for Longmi4) are identified as repeats, of which more than 90% are TEs. As with other plant genomes, long-terminal retrotransposons (LTRs) account for the majority (>80%) of TEs. The 0390 and Longmi4 genome reported 55,930 and 63,671 protein-coding genes respectively. In addition, 339 microRNAs, 1420 transfer RNAs, 1640 ribosomal RNAs, and 2302 small nuclear RNAs were identified for the 0390 genome. The coverage of gene space was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Seppey et al. 2019). Both genomes contain 1411 intact orthologous genes of the 1440 single-copy orthologs from BUSCO v2, indicating high completeness of the gene model prediction. Although the 0390 genome has a smaller number of gene models, average lengths of the protein-coding gene (3260 versus 2883 bp) and the coding sequence (1172 versus 1023 bp) are slightly larger than those of the Longmi4 genome. Overall, the number of predicted protein-coding genes is about twice of that of foxtail millet (Bennetzen et al. 2012; Zhang et al. 2012). Functional annotation of gene models is mainly based on homology search against public functional databases, such as InterPro, Gene Ontology, KEGG, KOG, SwissProt, and TrEMBL. The results from these

1

The Broomcorn Millet Genome

searches were concatenated. As expected, conserved protein domains can be identified in high percentage of the gene models.

1.3.4 Evolutionary History of Broomcorn Millet Despite an ancient crop, the evolution history of broomcorn millet is largely unknown. The two diploid ancestors that hybridized to form this species need to be identified and the wild ancestor for this crop also remains controversial. The genome sequences provide a reference to answer these questions in the future. Assuming a constant random mutation rate over time, the divergence time between two related species can be inferred based on the synonymous mutation rate of orthologous gene pairs (i.e., single-copy ortholog in each genome). Using this approach, the divergence time between foxtail millet and broomcorn millet was estimated to be *18 million years ago (MYA) (Zou et al. 2019). This number is slightly different from *13.1 MYA estimated by Shi et al. The discrepancy may be due to different reference mutation rate assumed. The same method can be used to compare the two subgenomes of broomcorn millet, whose divergence was estimated to occur *5.8 MYA (Shi et al. 2019; Zou et al. 2019). The tetraploidization event of broomcorn millet should happen less than 5.8 MYA, if the mutation rate remained unaffected after wholegenome duplication, Broomcorn millet is one of the earliest domesticated crops. The most important evidence supporting this statement was found at the Neolithic site of Cishan, which is located near the junction of the Loess Plateau and the North China Plain. The discovered husk phytoliths and biomolecular components of broomcorn millet were dated to ca. 10,300 and ca. 8700 calibrated years before present (cal yr BP) (Lu et al. 2009). Another important discovery was made in the Dadiwan site in northwest China, where broomcorn millet remains were dated to 7900– 7200 cal yr BP (Barton et al. 2009). In addition, there are many other archaeobotanical evidences

11

of broomcorn millet in China, which are widely distributed and dated (Tsang et al. 2017). Miller et al. (2016) proposed that broomcorn millet spread across Europe and other parts of the continent through trade routes along the mountain valleys of Central Eurasia. There might be two expansion waves of broomcorn millet from east Eurasia to west Eurasia (Miller et al. 2016). By the first millennium BC, the adoption of broomcorn millet was influenced by climate, irrigation technology, and the social context of agricultural production. The second wave during the second half of the first millennium BC was due to a change of the economic role of the crop to a summer rotation crop in complex crop rotation systems in order to maximize the productivity of land (Miller et al. 2016). These results support the hypothesis that broomcorn millet was originated and domesticated in northern China. However, archeological broomcorn millet samples from as early as the sixth millennium BC were also reported in Eastern Europe, although the dating method was questioned (Motuzaite-Matuzeviciute et al. 2013). Therefore, whether there is a second domestication center in Eastern Europe is a question. The center of origin of a crop usually has the highest level of genetic diversity. Genetic diversity analyses using microsatellite and granule-bound starch synthase I (GBSSI) sequences from 341 landraces across Eurasia support that the origin of broomcorn millet expansion was in western China and the western Loess Plateau is a region of primary domestication of broomcorn millet (Hunt et al. 2018). In the future, genome resequencing of worldwide varieties as well as archaeobotanical samples will provide more definitive answers to this question.

1.3.5 Comparative Genomics with Other Crops With high-quality assembly and annotation of the genome, extended synteny (evolutionarily preserved order of genes on the chromosome) were detected between broomcorn millet and foxtail millet (Fig. 1.3a), as well as other grass species

12

(Fig. 1.3b). The chromosomes of broomcorn millet have a 2-to-1 syntenic relationship to the chromosomes of foxtail millet (Shi et al. 2019; Zou et al. 2019). The 0390 and Longmi4 genomes are not fully consistent at the chromosome level (Fig. 1.3c). This may be due to different techniques used to cluster and orient contigs/scaffolds. However, the two assemblies consistently identified two intrachromosomal inversions on the two chromosomes homologous to Chr5 in foxtail millet (Shi et al. 2019; Zou et al. 2019). Further studies are needed to discriminate assembly errors vs large structural variations. In addition, the fusion of Chr8 and Chr9 of the sorghum genome seem to correspond to Chr3 in foxtail millet, and Chr5 and Chr6 in broomcorn millet; this explains the difference in the basic chromosome number of broomcorn millet and sorghum. Based on ortholog analysis with six other grass species, 47,142 broomcorn millet genes were assigned to 20,374 orthologous groups (OGs); the majority of which (59.2%) contains two genes (Zou et al. 2019). This ratio of 2-copy gene families is high compared to other crops with polyploid genomes. For example, in hexaploid wheat 2-copy and 3-copy gene families represent only 19.8% and 18.0% of the total number. Most members from the two-copy families of broomcorn millet are located in blocks syntenic with foxtail millet, confirming that the expansion in two-copy gene families is primarily due to tetraploidization. Gene duplication is usually followed by functional diversification, which may be reflected in the change of gene expression patterns. Shanon Entropy was proposed to quantify the uniformity of gene expression patterns (Martinez and Reyes-Valdes 2008). We thus calculated the distribution of Shanon Entropy for genes from different sizes of OGs using the organ-specific transcriptome data generated for genome annotation. In general, genes from high copy families have low Shanon Entropy values and more organ-specific expression patterns. The only exception are the genes from 2-copy families, which had the highest entropy values and the most uniform expression among different organs. These genes also have

L. Li and H. Zhang

the highest median expression levels. Gene Ontology (GO) enrichment analysis of 2-copy gene families revealed enrichment of genes involved in transcriptional regulation, RNA processing, DNA repair, and developmental regulation. These results seem to indicate that the tetraploidization event of broomcorn millet happens relatively recently. By comparing broomcorn millet to four other grass species, including foxtail millet, maize, sorghum, and rice, genes unique for broomcorn millet can be identified. Not surprisingly, most gene families (74.5%) in broomcorn millet were shared among all five species, while only 4.2% (862) of the gene families are specific to broomcorn millet (Zou et al. 2019). GO enrichment analysis indicated that genes involved in protein phosphorylation and protein–protein interactions were significantly over-represented. Among the OGs that have a significantly larger size compared to other grass species, several were annotated as F-box proteins. Further investigation indicated that almost all the members contained the BTB (broad complex/ tramtrack/bric-a-brac) domain. BTB-domain containing proteins usually form the ubiquitin E3 ligase complex with two other proteins, CUL3 (cullin-3) and RBX1 (RING-box protein 1), and BTB functions in substrate recognition of the complex. To understand the significance of this finding, gene models of BTB proteins from broomcorn millet, foxtail millet, rice, and Arabidopsis were identified and/or manually curated. Then, the predicted protein sequences were submitted to SMART (Simple Modular Architecture Research Tool) database (Letunic et al. 2021) for domain identification and annotation. This allows the identification of 247 BTBdomain containing proteins in the broomcorn millet genome, which can be further divided into over 12 subgroups based on the domain architecture. The top 3 subgroups (MATH-BTBBACK, MATH-BTB, BTB-BACK) contain over half of the BTB genes (Fig. 1.4a). Several subgroups, including MATH-BTB-BACK, BTBBACK, BTB-NPH3, and BTB-TPR, contain at least two times the number of genes in broomcorn millet compared to foxtail millet or rice.

1

The Broomcorn Millet Genome

13

Fig. 1.3 Dot plots showing the syntenic relationship between a broomcorn millet and foxtail millet, b broomcorn millet and sorghum, c broomcorn millet ‘0390’ and

broomcorn millet ‘Longmi4’. Each dot represents a synteny block containing at least 4 pairs of genes

Further phylogenetic analyses (Fig. 1.4b) using only the BTB-domain sequences from broomcorn millet indicated a complex phylogenetic relationship among BTB proteins although the proteins with the same domain arrangement usually cluster into the same large clade. Further analyses are needed to identify the number of BTB genes in other species from Panicum or Paniceae. Considering the critical role of the ubiquitin-protease system in regulating protein dynamics and numerous biological processes, the

BTB family represent an attractive subject to study their potential role in the adaptive evolution of this species.

1.3.6 Genes Involved in C4 Photosynthesis Broomcorn millet was traditionally used as a model plant to study C4 biology. In fact, it was one of the most frequently used species for the

14

L. Li and H. Zhang

Fig. 1.4 a Gene copy number and domain architecture of BTB proteins in broomcorn millet (Pm), foxtail millet (Si), rice (Os), and Arabidopsis thaliana (At). Gene copy numbers that are at least twofold higher in broomcorn millet (P. miliaceum) than in other species are labeled red. BTB: broad complex/tramtrack/bric-a-brac; NPH3:

nonphototropic- hypocotyl 3; MATH: meprin-and-TRAFhomology; BACK: BTB and C-terminal Kelch; TAZ: Transcription Adaptor putative Zinc finger; TPR: Tetratricopeptide repeat; F5_F8 type C: discoidin domain. b Phylogenetic tree of BTB proteins in broomcorn millet

purification and characterization of C4-related enzymes. C4 plants operate more efficiently under low concentrations of CO2, conferring them a competitive advantage under hot and dry conditions, which usually reduce stomata openness and internal CO2 availability. Panicum is traditionally categorized as the NAD-ME subtype of C4 plants, named after the principal decarboxylation enzyme NAD-dependent malic enzyme. The other two subtypes are NADP-ME (NADP-dependent malic enzyme) and PEP-CK (phosphoenolpyruvate carboxykinase). It was reported that NAD-ME C4 grasses in general have higher WUE than their NADP-ME relatives (Ghannoum et al. 2002), but the underlying mechanism is unknown. We analyzed the main genes involved in C4 carbon fixation, including enzymes and metabolite transporters, and found that all the homologs are in syntenic regions conserved with other grass species such as rice. Because most of these enzymes also have homologs that play a housekeeping role in both C4 and C3 species, we inferred their involvement in C4 carbon fixation in broomcorn millet based on their preferential expression in photosynthetic tissues/organs. As expected, we identified two

NAD-ME genes that are highly expressed in photosynthetic tissues such as leaves and stems, but not in roots or seeds. To our surprise, enzymes specific for the NADP-ME subtype, such as NADP-ME and NADP-MDH, were also more highly expressed in photosynthetic tissues. Two NADP-ME genes (PM03G37230 and PM08G02950) were expressed at higher levels than the two C4 NAD-MEs in seedlings. While future biochemical experiments are needed to validate the role of candidate NADP-MEs in C4 carbon fixation, current results suggest a mixed C4 model that operates with all the components of the traditional NAD-ME and NADP-ME subtypes. The flexible use of different decarboxylation mechanisms under different conditions or developmental stages may help broomcorn millet better adapts to the everchanging environment.

1.4

Future Goals and Prospects

As we have seen with model organisms, highquality reference genomes will lay the foundation of molecular breeding and functional genomics

1

The Broomcorn Millet Genome

studies for broomcorn millet. Although low to medium levels of genetic diversity were reported for broomcorn millet based on a small number of genes, field observations show great morphological and physiological diversity (Fig. 1.1) (Yuan et al. 2019). With the large number of germplasms in public seedbanks, the genetic bases of yield and grain quality of broomcorn millet can be elucidated at an acceptable cost. Another direction is the identification of male sterile lines of broomcorn millet, which is critical for the development of hybrid varieties. Lack of efficient genetic transformation systems remains a technical obstacle for broomcorn millet research, especially in the era of genome editing. With these efforts, hopefully more elite varieties for this ancient crop can be generated in the near future. Acknowledgements We thank Dr. Changsong Zou for the help on graphics. This work was supported by the National Natural Science Foundation of China (31900189 to L.L.), Science and Technology Commission of Shanghai Municipality (17391900200 to H.Z.), and Chinese Academy of Sciences (131965KYSB20190083-03 and XDB27040108 to H.Z.).

References Aliscioni SS, Giussani LM, Zuloaga FO, Kellogg EA (2003) A molecular phylogeny of Panicum (Poaceae: Paniceae): tests of monophyly and phylogenetic placement within the Panicoideae. Am J Bot 90:796–821 Anderson JW, Baird P, Davis RH, Ferreri S, Knudtson M, Koraym A, Waters V, Williams CL (2009) Health benefits of dietary fiber. Nutr Rev 67:188–205 Baltensperge D, Lyon D, Anderson R, Holman T, Stymiest C, Shanahan J, Nelson L, DeBoer K, Hein G, Krall J (1995) Producing and marketing proso millet in the high plains. Univ of Nebraska, Lincoln Barton L, Newsome SD, Chen FH, Wang H, Guilderson TP, Bettinger RL (2009) Agricultural origins and the isotopic identity of domestication in northern China. Proc Natl Acad Sci U S A 106:5523–5528 Battisti DS, Naylor RL (2009) Historical warnings of future food insecurity with unprecedented seasonal heat. Science 323:240–244 Bei Y, Zhao Z (2010). Wheat: the power of the Qin dynasty unified the world. Chinese Heritage 1, http:// www.dili360.com/ch/article/ p5350c5353da5355a5355e5063.htm (in Chinese)

15 Bekkering CS, Tian L (2019) Thinking outside of the cereal box: breeding underutilized (pseudo) cereals for improved human nutrition. Front Genet 10:1289 Bennetzen JL, Schmutz J, Wang H, Percifield R, Hawkins J, Pontaroli AC, Estep M, Feng L, Vaughn JN, Grimwood J, Jenkins J, Barry K, Lindquist E, Hellsten U, Deshpande S, Wang X, Wu X, Mitros T, Triplett J, Yang X, Ye CY, MauroHerrera M, Wang L, Li P, Sharma M, Sharma R, Ronald PC, Panaud O, Kellogg EA, Brutnell TP, Doust AN, Tuskan GA, Rokhsar D, Devos KM (2012) Reference genome sequence of the model plant Setaria. Nat Biotechnol 30:555–561 Blancquaert D, De Steur H, Gellynck X, Van Der Straeten D (2014) Present and future of folate biofortification of crop plants. J Exp Bot 65:895–906 Chai Y, Gao X, Cui L (2012) Production of proso millet in China. In Advances in broomcorn millet research. In: Chai Y, Feng B (eds) Proceedings of the 1st international symposium on broomcorn millet. Northwest A&F University (NWSUAF). Yangling: Northwest A&F University Press Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O’Malley R, FigueroaBalderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC (2016) Phased diploid genome assembly with single-molecule real-time sequencing. Nat Methods 13:1050–1054 Christenhusz MJM, Byng JW (2016) The number of known plants species in the world and its annual increase. Phytotaxa 261 Dai AG (2013) Increasing drought under global warming in observations and models. Nat Clim Chang 3:52–58 Das S, Khound R, Santra M, Santra D (2019) Beyond bird feed: proso millet for human health and environment. Agriculture 9 Diao X (2017) Production and genetic improvement of minor cereals in China. The Crop Journal 5:103–114 Dietary Guidelines Advisory Committee (2015) Dietary guidelines for Americans 2015–2020. (Government Printing Office) Dong Y, Yan F, Li Q, Ji S, Li X, Zhao S, Wu L, Zhou C (2020) High-yield stability analyses of 13 broomcorn millet varieties. Heilongjiang Agric Sci 12–14 Dwivedi S, Upadhyaya H, Senthilvel S, Hash C, Fukunaga K, Diao X, Santra D, Baltensperger D, Prasad M (2012) Millets: genetic and genomic resources. In: Janick J (ed) Plant breeding reviews. John Wiley & Sons, Inc., Hoboken, New Jersey, pp 247–375 FAOSTAT (2020) FAO Statistics. https://www.fao.org/ faostat/ Felter DG, Lyon DJ, Nielsen DC (2006) Evaluating crops for a flexible summer fallow cropping system. Agron J 98:1510–1517 Ghannoum O, von Caemmerer S, Conroy JP (2002) The effect of drought on plant water use efficiency of nine NAD-ME and nine NADP-ME Australian C-4 grasses. Funct Plant Biol 29:1337–1348

16 Habiyaremye C, Matanguihan JB, Guedes JD, Ganjyal GM, Whiteman MR, Kidwell KK, Murphy KM (2017) Proso millet (panicum miliaceum l.) and its potential for cultivation in the Pacific Northwest, U. S.: a review. Front Plant Sci 7, 1961 Hedden P (2003) The genes of the Green Revolution. Trends Genet 19(1):5–9. https://doi.org/10.1016/ S0168-9525(02)00009-4 Huang M, Shao M, Zhang L, Li Y (2003) Water use efficiency and sustainability of different long-term crop rotation systems in the Loess Plateau of China. Soil Tillage Res 72:95–104 Hunt HV, Badakshi F, Romanova O, Howe CJ, Jones MK, Heslop-Harrison JSP (2014) Reticulate evolution in Panicum (Poaceae): the origin of tetraploid broomcorn millet P. Miliaceum. J Exp Bot 65:3165–3175 Hunt HV, Rudzinski A, Jiang HE, Wang RY, Thomas MG, Jones MK (2018) Genetic evidence for a western Chinese origin of broomcorn millet (Panicum miliaceum). Holocene 28:1968–1978 Johnson M, Deshpande S, Vetriventhan M, Upadhyaya HD, Wallace JG (2019) Genome-wide population structure analyses of three minor millets: kodo millet, little millet, and proso millet. Plant Genome 12:190021 Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM (2017) Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res 27:722–736 Kubešová M, Moravcova L, Suda J, Jarošík V, Pyšek P (2010) Naturalized plants have smaller genomes than their non-invading relatives: a flow cytometric analysis of the Czech alien flora. Preslia 82:81–96 Kurata N, Nonomura KI, Harushima Y (2002) Rice genome organization: the centromere and genome interactions. Ann Bot 90:427–435 Letunic I, Khedkar S, Bork P (2021) SMART: recent updates, new developments and status in 2020. Nucleic Acids Res 49:D458–D460 Liu F, Liu M, Zhao Y, Nan C, Xie X, Li S, Xu L (2017) Development trends of china’s foxtail millet and broomcorn millet industries in 2017. Agric Outlook 13:40–43 (in Chinese) Lu H, Zhang J, Liu KB, Wu N, Li Y, Zhou K, Ye M, Zhang T, Zhang H, Yang X, Shen L, Xu D, Li Q (2009) Earliest domestication of common millet (Panicum miliaceum) in East Asia extended to 10,000 years ago. Proc Natl Acad Sci USA 106:7367–7372 Marti A, Tyl C (2021) Capitalizing on a double crop: recent advances in proso millet’s transition to a food crop. Compr Rev Food Sci Food Saf 20:819–839 Martinez O, Reyes-Valdes MH (2008) Defining diversity, specialization, and gene specificity in transcriptomes through information theory. Proc Natl Acad Sci USA 105:9709–9714 Miller NF, Spengler RN, Frachetti M (2016) Millet cultivation across Eurasia: Origins, spread, and the

L. Li and H. Zhang influence of seasonal climate. Holocene 26:1566– 1575 Motuzaite-Matuzeviciute G, Staff RA, Hunt HV, Liu X, Jones MK (2013) The early chronology of broomcorn millet (Panicum miliaceum) in Europe. Antiquity 87:1073–1085 NAAS-USDA. (2020). Crop production https://usda. library.cornell.edu/concern/publications/tm70mv177 Ott A, Schnable JC, Yeh CT, Wu LJ, Liu C, Hu HC, Dalgard CL, Sarkar S, Schnable PS (2018) Linked read technology for assembling large complex and polyploid genomes. BMC Genomics 19:651 Pingali PL (2012) Green revolution: impacts, limits, and the path ahead. Proc Natl Acad Sci USA 109:12302– 12308 Rajput SG, Santra DK, Schnable J (2016) Mapping QTLs for morpho-agronomic traits in proso millet (Panicum miliaceum L.). Mol Breeding 36 Saleh ASM, Zhang Q, Chen J, Shen Q (2013) Millet grains: nutritional quality, processing, and potential health benefits. Compr Rev Food Sci Food Saf 12:281–295 Seppey M, Manni M, Zdobnov EM (2019) BUSCO: assessing genome assembly and annotation completeness. Methods Mol Biol 1962:227–245 Shi J, Ma X, Zhang J, Zhou Y, Liu M, Huang L, Sun S, Zhang X, Gao X, Zhan W, Li P, Wang L, Lu P, Zhao H, Song W, Lai J (2019) Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat Commun 10:464 Su W, Xu B, Gao J, Chai Y (2012) The origin and evolution of broomcorn millet in China. In: Chai Y, Feng B (eds) Advances in broomcorn millet research. Proceedings of the 1st international symposium on broomcorn millet. Northwest A&F University (NWSUAF), (Yangling: Northwest A&F University Press), pp 17–22 Tang H, Zhang X, Miao C, Zhang J, Ming R, Schnable JC, Schnable PS, Lyons E, Lu J (2015) ALLMAPS: robust scaffold ordering based on multiple maps. Genome Biol 16:3 Triplett JK, Wang Y, Zhong J, Kellogg EA (2012) Five nuclear loci resolve the polyploid history of switchgrass (Panicum virgatum L.) and relatives. Plos One 7, e38702 Tsang CH, Li KT, Hsu TF, Tsai YC, Fang PH, Hsing YLC (2017) Broomcorn and foxtail millet were cultivated in Taiwan about 5000 years ago. Bot Stud 58 Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM (2014) Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. Plos One 9:e112963 Wang L, Wang Y (2018) China broomcorn millet germplasm resource research. China Agricultral Science and Technology Press, Beijing Wang X, Li J, Hao M (2011) Simulation of sustainable use of soil water in dry land for alfalfa-grain rotation

1

The Broomcorn Millet Genome

system at Changwu arid-plateau of China. Trans CSAE 27:257–266 (in Chinese) Wang X-Y, Wang L, Wen Q-F (2010) Textual research and standardization of chinese name of broomcorn millet. J Plant Genet Res 11:132–138 Yuan Y, Yang Q, Dang K, Yang P, Gao J, Gao X, Wang P, Lu P, Liu M, Feng B (2019) Salt-tolerance evaluation and physiological response of salt stress of broomcorn millet (Panicum miliaceum L.). Scientia Agricultura Sinica 52:4066–4090 Zarnkow M, Keßler M, Back W, Arendt EK, Gastl M (2010) Optimisation of the mashing procedure for 100% malted proso millet (Panicum miliaceum L.) as a raw material for gluten-free beverages and beers. J Inst Brew 116:141–150 Zhang G, Liu X, Quan Z, Cheng S, Xu X, Pan S, Xie M, Zeng P, Yue Z, Wang W, Tao Y, Bian C, Han C, Xia Q, Peng X, Cao R, Yang X, Zhan D, Hu J,

17 Zhang Y, Li H, Li H, Li N, Wang J, Wang C, Wang R, Guo T, Cai Y, Liu C, Xiang H, Shi Q, Huang P, Chen Q, Li Y, Wang J, Zhao Z, Wang J (2012) Genome sequence of foxtail millet (Setaria italica) provides insights into grass evolution and biofuel potential. Nat Biotechnol 30:549–554 Zotikov V, Sidorenko V, Bobkov S, Kotlyar A, Gurinovich S (2012) Area and production of proso millet (Panicum miliaceum L.) in Russia. In: Advances in broomcorn millet research. Proceedings of the 1st international symposium on broomcorn millet. Northwest A&F University (NWSUAF), pp 25–31 Zou C, Li L, Miki D, Li D, Tang Q, Xiao L, Rajput S, Deng P, Peng L, Jia W, Huang R, Zhang M, Sun Y, Hu J, Fu X, Schnable PS, Chang Y, Li F, Zhang H, Feng B, Zhu X, Liu R, Schnable JC, Zhu JK, Zhang H (2019) The genome of broomcorn millet. Nat Commun 10:436

2

Buckwheat Genome and Genomics Yuqi He and Meiliang Zhou

Abstract

Buckwheat was domesticated in the southwest of China and has been cultivated for more than 4000 years. Buckwheat is adapted to harsh environment and has be used as a staple food in mountainous areas for centuries. Because of its great nutritional and medical values, buckwheat products have gained worldwide attention. The rich germplasm resources in buckwheat contain abundant genetic variation, which is the genetic basis for buckwheat breeding. In recent years, with the assembly of the buckwheat genome and the discovery of genome variation, candidate genes responsible for key traits have been identified. Here, we summarize recent research progress of buckwheat genomics and discuss key trait-regulating genes. The rapid development of the buckwheat genome and molecular breeding technology may contribute to breeding buckwheat varieties with better nutritional traits, yield, and resistance in future.

Y. He  M. Zhou (&) Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, National Crop Gene Bank Building, Zhongguancun South Street No. 12, Haidian District, Beijing 100081, China e-mail: [email protected]

2.1

Introduction

Buckwheat belongs to the genus Fagopyrum, a dicotyledonous species of the Polygonaceae. This crop originated in the southwest of China, and its cultivation has been traced back to the first and second centuries BC (Tang et al. 2016). As it has a short growth period and harsh environment adaptability, buckwheat is planted as staple food on all continents, especially mountainous areas where other major crops cannot survive well (Kumari and Chaudhary 2020). Due to its gluten-free, balanced essential amino acid composition, as well as abundant dietary fiber, mineral elements, and secondary metabolites, such as vitamins and flavonoids, buckwheat has attracted worldwide attention (Kreft et al. 2019). It is also used as anti-diabetic, improves hypertension, reduces cholesterol, and protects cardiovascular and cerebrovascular health (Joshi et al. 2020). These edible and medical values make buckwheat foods and drinks, such as flour, noodles, bread, and tea, more popular all over the world. The most widely cultivated species used as food crops include common buckwheat (Fagopyrum esculentum Moench) and Tartary buckwheat (Fagopyrum tataricum Gaertn). Common buckwheat is self-incompatible and relies on cross-pollination and can result in reduced fertility and yield (Yasui et al. 2016). Conversely, Tartary buckwheat is self-pollinated,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_2

19

20

which makes the yield more stable than common buckwheat (Zhang et al. 2017) (Fig. 2.1). Beside yield, Tartary buckwheat has much higher vitamin B and rutin (a flavonoid) content and is

Y. He and M. Zhou

considered to be a major dietary source of beneficial rutin (Zhu 2016). These properties make Tartary buckwheat more favorable by health care enthusiasts.

Fig. 2.1 Phenotypes of common buckwheat (Fagopyrum esculentum Moench, a-b) and Tartary buckwheat (Fagopyrum tataricum Gaertn., c-d)

2

Buckwheat Genome and Genomics

2.2

Buckwheat Genome

Buckwheat is rich in germplasm resources, and the genus consists of 26 species (Joshi et al. 2020). It contains both diploid (2n = 2x = 16) and tetraploid (4n = 4x = 32) species with a base number of 8 chromosomes (Ohsako and Ohnishi 2000). The cultivated buckwheat species, common buckwheat and Tartary buckwheat are both diploids. Fagopyrum cymosum has both diploid and tetraploid plants, and some wild buckwheat species are tetraploid. By flow cytometry, the Fagopyrum genome size is estimated to vary from *530 Mb (F. tataricum ssp. potanini) to 1850 Mb (F. urophyllum), which are smaller compared to other higher plants (Nagano et al. 2000). Tartary buckwheat and common buckwheat were estimated to be *540 Mb and *1340 Mb, respectively. These diverse genome resources can be used to identify novel variation and candidate genes responsible for economic important traits and provide great potential for further genetic improvement of buckwheat.

2.3

Buckwheat Genomics

Due to the rapid development of high-throughput techniques, researchers can comprehensively analyze the whole genome with low costs. Because of self-incompatibility and crosspollination, it is more difficult to study the genome of common buckwheat. In order to reduce heterozygous genome regions, sib-mating descendant was constructed for genome study (Yasui et al. 2008). Using short reads obtained from next-generation sequencing, a draft assembly of common buckwheat genome was generated. This 1.2 Gb haploid genome was the first assembled genome in buckwheat (Yasui et al. 2016). Subsequently, to evaluate the genome-wide nucleotide diversity, a genotyping-by-sequencing (GBS) dataset with 46 worldwide cultivated common buckwheat landraces was constructed (Mizuno and Yasui 2019). The nucleotide diversity of common buckwheat was comparable

21

to other outcrossing plants. Phylogenetic analyzes showed cultivated common buckwheat can be divided into Asian and European groups (Fig. 2.2). The nucleotide diversity was lower in European than Asian group, and low differentiation was found between the two groups, suggesting genetic bottleneck exist during common buckwheat dispersion from Asian to European. This is the first research on common buckwheat genome-wide nucleotide diversity. Compared with the difficulty of common buckwheat, genome sequencing of Tartary buckwheat is relatively simple because its small genome size and self-pollination induced lowgenomic heterozygosity. By combining wholegenome shotgun sequencing of both Illumina short reads sequencing and single-molecule realtime (SMRT) long reads from Pacific Bioscience (PacBio) platform, and pooled sequences of a large DNA insert fosmid library with the GBS method, Hi-C sequencing data and BioNano genome maps, a high-quality chromosome-scale Tartary buckwheat genome sequence of cv. Pinku1 of 489.3 Mb was assembled (Zhang et al. 2017). A whole-genome duplication (WGD) event after buckwheat divergence from sugar beet was identified and might be responsible for buckwheat adaption to extremely harsh environment. For example, stress-related R2R3 MYB transcription factors were expanded in Tartary buckwheat and were expressed in almost all tissues, suggesting they may involve in Tartary buckwheat stress tolerance. Expansion events were found in Al-activated malate transporters and multidrug and toxic compound extrusion (MATE) citrate transporters. This is the first chromosomal-scale pseudomolecules constructed in Polygonaceae and will provide important resource for molecular breeding in Tartary buckwheat. To investigate Tartary buckwheat genomic variation, we constructed a comprehensive whole-genome resequencing database of 510 worldwide germplasms, including 483 landraces and 27 wild accessions (Zhang et al. 2021). Phylogenetic analysis showed Tartary buckwheat could be divided into three major monophyletic clades, i.e., Himalayan wild accessions (HW),

22

Y. He and M. Zhou

Southwestern landraces (SL), and Northern landraces (NL) (Fig. 2.3). 150 domestication sweeps were found in HW vs SL, and 156 domestication sweeps were found in HW vs NL. However, only 19 sweeps overlapped, suggesting two independent domestication events driven by human intervention occurred in the two genetically and geographically distinct groups. Using this detailed and extensive Tartary buckwheat genomic variation resource, the domestication history and why modern varieties possess diverse characteristics were described, and the genomic-assisted breeding of elite cultivars will benefit.

2.4

Fig. 2.2 Neighbor-joining (NJ) tree of 46 cultivars of common buckwheat (Fagopyrum esculentum Moench) based on 7.15 Mb including 255,517 SNPs. Red and blue represent short- and long-styled plants, respectively. Numbers above branches show bootstrap values based on 100 replicates (those less than 80% are not shown) and red asterisks indicate bootstrap values of 95% or over. The scale bar corresponds to 0.001 substitutions per nucleotide site. Taken from Mizuno and Yasui (2019) under a Creative Commons license

Key Traits and Gene Function

Using buckwheat genome data, numerous genes related to various agronomic traits have been identified. Increasing the crop yield is the first problem need to be solved in the breeding process. Using genome-wide association study (GWAS) of 1000-grain weight in 480 Tartary buckwheat accessions, an AP2 transcription factor FtAP2YT1 was found correlated with grain weight (Zhang et al. 2021). A mutation in the GCC cis-element was changed, suggesting this transcription factor may involve in regulating grain weight of Tartary buckwheat. Heteromorphic self-incompatibility is a huge problem in common buckwheat genetic breeding. This characteristic is attributed to its dimorphic flowers and intra-morph infertility and is determined by a single S locus. Through GBS reads obtained from 18 short-styled and 18 long-styled common buckwheat landraces, a short-stylespecific 5.4-Mb-long S-allele region was identified (Yasui et al. 2016). The genetic diversity in the S-allele region was lower than those in genome-wide SNPs, suggesting smaller population size of S-allele region compared to the whole genome (Mizuno and Yasui 2019). The high diversity of the S-allele region also indicates gene flow from wild to cultivated buckwheat, and cultivated buckwheat may have multiple origins.

2

Buckwheat Genome and Genomics

23

Fig. 2.3 Geographic distribution and population structure of resequenced accessions from Fagopyrum species. a Geographic distributions of Tartary buckwheat (Fagopyrum tataricum Gaertn) accessions. Each accession is represented by a dot on the map. The spread routes are shown with blue lines, which represent from northern China to other countries. HW, Himalayan Wild accession; SL, Southwest Landraces; NL, North Landraces. b Neighbor-joining tree of 517 germplasms, including 510 Tartary buckwheat accessions and 7 other Fagopyrum species. Branch colors indicate different groups: group HW (red),

group SL (green), group NL (blue), and outgroup (purple), matching the colors shown in a. c Principal component analysis of Tartary buckwheat accessions, showing the first two components. Colors correspond to the phylogenetic tree grouping. d The population structure analysis with different numbers of clusters (K = 3, 4, and 5) matches the phylogenetic tree. The x axis lists the different accessions that are consistent with those in the phylogenetic tree. e Phylogenetic tree of the outgroup species (Zhang et al. 2021). Taken from Zhang et al (2021) under a Creative Commons license

As buckwheat can grow in harsh environments, it contains many stress response genes to be discovered. For example, aluminum (Al) is toxic to other crops, while buckwheat can accumulate high Al in leaves without exhibiting any

toxicity symptoms. When buckwheat is exposed to Al, root tips rapidly secrete oxalate, which bind Al ions in the rhizosphere for detoxication. When Al is taken up in buckwheat, the exchange between Al-oxalate and Al-citrate causes Al to be

24

stored in vacuole for internal detoxification (Shen et al. 2004). By high-throughput RNA-seq analysis, target genes of a C2H2 zinc finger-type transcription factor involved in Al resistance STOP1/ART1 (SENSITIVE TO PROTON RHIZOTOXICITY1/AL RESISTANCE TRANSCRIPTION FACTOR1) were up-regulated after Al treatment in common buckwheat (Yokosho et al. 2014). Among them, FeSTAR (SENSITIVE TO AL RHIZOTOXICITY) is a gene encoding the nucleotide-binding domain and membrane domain of ATP-binding cassette (ABC) transporter (Che et al. 2018). FeALS (ALUMINUM SENSITIVE) is a half-size ABC transporter involved in the internal Al detoxification (Lei et al. 2017a). FeMATE is involved in Al-activated citrate secretion and citrate transport into Golgi system in buckwheat (Lei et al. 2017b). Al also induces the expression of FeIREG1, a tonoplast IRON REGULATED/ ferroportin, sequestering Al into root vacuoles, thus increasing Al tolerance in buckwheat (Yokosho et al. 2016). As for Tartary buckwheat, 436 homologous genes involved in Al resistance were identified in the Tartary buckwheat genome (Zhang et al. 2017). Among them, two Alactivated malate transporters, 10 toxic compound extrusion citrate transporters, 20 ATP-binding cassette transporters, and many transcription factors were up-regulated after Al treatment. In addition, a similar Al detoxification mechanism was also found in Tartary buckwheat, indicating a common Al tolerance mechanism across buckwheat species (Wang et al. 2015). As Tartary buckwheat is abundant in the flavonoid rutin, the biosynthesis of rutin has attracted particular interest. By comparing with Arabidopsis homologues, Tartary buckwheat genes encoding rutin synthesis enzymes were identified (Zhang et al. 2017). Among them, the full-length proteins of chalcone isomerase and flavanone-3’-hydroxylase were first identified in Tartary buckwheat. Ortholog number of cinnamate-4-hydroxylase and chalcone synthase were dramatically increased compared to Arabidopsis. Using GWAS in 480 Tartary buckwheat accessions, FtUFGT3, encoding an UDP-

Y. He and M. Zhou

glucosyltransferase, was found to be associated with kaempferol-3-rutinoside content (Zhang et al. 2021). FtUFGT3 overexpression in hairy roots elevated kaempferol-3-rutinoside content, and enzymatic assay showed FtUFGT3 could catalyzed kaempferol into kaempferol-3rutinoside, confirming FtUFGT3 was involved in flavonoid biosynthesis. Gene structure analysis revealed some key enzyme-encoding genes involved in flavonoid metabolism could bind MYB transcription factors, suggesting MYB transcription factors also play important roles in regulating flavonoid biosynthesis pathway (Yao et al. 2021). Examination of the Tartary buckwheat genome identified 149 R2R3-MYB transcription factors, and flavonoid synthesis-related MYB transcription factors were expanded and highly expressed in at least one tissue, suggesting they may play different roles in regulating flavonoid biosynthesis (Zhang et al. 2017). Among them, FtMYB1, FtMYB2 (Bai et al. 2014), FtMYB6 (Yao et al. 2020), FtMYB31 (Sun et al. 2019), and FtMYB116 (Zhang et al. 2018) could positively while FtMYB8 (Huang et al. 2019), FtMYB11 (Zhou et al. 2017), FtMYB13/14/15 (Zhang et al. 2018), FtMYB16 (Li et al. 2019), and FtMYB18 (Dong et al. 2020) negatively regulate rutin and associated anthocyanidins/proanthocyanidins synthesis through modulating phenylpropanoid synthesis-related genes. FeMYBF1 in common buckwheat was also found positively regulate flavonol synthesis, suggesting this mechanism is general in buckwheat (Matsui et al. 2018). Tartary buckwheat also has a comprehensive MYBdependent flavonoid biosynthesis regulatory network. For example, the importin protein ‘sensitive to ABA and drought 2’ (SAD2) and the jasmonate signaling cascade repressor JAZ protein could both interacted with FtMYB11 (Zhou et al. 2017) and FtMYB13 (Zhang et al. 2018). FtSAD2 also interacts with other transcription repressors FtMYB14/15, and these FtMYBs were degraded via the COI1-dependent jasmonate signaling 26S proteasome pathway (Zhang et al. 2018). The importin-a protein Ftimportin-a1 also interacted with and imported

2

Buckwheat Genome and Genomics

FtMYB16 into the nucleus (Li et al. 2019). GWAS analysis found a BTB-POZ/MATH (BPM) E3 ligase (FtBPM3) associated with kaempferol-3-O-rutinoside content could interact with and degrade FtMYB11. And FtMYB11 could in turn repress FtBPM3 expression, suggesting a ‘ping-pang’ mechanism was involved in regulating flavonoid synthesis (Ding et al. 2021). Besides their involvement in flavonoid biosynthesis, MYB transcription factors also responsible for stress response in buckwheat. According to the Tartary buckwheat transcriptome database, eight MYB transcription factors involved in abiotic stress are identified (Gao et al. 2016b). Among them, FtMYB9, FtMYB10, and FtMYB13 were found to be involved in ABA sensitivity and drought/salt tolerance (Gao et al. 2016a, 2017; Huang et al. 2018). According to the Tartary buckwheat genome, MYB transcription factors were expanded in Tartary buckwheat compared to Arabidopsis, suggesting these genes may responsible for Tartary buckwheat survive in harsh conditions (Zhang et al. 2017).

2.5

Conclusion and Perspective

Due to the high content of bioactive components, strong resistance to harsh environments, and abundant germplasm resources of buckwheat, the exploration and utilization of its gene resources are of great value. However, compared with major crops, molecular breeding of buckwheat is lagging behind. The sequencing of the buckwheat genomes to date have helped researchers resolve candidate genes controlling key traits. In future, by combining with recently developed molecular breeding techniques, the buckwheat genome may help breeders improve buckwheat varieties with better nutraceutical traits and yield. Acknowledgements This research was supported by National Key R&D Program of China [2019YFD1000700/ 2019YFD1000701] and National Natural Science Foundation of China [32161143005, 31871536 and 31871691].

25

References Bai YC, Li CL, Zhang JW et al (2014) Characterization of two tartary buckwheat R2R3-MYB transcription factors and their regulation of proanthocyanidin biosynthesis. Physiol Plant 152:431–440 Che J, Naoki Y, Kengo Y et al (2018) Two genes encoding a bacterial-type ATP-binding cassette transporter are implicated in aluminum tolerance in buckwheat. Plant Cell Physiol 59:2502–2511 Ding M, Zhang K, He Y et al (2021) FtBPM3 modulates the orchestration of FtMYB11-mediated flavonoids biosynthesis in Tartary buckwheat. Plant Biotechnol J pp 1–3 Dong Q, Zhao H, Huang Y et al (2020) FtMYB18 acts as a negative regulator of anthocyanin/proanthocyanidin biosynthesis in Tartary buckwheat. Plant Mol Biol 104:309–325 Gao F, Yao H, Zhao H et al (2016a) Tartary buckwheat FtMYB10 encodes an R2R3-MYB transcription factor that acts as a novel negative regulator of salt and drought response in transgenic Arabidopsis. Plant Physiol Biochem 109:387–396 Gao F, Zhao HX, Yao HP et al (2016b) Identification, isolation and expression analysis of eight stress-related R2R3-MYB genes in tartary buckwheat (Fagopyrum tataricum). Plant Cell Rep 35:1385–1396 Gao F, Zhou J, Deng RY et al (2017) Overexpression of a tartary buckwheat R2R3-MYB transcription factor gene, FtMYB9, enhances tolerance to drought and salt stresses in transgenic Arabidopsis. J Plant Physiol 214:81–90 Huang Y, Wu Q, Wang S et al (2019) FtMYB8 from Tartary buckwheat inhibits both anthocyanin/ Proanthocyanidin accumulation and marginal Trichome initiation. BMC Plant Biol 19:263 Huang Y, Zhao H, Gao F et al (2018) A R2R3-MYB transcription factor gene, FtMYB13, from Tartary buckwheat improves salt/drought tolerance in Arabidopsis. Plant Physiol Biochem 132:238–248 Joshi DC, Zhang K, Wang C et al (2020) Strategic enhancement of genetic gain for nutraceutical development in buckwheat: A genomics-driven perspective. Biotechnol Adv 39:107479 Kreft I, Zhou M, Golob A et al (2019) Breeding buckwheat for nutritional quality. Breed Sci 70:67–73 Kumari A, Chaudhary HK (2020) Nutraceutical crop buckwheat: a concealed wealth in the lap of Himalayas. Crit Rev Biotechnol 40:539–554 Lei GJ, Yokosho K, Yamaji N et al (2017a) Functional characterization of two half-size ABC transporter genes in aluminium-accumulating buckwheat. New Phytol 215:1080–1089 Lei GJ, Yokosho K, Yamaji N et al (2017b) Two MATE transporters with different subcellular localization are involved in Al tolerance in buckwheat. Plant Cell Physiol 58:2179–2189

26 Li J, Zhang K, Meng Y et al (2019) FtMYB16 interacts with Ftimportin-a1 to regulate rutin biosynthesis in tartary buckwheat. Plant Biotechnol J 17:1479–1481 Matsui K, Oshima Y, Mitsuda N et al (2018) Buckwheat R2R3 MYB transcription factor FeMYBF1 regulates flavonol biosynthesis. Plant Sci 274:466–475 Mizuno N, Yasui Y (2019) Gene flow signature in the Sallele region of cultivated buckwheat. BMC Plant Biol 19:125 Nagano M, All J, Campbell C et al (2000) Genome size analysis of the genus Fagopyrum. Fagopyrum 17:35– 39 Ohsako T, Ohnishi O (2000) Intra- and interspecific phylogeny of wild Fagopyrum (Polygonaceae) species based on nucleotide sequences of noncoding regions in chloroplast DNA. Am J Bot 87:573–582 Shen R, Iwashita T, Ma JF (2004) Form of Al changes with Al concentration in leaves of buckwheat. J Exp Bot 55:131–136 Sun Z, Linghu B, Hou S et al (2019) Tartary buckwheat FtMYB31 gene encoding an R2R3-MYB transcription factor enhances flavonoid accumulation in tobacco. J Plant Growth Regul 39:564–574 Tang Y, Ding MQ, Tang YX et al (2016) Germplasm resources of buckwheat in China. In: Zhou M (ed) Molecular breeding and nutritional aspects of buckwheat. Elsevier, London, pp 13–20 Wang H, Chen RF, Iwashita T et al (2015) Physiological characterization of aluminum tolerance and accumulation in tartary and wild buckwheat. New Phytol 205:273–279 Yao P, Huang Y, Dong Q et al (2020) FtMYB6, a lightinduced SG7 R2R3-MYB transcription factor, promotes flavonol biosynthesis in Tartary buckwheat (Fagopyrum tataricum). J Agric Food Chem 68:13685–13696 Yao Y, Sun L, Wu W et al (2021) Genome-wide investigation of major enzyme-encoding genes in the

Y. He and M. Zhou flavonoid metabolic pathway in Tartary buckwheat (Fagopyrum tataricum). J Mol Evol 89:269–286 Yasui Y, Hirakawa H, Ueno M et al (2016) Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes. DNA Res 23:215–224 Yasui Y, Mori M, Matsumoto D et al (2008) Construction of a BAC library for buckwheat genome research—an application to positional cloning of agriculturally valuable traits. Genes Genetic Syst 83:393–401 Yokosho K, Yamaji N, Ma JF (2014) Global transcriptome analysis of Al-induced genes in an Alaccumulating species, common buckwheat (Fagopyrum esculentum Moench). Plant Cell Physiol 55:2077–2091 Yokosho K, Yamaji N, Mitani-Ueno N et al (2016) An aluminum-inducible IREG gene is required for internal detoxification of aluminum in buckwheat. Plant Cell Physiol 57:1169–1178 Zhang K, He M, Fan Y et al (2021) Resequencing of global Tartary buckwheat accessions reveals multiple domestication events and key loci associated with agronomic traits. Genome Biol 22:23 Zhang K, Logacheva MD, Meng Y et al (2018) Jasmonate-responsive MYB factors spatially repress rutin biosynthesis in Fagopyrum tataricum. J Exp Bot 69:1955–1966 Zhang L, Li X, Ma B et al (2017) The Tartary buckwheat genome provides insights into rutin biosynthesis and abiotic stress tolerance. Mol Plant 10:1224–1237 Zhou M, Sun Z, Ding M et al (2017) FtSAD2 and FtJAZ1 regulate activity of the FtMYB11 transcription repressor of the phenylpropanoid pathway in Fagopyrum tataricum. New Phytol 216:814–828 Zhu F (2016) Chemical composition and health effects of Tartary buckwheat. Food Chem 203:231–245

Tef [Eragrostis tef (Zucc.) Trotter]

3

Gina Cannarozzi and Zerihun Tadele

Abstract

Tef [Eragrostis tef (Zucc.) Trotter] is the major food crop of Ethiopia in terms of production, consumption, and cash crop value and is also grown outside the Horn of Africa as a forage crop. In Ethiopia, tef production accounts for more than twenty-five percent of the gross grain production of all cereals cultivated in the country. Tef provides quality food and grows under marginal conditions, conferring it great economic and agricultural significance. Unfortunately, the productivity of tef is relatively low with national average yield of about 1.9 t/ha in 2020. This low productivity has roots in both technical and socio-economic constraints. In the last decade, however, tef has benefitted from scientific improvement, starting with genomic sequencing in 2014. This paper provides an overview of tef, its significance in Ethiopia, some of its major production constraints, and some of the major technical achievements made to date, including genome sequencing and the improvement techniques that followed. In

Contribution to ‘Underutilised Crop Genomes’ edited by M. Chapman. G. Cannarozzi  Z. Tadele (&) Institute of Plant Sciences, University of Bern, Altenbergrain 21, 3013 Bern, Switzerland e-mail: [email protected]

addition, we discuss some of the agronomic traits that have been targeted for improvement such as waterlogging, drought, plant architecture, and nutrition.

3.1

Crop Background

Tef [Eragrostis tef (Zucc.) Trotter] is the major food crop in Ethiopia where it is annually cultivated on more than three million hectares of land, equivalent to 30% of the total area allocated to cereal crops (CSA 2020) (Fig. 3.1). Compared to other cereals, tef is more tolerant to extreme environmental conditions, especially to both the scarcity and excess of soil moisture. Like rice, tef can grow on poorly drained soils, which most cereals cannot withstand. In addition, tef is less affected by post-harvest losses as tef seeds can be easily stored under local storage conditions without losing viability (Ketema 1997). The grain of tef is nutritious since it is rich source of protein and iron. A study showed that the bioavailable iron content was significantly higher in tef bread than in wheat bread (Alaunyte et al. 2012). Tef is considered to be a healthy food due to the absence of gluten (Spaenij-Dekking et al. 2005), a substance to which a number of people are allergic. Apart from human consumption, the

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_3

27

28

forage and straw from tef provide livestock with high nutrition and palatability (Bediye and Fekadu 2001; Yami 2013). In general, tef provides quality food and feed and grows under marginal conditions, many of which are poorly suited to other cereals. However, tef is considered to be an underutilized or an orphan crop since it is only of regional importance and has, until recently, not been the focus of crop improvement (Tadele 2018; Chanyalew et al. 2019). Despite its adaptation to extreme environmental conditions, the productivity of tef is extremely low; the national average yield of tef is only 1.88 t/ha (CSA 2020). The major bottlenecks in tef production are broadly grouped under technical and socioeconomic constraints

Fig. 3.1 Tef cultivation and production in Ethiopia. a Annual area and production of tef from 1994 to 2020; b Proportion of total area allocated to major cereal crops in the year 2020. Source CSA reports

G. Cannarozzi and Z. Tadele

(Chanyalew et al. 2019). The technical problems include: (i) susceptibility of the plant to lodging, the permanent displacement of the stem from the upright position. Tef possesses tall and weak stems that easily succumb to lodging caused by wind or rain. In addition, lodging hinders the use of high input husbandry since the application of increased amounts of nitrogen fertilizer to boost the yield results in severe lodging. When lodging occurs in tef plants, both the quantity and the quality of the grain and the straw are severely reduced. (ii) little spread of improved varieties and inputs to farmers. (iii) the labor-intensive nature of tef’s husbandry starting from land preparation to harvesting and threshing. (iv) the tef plant’s lack of ability to compete with weeds. Socioeconomic constraints related to tef

3

Tef [Eragrostis tef (Zucc.) Trotter]

cultivation and production include: (i) lack of attention to the research and development of tef both at the global and local level; (ii) limited availability of adequate quality and quantity of seeds of improved varieties; (iii) weak extension systems and research-extension linkages for dissemination of improved tef technologies, and (iv) lack of a credit system for supporting smallholder farmers.

3.1.1 Botanical Description Tef belongs to the Poaceae or grass family as do all economically important cereals. It is closely related to finger millet (Eleusine coracana Gaertn.) as both are in the sub-family Chloridoideae (Fig. 3.2). The genus Eragrostis comprises about 350 species from which only tef is cultivated for human consumption. Unlike wheat, barley, and rice, which are all C3 plants, tef (along with maize and sorghum) is a C4 plant which efficiently utilizes carbon dioxide during photosynthesis (Kebede et al. 1989). The flower biology of tef which includes the panicle, spikelet, and flower as well as different sizes of tef seeds are shown in Fig. 3.3. Due to

Fig. 3.2 Phylogenetic relationship between tef and other cereals inferred using partial sequences of the WAXY gene and PhyML with the default model of HKY85 + G. The scale bar reflects evolutionary distance in units of substitution per nucleotide side, while branch support was

29

extremely small flower size, both the emasculation and crossing are made under the binoclear microscope. The seeds of tef are also extremely tiny (only 0.3 g/1000 kernel).

3.1.2 Geographical Distribution Ethiopia is the center of both the origin and diversity of tef (Vavilov 1951). The crop is adapted to a wide range of environments and is presently cultivated under diverse climatic and soil conditions in Ethiopia (Fig. 3.4). However, it performs best at an altitude ranging from 1700 to 2500 m above sea level, annual rainfall ranging from 750 to 850 mm, and air temperature ranging from 10 to 27 °C. In general, based on agroclimatic criteria, tef growing environments are grouped into three broad categories, namely (i) sub-tropical temperate, (ii) mid-altitude, and (iii) highland.

3.1.3 Accessible in Seed Banks As in any crop improvement program, tef breeding also relies mainly upon the germplasm

inferred using the Shimodaira–Hasegawa-like (SH) aLRT provided by PhyML. Tef is closely related to finger millet. Source Cannarozzi et al. (2014) under a Creative Commons license

30 Fig. 3.3 Inflorescence and flower of tef. a panicle; b spike of tef showing individual spikelet; c structure of tef flower indicating three stamens and a pair of hairy stigmas, and d different sizes of tef seeds. Bar for b and c: 1 mm. Photo R. Schneider, University of Bern

Fig. 3.4 Proportion of land under tef cultivation in Ethiopia. Source (Tadesse et al. 2006) under an Open Access Agreement

G. Cannarozzi and Z. Tadele

3

Tef [Eragrostis tef (Zucc.) Trotter]

resources existing in the genetic stock. Currently, close to 6000 tef accessions collected from different growing areas are available at the facility in the Ethiopian Biodiversity Institute (EBI) (Tesema 2013). These accessions have been investigated for different traits by national and international researchers.

3.1.4 Why It Is Underutilized? Underutilized crops also known as orphan, abandoned, disadvantaged, or lost crops are crops which are little studied by the global research community (Tadele 2019). However, they play a vital role in the food and nutrition security of populations in the developing world. Most of these crops possess some desirable agronomical, nutritional, and health-related properties. The compatibility of these crops to the agroecology and socioeconomic conditions in the developing world as well as their potential for mitigating problems associated with the changing climate are a few examples of the opportunities that can be gained from underutilized or orphan crops. Underutilized crops are popular for their resilience to extreme environmental conditions. In addition, they provide nutrient-rich biodiversity and healthier diets to resource-poor consumers (Hunter et al. 2019). Due to multiple dietary benefits and their tolerance to extreme environmental conditions, some orphan crops are considered to be crops for the future (Gregory et al. 2019). Large amount of diversity has been maintained by farmers for diverse types of underutilized crops. However, only, limited numbers of them have so far been properly characterized. Tef is a typical example of underutilized or orphan crops. Similar, to most underutilized crops, tef performs better than other cereal crops under both biotic and abiotic stresses. Tef also provides excellent food and feed sources both for human and livestock in terms of nutrition and palatability. However, similar to other underutilized crops, tef suffers from a lack of advanced research.

31

3.1.5 Benefits of Tef Tef is the major food security crop in the Horn of Africa, particularly in Ethiopia, where it is a staple food for about 70 million people and annually cultivated on over three million hectares of land (CSA 2020). The extensive cultivation of tef as a main food security crop is due to its high tolerance to diverse environmental stresses and the preference of consumers to a pan-like bread (called ‘Injera’) made from tef. Tef grain is becoming popular globally as a healthy food due to the absence of gluten in its grain, (Spaenij-Dekking et al. 2005) which makes it an alternative food for people suffering from celiac disease. Since tef matures in only three months from sowing to harvesting (while all major cereal crops mature in a minimum of four months), it requires less precipitation during the entire crop cycle, a major benefit for farmers on the semi-arid land on which tef is often grown.

3.2

Genome Sequencing

Tef is an allotetraploid (2C = 2n = 4x = 40) for which the base number of chromosomes (x) is 10 and a gamete has 2C = 2n = 20 chromosomes. An allotetraploid genome is the result of a wholegenome duplication that occurred when two diploid progenitors hybridized. Tef’s two genomes are 93% similar in the coding region (VanBuren et al. 2020) and are referred to as the A and the B genome, with A and A’ coming from one diploid progenitor and B and B’ coming from the other. At the present time, the diploid progenitors remain unknown although several wild Eragrostis species have been suggested to be either the progenitors or closely related to them (Costanza et al. 1979; Jones et al. 1978; Bekele and Lester 1981; Ingram and Doyle 2003). Genomic sequencing of Eragrostis tef has been initiated three times. The first initiative from the Tef Improvement Project (Cannarozzi et al.

32

2018a) produced a draft genome of the Tsedey variety published in 2014 (Cannarozzi et al. 2014). More recently, a chromosome-scale genome of Dabbi was completed in 2019 by the Tef Sequencing Consortium (VanBuren et al. 2020), while a third initiative spearheaded by Corteva Agriscience produced a chromosome-scale genome of Tsedey in 2021 which has not yet been released (private communication).

3.2.1 First Sequencing: Tef Improvement Project 2010 Tef was the first indigenous Ethiopian crop to be sequenced, marking an epoch in the history of genomic studies on Eragrostis species. This first effort was funded by the Syngenta Foundation for Sustainable Development and the University of Bern. Tsedey (DZ-Cr-37), an early maturing and widely adapted improved variety of tef, was selected for whole-genome sequencing by the Tef Improvement Project around 2010 (Cannarozzi et al. 2014). In addition to its long lifetime as an improved variety, Tsedey was one of the few varieties at the time for which there was a mutagenized population, which means that the genome could be used in reverse-genetic screens such as Targeting Induced Local Lesion IN Genomes (TILLING) (Esfeld et al. 2013; Tadele et al. 2010). The genome sequence of Tsedey was obtained using two sequencing platforms—Illumina HiSeq2000 at 47 coverage and 454-FLX pyrosequencing at 7 coverage. Assembly and scaffolding resulted in a genome size of 688 Mbp, roughly 95% of the 1C genome size estimated by flow cytometry. Prediction of coding regions in the genome resulted in 42,052 genes which were annotated and for which orthologous genes in sorghum and rice were identified. The draft genome has been enormously useful since 2014 for single-gene use such as primer design, target identification, and TILLING. The genome was highly fragmented (number of scaffolds greater than 1000 bp was 14,000 and the scaffold N50 was 85,000),

G. Cannarozzi and Z. Tadele

however, rendering it impossible to separate the genes and scaffolds into the A and B genomes.

3.2.2 Chromosome-Level Sequencing: Tef Sequencing Consortium The cultivar Dabbi was chosen for sequencing, and a chromosome-scale assembly was released in 2019 (VanBuren et al. 2020). It is not an improved variety, but a landrace collected from a tef-growing region in Ethiopia. The assembly was made using 85 coverage of filtered PacBio reads in combination with long-range highthroughput chromatin capture (Hi-C), resulting in 687 contigs collectively spanning 96% of the assembly (555 Mb). The assembly was anchored and oriented across 20 pseudomolecules, corresponding to the expected 20 chromosomes of tef (Fig. 3.5). Centromeres are chromosomal regions responsible for mediating chromosome segregation in eukaryotes. DNA sequences around centromeres are characterized by repeated motifs that evolve rapidly between even closely related species. Two distinct centromeric array sequences were identified in tef, allowing identification of the tef A and B homeologous pseudomolecules. This is a tremendous advance for tef genomics because then subgenome-specific structure and evolution can be studied. The insertion history of long-terminal-repeat (LTR) transposable elements has been used to date genomic events. An LTR retrotransposon has identical sequences at either end of the transposable element, which result from the insertion process (SanMiguel et al. 1998). These two repeat regions are identical upon insertion and decay after insertion. If the decay process is assumed to be at a steady rate, the number of substitutions per synonymous site (Ks) can be counted and a rate constant used to estimate the divergence time. Six repeat families were found in only one subgenome, with five families in the A genome and one family in the B genome. The most recent subgenome-specific transposable element (TE) insertion occurred 1.1 MYA;

3

Tef [Eragrostis tef (Zucc.) Trotter]

33

Fig. 3.5 Dot plot of synteny within the tef Dabbi cultivar with the diagonal (self-match) removed. The high level of co-linearity between the A and the B genomes can be seen as well as very little structural rearrangement

meaning that at this time, the two subgenomes were still evolving independently, placing an upper bound on the time on the polyploidy event. Using this method, they estimate that the tef polyploidy event occurred *1.1 MYA. In addition, Van Buren et al. calculated the average divergence between the coding regions of the A and B genomes and estimated the divergence time between the two diploid progenitors to be *5.0 MYA, a value similar to that found from the 2014 genome paper (*4 MYA)(VanBuren et al. 2020). Surprisingly, the study revealed no evidence of major inversions or structural rearrangements between the subgenomes, unlike other similarly aged polyploids such as cotton (Hu et al. 2019), strawberry (Edger et al. 2019), and banana (Wang et al. 2019). The tef subgenomes are unusually stable, perhaps because of the low observed rate of homeologous exchange, and there may be an underlying mechanism to reduce homeologous exchanges in this species. The lack of rearrangements can be seen the dot plot constructed from the CoGe Web site (Fig. 3.5). The

CoGe SynMap function identifies homologous gene pairs based on sequence similarity and collinearity (Lyons et al., 2008). Each blue dot represents a homologous gene pair, and selfmatches are excluded. The extreme co-linearity of the homeologous pairs (1A and 1B, 2A and 2B, etc.) is evident. Biased fractionation is when, after a wholegenome duplication event (such as the allopolyploid event in tef) occurs, one homeologous chromosome is targeted for more gene loss than the partner chromosome. Although no biased fractionation was observed in the tef genome, the B genome was suggested to be weakly dominant over the A genome. Additionally, 5 of 6 subgenome-specific TE families were found only in the A subgenome, suggesting that they either insert more efficiently in A or are deleted more efficiently in the B subgenome. Recent LTR activity contributes to the A subgenome being 13% larger than the B subgenome. High-quality chromosome-length scaffolds are important to connect genotype with phenotype on a genome-wide basis and for use in

34

G. Cannarozzi and Z. Tadele

association studies. Identification of the A and B genomes will help interpretation of such association studies, as well as restriction site-associated DNA sequencing (RAD-Seq) and genotyping by sequencing (GBS) studies. In addition, this knowledge will aid subgenome-specific expression and evolution studies.

3.2.3 Repetitive Element Content Repetitive portions of three tef genomes were reported (Gebre et al. 2016; Cannarozzi et al. 2014; VanBuren et al. 2020) (Table 3.1). Gebre et al. identified and characterized the repetitive portion of the tef cultivar Enatite using a de novo repeat identification strategy (Gebre et al. 2016). They identified 1389 medium or highly repetitive sequences, comprising 27% of the genome and compared them to rice and maize TEs. This agrees well with the estimate of 25.6% repetitive elements in Dabbi reported in the study of Van Buren and colleagues (VanBuren et al. 2020). Detailed information about the composition of the transposable elements was not available from the draft Tsedey genome as the assembly procedure used for that genome is known to collapse repetitive sequences.

3.2.4 Resequencing Although to date, no full-genome resequencing of tef genotypes has been done; sub-sampling of

the genome using RAD-Seq (Girma et al. 2020), GBS (Girma et al. 2018), and plastome sequencing (Teshome et al. 2020) has been done to determine the relationships and information content of multiple genotypes. In a recent study (Girma et al., 2018), a germplasm panel with 40 Eragrostis species and 42 tef genotypes was investigated using GBS to assess phylogenetic relationships, genetic diversity, and population structure. Nucleotide diversity is defined as the number of differences per nucleotide site between any two randomly chosen sequences from a population. The wild Eragrostis species were found to have a higher nucleotide diversity (p = 0.021) than that of the tef genotypes (p = 0.004). The tef genotypes had very little population differentiation between subpopulations, despite coming from different agroecologies. For this reason, the wild Eragrostis species could potentially find use in supplementing the tef gene pool. This GBS study also did phylogenetic analysis of the waxy gene, which showed that Eragrostis aethiopica clustered with the A genome of Eragrostis tef, suggesting a potential progenitor of the A genome (Girma et al. 2018). The plastomes of a panel of 32 tef accessions sampled from diverse regions in Ethiopia were sequenced and investigated for diversity and phylogenetic relationships (Teshome et al. 2020). The plastomes in the tef accessions ranged from 134,349 to 134,437 bp and contained 34 polymorphic sites, mostly in non-coding regions. The phylogenetic analysis showed robust support for

Table 3.1 Repetitive sequence in the genome of three tef cultivars (Tsedey, Dabbi, and Enatite) Eragrostis tef cultivar Number of transcripts

Tsedey

Dabbi

Enatite

42,052

68,255

NA

Estimated genome size (Mbp)

672

Total abundance of TEs (% of genome)

14.2

26.5

27

NA

Gypsy abundance in genome (% of genome)

NA

12.4

11.4

Copia abundance in genome (% of genome)

NA

2

12.17

References

(Cannarozzi et al. 2014)

(VanBuren et al. 2020)

(Gebre et al. 2016)

3

Tef [Eragrostis tef (Zucc.) Trotter]

a distinct clade containing the southern accessions (Teshome et al. 2020). Similarly, the complete chloroplast genomes with lengths varying from 130,773 bp to 135,322 bp of 13 Eragrostideae species including tef were sequenced and used to resolve phylogenetic relationships within this genus (Liu et al. 2021).

3.2.5 Candidate Genes for Agronomic Traits The discovery of genes that affect the phenotype of traits of interest is a major use of the genome. Traits being studied in tef include resistance to abiotic stresses such as drought (Kamies et al. 2017; Martinelli et al. 2018) and waterlogging (Cannarozzi et al. 2018b), plant architecture (Blosch et al. 2020; Jost et al. 2015), and nutritional content (Wang 2018). These are either found using forward genetics, determining the genetic basis for a given phenotype or reverse genetics, starting with a candidate gene known to affect the phenotype of interest, and searching for an individual in a mutagenized population that has a knock-out or knock-down mutation in this gene. Candidate genes can be chosen either by using genes known to affect the phenotype of interest in related organisms or through such as RNA-Seq. TILLING has been used to identify mutations responsible for a semi-dwarf genotype as well as to identify genes responsible for differing starch content (Wang 2018). Drought As drought is one of the primary problems facing tef farmers in the Horn of Africa and tef is moderately drought resistant, several studies have been done to determine the physiological and metabolic response of tef to drought (Kamies et al. 2017; Martinelli et al. 2018). The motivation for such studies is as follows: (i) discovery of key genes acting in the drought response provides targets for reverse-genetic techniques such as TILLING and (ii) discovery of the underlying physiological mechanism of stress tolerance, which can be used to improve tolerance of these stresses in tef or to transfer the tolerance mechanism to other species. The

35

drought response of the transcriptome, proteome, and miRNA has all been investigated using nextgeneration sequencing (Kamies et al. 2017; Martinelli et al. 2018). In these differential expression studies, the protein/transcript/mRNA expression of tef subjected to controlled drought conditions is compared to the same expression in control (normally watered) tef. The proteome of brown-seeded tef under drought conditions was studied using quantitative iTRAQ mass spectrometry in combination with classical characterization of the physiological response to drought such as electrolyte leakage, structural analysis using transmission electron micrographs and chlorophyll fluorescence measurements (Kamies et al. 2017). The characterization showed that plants tolerated dehydration to 65% RWC without losing viability, while leaf tissue showed adverse effects (increased electrolyte leakage and structural damage) after water loss to 50% RWC. The study of differential expression between hydrated plants and those subjected to water deficit detected 5727 proteins of which 211 showed significant differential expression between hydrated (80% relative water content) vs dehydrated (50% relative water content) tissue. Proteins associated with stress response, signaling, transport cellular homeostasis, and pentose metabolic processes were upregulated, while proteins involved in reactive oxygen species (ROS)—producing processes such as photosynthesis, associated light-harvesting reactions and manganese transport and homeostasis, the synthesis of sugars, and cell-wall metabolism were downregulated. The expression of the proteins fructose-bisphosphate aldolase (FBA), monodehydroascorbate reductase (MDHAR), and peroxidase (POX) was confirmed with Western blot analyses. Another study compared differential expression of mRNA in two tissues (root and leaf) of two tef genotypes subjected to either a two-week drought or normal watering (Martinelli et al. 2018). The two tef genotypes used for the study were a drought-susceptible natural accession called Alba, and a drought-resistant-improved variety called Tsedey. Using both susceptible and

36

tolerant genotypes allows for the identification of miRNAs that might be involved in the different responses. When subjected to drought, 13 and 35 miRNAs were significantly differentially expressed in Alba and Tsedey roots, respectively. The miRNAs regulated in both genotypes may be involved in a general response to drought and include members of the miR169-5p, miR167-3p, and miR167-5p families. The miRNAs that are responsible for the genotype-specific response such as those downregulated only in the tolerant tef genotype Tsedey are of special interest. In the shoot tissue, nine miRNAs were differentially expressed in both genotypes, and all showed similar trends of expression. The putative targets of all miRNAs were predicted, and 5 of 6 pairs of miRNA/target were verified with qRT-PCR. A fascinating recent study compares the genomes of Eragrostis nindensis, a resurrection plant that can survive almost complete desiccation, to Eragrostis tef as well as another desiccationtolerant member of the Chloridoideae, Oropetium thomaeum (Pardo et al. 2020). In this study, tef is desiccation susceptible, and two desiccationtolerant plants were included to identify convergent patterns in the evolution of desiccation tolerance. Parallel dehydration experiments using E. nindensis and E. tef were conducted to identify the signatures that distinguish water stress and those of the desiccation response. Expression data were used to test if desiccation tolerance in grasses arose through co-option of seed dehydration pathways. The study identified similar signatures of water-deficit stress and desiccation responses including chromatin architecture, methylation, gene duplications, and expression dynamics. Although they had previously hypothesized that transcriptional rewiring of seed desiccation pathways is what confers vegetative desiccation tolerance, they found that only a few expression changes in seed-related orthologs are unique to desiccation-tolerant grasses. They suggest a model in which under water deficit, seeds and leaves share common sets of co-regulated, seed-related genes with only a few crucial genes being unique to the desiccation-tolerance trait.

G. Cannarozzi and Z. Tadele

Waterlogging RNA-Seq and differential expression analysis were used to identify transcripts implicated in the response of Tsedey to waterlogging (Cannarozzi et al. 2018b) with the purpose of finding cultivars with improved waterlogging tolerance for inclusion in breeding programs. Three tef genotypes, Alba (a waterlogging susceptible landrace), Tsedey (a waterlogging-tolerant improved variety), and Quncho (a high yielding improved variety), were subjected to waterlogging conditions. Transcript expression was measured for Tsedey. Waterlogging-affected genes are involved in the following areas: carbohydrate metabolism, cell growth, response to reactive oxygen species, transport, signaling, and stress responses. In addition, their growth and physiological parameters such as root and shoot growth, dry weight, stomatal conductance, and chlorophyll and carotenoid contents were quantified over 22 days after waterlogging, and microscopy was used to observe accompanying morphological changes. Surprisingly, Tsedey plants thrived under waterlogging conditions, growing higher, and having more root biomass than normally watered Tsedey plants. In contrast, Quncho and Alba genotypes did not grow as well under these conditions. Microscopy of cross-sections revealed that Tsedey formed more aerenchyma (air-filled spaces used in gas transport from the shoots to the roots) than Alba. Starch An investigation of the starch composition and metabolism in tef was initiated at the ETH Zurich with the aim of screening for mutants with desirable starch properties that can be used in breeding programs (Wang 2018). First, genes involved in starch synthesis were identified by comparative analysis using closely related diploid species, rice and sorghum. The genes were confirmed to be in tef using RNA-Seq. Then, a new TILLING approach, TILLING-bysequencing, was developed to identify tef plants with mutations in starch genes. A dataset of mutations in known starch genes in the tefmutagenized population was made available.

3

Tef [Eragrostis tef (Zucc.) Trotter]

The starch genes identified in tef included: GBSSI, GBSSII, SBEI, SBEIIb, SSI, SSIIa, SSIIb, SSIIc, SSIVa, SSIVb, PTST, ESVI, LESV, and RSRI. GBSSI was used for TILLING, once with the copy from the A genome EtGBSSIa and once with the copy from the B genome, EtGBSSIb. The screening of around 2112 mutant lines for mutants of the A copy identified five mutants. In the case of EtGBSSIb, around, 2880 mutant lines were screened, and three mutations were detected. In summary, after screening around 4000 lines, eight mutant candidates were found. Unfortunately, none produced a good candidate—only one had an amino acid change from serine to phenylalanine, unlikely to cause loss of function. To boost the mutation detection, highthroughput sequencing technologies were applied to the same DNA populations. The strategy was to use PCR amplification of each starch-related gene from super-pooled DNA and then to use nextgeneration sequencing of the mixed amplicons. Single nucleotide polymorphisms (SNPs) were identified in the genes of interest. Based on the pooling strategy, the number of individuals which may have a SNP was reduced to 96 plants. Highresolution melt and Sanger sequencing were then used to identify individual mutants. Ultimately, the complexity of the experiment combined with the sequencing noise, PCR amplification errors, and difference in allele frequencies caused by the fact that the 96 pooled individuals might not have exactly equimolar concentrations, made for the identification of many false-positive SNPs. The optimal experimental conditions are still under investigation.

3.3

Conclusions

Tef entered the genomic era almost a decade ago, and the number of molecular resources for this orphan crop is growing. New projects are promising to deliver sequencing of many genotypes and to apply genome editing techniques such as CRISPR-Cas to alter genes-related traits affecting stress tolerance, plant architecture, and nutrition (Numan et al. 2021). Tef is already

37

resistant to stress and plays an important role in the socioeconomics of the Horn of Africa. Breeding programs supported by molecular technologies provide hope for a future with an uncertain climate.

References Alaunyte I, Stojceska V, Plunkett A, Ainsworth P, Derbyshire E (2012) Improving the quality of nutrient-rich Teff (Eragrostis tef) breads by combination of enzymes in straight dough and sourdough breadmaking. J Cereal Sci 55(1):22–30. https://doi. org/10.1016/j.jcs.2011.09.005 Bediye S, Fekadu D (2001) Potential of tef straw as livestock feed. In: Tefera H, Belay G, Sorrells ME (eds) Narrowing the Rift: Tef Research and Development. Ethiopian Agricultural Research Organization, Addis Ababa; Ethiopia, pp 245–254 Bekele E, Lester RN (1981) Biochemical assessment of the relationships of Eragrostis-Tef (Zucc) Trotter with Some Wild Eragrostis Species (Gramineae). Ann BotLondon 48(5):717–725 Blosch R, Plaza-Wuthrich S, de Reuille PB, Weichert A, Routier-Kierzkowska AL, Cannarozzi G, Robinson S, Tadele Z (2020) Panicle angle is an important factor in tef lodging tolerance. Front Plant Sci 11:61. https:// doi.org/10.3389/fpls.2020.00061 Cannarozzi G, Chanyalew S, Assefa K, Bekele A, Blosch R, Weichert A, Klauser D, Plaza-Wuthrich S, Esfeld K, Jost M, Rindisbacher A, Jifar H, JohnsonChadwick V, Abate E, Wang WY, Kamies R, Husein N, Kebede W, Tolosa K, Genet Y, Gebremeskel K, Ferede B, Mekbib F, Martinelli F, Pedersen HC, Rafudeen S, Hussein S, Tamiru M, Nakayama N, Robinson M, Barker I, Zeeman S, Tadele Z (2018a) Technology generation to dissemination: lessons learned from the tef improvement project. Euphytica 214(2). https://doi.org/10.1007/ s10681-018-2115-5 Cannarozzi G, Plaza-Wuthrich S, Esfeld K, Larti S, Wilson YS, Girma D, de Castro E, Chanyalew S, Blosch R, Farinelli L, Lyons E, Schneider M, Falquet L, Kuhlemeier C, Assefa K, Tadele Z (2014) Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef). BMC Genomics 15:581. https://doi.org/10.1186/ 1471-2164-15-581 Cannarozzi G, Weichert A, Schnell M, Ruiz C, Bossard S, Blösch R, Plaza-Wüthrich S, Chanyalew S, Assefa K, Tadele Z (2018b) Waterlogging affects plant morphology and the expression of key genes in tef (Eragrostis tef). Plant Direct 2018:1–22 Chanyalew S, Ferede S, Damte T, Fikre T, Genet Y, Kebede W, Tolossa K, Tadele Z, Assefa K (2019) Significance and prospects of an orphan crop tef.

38 Planta 250(3):753–767. https://doi.org/10.1007/s00425019-03209-z Costanza SH, Dewet JMJ, Harlan JR (1979) Literaturereview and numerical taxonomy of Eragrostis-Tef (Tef). Econ Bot 33(4):413–424 CSA (2020) Agricultural sample survey for 2020/21. Volume I: Report on area and production of major crops. Central Statistical Agency (CSA), Statistical Bulletin 590. Addis Ababa, Ethiopia Edger PP, Poorten T, VanBuren R, Hardigan MA, Colle M, McKain MR, Smith RD, Teresi S, Nelson ADL, Wai CM, Alger EI, Bird KA, Yocca AE, Pumplin N, Ou SJ, Ben-Zvi G, Brode A, Baruch K, Swale T, Shiue L, Acharya CB, Cole GS, Mower JP, Childs KL, Jiang N, Lyons E, Freeling M, Puzey JR, Knapp SJ (2019) Origin and evolution of the octoploid strawberry genome. Nat Genet 51(3):541–547. https:// doi.org/10.1038/s41588-019-0356-4 Esfeld K, Uauy C, Tadele Z (2013) Application of TILLING for orphan crop improvement. In: Jain SM, Gupta SD (eds) Biotechnology of neglected and underutilized crops. Springer, pp 83–113 Gebre YG, Bertolini E, Pe ME, Zuccolo A (2016) Identification and characterization of abundant repetitive sequences in Eragrostis tef cv Enatite Genome. BMC Plant Biol 16:39. https://doi.org/10.1186/ s12870-016-0725-4 Girma D, Cannarozzi G, Weichert A, Tadele Z (2018) Genotyping by sequencing reasserts the close relationship between tef and its putative wild Eragrostis progenitors. Diversity 10(2):17 Girma D, Cannarozzi G, Weichert A, Tadele Z (2020) Restriction site associated DNA sequencing based single nucleotide polymorphism discovery in selected tef (Eragrostis tef) and wild Eragrostis species. Ethiop J Agric Sci 30(1):49–68 Gregory PJ, Mayes S, Hui CH, Jahanshiri E, Julkifle A, Kuppusamy G, Kuan HW, Lin TX, Massawe F, Suhairi TASTM, Azam-Ali SN (2019) Crops For the Future (CFF): an overview of research efforts in the adoption of underutilised species. Planta 250(3):979– 988 Hu Y, Chen JD, Fang L, Zhang ZY, Ma W, Niu YC, Ju LZ, Deng JQ, Zhao T, Lian JM, Baruch K, Fang D, Liu X, Ruan YL, Rahman MU, Han JL, Wang K, Wang Q, Wu HT, Mei GF, Zang YH, Han ZG, Xu CY, Shen WJ, Yang DF, Si ZF, Dai F, Zou LF, Huang F, Bai YL, Zhang YG, Brodt A, Ben-Hamo H, Zhu XF, Zhou BL, Guan XY, Zhu SJ, Chen XY, Zhang TZ (2019) Gossypium barbadense and Gossypium hirsutum genomes provide insights into the origin and evolution of allotetraploid cotton. Nat Genet 51(4):739– 748. https://doi.org/10.1038/s41588-019-0371-5 Hunter D, Borelli T, Beltrame DMO, Oliveira CNS, Coradin L, Wasike VW, Wasilwa L, Mwai J, Manjella A, Samarasinghe GWL, Madhujith T, Nadeeshani HVH, Tan A, Ay ST, Guzelsoy N, Lauridsen N, Gee E, Tartanac F (2019) The potential of neglected and underutilized species for improving diets and nutrition. Planta 250(3):709–729

G. Cannarozzi and Z. Tadele Ingram AL, Doyle JJ (2003) The origin and evolution of Eragrostis tef (Poaceae) and related polyploids: Evidence from nuclear waxy and plastid rps16. Am J Bot 90(1):116–122 Jones BMG, Ponti J, Tavassoli A, Dixon PA (1978) Relationships of Ethiopian cereal tef (Eragrostis-Tef (Zucc) Trotter)—evidence from morphology and chromosome-number. Ann Bot-London 42(182): 1369–1373 Jost M, Esfeld K, Burian A, Cannarozzi G, Chanyalew S, Kuhlemeier C, Assefa K, Tadele Z (2015) Semidwarfism and lodging tolerance in tef (Eragrostis tef) is linked to a mutation in the alpha-Tubulin 1 gene. J Exp Bot 66:933–944. https://doi.org/10.1093/jxb/ eru452 Kamies R, Farrant JM, Tadele Z, Cannarozzi G, Rafudeen MS (2017) A proteomic approach to investigate the drought response in the orphan crop Eragrostis tef. Proteomes 5(4):32 Kebede H, Johnson RC, Ferris DM (1989) Photosynthetic response of Eragrostis tef to temperature. Physiol Plantarum 77(2):262–266 Ketema S (1997) Tef, Eragrostis tef (Zucc.) Trotter. Institute of Plant Genetics and Crop Plant Research, Gatersleben/International Plant Genetic Resources Institute, Rome, Italy Liu K, Wang R, Guo XX, Zhang XJ, Qu XJ, Fan SJ (2021) Comparative and Phylogenetic Analysis of Complete Chloroplast Genomes in Eragrostideae (Chloridoideae, Poaceae). Plants-Basel 10(1):109. https://doi.org/10.3390/plants10010109 Martinelli F, Cannarozzi G, Balan B, Siegrist F, Weichert A, Blosch R, Tadele Z (2018) Identification of miRNAs linked with the drought response of tef [Eragrostis tef (Zucc.) Trotter]. J Plant Physiol 224– 225:163–172. https://doi.org/10.1016/j.jplph.2018.02. 011 Numan M, Khan AL, Asaf S, Salehin M, Beyene G, Tadele Z, Ligaba-Osena A (2021) From traditional breeding to genome editing for boosting productivity of the ancient grain tef [Eragrostis tef (Zucc.) Trotter]. Plants-Basel 10(4):628. https://doi.org/10.3390/plants 10040628 Pardo J, Wai CM, Chay H, Madden CF, Hilhorst HWM, Farrant JM, VanBuren R (2020) Intertwined signatures of desiccation and drought tolerance in grasses. Proc Natl Acad Sci U S A 117(18):10079–10088. https://doi.org/10.1073/pnas.2001928117 SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL (1998) The paleontology of intergene retrotransposons of maize. Nat Genet 20(1):43–45 Spaenij-Dekking L, Kooy-Winkelaar Y, Koning F (2005) The Ethiopian cereal tef in celiac disease. N Engl J Med 353:1748–1749. https://doi.org/10.1056/NEJMc 051492 Tadele Z (2018) African orphan crops under abiotic stresses: challenges and opportunities. Scientifica 2018:1451894 Tadele Z (2019) Orphan crops: their importance and the urgency of improvement. Planta 250(3):677–694

3

Tef [Eragrostis tef (Zucc.) Trotter]

Tadele Z, Mba C, Till BJ (2010) TILLING for mutations in model plants and crops. In: Jain SM, Brar DS (eds) Molecular Techniques in Crop Improvement: 2nd Edition Springer, pp 307–332 Tadesse M, Alemu B, Bekele G, Tebikew T, Chamberlin J, Benson T (2006) Atlas of the Ethiopian rural economy. International Food Policy Research Institute (IFPRI), Central Statistical Agency (CSA), Ethiopian Development Research Institute (EDRI), Washington, D.C., Addis Ababa, Ethiopia Tesema A (2013) Genetic diversity of tef in Ethiopia. In: Assefa A, Chanyalew S, Tadele A (eds) Achievements and prospects of Tef improvement. EIAR-University of Bern, Bern, Switzeralnd, pp 15–20 Teshome GE, Mekbib Y, Hu GW, Li ZZ, Chen JM (2020) Comparative analyses of 32 complete plastomes of Tef (Eragrostis tef) accessions from Ethiopia: phylogenetic relationships and mutational hotspots. PeerJ 8: e9314. https://doi.org/10.7717/peerj.9314 VanBuren R, Man Wai C, Wang X, Pardo J, Yocca AE, Wang H, Chaluvadi SR, Han G, Bryant D, Edger PP, Messing J, Sorrells ME, Mockler TC, Bennetzen JL, Michael TP (2020) Exceptional subgenome stability and functional divergence in the allotetraploid

39 Ethiopian cereal teff. Nat Commun 11(1):884. https://doi.org/10.1038/s41467-020-14724-z Vavilov I (1951) The origin, variation, immunity and breeding of cultivated plants. Ronald Press Co, New York, Translated from the Russian by Chester KS Wang W (2018) Starch: from metabolism to utilization in orphan crop. ETH Zurich Wang Z, Miao HX, Liu JH, Xu BY, Yao XM, Xu CY, Zhao SC, Fang XD, Jia CH, Wang JY, Zhang JB, Li JY, Xu Y, Wang JS, Ma WH, Wu ZY, Yu LL, Yang YL, Liu C, Guo Y, Sun SL, Baurens FC, Martin G, Salmon F, Garsmeur O, Yahiaoui N, Hervouet C, Rouard M, Laboureau N, Habas R, Ricci S, Peng M, Guo AP, Xie JH, Li Y, Ding ZH, Yan Y, Tie WW, D’Hont A, Hu W, Jin ZQ (2019) Musa balbisiana genome reveals subgenome evolution and functional divergence. Nature Plants 5(8):810– 821. https://doi.org/10.1038/s41477-019-0452-6 Yami A (2013) Tef straw: a valuable feed resource to improve animal production and productivity. In: Assefa K, Chanyalew S, Tadele Z (eds) Achievements and prospects of tef improvement. EIAR-Uni. Bern, Bern, pp 233–251

4

The Apricot Genome Yu-zhu Wang, Hao-yuan Sun, Jun-huan Zhang, Feng-chao Jiang, Li Yang, and Mei-ling Zhang

Abstract

Apricot originated in China and the historical records of apricot cultivation is more than 3600 years ago with archaeological results. It was first introduced to Persia (now Iran) from China when Zhang Qian went on a mission to the Western Regions in the 1–2 centuries BC and were introduced to ancient Greece in the 1st century AD via Armenia. Then apricot was introduced to North America in the latter part of the 17th century, and to Africa, South America, and Oceania only after the 19th century. Today, apricots have spread across five continents in the world and have become a worldwide fruit tree species.The flesh and kernels of apricot are not only rich in nutrition, but also the important raw materials of traditional Chinese medicine, with high medicinal value. Therefore, they can be processed into various food products and Chinese

Y. Wang (&)  H. Sun  J. Zhang  F. Jiang  L. Yang  M. Zhang Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences, Beijing 100093, China e-mail: [email protected] Y. Wang  H. Sun  J. Zhang  F. Jiang  L. Yang  M. Zhang Apricot Engineering and Technology Research Center, National Forestry and Grassland Administration, Beijing 100093, China

medicine. Currently, apricots are cultivated in Asia, Europe, Oceania, Africa, North America and South America, of which Asia has the largest cultivated area. The genetic diversity of biological traits in apricot germplasm is very rich with 12–14 species of apricots. According to statistics of the International Plant Genetic Resources Institute (IPGRI) database, 62 research institutions in 30 countries or regions have held over 6000 accessions (including duplicates) of apricot germplasm resources.``Chuanzhihong'' apricot is native to Hebei Province in China, with a cultivation history of more than 300 years. It was selected for sequencing and assembling the apricot genome due to its good comprehensive cultivation characteristics such as high yield, disease resistance, strong heritability, and good genetic stability. The researchers constructed a high-density genetic map of apricot and completed the first high-quality whole-genome sequencing of apricot (Prunus armeniaca L.) using PacBio sequencing technology. The assembled apricot genome size was 221.9 Mb, with a contig N50 size of 1.02 Mb. 30,436 protein-coding genes were predicted. The molecular mechanism of fruit color, apricot kernels sweetness or bitterness, the resistance to plum pox virus (PPV) and other agronomic traits have been preliminarily revealed.In the future, with the continuous development of bioinformatics technology, the research of apricot genome will promote

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_4

41

42

Y. Wang et al.

the combination of multiple omics, such as structural genomics, comparative genomics, evolutionary genomics, functional genomics, agrigenomics, epigenetics, metabolism omics, etc. These omics will provide unprecedented research methods and platforms for further study of apricot species evolution, gene function, and regulation mechanism which may promote the fully utilization of apricot genome.

4.1

Introduction

China; Author: Guo Yigong (266 AD * 420 AD)) states: “Rongyang has white apricots; Yezhong has red apricots and green apricots.” In addition, the “Four Seasons Compilation Essentials” (a book of monthly-style farmer’s miscellaneous record that lists the things that the farmer should do for twelve months in four seasons; Author: Han E (945 AD to 960 AD)) also records apricot sowing, transfer, planting distance, frost prevention, etc. “Essential Techniques for People’s Welfare” (Author: Jia Sixie, 533 AD * 544 AD) has a description of apricot cultivation techniques, fruit processing, and utilization. After the famers selective breeding of apricot varieties, a dazzling array of farm varieties has been bred in North and Northwest China. These varieties not only have good quality but also the single fruit weight ranges from 10 to 150 g.

Apricot is native to China and is one of the oldest fruit tree species cultivated there. According to ancient Chinese books, apricot has been recorded in Chinese for more than 3600 years. “Xia Xiaozheng” (approximately 2070 BC * 1600 BC) records: “plum, apricot, and peach blossom in Lunar January and apricot fruits in the garden in Lunar April.” “Guanzi” (685 BC) records: 4.1.1 Botanical Description “It is suitable for growing apricots on fertile soil.” “The Classic of Mountains and Rivers” The apricot belongs to the family Rosaceae; (460 BC) records: “There are many trees such as subfamily Prunoideae, genus Prunus L., subpeaches, plums, plums and apricots in Ling genus Prunophora (Neck.) Focke, section Shan.” “Plain ConversationThe Discussion of Armeniaca (Mill) Koch (Render 1940). Some Storing-Qi Methods” (202 BC * 8 AD) records scholars have divided apricots into the genus of “Five grains for nourishment, five fruits for help” apricot independently from the Prunus family Five fruits include: “peach, plum, apricot, (Yu 1979). There are 12–14 species of apricot. chestnut, jujube.” Meanwhile, “The Five Flavors of Lingshu Jing” (202 BC * 8 AD) interprets 4.1.1.1 The Main Species of Apricot the characteristics of five fruits from the per- (1) Armeniaca vulgaris Lam. = Prunus armenispective of traditional Chinese medicine: “Jujube aca L sweet, plum acid, chestnut salt, apricot bitter, and peach pungent.” It can be seen that apricots were Trees 5–8(12) m tall, crown spherical, sphericalused as an important fruit tree species in the flattened, or elongated oblong. Bark grayish Western Han Dynasty of China at that time. brown, longitudinally splitting. Older branchlets “Xijing Miscellaneous Notes” (Liu Xin, 50 brownish, glabrous, transversely lenticellate; BC * 23 AD) records: “Wen Xing (apricot with young branchlets reddish brown, with many pale literary trunk), Penglai Xing (presented by Yu Ji, lenticels. Winter buds purplish red, ovoid, the captain of the east county. The apricot has 2–4 mm, glabrous or puberulous at scale marfive mixed colors flowers and six petals. Legend gins, apex obtuse. Petiole 2–3.5 cm, glabrous or white pubescent, basally usually with 1–6 nechas it that apricots are eaten by immortals).” It can be seen that there were various varieties taries; leaf blade broadly ovate to orbicularof apricot in China as early as 2000 years ago. ovate, 5–9  4–8 cm, both surfaces glabrous, “Guangzhi” (A book which detailed records of abaxially pubescent in vein axils, or adaxially numerous animals and plants in various places of white pubescent, base cuneate, broadly cuneate,

4

The Apricot Genome

43

rounded, or subcordate and with several nec- (2) A. sibirica Lam. = P. sibirica L. taries, margin crenate, apex acute to shortly Shrubs or trees, 2–5 m tall. Bark dark gray. acuminate. Flowers solitary or occasionally paired, Branches spreading; branchlets grayish brown to opening before leaves, 2–4.5 cm diameter. Ped- reddish brown, sparsely pubescent when young, icel 1–3 mm, pubescent. Hypanthium purplish glabrescent. Winter buds reddish brown, ovoid to green, shortly cylindrical, 5–7  3–4 mm, out- conical, 2–4 mm; scale margins are pubescent. side pubescent near base. Sepals purplish green, Petiole 2–3.5 cm, glabrous, or pubescent when ovate to ovate-oblong, 3–5 mm, reflexed after young, with or without small nectaries; leaf blade anthesis, apex acute to rarely obtuse. Petals ovate to suborbicular, (3)5–10  (2.5)4–7 cm, white, pink, or tinged with red, orbicular to both surfaces glabrous, pubescent, or abaxially obovate, 0.8–1.2 cm and ± as broad, margin pubescent in vein axils, base rounded to subshortly unguiculate, apex rounded. Stamens 20– cordate, margin obtusely minutely serrate, apex 100, slightly shorter than petals; filaments white; long acuminate to caudate. Flowers solitary, anthers yellow. Ovary pubescent. Style slightly opening before leaves, 1.5–3.5 cm in diam. longer than or nearly as long as stamens, basally Pedicel 1–2 mm. Hypanthium purplish red outside, campanulate, outside basally glabrous or pubescent. Fruit spherical, obovate, oblate or elliptical, slightly pubescent. Sepals oblong-elliptic, more than 2.5 cm in diameter (largest more than recurved at anthesis, apex acute to shortly 6 cm). Peel white, yellow or orange-yellow, acuminate. Petals white with pink veins or often with redness or erythema, slightly pub- pinkish, suborbicular to obovate. Stamens nearly escent, smooth, and hairless. The fruit generally as long as petals. Ovary pubescent (Lu and Bruce does not crack when it is mature, but it will crack 2003). Fruit oblate, 1.4–2.9 cm in diameter, yellow if there is too much rain, or the soil moisture is oversaturated. Flesh sweet and sour, juicy, and or orange-red, sometimes flushed, and pubescent. fragrant. Stone adhesion or not, oval or elliptical, Flesh dry, with a thickness of 0.4–0.6 cm. When flat on both sides, blunt tip, symmetrical base, mature, it cracks along the sutures, tastes sour, rarely asymmetry, rough surface. Shell surface is slightly bitter, and inedible. Stone not adhesion. striated or smooth, ventral ribs round or blunt, Shell oblate or oblate ellipse, flat on both sides, dorsal ribs more upright, keel-shaped side ribs on round top, sloping base to one side, asymmetric, smooth surface, wide and sharp ventral ribs. the abdomen. Kernel bitter or sweet. Flowering period from March to April, fruit Kernel bitter, rarely sweet. Flowering period from March to April, maturity period from May to August, rarely September to October. 2n = 2x = 16, rarely 24. 3–4 days later than the flowering period of A. This species is a cultivated species; the fruit vulgaris Lam. (P. armeniaca L.) (Wang et al. can be eaten freshly or used for processed, and 2004). Fruit mature period from June to July, the kernels can also be eaten directly or pro- rarely from September to October. 2n = 2x = 16. This species is a pure native species of apricessed for more products. There are six variants of this species, namely: cot. Kernels can be used for food processing and 1. Armeniaca vulgaris var. vulgaris; 2. Armeni- oil extraction as well as an important raw mateaca vulgaris var. ansu (Maxim.); 3. Armeniaca rial for Chinese medicine. This species is extrevulgaris var. glabra (Sun S. X. et al.); 4. Arme- mely cold resistant. It can withstand low niaca vulgaris var. pendula (Jager); 5. Armeni- temperatures of −50 °C and has strong drought aca vulgaris var. variegata (West.); 5. and barren resistance. This species is mostly used Armeniaca vulgaris var. meixianensis (Zhang j. as a rootstock or the original material for coldY. et al.); 6. Armeniaca vulgaris var. zhidanensis resistant breeding of apricots. There are 4 variants of this species, namely: (Qiao C. Z. et al.) (Yu 1979, Zhang and Zhang 1. Armeniaca sibirica var. sibirica; 2. Armeniaca 2003).

44

sibirica var. pubescens (Kostina); 3. Armeniaca sibirica var. multipetala (Liu G. S. et al.); 4. Armeniaca sibirica var. pleniflora (Zhang et al.). (3) Prunus simonii Carr.

Y. Wang et al.

inconspicuous heavy serrations, often with glands at the tip of teeth when young; dark green above, main vein and lateral veins are obviously sunken; underneath is light green without hair; both mid and lateral veins are obviously protruding, lateral veins are arcuate, and the main veins are at an acute angle; petiole short, with 2– 4 glands. Flowers (1) 2–3, clustered, 2.0–2.5 cm in diameter, pedicel 2–4 mm long, glabrous. Fruit oblate, 3–5 (6) cm in diameter, red, with light-yellow flesh, compact texture, strong flavor, sticky nucleus, slightly astringent; small nucleus, oblate, with longitudinal grooves. Fruit mature period from June to July. Production area: Produced in North China. Widely cultivated. Common varieties include Xiangbian, Hebao Plum, Yanguohong, Yaozihong, and so on. Strong cold resistance, but less disease resistance than Prunus armeniaca L.

Other names: red plum, Yuhuang plum, red apricot, spatholobus plum, Qiugen plum, Limi apricot, Lizi apricot, bitter plum, plum apricot, Meizi apricot, Qiugenzi and so on. This species is native to China, and no pure wild species have been found until now. The varieties include red plum, Qiugen plum, Xingmei, Lizimei, Yellow Xingmei, Ziangbian plum, Hebao plum, Yan Guohong, and Yaozihong. Mainly distributed in North China (Chen 1937). The seeds of this species were passed to France for cultivation in 1867 by Eugene Simon. In 1872, Carriere, a botanist in France, considered it to be an independent species and gradually named it the scientific name of Simon (Prunus simonii Carr. in (4) A. mandshaica Koehne = P. mandshurica Rev. Hort. 1872:111. t. 1872). When this apricot (Maxim.) Skv. species introduced to the United States in 1880, it was originally called Simon-plum, also known Trees 5–15 m tall. Bark dark gray, deeply splitas: Apricot-plum (Wu 1984). Chow (1934) ting. Branchlets reddish brown to greenish, describes it as native to North China and occa- glabrous. Winter buds purplish brown, ovoid, 2– sionally cultivated. It has some characters remi- 4 mm, apex obtuse; scale margins glabrous or niscent of apricot and was thought by some to puberulous. Petiole 1.5–3 cm, puberulous, often have descended from a natural hybrid, but more with 2 nectaries; leaf blade broadly ovate to likely is just an upright variant of P. salicina. broadly elliptic, 5–12(15)  3–6(8) cm, both Zhang et al. (1999) named the cultivated varieties surfaces glabrous or pubescent but gradually produced by the natural hybrids of plum and glabrescent and only abaxially pubescent in vein apricot, and later the varieties (lines) that were axils with age, base broadly cuneate, rounded, or artificially selected for hybridization between sometimes cordate, margin irregularly acutely apricots and plums named Armeniaca limeixing elongately biserrate, apex acuminate to caudate. Zhang. In the past, Prunus simonii was classified Flowers solitary, opening before leaves, 2– as plum. Since this species is a hybrid of apricot 3 cm in diam. Pedicel 7–10 mm, glabrous or and plum, it is recommended to also include sparsely pubescent when young. Hypanthium apricot (Armeniaca). There is no need to repeat reddish brown, campanulate, outside usually the new species name. glabrous. Sepals reddish brown, oblong to Small trees 5–8 m tall, crown pyramidal and elliptic-oblong, outside usually glabrous, margin branchlets upright. Older branchlets purple–red; inconspicuously minutely serrate, apex obtuse to young branchlets light red, stout, glabrous with acute. Petals pink or white, broadly obovate to short internodes. Leaves oblong, obovate or suborbicular. Stamens many, slightly longer than oblong-lanceolate, sparsely oblong, 7–10 cm or nearly as long as petals. Ovary densely publong, 3–5 cm wide, apex acuminate or sharp, escent. (Lu and Bruce 2003). base wedge or broad wedge, with fine round Fruit nearly spherical, 1.5–2.6 cm in diameter, blunt serrations on the edge, sometimes yellow, flushed on the sunny side, and pubescent.

4

The Apricot Genome

Fresh juicy or dry, sour or slightly bitter. Large fruit variety edible with fragrance. Shell nearly spherical or broadly elliptical, flat on both sides, blunt or slightly pointed at the top, nearly symmetrical at the base, smooth or slightly wrinkled on the surface. Ventral ridges blunt. Lateral ridges underdeveloped with shallow longitudinal grooves. Dorsal ridges nearly round. Stone not adhesion. Kernels bitter, rarely sweet. Flowering in April, fruit ripening in July– August. 2n = 2x = 16. This species has strong cold tolerance, and the seeds can be used for medicinal purposes or food processing. The seeds can be used for apricot rootstocks, and the wood can be used for furniture. The flowers can be viewed. There are 2 variants of this species, namely: 1. Armeniaca mandshurica var. mandshurica; 2. Armeniaca mandshurica var. glabra (Nakai) (Yu T. T. et al.).

45

Fruit ovoid or elliptic, slightly flattened on both sides, fruit top blunt or slightly pointed, with a diameter of 2–3 cm, yellow-green, flushed on the sunny side, densely pilose. Fruit stem 4–7 mm long. Flesh does not crack when mature, less juicy, slightly fleshy, sour and slightly astringent. Stone not adhesion, round, oval or elliptical, slightly flattened on both sides, sharply pointed at the top, nearly symmetrical or slightly asymmetrical at the base, smooth or slightly wrinkled on the surface, and slightly blunt ventral edges. Kernels bitter. Flowering in April, fruit maturity in July. 2n = 2x = 16. This species has strong drought and cold resistance. The flesh is not good for fresh-eating but can be processed. The kernels can be used for medicinal or food processing. The seeds can be used for apricot rootstocks and are not suitable for planting at low altitudes.

(5) A. holosericea Batal = P. armeniaca var. (6) A. mume Sieb = P. mume Sieb. holosericea Batal Trees, rarely shrubs, 4–10 m tall. Bark grayish to Trees 4–5(7) m tall. Branchlets reddish brown to tinged with green, smooth. First year’s branchlets grayish brown, pubescent when young, gradually green, smooth, glabrous, or densely incanous. glabrescent, bark burst. 1-year-old branches are Winter buds purplish brown, ovoid, 3–6 mm, glabrous, perennial branches have thorns. Winter glabrous, apex acute. Petiole 1–2 cm, densely buds brown, ovoid. Petiole 1.5–2 cm, pubescent, incanous or pubescent when young, often with usually with nectaries; leaf blade ovate to nectaries; leaf blade ovate, ovate-elliptic, elliptic, elliptic-ovate, 4–6  3–5 cm, both surfaces obovate, or obovate-oblanceolate, 4–8  2.5– pubescent when young but glabrescent, base 5 cm, grayish green, both surfaces pubescent rounded to subcordate, margin minutely serrate, when young, gradually glabrescent or only apex acuminate (Lu and Bruce 2003). abaxially pubescent in vein axils with age, base Flower buds oval, pink, without mesopores. broadly cuneate to rounded, margin usually Flowers solitary, 2.4–2.6 cm in diameter, flowers acutely serrulate, apex caudate. slightly buckled inward, opening before the Flowers solitary or 2 in a fascicle, opening leaves. Pedicel 0.2–0.3 cm long and densely before leaves, 2–2.5 cm in diam., strongly fracovered with white pubescent hairs. Calyx pale grant. Pedicel 1–10 mm, glabrous. Hypanthium purple, almost glabrous. Petals nearly round, usually reddish brown but green to greenish purple with small claws, gaps between the petals, for some cultivated varieties, broadly campanufreshwater red, and slightly darker near the top of late, 2.5–4 mm, outside glabrous or sometimes the petals. Stamens 30, filaments white, anthers pubescent. Sepals ovate to suborbicular, 3–5 mm, khaki. One pistil, style, ovary, and stigma are all apex obtuse. Petals white or pink, obovate, 0.9– yellowish green, with long white hairs in the 1.4  0.8–1.2 cm. Stamens shorter to slightly middle and lower part of the style. Flower no longer than petals. Ovary densely pubescent. Style fragrance. shorter to slightly longer than stamens.

46

Y. Wang et al.

It is native and mainly distributed in ShenFruit yellow to greenish white, subglobose, 2–3 cm in diam., single fruit 5–25 g, pubescent; nongjia, Hubei Province, China. It is related to mesocarp sour, flesh less juicy, adnate to endo- two species of Tibetan apricot and plum. This carp, stone adhesion; ellipsoid to subglobose, species is resistant to drought and cold and can slightly compressed on both sides, ventral suture be used as a raw material for rootstock and somewhat obtuse, distinctly longitudinally fur- breeding. Its seeds can be used as rootstocks, and rowed on ventral and dorsal sides, surface pitted, its kernels can be used as medicine. base cuneate, obtuse, or rounded, apex obtuse and abruptly mucronulate (Lu and Bruce 2003). (8) A. dasycarpa Ehrh. = P. dasycarpa (Ehrh.) Flowering in February-April, fruit ripening in Borkh May–June or July–August. 2n = 2x = 16, rarely Trees 4–7 m tall. Branchlets many, purplish red, 24. This species is resistant to waterlogging, has somewhat thin, smooth and glabrous when strong root resistance to nematodes, and is not young. Petiole thin, short, with or without small cold resistant. The seeds can be used for apricot nectaries; leaf blade ovate to elliptic-ovate, 4– rootstocks. The flowers of most varieties of this 7  2.5–5 cm, abaxially pubescent along veins species can be used for ornamental purposes. The or in vein axils, adaxially dark green and glabflowers, leaves, roots and kernels can be used as rous, base cuneate to subrounded, margin irregularly densely minutely crenate, apex shortly medicine. There are four varieties of this species, namely: acuminate. Flowers usually solitary (or 2), opening before 1. Armeniaca mume var. mume; 2. Armeniaca leaves, ca. 2 cm in diam. Pedicel 4–7 mm, 7– mume var. pallescens (Franchet) Yu & Lu; 3. Armeniaca mume var. cernua (Franchet) Yu & Lu; 12 mm in fruit, thinly pubescent. Hypanthium 4. Armeniaca mume var. pubicaulina Qiao & Shen. reddish brown, campanulate, outside subglabrous. Sepals reddish brown, suborbicular to (7) A. hongpingensis Tii et Li = P. hongpingen- broadly oblong, subglabrous, apex obtuse. Petals white or with pink spots, broadly obovate to sis Tii et Li spatulate, to 1 cm. Stamens many, nearly as long Trees to 10 m tall. Bark grayish brown, irregu- as petals. Ovary thinly pubescent. larly shallowly splitting. Branchlets pale brown Fruit nearly round or oblong, 3 cm in diamto reddish brown, glabrescent. Winter buds eter, with dark purple or black-purple on the small, ovoid. Petiole 1.5–2 cm, densely pub- whole, orange-yellow background, powdery escent; leaf blade elliptic to elliptic-ovate, 6– cream and pubescent. Stalk 4–7 mm long. Flesh 10  2.5–5 cm, abaxially densely yellowish- is orange-yellow, sweet and sour, juicy, thick and brown villous, adaxially sparsely pubescent, rich in cellulose, and has an aroma. Stone is base rounded, margin densely acutely serrulate, adhesion or not, oval or elliptical; the top is flat apex narrowly acuminate to caudate. Fruiting or slightly pointed; the base is nearly symmetripedicel 7–10 mm. Drupe subglobose, 3.5– cal; both sides are flat; the ventral edge is roun4  3–3.5 cm, densely yellowish-brown pub- ded; the dorsal edge is broad and sharp; the base escent; mesocarp edible; endocarp ellipsoid, has a longitudinal groove, and the surface is compressed on both sides, ventral rib obtuse, slightly rough. Kernel is flat round, bitter, and longitudinally furrowed on ventral side, surface rarely sweet. pitted, base subsymmetric, apex acute. Fr. JunIt is mainly distributed in Kashmir, AfghaniJul. Along trails, sometimes cultivated in vil- stan, and Iran in Central Asia, and runs through lages; 200–1800 m. W Hubei, W Hunan (Xupu the Caucasus and Xinjiang, China. This species Xian). This species is cultivated for its edible is a natural distant hybrid of A. vulgaris L. and fruit (Lu and Bruce 2003). P. cerasifera.

4

The Apricot Genome

Flowering in April–May, fruit maturing in June–August. 2n = 2x = 16. The biggest feature of this species is its late flowering, about 5 days later than (A. vulgaris Lam.), and strong resistance to low temperature during flowering and fungal diseases (Lu and Bruce 2003). (9) A. brigantina Vill. = P. brigantina Vill.

47

juice, sweet, five-scented, and does not crack along the sutures when mature. Stone adhesion. Fresh stone 3 g, yellowish-brown, long elliptical nucleus, flat on both sides, blunt top, symmetrical base; rough surface, shallow reticulate pattern, many hairy thin fibers. The ventral and dorsal ridges are round and obtuse, and there are few side ridges. The dorsal ridge sometimes cracks at both ends and all. Between the two sides of the ventral ridge and the nuclear surface, there is a deep longitudinal groove from the top of the nucleus to the base of the nucleus. Kernels oval, full and bitter. Flowering in March, fruit ripening in July.

The Briancon apricot or alpine plum, as an apricot species. This species is native to the foothills of the Alps in southeastern France. The trees are small; the fruit are small, yellow, round, and glabrous. Buds contain more than one flower. Morphological traits and crossing affinity with P. cerasifera indicate that this species is (11) P. hypotrichodes Cardot more appropriately classified as a plum (Kostina 1978). Shrubs to 3 m tall. Branchlets glabrous, dark brown initially, later brownish gray. Winter buds (10) A. zhengheensis Zhang & Lu = P. zhengheen- ovoid, brown; scales orbicular, imbricate, margin sis Zhang & Lu pilose. Stipules small, lanceolate, caducous, margin glandular denticulate. Petiole 1–1.5 cm, Trees 35–40 m tall, erect. Bark dark brown, usually glabrous, with 1–3 nectaries; leaf blade somewhat smooth, flaking into pieces. Older lanceolate, 5–12  1.5–4 cm, broadest near branchlets grayish brown; younger branchlets middle, abaxially brownish villous, adaxially reddish brown, smooth, pubescent, with dense glabrous or rarely pilose on veins, base narrowly and transverse lenticels; new shoots reddish subrounded, margin irregularly minutely glandbrown on exposed side, green on opposite side. tipped serrate, apex acute. Petiole red, 1.3–1.5 cm, usually glabrous, with 2– Flowers solitary, opening slightly before or 4(–6) nectaries apically from middle; leaf blade with leaves, ca. 3 cm in diam. Pedicel 7–10 mm, elliptic to oblong, 7.5–15  3.5–4.5 cm, abaxi- glabrous. Hypanthium subcampanulate to obcoally densely grayish villous, adaxially green and nic, outside glabrous. Sepals ligulate, shorter pilose on veins, base mostly truncate to rarely than hypanthium, reflexed, outside glabrous, rounded, margin irregularly minutely gland- margin ciliate glandular. Petals white, long obotipped serrate, apex acuminate to long caudate; vate, 1.2–1.5  0.8–0.9 cm, base shortly and midvein adaxially red or sometimes white. broadly unguiculate, margin entire, apex obtuse. Flowers usually solitary, opening before Stamens ca. 30, arranged in several whorls, outer leaves, ca. 3 cm in diam. Pedicel yellowish ones nearly as long as petals, inner ones shorter green, 3ñ4 mm, glabrous. Hypanthium basally than petals; filaments slender, glabrous; anthers green, apically reddish, campanulate, outside small, orbicular. Ovary densely villous. Style glabrous. Sepals purplish red, ligulate, reflexed slender, slightly longer than or nearly as long as after anthesis, outside glabrous. Petals white, stamens, basally hirsute; stigma dilated (Li and elliptic, 1.3–1.5  0.8–0.9 cm, base shortly Jiang 1998; Lu and Bruce 2003). Fruit drupaunguiculate, apex obtuse. Stamens 25–30, longer ceous, fleshy or dry, indehiscent, 1–or rarely than petals (Lu and Bruce 2003). 2–seeded. Fruit ovoid, 20 g, with yellow skin, flushed on Flowering in March–April, fruit ripening in the sun surface, and slightly hairy; it is rich in June–July.

48

4.1.1.2 Eco-geographical Groups of Common Apricot In 1930s, Kostina and others established collections at the Nikita Botanic Garden in Yalta and the Central Asian Experiment Station of the Institute of Plant Industry in Tashkent, which collected apricots from several geographical regions. After an extensive study of over 700 genotypes, she concluded that most of the cultivated apricots of world belong to P. armeniaca and distinguished four major eco-geographical groups and 13 regional subgroups of the common apricot (Kostina 1969). Descriptions of the major groups together with consideration of their ecological adaptation have been very useful to fruit breeders (Kostina 1969; 1972). Bailey and Hough (1975) have suggested additions to 8 ecogeographical groups (Mehlenbacher et al. 1990). Zhang et al. (1999) suggested that apricots should be divided into six ecological groups and 24 subgroups. The author believes that the following seven ecological groups should be divided: North Chinese This ecological group includes at least more than one thousand traditional cultivated varieties (farm varieties). The ecological group is droughtresistant but not tolerant to waterlogging, barrenness, arbor, vigorous growth, self-flowering, and small-medium-large fruits. (Occasionally see the extra-large type), the peel is white-light yellow-yellow; most varieties have red color on the surface; the flesh is white-light yellowyellow-orange-orange, juicy, stone adhesion, adhesion-half or not, kernel sweet or bitter; most of the fruits can be directly fresh-eating, and they can also be used both fresh-eating and processed. The biggest feature of this ecological group is that it has a large number of varieties with strong flavor of fruits, which is a characteristic that other ecological groups do not have. Northeast Chinese This ecological group is drought-resistant, medium-sized trees, small-medium fruit, light yellow-yellow skin, red-colored fruit surface of most varieties, light yellow-yellow-orangeyellow, medium juicy, stone adhesion, adhesion-half or not. Kernel is sweet or bitter;

Y. Wang et al.

most of the fruits are fresh-eating. The biggest feature of this ecological group is its extreme cold resistance. Eastern Chinese This ecological group has medium-sized trees or shrubs that are resistant to waterlogging and not cold. It has low drought-resistant, strong disease resistance, low temperature in winter, less cold demand, short dormancy period. The flower is self-sterile, and the color is diverse. Fruits is small-medium, mostly yellow-green, less juicy, low sugar content, high acid content. Stone is mostly adhesion. Fruits are basically used as raw materials for processing. Central Asian The Central Asian group is the oldest and richest in diversity of forms of the first four groups. It includes the local apricots of Afghanistan, Baluchistan, Kashmir, and Xinjiang, and the Soviet Republics of Kirghiz, Uzbek, and Tadzik, where local orchards established over centuries of selection in seedling populations offer a wealth of extremely valuable germplasm. The trees are vigorous and long lived with dense crowns of thin twigs. They are resistant to dry atmospheric conditions but sensitive to a lack of soil moisture and are thus usually irrigated. The trees have a high requirement for heat in the postrest phase. The length of the juvenile period is quite long. Most Central Asian apricots are selfincompatible and produce fruit of small to medium size. Genotypes ripen over very long period (May to September). Kernels are generally sweet and are consumed like apricot kernels or used as a source of cooking oil. The fruits have a high sugar content (occasionally > 30% soluble solids) and are used for drying or for fresh market. The fruit is well attached to the branches at maturity, and fruit of drying varieties often dries on the tree. When planted in humid areas, their susceptibility to Monilinia laxa, Stigmina carpophila, and other fungal diseases becomes apparent. European The European group is the youngest in origin and the least variable of the first four groups. The cultivars of Europe, North America, South

4

The Apricot Genome

Africa, and Australia belong to this group. The trees are more precocious, less vigorous, and have a shorter rest period than those of those of the Central Asian and Irano-Caucasian groups. “Zherdeli” types from northern Europe can withstand very cold temperatures during the dormant period. Some “Zherdeli” types have a prolonged blooming period would destroy only a fraction of the flowers that are produced. Most cultivar of European group are self-compatible. The fruit is medium. Dzhungar-Zailij The Dzhungar-Zailij group is the most primitive of the first four groups including the mountains of Panfilov (Dzharskent), TaldyKurgan, and the Almaty south of Balkhash Lake in Kazakhstan, as well as Yili, Tacheng, and Altay north of the Tianshan Mountains in the Xinjiang Uygur [Uighur] Autonomous Region of China, Changji and Hami, and Jiuquan and Zhangye in Gansu Province. The characteristics of the species within the ecological group vary greatly. They are small trees, cold resistant, drought resistant, and tolerant to strong light. Most of the species have small fruits and occasionally medium-sized fruit varieties. The peel is mostly yellow, with red spots or flakes, and the flesh contains high sugar content and low acid content. Stone is less adhesion. Kernel is sweet or bitter, and fruiting will not occur by selfpollination. The most obvious feature is that most varieties of fruit are smooth and hairless. Irano-caucasian The ecological group mainly includes Iraq, Iran, Syria, Russia's Dagestan, Armenia, Georgia, Azerbaijan, Turkey, North Africa, and some varieties of Italy and Spain. The main characteristics of the species in this ecological group are a lower chilling requirement, early flowering, not cold tolerant, large shiny leaves, dense branches, and poorer tree body robustness compared with other species groups, and a slightly shorter lifespan. Most varieties are self-flowering. The fruit is medium; the skin is usually white, cream, or yellow, and the pulp contains high sugar content. The fruit can be eaten fresh or processed, usually processed into preserved apricots.

49

4.2

Origin and Distribution

In recent years, Chinese archaeologists have discovered charred apricot shells and apricot kernels in ancient tombs in Lianyungang of Jiangsu Province, Jiangling, and Guanghua in Hubei Province during the Western Han Dynasty. This is evidence that apricot has been planted in China as early as 2000 years ago (Tong 1983). Furthermore, this also confirmed the long history of apricot cultivation in China. During the Han Dynasty, apricots were already planted in some places in Xinjiang Province. Not only piles of apricot shells were found at the Niya site in Minfeng County of Xinjiang province but also an orchard in 1–3 century AD. There are apricots, peaches, plum, and grapes in this orchard. In addition, apricot shells were also found in the archaeology of Ruoqiangvash Gorge in an ancient tomb from the Jin-Tang Dynasties in Turpan, Xinjiang (Zhang 1983). Apricot and peach were first introduced to Persia (now Iran) from China in the 1–2 centuries BC and were introduced to ancient Greece in the first century AD via Armenia (Laufer 1919). Therefore, in Europe, it was once mistaken that apricots originated in Armenia and was called Armenia Apple. This is why the scientific Latin name of apricot is called “armeniaca” now. According to De Candolle’ theory (De Candolle), there is no record of apricot in the age of Euphrates in ancient Greece. Pliny (Pliny the Elder, 23AD * 79AD) first named the apricot Praecorium, and recorded it in his book “Historia Naturals” (“Historia Naturals,” the work was completed in 77 AD). The book mentioned that apricots have been introduced to Rome for 30 years at that times. It is speculated that the introduction of apricot to Rome may be around the first century AD. Compared with the Chinese dynasties, it is speculated that it was the Western Han Dynasty, which is equivalent to the time when Zhang Qian went on a mission to the Western Regions. Apricot and peach were sent from China to the Western Regions together. During the era of Henry VIII in Britain, apricot was introduced by a Catholic priest from Italy in

50

1524. Before this time, there was no English name for apricot. As the apricot fruit ripened early and had no name after introduction, it was temporarily called Praecox (meaning earlyripening fruit), which was later transformed into Apraecox (an early-ripening fruit). It was later simplified as apricox fo Apricock, and finally became Apricot, which became the last English name of apricot. In the latter part of the seventeenth century, apricot was introduced to North America, after which a Spanish Christian priest introduced apricots to California in Asia. Apricot was introduced to Africa, South America, and Oceania only after the nineteenth century. Today, apricots have spread across five continents in the world and have become a worldwide fruit tree species (Wu 1984).

4.2.1 Geographic Distribution Apricot originated in China, and it is distributed in almost every province in the north and south of China, but the main commercial cultivation area is mainly in the north of China. After several years of introduction and propagation, apricot has become a worldwide tree species which are cultivated on all continents except Antarctica. Although the gene pool of apricot contains species and varieties which range in areas of adaptation from the cold winters of Siberia to the subtropical climate of North Africa and from the deserts of Central Asia to the humid areas of Japan and eastern China, as a commercial production area, it is still restricted by the environment. At present, the commercial apricot production area is mainly in the temperate zone, and a small part of apricot varieties is also cultivated in the subtropical and frigid zones. The main commercial cultivated area of apricots is mainly in North, Central, and Southwest Asia, the cultivated area of apricots including Syria, Iran, Iraq, Afghanistan, Kazakhstan, Turkmenistan, Kyrgyzstan, Uzbekistan, Tajikistan, Jordan, Pakistan, Armenia, Azerbaijan, India, Nepal, Israel, Lebanon, Georgia, and Yemen. In Europe, the cultivated area of apricots included Turkey, France, Italy, Greece, Poland, Hungary,

Y. Wang et al.

Romania, Spain, Portugal, Serbia, Slovenia, Czech, Slovakia, Bulgaria, Austria, Ukraine, Russia, Switzerland, Albania, Croatia, Bosnia and Herzegovina, and North Macedonia. In South America, apricot is cultivated in Chile, Argentina, Ecuador, and Peru. In Africa, the cultivated area of apricots includes Egypt, Morocco, Algeria, South Africa, Tunisia, Libya, Cameroon, Zimbabwe, and Madagascar. In Oceania, the cultivated area of apricots includes Australia and New Zealand. According to the cultivation area statistics provided by FAO (2019), countries with more than 10,000 ha of apricot cultivation include Turkey (131,178 ha), Iran (56,090 ha), Uzbekistan (43,464 ha), Algeria (30,861 ha), China (24,642 ha), Spain (20,240 ha), Pakistan (19,372 ha), Italy (17,910 ha), Afghanistan (17,719 ha), Japan (14,500 ha), Syria (13,438 ha), France (12,280 ha), Russian (11,523 ha), Morocco (11,073 ha), Tajikistan (10,764 ha), and Armenia (10,363 ha). It should be noted that the FAO (2019) data on apricot cultivation area in China are inconsistent with the actual apricot cultivation status in China. According to the statistics of the cultivation area of various provinces in China, the cultivated area of fresh-eating apricots is 311,596 ha. This data did not include the more than 1,370,000 ha of newly developed kernel-using apricots for the “Returning Farmland to Forests” project that has been implemented in northern China since the end of the twentieth century, as well as the wild apricot secondary forest zone scattered in the mountains of northern China (most of them belong to Siberian apricot) (Wang 2016).

4.2.1.1 The Seed Banks of Apricot A large number of apricot germplasm resources have been collected and preserved in different germplasm repositories around the world. According to statistics of the International Plant Genetic Resources Institute (IPGRI) database (Emilie and Marine 2006), in addition to China, 62 research institutions in 30 countries or regions have held over 6000 accessions (including duplicates) of apricot germplasm resources. Italy has persevered the largest number of apricot

4

The Apricot Genome

resources, with more than 1000 accessions. The Crimean Fruit Tree Test Station in Ukraine holds 718 accessions of apricot. The national fruit tree germplasm repository at the University of California (Davis) in the United States has collected 212 accessions of six species including P. armeniaca L., P. mandshurica Maxim., P. mume Sieb. & Zucc., P. dasycarpa Ehrh., P. sibirica L., and P. brigantina Vill. As the origin center of apricot, China has most diversified apricot genetic resources (Bourguiba et al. 2020). The Chinese national fruit tree germplasm repository of apricot, located in Xiongyue, Liaoning province, has collected and preserved all nine species of apricot including 553 accessions; the number of genetic types held there ranked first in the world apricot repositories (Liu et al. 2010). Among the provincial fruit tree germplasm resource repositories in China, the Forestry and Pomology Institute of Beijing Academy of Agriculture and Forestry Sciences preserved the largest amount of apricot resources with more than 350 accessions; most of them do not overlap with the National Germplasm Resource Repository of Apricot in Xiongyue, Liaoning. The conservation status of apricot germplasm resources in the world's major apricot germplasm repositories was listed in Table 4.1.

4.2.2 Economic and Ecological Value of Apricot 4.2.2.1 Nutritional Value According to the main purposes of apricot fruits, apricot varieties can be divided into three types: fresh apricot, processed apricot, and kernel-use apricot. The fruit is the most nutritious product of the apricot tree. The flesh can be eaten freshly and can also be processed into canned apricots, preserved apricots, apricot jam, apricot juice, apricot vinegar, and dried apricots. The apricot flesh is delicious, brightly colored, and has a pleasant fragrance. Apricot flesh contains 78.3– 89.4% water, 1.1–1.6% protein, 2.0–3.0% fiber, 6.0–20.7% soluble solids, 0.3–0.5% fructose, 1.1–2.7% glucose, 4.4–5.1% sucrose, 0.7–3.2% titratable acid, and 2.6% organic acid (including

51

0.4% malic acid and 2.1% citric acid). Each 100g flesh contains minerals and vitamins: calcium 11.0–16.0 mg, phosphorus 9.0 mg, potassium 320.0–350.0 mg, sodium 1.0 mg, magnesium 9.0 mg, iron 0.3 mg, zinc 0.1 mg. As determined by the authors, every 100 g of flesh contains vitamin C 2.10 to 14.60 mg, vitamin B1 0.01 to 0.03 mg, vitamin B2 0.03 to 0.23 mg, vitamin B6 0.01 * 0.04 mg, vitamin E 0.22 * 1.16 mg, and b-carotene 0.07 * 7.82 mg. The content of various vitamins is different between apricot varieties. For example, the content of b-carotene in yellow-fleshed apricots was 14-times higher than that in white-fleshed apricots, while the vitamin E content of white-fleshed apricots was significantly higher than that in yellow-fleshed apricots (Wang et al. 1999). Chinese medicine has made a great contribution to mankind, among which apricot is one of the important Chinese medicinal materials. “Shen Nong’s Materia Medica” (many medical scientists, 25–220 years) records the medicinal value of apricot kernels, but it is limited to the medicinal use of apricot kernels: Apricot kernels has the effects of relieving cough, relieving asthma, moisturizing the lungs and reducing swelling, etc. “The Notes to the Collection of Materia Medica” and “Bielu” (Tao Hongjing, AD 456–536) record the medicinal value of apricot kernels, apricot flowers, fruits, and roots: Apricot kernels are bitter, cold, and toxic. “Compendium of Materia Medica” (Li Shizhen, AD 1518–1593), based on the summary of predecessors, not only gives the uses of apricot kernels to cure diseases but also gives specific usage methods. For example, apricot kernels can cure wheezing and swelling, urinary dripping. Use 0.5 g apricot kernels, remove the peel and tip, boil and grind finely, cook porridge with rice, and eat 200 ml before meals. Apricot kernels can be used for the treatment of the head and face swelling. Mash apricot kernels into paste, mix with egg yolk and spread them on cloth, wrap around your head. When the medicine is dry, apply it again. The swelling of the head and face can be cured after using this treatment seven or eight times. Apricot kernels can also treat cough, dyspnea, aphasia, phlegm cough, throat fever,

52

Y. Wang et al.

Table 4.1 Apricot germplasm resource held at national repositories and research institutes according to the database of the international plant genetic resources institute (IPGRI) No

Country

1

Australia

Institutions

No. accessions

Granite belt horticultural research station, Queensland

P. armeniaca L. (21)

Agriculture research station, new south wales

P. armeniaca L. (42)

Institute of plant sciences, Burnley, Victoria

P. armeniaca L. (15)

Department of agriculture Loxton research center, Loxton, South Australia

P. armeniaca L. (615)

2

Brazil

Centro de pesquisa agropecuaria de clima temperado, EMBRAPA

P. armeniaca L. (4) P. mandshurica Maxim (1) P. mume Sieb.& Zucc. (1)

3

Canada

Canadian clonal genebank, PGRC agriculture and agrifood Canada

P. armeniaca L. (144)

Research center of agriculture and agri-food Canada, Summerland

P. armeniaca L. (150)

4

Chile

Instituto de investigaciones agropecuarias, Santiago

P. armeniaca L. (19)

5

France

Conservatoire botanique national alpin de gap-charance

P. brigantina Vill (13, wild type)

Conservatoire botanique national de porquerolles, Hyeres

P. armeniaca L. (38)

Center de pomologie la mazičre, L’Estrechure

P. armeniaca L. (24)

6

Greece

Recherches fruitieres mediterraneennes, INRA

P. armeniaca L. (380)

Unité de Recherches sur les Espčces Fruitičres et la Vigne, INRA

P. P. P. P.

Pomology Institute, National Agricultural Research Foundation

P. armeniaca L. (8)

Greek genebank, agric. res. center of Makedonia and Thraki

P. armeniaca L. (10)

armeniaca L. (2) dasycarpa Ehrh. (91) mume Sieb. & Zucc. (1) brigantina Vill (1)

7

India

Regional station Shimla, NBPGR India

P. sibirica L. (1, wild type) P. armeniaca L. (28)

8

Italy

Dipartimento di colturei arboree, universita di Bologna

P. armeniaca L. (686)

Dipartimento di ortoflorofrutticoltura, università degli studi di Firenze

P. armeniaca L. (36)

Department of protection and cult. of woody species, univ. of Pisa

P. armeniaca L. (286)

Istituto sperimentale per la frutticoltura, 00134 Roma

P. armeniaca L. (350)

9

Pakistan

National agricultural research center, plant genetic resources program, 45,500 Islamabad

P. armeniaca L. (32)

10

Poland

Research institute of pomology and floriculture, 96– 100 Skierniewice

P. armeniaca L. (76)

11

Spain

Centro investigacion y desarrollo agroalimentario fruticultura—Murcia

P. armeniaca L. (111)

Instituto Canario de investigaciones agrarias, ICIA

P. armeniaca L. (3)

Center de mas bove, inst. recerca i tecnologia agroalimen

P. armeniaca L. (96)

Consejo superior de investigaciones cientificas, 50,080 Zaragoza

P. armeniaca L. (12) (continued)

4

The Apricot Genome

53

Table 4.1 (continued) No

Country

Institutions

No. accessions

12

Turkey

Dept. of horticulture, faculty of agriculture, Ege University

P. armeniaca L. (34)

Plant genetic resources dept. Aegean agricultural research inst., 35,661 Izmir

P. armeniaca L. (75)

Inst. for irrigated horticulture, 332,311 Melitopol

P. armeniaca L. (130)

13

Ukraine

Crimean pomological station

P. armeniaca L. (718)

Institute of Horticulture Podolskaya Exp. Staion

P. armeniaca L. (25)

14

U.K

The Royal Horticultural Society, Woking

P. armeniaca L. (2)

15

USA

Fruit laboratory, USDA/ARS plant germplasm quarantine office, Beltsville, Maryland

P. armeniaca L. (72) P. mandshurica Maxim. (1) P. mume Sieb. & Zucc. (6)

16

China

Department of horticulture Clemson university, Clemson

P. armeniaca L. (40)

Cream ridge research center (horticulture and forestry), NJ

P. armeniaca L. (46)

Department of plant sciences, university of California, Davis

P. P. P. P. P. P.

U.S. national arboretum USDA/ARS, woody landscape plant germplasm repository, Maryland

P. mume Sieb. & Zucc. (5)

Postharvest quality & genetics, USDA-ARS

P. armeniaca L. (030) P. dasycarpa Ehrh. (1) P. mume Sieb. & Zucc. (1)

USDA/ARS, WSU-IAREC

P. armeniaca L. (34) Prunus pumila  P. armeniaca L. (1)

National fruit tree Germplasm Xiongyue Plum and Apricot nursery (Liaoning Institute of Fruit Sciences)

P. P. A. P. P. P. P. A. A.

armeniaca L. (536) sibirica L. (2) holosericea Batal. (1) mandshurica Maxim. (2) dasycarpa Ehrh. (1) mume Sieb.& Zucc. (2) brigantina Vill (1) zhengheensis Zhang & Lu (1) limeixing Zhang & Wang (4)

Beijing fruit tree Germplasm Resource Nursery (Institute of Forestry and Pomology, Beijing Academy of Agriculture and Forestry Sciences)

P. P. A. P. P.

armeniaca. L(346) sibirica L. (1) holosericea Batal. (1) mandshurica Maxim. (1) mume Sieb. & Zucc. (1)

armeniaca L. (195) brigantina Vill. (1) mandshurica Maxim. (2) mume Sieb. & Zucc. (8) sibirica L. (1) dasycarpa Ehrh. (6)

54

hemoptysis, bleeding of hemorrhoids, sores in the nose, cloudiness in the eyes, itchy or painful, gradually covering up the pupils, rotten umbilical cord of the child, indigestible, and sores swelling and painful. At present, almost all Chinese patent medicines for colds and coughs contain apricot kernels. According to Wang (1994, 2001) determination of “Yiwuofeng” and “Youyi” apricot kernels, apricot kernels contain 21.6– 25.2% protein, 51.2–58.0% crude fat, and 11.2– 12.9% total sugar. In addition, it is rich in minerals such as phosphorus, calcium, and iron. In particular, oleic acid accounts for 60.0–70.0% of fats, linoleic acid 18.0–32.0%, palmitic acid and stearic acid 2.0–7.8%. Hyun-hee et al. (2014) showed that the phytochemical components of apricot essential oil have antibacterial activity against 16 kinds of bacteria and 2 kinds of yeast. Therefore, in addition to nutritional value, apricot kernels also have important medicinal value. Apricot is rich in nutrients. The flesh can be processed into preserved apricots, canned apricots, dried apricots, apricot jam, apricot juice, apricot wine, preserved plum candy, haw rolls (a Chinese sweet), and other foods. Apricot kernels can be made into apricot kernel cream, milk, butter, dessert, pickles, oil, and so on. Apricot kernel oil is slightly yellow and transparent and has a fragrant taste. It is not only an excellent edible oil but also an important raw material for high-grade paint coating, cosmetics, and highquality soap. Apricot kernels can also be extracted from natural benzaldehyde and benzoic acid, which are important raw materials for medicine and food. Apricot kernels have high nutritional value and are also an important raw material for traditional Chinese medicine. Apricot fruit has a good medical effect and occupies an important position in Chinese herbal medicine. It mainly treats wind-cold lung disease, promotes body fluid to quench thirst, moisturizes the lungs and resolves phlegm, clears heat, and detoxifies. Apricot shells can be processed into activated carbon.

Y. Wang et al.

4.2.2.2 Ecological Value Apricot not only has high nutritional value and medicinal value but also has high ecological value. At present, most of the areas where apricots are naturally distributed, including Central Asia, West Asia, and the northern part of East Asia. These mountainous areas are drought and low rainfall with limited irrigation conditions, so the apricot grows naturally. Others are basically artificially cultivated apricot areas. Currently in China, from western Liaoning to Hebei, Beijing, northern Hebei, Inner Mongolia, Shanxi, Shaanxi, and Ningxia’s Yanshan, Taihangshan, Daqingshan (Yinshan), and Tianshan mountainous regions, there are secondary forest belts of wild apricots (Fig. 4.1). During the apricot blooming season in spring, the apricot flowers all over the mountains become a gorgeous landscape. The annual precipitation in this area is small and concentrated, mainly concentrated in July–August, generally between 550 and 380 mm. Northern Hebei, Northwestern Shanxi and Ningxia are areas with very little annual precipitation, and some areas are only about 280 mm. As the apricot tree is a pioneer tree species with drought resistance, the apricot can grow and bear fruit where the annual precipitation is above 280 mm, even if there is no irrigation. It is difficult to survive without irrigation under the same annual precipitation for any other fruit tree species of except apricot, which shows that apricot has an extremely important contribution to improving the ecological environment (Wang et al. 2003). In view of this, China has implemented the “Returning Farmland to Forests” project since the end of the twentieth century. So far, more than 1.37 million hectares of apricots and mixed forests of apricots and pine trees have been newly developed in Liaoning, Inner Mongolia, Hebei, Beijing, Shanxi, Shaanxi, and Ningxia. Now, forests play a significant role in reducing local water loss and soil erosion and act as wind-breaking and sandfixing, improving the local landscape, increasing

4

The Apricot Genome

55

Fig. 4.1 Secondary forest belts of wild apricots

local economic income, which contribute to improve the local ecological environment.

4.3

Genome Sequencing

focus for sequencing and assembling the genome (Jiang et al. 2019). “Chuanzhihong” apricot is native to Hebei Province, with a cultivation history of more than 300 years, and good comprehensive cultivation characteristics such as high yield, disease resistance, and late maturation.

4.3.1 Strategy 4.3.1.1 Plant Materials Heterozygosity is a common feature in most eukaryotic species. Fruit trees usually exhibit high heterogeneity, which makes it more difficult to assemble high-quality complete genomes. To reduce the complexity of the sequenced genome, the plants used to sequence are critical. For sweet orange (Citrus sinensis), a double-haploid (dihaploid) line, derived from the anther culture of Valencia sweet orange, was selected as the genome sequencing material (Xu et al. 2013). However, it was difficult to obtain the haploid line for most fruit species. In apricot (P. armeniaca L.), one of the most widely cultivated Chinese cultivars “Chuanzhihong” had been the

4.3.1.2 Sequencing Strategy With the rapid development of sequencing technology over the past ten years, many fruit species’ genomes have been published. Considering the heterozygosity of plant material and the sequencing cost, different sequencing strategies have been applied among most of the sequenced species. Usually, in the early stages, a wholegenome shotgun strategy has been used, and the data were generated by paired-end sequencing of cloned inserts using Sanger technology and Roche 454 sequencing (Jaillon et al. 2007; Velasco et al. 2010). Then, with the developments in next generation sequencing (NGS) technology, more sequence data were generated by using the Illumina HiSeq platform or

56

Illumina GAII sequencer (Wu et al. 2013; Xu et al. 2013; Zhang et al. 2012; Liu et al. 2014). Recently, the single molecule sequencing technology is promoting the genome sequence and improved the quality of genome assembling. For the genome assembling of star fruit (Averrhoa carambola), Illumina HiSeq short reads coupled with 52.33 Gb of long reads from Oxford Nanopore Technology (ONT) was used to produce a 335.49 Mb genome with a contig N50 size of 4.22 Mb and then used 42.76 Gb of Hi-C clean data to reconstruct physical maps by reordering and clustering the assembled scaffolds (Wu et al. 2020). In the assembly of the apricot genome, to overcome the problem caused by heterozygosity, we first assembled the Illumina data into super-reads by using MaSuRCA, and super-reads and corrected PacBio subreads were then assembled into contigs constituting the diploid genome (Jiang et al. 2019). In order to assist, to assemble the apricot (Prunus armeniaca L.) genome, a high-density genetic map of apricot was constructed using an F1 population constructed by crossing two main Chinese cultivars “Chuanzhihong” and “Luotuohuang,” coupled with a recently developed reduced representation library (RRL) sequencing. The average sequencing depth was 38.97 in “Chuanzhihong” (female parent, also, is the material for genome sequencing), 33.05 in “Luotuohuang” (male parent), and 8.91 in each progeny. Based on the sequencing data, 12,451 polymorphic markers were developed and used in the construction of the genetic linkage map. The final map of apricot comprised eight linkage groups, including 1991 markers (Zhang et al. 2019). For the de novo assembly, firstly, the error rate of the long reads obtained from the PacBio platform was estimated by using Illumina paired-end reads. Then, we applied the canu pipeline to assemble the long reads and superreads obtained from MaSuRCA into contigs with the following parameters: genomeSize = 300 m, corOutCoverage = 100, minReadLength = 1000, minOverlapLength = 1000, ErrorRate = 0.064, and batOptions. One copy of the contigs from heterozygous regions was retained by using Purge_Haplotigs. Furtherly, we mapped the

Y. Wang et al.

Illumina paired-end reads to the filtered contigs using bwa-mem and polished the contigs with pilon. Finally, the contigs had been organized into pseudo-chromosomes with JCVI allmaps and SLAF markers (Jiang et al. 2019).

4.3.2 The Apricot Genome Assembly The genome of apricot (2n = 16) is small but highly heterozygous. The genome size and fraction of heterozygosity in P. armeniaca were estimated to be 220.36 − 220.56 Mb and 0.900 − 0.902%, respectively, according to evaluation with GenomeScope (best k-mer = 61, obtained with KmerGenie). After the purging of haplotigs, we obtained a haplotype assembly with 444 contigs, and its size was 221.9 Mb, with a contig N50 size of 1.02 Mb. A total of 92.88% of the assembly was anchored to eight linkage groups using linkage maps, and the pseudomolecules ranged in size from 18.6 to 43.0 Mb (Jiang et al. 2019). A total of 30,436 protein-coding genes were predicted, with an average transcript length of 1641 bp, by using a combination of homologybased, ab initio and transcriptome-based prediction methods. The average gene density of apricot was 137 genes per Mb. 905 ribosomal RNAs (5S, 5.8S, 18S and 28S), 488 transfer RNAs, 353 small nuclear RNAs, and 278 microRNAs had been identified. The proportions of all gene models annotated to the Nr, Pfam59, KEGG, GO, UniProt, and transmembrane prediction (TMHMM) databases were 99.17%, 86.07%, 43.40%, 54.59%, 71.53%, and 23.51%, respectively. 1363 transcription factors have been identified in the apricot genome (Jiang et al. 2019).

4.3.3 Resequencing In order to examine the genome-level diversity, population structure, and relationships of apricot, 150 accessions including the four species P. armeniaca, P. sibirica, P. mandshurica, and P. dasycarpa have been re-sequenced using the

4

The Apricot Genome

Illumina HiSeq platform. Based on the sequencing data, a new phylogenetic tree among apricot taxa was constructed; the candidate gene intervals linked to more agronomic traits also had been detected (unpublished data). Li et al. (2020) adopted restriction site-associated DNA sequencing (RAD-seq) technology to sequence 168 Prunus spp. accessions distributed in five ecological groups, including 74 accessions of cultivated Prunus armeniaca L. and 94 accessions of wild apricots (P. armeniaca L. and Prunus sibirica L.), and speculate that the Central Asian ecological group accessions were domesticated from the Dzhungar-Ili ecological group. The population structure and gene flow of the North China and European ecological group accessions suggested a genetic background of P. sibirica. P. armeniaca originated in Northwest China (Ili Valley), subsequently spread throughout Central Asia, and eventually spread to Europe (Li et al. 2020).

4.3.4 Comparison to Other Crops The comparison of the apricot genome to other sequenced fruit plant genomes showed that apricot genome has a high level of contig N50 size (1018 kb) (Table 4.2). Among the 15 fruit tree species, the apricot contig N50 size ranked fifth, which followed black raspberry (Rubus occidentalis), apple (Malus x domestica), star fruit (Averrhoa carambola) and sweet cherry (Prunus avium). For the genome evolution, whole-genome duplication (WGD) is a process of genome doubling that dramatically increases genome complexity. WGD is a particularly important feature of angiosperm genomes. There are two ways of whole-genome replication: old wholegenome duplications (Old WGD) and modern whole-genome duplications (Recent WGD). The whole-genome replication process in modern times is accompanied by the loss of genes and the generation of new functions of genes, which is the main driving force for the evolution of most eudicot plants.

57

All these 14 kinds of fruit tree plants had undergone the ancient WGD. The genome hexaploidization of these eudicot plants occurred at a similar time, after the differentiation of monocots and eudicots and before the differentiation of the Rosaceae branch (Eurosids clade) and the Vitaceae. The differentiation of Rosaceae was earlier than that of Rhamnaceae and Rosaceae, about 140 million years ago. The speciation time is 87.2 Mya for jujube and Rosaceae (including P. bretschneideri, M. domestica, P. mume, P. persica, and F. vesca) (Liu et al. 2014). Compared with the old WGD events that have generally occurred during evolution, only three fruit tree genomes (apple, pear, kiwifruit) among the 14 fruit tree plants have undergone modern WGD events. For the origin and evolution of Rosaceae plants, Zhang et al. (2012) hypothesized that at least 11 fissions and 11 fusions occurred in P. mume from the nine common ancestral chromosomes. For M.  domestica, at least one WGD and five fusions took place to reach the 17chromosome structure, compared with 15 fusions for F. vesca resulting in the 7-chromosome structure. Jiang et al. (2019) reported that apricot was more closely related to P. mume (Japanese apricot) and that the ancestor of the two species split * 5.53 million years ago. The estimated divergence time of the ancestor of sweet cherry was relatively distant in the four Prunus species, at 10.92 million years. Although apricot has not undergone recent WGD events, as observed in apple and pear, there were many large segmental duplication regions in the apricot genome.

4.3.5 Gene Discovery Compared with four other sequenced Prunus species (peach, mume, sweet cherry, and almond), more gene families have expanded or contracted in apricot. As shown in Fig. 4.2, gene family analysis showed that during the evolution of apricot, 1324 gene families expanded, while 981 families contracted and produced 2300

2012

2011

2018

2007

2013

Straberry (Fragaria vesca)

Black raspberry (Rubus occidentalis)

Grapevine (Vitis vinifera)

Kiwifruit (Actinidia chinensis)

2020

Pear (Pyrus bretschneideri)

2017

Sweet cherry (Prunus avium)

2010

Illumina + Oxford Nanopore + Hi-C

2012

Mei (Prunus mume)

Apple (Malus x domestica)

Illumina HiSeq 2000

2013

Peach (Prunus persica)

Illumina HiSeq 2000

Sanger

Hi-C

Roche 454 + Illumina Solexa + SOLiD

Illumina HiSeq 2000

Sanger + Roche 454

Illumina

Sanger

Illumina HiSeq 2000 + PacBio

2019

Apricot (Prunus armeniaca)

Sequence platform

Published time

Genome

Assembled scaffold sequence (Mb) 221.9

227.4 280 272.4

344.3 598.3 512.0 240

223.8

487 616.1

Sequence coverage

241 

8.47  101.4  97.3 

326  16.9  194  39 

325 

8.4  140 

81.3

69

97.2

95

75.5

71.2

97.6

77.8

83.9

84.8

92.9

Assembled genome/ predicted genome (%)

Table 4.2 Comparison of the apricot genome to other sequenced fruit plant genomes

0.65

2.07



1.36

0.54

80

42.62

0.22

0.58

4.0

25.13

N50 (scaffold) (Mb)

280

3514

7

3263

2103

439

8

10,148

120

202

8

Number of scaffolds

3.41

12.68

39.50

3.24

4.1

1.97

62.32

1.46

2.87



42.98

Longest scaffold (Mb)

58.86

65.9

31,759



35.7

16,171

3,247



31.8

294

1,018

N50 (contig) (kb)

2977

19,577

9813

16,487

25,312

122,146

610



2,009

2730

444

Number of contig

423.5

557





300

13.4

13,604



201.1



5,999

Longest contig (kb)

36.2

38.44

37.7

37.32

37.42

GC content (%)

(continued)

39,040

30,434

28,005

34,809

42,812

57,386

40,338

43,349

31,390

27,852

30,436

Number of predicted proteincoding gene

58 Y. Wang et al.

Illumina HiSeq 2000

Roche 454 + Illumina + PacBio

2014

2015

2012

2012

2020

Jujube (Ziziphus jujube)

Pineapple (Ananas comosus)

Sweet orange (Citrus sinensis)

Banana (Musa acuminata)

Star fruit (Averrhoa carambola)

Illumina HiSeq + ONT + HiC

Roche/454 + Sanger + Illumina

Illumina GAII

Sequence platform

Published time

Genome

Table 4.2 (continued) Assembled scaffold sequence (Mb) 437.65

381.9

320.5

472.2 335.5

Sequence coverage

73.9 

408 

214 

71  512  85.5, 93.77

90

87.3

72.6

98.6

Assembled genome/ predicted genome (%)

31.25

1.31

1.69

11.8

0.30

N50 (scaffold) (Mb)

11

7513

4811

3133

5898

Number of scaffolds

33.98

11.96

8.16

24.88

3.14

Longest scaffold (Mb)

4220

43.12

49.89

126.5

33.95

N50 (contig) (kb)



24,425

16,890

8986

28,930

Number of contig



476,604

323.34

1589.4

334.9

Longest contig (kb)

34.06

39

33.41

GC content (%)

25,419

36,542

29,445

27,024

32,808

Number of predicted proteincoding gene

4 The Apricot Genome 59

60

Y. Wang et al.

Fig. 4.2 Phylogenetic tree and gene family changes of apricot and related species (Jiang et al. 2019)

apricot-specific genes. The genes from the expanded families were mainly enriched in phenylpropanoid biosynthesis (p = 0.0018) and flavonoid biosynthesis (p = 0.0019). In addition, the citrate synthase family was expanded, with three copies in the apricot genome (the additional copy came from the recent species-specific tandem duplication event) and two in other species in the Prunus genus (Jiang et al. 2019).

4.3.6 Candidate Genes for Agronomic Traits Based on the analysis of whole-genome sequencing data, more than 25,000 proteincoding loci have been annotated in each fruit tree species (Table 4.2). Through further study, candidate genes related to resistance, fruit development, quality formation, and other key traits have been identified, which are valuable for resistance breeding and quality breeding in

apricot and will likely pave the way for gene editing approaches in apricot and other Prunus species. Some candidate genes related to three important agronomic traits in apricot are descripted below.

4.3.6.1 Accumulation of b-Carotene in Apricot Apricot fruit is enriched in b-carotene, which represents 60–70% of the total carotenoids (SassKiss et al. 2005; Dragovic-Uzelac et al. 2007; Zhou et al. 2020) and giving the fruit its characteristic color (Curl 1960; Roussos 2016). bcarotene is the main precursors of vitamin A which is one of the most important functional ingredients. Vitamin A is an essential nutrient for humans because it cannot be synthesized within the body. Thus, being a good source of bcarotene, apricots are highly beneficial for human health (Akin et al. 2008, Ali et al. 2011). Apart from its nutritional characteristics, apricot fruit also has some pharmacological significance due

4

The Apricot Genome

to having high amounts of antioxidant. Apricot and/or b-carotene treatment may protect the impairment of oxidative stress and ameliorate methotrexate-induced intestine damage and nephrotoxicity at biochemical and histological levels (Vardi et al. 2008, 2013). Therefore, obtaining fruit with higher b-carotene levels has been a major goal in apricot breeding. The biosynthesis pathway for b-carotene in plant has been well studied, and several enzymes of this pathway have been identified, including phytoene synthase (Psy), phytoene desaturase (Pds), f-carotene desaturase (Zds), and lycopene b-cyclase (Lcy-b), and these genes have been isolated from various plant species, such as tomato, sweet potato, citrus species, and durian (Giuliano et al. 1993; D’Ambrosio et al. 2004; Chen and Zhang 2011; Wisutiamonkul et al. 2017). The present studies indicate that psy1 and pds are important for b-carotene synthesis in ripening fruit of tomato (Xu et al. 2006; Meléndez-Martínez et al. 2010; Romero et al. 2011; Bu et al. 2014). Whereas, for sweet potato, among psy, pds, zds, and lcy-b involved in b-carotene synthesis, the correlation between the expression of lcy-b and b-carotene content was most obvious (Chen and Zhang 2011). The specific genes responsible for b-carotene biosynthesis are different depending on species and have different expression patterns among different varieties. For red and pink-fleshed watermelon cultivars, lcy-b and chy-b (b-carotene hydroxylase) showed downregulation when late maturity, resulting in higher contents of lycopene and b-carotene in the fruits. For yellow-fleshed cultivars, the expression level of lcy-b was always higher during the whole stage, whereas for white-fleshed cultivars, all the genes occurred rapidly attenuated at maturity stage (Kang et al. 2010). The expression patterns of b-carotene synthesis genes in plants are tissue and stage dependent. Psy, pds, zds, lcy-e (lycopene ɛcyclase), crt-b (b-carotene hydroxylase), zep (zeaxanthin epoxidase), and nced3 (9-cisepoxycarotenoid cleavage dioxygenase 3) are all expressed in coffee leaves, flowers, and shoots, but the transcript levels are different

61

among three tissues (Simkin et al. 2010). For tomato, citrus, watermelon, and other fruit-type crops, the genes-related b-carotene biosynthesis appeared highest transcript level, and the bcarotene rapid accumulation was mainly observed in nearly matured stage (Carrari and Fernie 2006; Alquezar et al. 2008; Dou et al. 2017). Lycopene and b-carotene rapidly accumulated in the flesh of Cara Cara citrus fruit during two stages of fruit enlargement and fruit ripening. For study on the molecular mechanisms of b-carotene biosynthesis in apricot, Jiang et al. (2019) used RNA-Seq technology to sequence eight transcriptome libraries from two colorcontrasted apricot cultivars “Chuanzhihong” (yellow flesh) and the “Dabaixing” (white flesh), at four development stages including green fruit with soft kernel (G1), green fruit with hard kernel (G2), color-turning fruit (CT), and fully ripened fruit (FR). By comparing expression patterns of genes in different development stages between these two cultivars, several DEGs involved in carotenoid metabolism were selected and analyzed by qRT-PCR. In apricot, lcy-b may make an important contribution to yellow-flesh development in the ripening fruit of “Chaunzhihong.” In this cultivar, Lcy-b was significantly upregulated, indicating that b-carotene was undergoing rapid synthesis during the CT phase, which is similar to the visual change in flesh color. In contrast, among the other three genes examined (psy, pds, zds), the gene expression levels were not correlated with the accumulation of bcarotene. All of the previous studies indicated that the gene expression changes controlling bcarotene biosynthesis in plants among different species or different varieties are very complex. However, it is interesting that in the white cultivar “Dabaixing,” the transcript level of the 9cis-epoxycarotenoid dioxygenase (NCED) gene is much higher during fruit development, especially in the last two stages of CT and FR, which is contrary to what is observed in the yellow cultivar “Chuanzhihong.” The newly synthesized carotenoids are rapidly converted into xanthoxin, the precursor of abscisate, through enzyme catalysis, especially that by NCED, halting the

62

accumulation of carotene. Thus, the balance of the biosynthesis and decomposition of bcarotene may contribute to the color of apricot fruit (Jiang et al. 2019).

4.3.6.2 Amygdalin Metabolism and the Sweet/Bitterness Kernel Forming The avoidance of bitterness in apricot kernels is the main trait considered by apricot breeders. The availability of “sweet” apricot seeds could eventually increase their marketability, as byproducts of the food processing industry for the production of flavoring pastes and for other applications (Stoewsand et al. 1975). However, the inheritance of seed bitterness of apricot still remains less known. The inheritance model of bitterness/sweet kernel trait varies in different Prunus species. In various Prunus species, seed bitterness is due to cyanogenic glucosides (McCarty et al. 1952). Cyanoglucoside inheritance has been studied in various Prunus species in relation to seed bitterness, whereas the inheritance model varies in different species. In peach, the “sweet kernel” behaves as a recessive trait, controlled by a single gene (sk) (Werner and Creller 1997). In almond, “sweet” has been reported as a dominant trait inherited as a simple Mendelian factor (Heppner 1923, 1926; Dicenta and García 1993). In apricot, no definitive model has been demonstrated. Kostina (1977) report the monofactorial inheritance of kernel taste should correspond to the recessive allele being responsible for bitter taste. Negri et al. (2008) elaborated a model in which five non-linked genes (three for the anabolic and two for the catabolic pathway) were involved in the determination of this character. Bitterness in apricot has been proven to be a complex trait. Our present study provides additional evidence for seed bitterness inheritance difference between apricot and almond. Although molecular markers for bitterness have been added close to the Sk locus (Sánchez-Pérez et al. 2010), the sequence of this gene remains unknown (Cervellati et al. 2012). In order to detect the candidate genes related to the seed bitterness, we analyzed and annotated the genes located in this

Y. Wang et al.

significant QTL intervals based on the apricot genome sequence. In total, 13 genes were identified and annotated in this interval. By analyzing probably the function of these genes, the nucleotides insertion-deletion (indel) in bHLH2 gene could control the seed bitterness of apricot (Unpublished data). Previous studies have confirmed that the seed bitterness is due to cyanogenic glucosides. The metabolic pathways for the synthesis and the catabolism of the cyanogenic glucosides in apricot are similar to almond, as well in eucalyptus and sorghum, including all four enzyme groups relevant to cyanogenesis (P450s, UGTs, b-glucosidases and a-hydroxynitrile lyases; Gleadow et al. 2008; Morant et al. 2008; Sánchez-Pérez et al. 2008; Zagrobelny and Møller 2011). Recently, with the genome sequencing of almond (Prunus dulcis) performed, the study of bitterness in almond had made great progress. Sánchez-Pérez et al. (2019) reported that the functional characterization demonstrated that bHLH2 controls transcription of the P450 monooxygenase–encoding genes PdCYP79D16 and PdCYP71AN24, which are involved in the amygdalin biosynthetic pathway. A nonsynonymous point mutation (Leu to Phe) in the dimerization domain of bHLH2 prevents transcription of the two cytochrome P450 genes, resulting in the sweet kernel trait.

4.3.6.3 Plum Pox Virus (PPV) Plum pox virus (PPV), also known as sharka disease, is currently the most important viral disease affecting Prunus species. Thus, a lot of research has been conducted in the selection of PPV-resistant genetic resources in Prunus germplasm and anti-PPV breeding. In 2002, Hurtado et al. established genetic linkage maps of two apricot cultivars (Prunus armeniaca L.), and the sharka resistance trait had been mapped on linkage group 2. Ruiz et al. (2011) narrow down the apricot PPV resistance locus, according to the peach syntenic genome sequence; PPVres was predicted within a region of 2.16 Mb in which a few candidate resistance genes were identified (Ruiz et al. 2011). In 2012, three SSR markers (PGS1.21 PGS1.23 and PGS1.24) tightly linked to PPV resistance trait were

4

The Apricot Genome

identified and screened, which was the first successful application of their use in MAS for breeding resistance in Prunus species (Soriano et al. 2012). Later, Zuriaga et al. (2013) revealed MATH gene(s) as candidate(s) PPV resistance in apricot (Prunus armeniaca L.), there were nine transcripts (three putatively coding for serine/ threonine kinases and six for MATHd proteins) that seem to be closely associated with potyvirus resistance and, particularly, with PPV resistance in Arabidopsis regarding MATHd proteins (Zuriaga et al. 2013). Recently, the results from Zuriaga et al. (2018) strongly support ParPMC1 and/or ParPMC2 as host susceptibility genes required for PPV infection, which silencing may confer PPV-resistance trait. By comparing the changes in MATHd orthologues in Prunus, Jiang et al. (2019) found that the associated regions were vertically inherited from the ancestor of Prunus species, and that at least, two tandemly arrayed copies have been retained in each species; the loss of the MATHd genes may result in susceptibility to PPV.

4.4

Future Goals and Prospects

4.4.1 The Goal of Apricot Genomics Research Genome research is a new model of life science research from the traditional “model oriented” to “data oriented.” A new life science and industrial development model oriented by bioinformatics has been formed. From the accumulation of expressed-sequence-tags (EST) to the framework of genome work, the genes of the species can be analyzed more accurately by using these data. Genomics is a research work of collection, processing, and analysis of sequence information by obtaining all the DNA sequences of organisms. The outstanding feature of genomics research is systematical. In order to cultivate new varieties of apricot that humans like and need, it is necessary to know these questions: how apricot evolved, who the ancestor is, what the taxonomic status is, and which species are closely related. Fruit tree varieties often come from bud

63

mutation in nature which leads to new varieties and these new varieties often contain point mutations. How these point mutations can be accurately and conveniently identified by molecular biological means are also a problem to be solved. Some apricot species have high quality, stable yield, and are resistant to cold, drought, barrenness, salt alkali, and pests. The identification, cloning, and regulative mechanism of specific genes for these qualified characteristics can all be obtained through genomics research. Only through genomics research, we can understand the copy number polymorphism (CNP) in the whole genome of apricot, which is copy number and distribution. Different apricot varieties have differences in the number of gene copies. As some varieties have lost a large number of gene copies, while other varieties have extra and extended gene copies, these two types of variants might confer differences in biotic and abiotic resistance. There is no way to figure out this difference without genomics research. The study of the evolution of apricot species, the evolution of gene function, and the regulation of genes can be done through structural genomics, comparative genomics, evolutionary genomics, functional genomics and agrogenomics. Specifically, it is achieved through genome sequence determination and analysis, genome sequence comparison, discovery of new genes, genome expression profiles, and genomelevel research on biological evolution. Genomics methods provide unprecedented research methods and build a good research platform in order to solve the above scientific problems.

4.4.2 Prospects and Implications of Apricot Genomics Apricot is a stone fruit tree. There are more than 200 species in the stone fruit family. According to our sequencing of the apricot genome and comparison to that of peach, plum, mume, and other species belonging to same family, our results show that this family originated from the same ancestor and differentiated into different fruit species (Fig. 4.2). The time for different

64

species of fruit trees to dedicate a variety of fruits to humans is not very long from the perspective of species evolution. It can be predicted that stone fruit species (at least apricot) will further differentiate into new species, which will be dedicated to humans in a few years. At the same time, by comparing the genome sequences of related species, the coding sequences, noncoding regulatory sequences, and unique sequences to apricot species can be identified. Combining sequence comparisons, we can understand the differences in nucleotide composition, collinearity, and gene order of different species, and then obtain molecular genetic information such as gene analysis, prediction and positioning, and evolutionary relationships of biological systems. Our current molecular genetics research based on the Mendelian genetics theory basically regards the base sequence of nucleotides as the content of molecular genetics. Whether in specific gene identification, gene cloning or transgenic research, the DNA on the chromosome is regarded as a straight chain, and its base pair sequence is studied. As a result, many cloned genes have obvious functions during molecular experiment in vitro, but once they are transferred to their own plant, the effects are not obvious. As a result, many cloned genes are only placed in the gene bank and cannot be effectively transformed into plants. Therefore, it is necessary to re-understand the work of gene identification. It can be predicted that the genes on DNA may be not linear chains in organisms. The DNA composed by nucleotides exists in the form of functional groups to form a three-dimensional structure, and there are interactions between functional groups. In addition to the genetic effects of nucleotide base sequences on genes, the interaction of functional groups will also have a greater impact on genetics. That is to say, the nucleotide base sequences and the threedimensional structure of DNA in organisms also have an impact on the genetics of the species. As the saying goes: a structure corresponds to a function. Epigenetics research at the genome level is bound to subvert traditional genomic science. The composition and structure of the

Y. Wang et al.

genome are critical to gene function, and genomics research provides a reliable approach and database for this research.

References Akin EB, Karabulut I, Topcu A (2008) Some compositional properties of main Malatya apricot (Prunus armeniaca L.) varieties. Food Chem 107:939–948 Ali S, Masud T, Abbasi KS (2008) Physico-chemical characteristics of apricot (Prunus armeniaca L.). Scientia Hortic 130:386–392 Ali S, Masud T, Abbasi KS (2011) Physico-chemical characteristics of apricot (Prunus armeniaca L.) grown in northern areas of Pakistan. Sci Hortic 130:386–392 Alquezar B, Rodrigo MJ, Zacarías L (2008) Regulation of carotenoid biosynthesis during fruit maturation in the red-fleshed orange mutant Cara Cara. Phytochemistry 69:1997–2007 Bailey CH, Hough LF (1975) Apricots. In: Janick J, Moore JN (eds) Advances in fruit breeding. Purdue University Press, West Lafayeette Bourguiba H, Scotti I, Sauvage C, Zhebentyayeva T, Ledbetter C, Krška B, Remay A, D’Onofrio C, Iketani H, Christen D, Krichen L, Trifi-Farah N, Liu W, Roch G and Audergon J-M (2020) Genetic structure of a worldwide germplasm collection of Prunus armeniaca L. reveals three major diffusion routes for varieties coming from the species’ center of origin. Front Plant Sci 11:638 Bu J, Ni Z, Aisikaer G, Jiang Z, Khan ZU, Mou W et al (2014) Postharvest ultraviolet-C irradiation suppressed Psy 1 and Lcy-b, expression and altered color phenotype in tomato (Solanum lycopersicum) fruit. Postharvest Biol Technol 89:1–6 Carrari F, Fernie AR (2006) Metabolic regulation underlying tomato fruit development. J Exp Bot 57:1883– 1897 Cervellati C, Paetzb C, Dondini L, Tartarini S, Bassi D, Schneider B, Masia A (2012) A qNMR approach for bitterness phenotyping and QTL identification in an F1 apricot progeny. J Biotech 159:312–319 Chen XY, Zhang ZJ (2011) Study on the relationship between accumulation of b-carotene and expression of genes for b-carotene biosynthesis in the storage roots of sweet potato [Ipomoea batatas (L. Lam)]. Chinese J Tropical Crops 32:1838–1842. (in Chinese) Chen R (1937) Taxonomy of Chinese trees [M]. China association of agricultural science societies, Beijing, 476. (In Chinese) Chow HF (1934) The familiar trees of Hopei. Peking Nat Hist Bull. Handbook. No. 4 Curl AL (1960) The carotenoids of apricots. J Food Sci 25:190–196 D’Ambrosio C, Giorio G, Marino I, Merendino A, Petrozza A, Salfi L, Stigliani AL, Cellini F (2004)

4

The Apricot Genome

Virtually complete conversion of lycopene into bcarotene in fruits of tomato plants transformed with the tomato lycopene b-cyclase (tlcy-b) cDNA. Plant Sci 166(1):207–214 Dicenta F, García JE (1993) Inheritance of the kernel flavour in almond. Heredity 70:308–312 Dou JL, Yuan PL, Zhao SJ, He N, Zhu HJ, Gao L et al (2017) Effect of ploidy level on expression of lycopene biosynthesis genes and accumulation of phytohormones during watermelon (Citrullus lanatus) fruit development and ripening. J Integr Agric 16:1956–1967 Dragovic-Uzelac V, Levaj B, Mrkic V, Bursac D, Boras M (2007) The content of polyphenols and carotenoids in three apricot cultivars depending on stage of maturity and geographical region. Food Chem 102:966–975 Emilie B, Marine B (2006) Euapricotdb: The European Prunus database for apricot genetic resources[M]. INRA Bordeaux: Hélène Christmann Giuliano G, Bartley GE, Scolnik PA (1993) Regulation of carotenoid biosynthesis during tomato development. Plant Cell 5:379–387 Gleadow RM, Haburjak J, Dunn JE, Conn ME, Conn EE (2008) Frequency and distribution of cyanogenic glycosides in Eucalyptus L’Hérit. Phytochemistry 69:1870–1874 Heppner MJ (1923) The factor for bitterness in the sweet almond. Genetics 8:390–392 Heppner MJ (1926) Further evidence of the factor for bitterness in the sweet almond. Genetics 11:605–607 Huang SX, Ding J, Deng DJ, Tang W, Sun HH, Liu D et al (2013) Draft genome of the kiwifruit Actinidia chinensis. Nat Commun 4:2640 Hurtado MA, Romero C, Vilanova S et al (2002) Genetic linkage maps of two apricot cultivars (Prunus armeniaca L.), and mapping of PPV (sharka) resistance. Theor Appl Genet 105:182–191 Hyun-hee L, Jeong-Hyun A, Ae-Ran K, Eun Sook L, JinHwan K, Yu-Hong M (2014) Chemical composition and antimicrobial activity of the essential oil of apricot seed. Phytotherapy Res 28:1867–1872 Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467 Jiang FC, Zhang JH, Wang S, Yang L, Luo YF, Gao SH, et al (2019) The apricot (Prunus armeniaca L.) genome elucidates Rosaceae evolution and betacarotenoid synthesis. Hortic Res 6:128 Kang BS, Zhao WE, Hou YB, Tian P (2010) Expression of carotenogenic genes during the development and ripening of watermelon fruit. Sci Hortic 124:368–375 Kostina KF (1977) Breeding apricot in the southern zone of the USSR. Sadovodstvo 7:24–25 Kostina KF (1969) The use of varietal resources of apricots for breeding. (In Russian). Trudy Nikit Bot Sada 40:45–63 Kostina KF (1972) Introduction and breeding of apricots. (In Russian). Seleshoziaistvennaya Biol 7(1):86–91

65 Kostina KF (1978) Apricot breeding under conditions of the U.S.S.R. south. Acta Hort 85:190–194 Laufer B (1919) Sino-Iranica: Chinese contributions to the history of civilization in ancient Iran. Field Museum of Natural History, Anthropological Series, Chicago Li WW, Liu LQ, Wang YN, Zhang QP, Gan GQ, Zhang SK, Wang YT, Liao K (2020) Genetic diversity, population structure, and relationships of apricot (Prunus) based on restriction site-associated DNA sequencing. Hortic Res 7:69 Li CL, Jiang SY (1998) New combinations in Armeniaca Mill. and Cerasus Juss.(rosaceae). Acta Phytotaxonomica Sinica 36(4):367–372 Lu LD, Bruce B (2003) Flora China 9:396–401 Liu WS, Liu N, Yu XH, Zhang YP, Sun M, Xu M (2010) Apricot germplasm resources and their utilization in china. Acta Hortic 862:45–50 Liu MJ, Zhao J, Cai QL, Liu GC, Wang JR, Zhao ZH et al (2014) The complex jujube genome provides insights into fruit tree biology. Nat Commun 5:5315 McCarty CD, Lesley JW, Frost HB (1952) Bitterness (benzaldehyde content) of kernels of almond-peach F1 hybrids and their parents. Proc Am Soc Hortic Scis 59:254 Mehlenbacher SA, Cociu V, Hough LF (1990) Apricot (Prunus). Acta Hort 290:72–74 Meléndez-Martínez AJ, Fraser PD, Bramley PM (2010) Accumulation of health promoting phytochemicals in wild relatives of tomato and their contribution to in vitro antioxidant activity. Phytochemistry 71:1104– 1114 Morant AV, Jørgensen K, Jørgensen C, Paquette SM, Sánchez-Pérez R, Møller BL, Bak S (2008) bGlucosidases as detonators of plant chemical defense. Phytochemistry 69:1795–1813 Negri P, Bassi D, Magnanini E, Rizzo M, Bartolozzi F (2008) Bitterness inheritance in apricot (P. armeniaca L.) seeds. Tree Genet Genomes 4:767–776 Render A (1940) Manual of cultivated trees and shrubs hardy in North America. MacMillan Co., New York Romero I, Tikunov Y, Bovy A (2011) Virus-induced gene silencing in detached tomatoes and biochemical effects of phytoene desaturase gene silencing. J Plant Physiol 168:1129–1135 Roussos PA, Denaxa N-K, Tsafouros A, Efstathios N, Intidhar B (2016) Chapter 2—apricot (Prunus armeniaca L.). In: Simmonds M, Preedy V (eds) Nutritional composition of fruit cultivars. Academic press, New York, pp 19–48 Ruiz EMV, Soriano JM, Romero C, Zhebentyayeva T, Terol J, Zuriaga E et al (2011) Narrowing down the apricot Plum pox virus resistance locus and comparative analysis with the peach genome syntenic region. Mol Plant Pathol 12(6):535–547 Sánchez-Pérez R, Jørgensen K, Olsen CE, Dicenta F, Møller BL (2008) Bitterness in almonds. Plant Physiol 146:1040–1052 Sánchez-Pérez R, Howad W, Garcia-Mas J, Arús P, Martínez-Gómez P, Dicenta F (2010) Molecular

66 markers for kernel bitterness in almond. Tree Genet Genomes 6(2):237–245 Sánchez-Pérez R, Pavan S, Mazzeo R, Moldovan C, Aiese Cigliano R, Del Cueto J et al (2019) Mutation of a bHLH transcription factor allowed almond domestication. Science 364:1095–1098 Sass-Kiss A, Kiss J, Milotay P, Kerek MM, Toth-Markus M (2005) Differences in anthocyanin and carotenoid content of fruits and vegetables. Food Res Int 38:1023–1029 Shirasawa K, Isuzugawa K, Ikenaga M, Saito Y, Yamamoto T, Hirakawa H, Isobe S (2017) The genome sequence of sweet cherry (Prunus avium) for use in genomics-assisted breeding. DNA Res 24(5):499–508 Shulaev V, Sargent DJ, Crowhurst RN, Mockler TC, Folkerts O, Delcher AL et al (2011) The genome of woodland strawberry (Fragaria vesca). Nat Genet 43:109–116 Simkin AJ, Kuntz M, Moreau H, McCarthy J (2010) Carotenoid profiling and the expression of carotenoid biosynthetic genes in developing coffee grain. Plant Physiol Biochem 48:434–442 Soriano JM, Domingo L, Zuriaga E et al (2012) Identification of simple sequence repeat markers tightly linked to plum pox virus resistance in apricot. Mol Breeding 30:1017–1026 Stoewsand GS, Anderson JL, Lamb RC (1975) Cyanide content of apricot kernels. J Food Sci 40:1107 Tong PY (1983) History of Hongxing. In: History of fruit trees. Agricultural Publishing House, Beijing. (In Chinese) Vardi N, Parlakpinar H, Ozturk F, Ates B, Gul M, Cetin A, Erdogan A, Otlu A (2008) Potent protective effect of apricot and b-carotene on methotrexateinduced intestinal oxidative damage in rats. Food Chem Toxicol 46:3015–3022 Vardi N, Parlakpinar H, Ates B, Cetin A, Otlu A (2013) The protective effects of Prunus armeniaca L (apricot) against methotrexate-induced oxidative damage and apoptosis in rat kidney. J Physiol Biochem 69:371– 381 Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, et al (2010) The genome of the domesticated apple (Malus  domestica Borkh.). Nat Genet 42:833–839 Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, et al (2013) The international peach genome initiative, The high-quality draft genome of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nature Genet 45:487–494 Wang YZ (2001) Effects of different harvesting periods on yield and quality of apricot. J Fruit Sci 18(1):57–59 (In Chinese) Wang YZ (2016) A survey report on apricot and plum industry in China [M]. China Agriculture Press, Beijing (In Chinese) Wang YZ, Yun HY, Yang L (2003) Current status suggestion for developing apricot production in China. Rev China Agric Sci Technol 5(2):24–27 (In Chinese)

Y. Wang et al. Wang YZ, Hu N, Liu QZ, Yang L, Shi H (1999) Study on the vitamin content in different flesh color varieties of apricot. J Fruit Sci 16(1):51–54. (In Chinese) Wang YZ, Sun YY, Yang L (2004) The genetic relationship of apricot and the taxonomic status of apricot were studied by quantitative taxonomy. In: Progress in research and utilization of Lixing resources (III). Beijing: China Agriculture Press, pp 178–184. (In Chinese) Wang YZ (1994) Research on economic characters of apricot for kernel use in China. Beijing Agricultural Science. Forest and Fruit special issue. (In Chinese) Werner DJ, Creller MA (1997) Genetic studies in peach: inheritance of sweet kernel and male sterility. J Am Soc Hortic Sci 122(2):215–217 Wisutiamonkul A, Ampomah-Dwamenab C, Allan AC, Ketsa S (2017) Carotenoid accumulation and gene expression during durian (Durio zibethinus) fruit growth and ripening. Sci Hortic 220:233–242 Wu SS, Sun W, Xu ZC, Zhai JW, Li XP, Li CR et al (2020) The genome sequence of star fruit (Averrhoa carambola). Hortic Res 7:95 Wu J, Wang ZW, Shi ZB, Zhang S, Ming R, Zhu SL, et al (2013) The genome of pear (Pyrus bretschneideri Rehd.). Genome Res 23:396–408 Wu GM (1984) Apricot. In: Taxonomy of temperate fruit trees in China. Agricultural Publishing House, Beijing. (In Chinese) Xu J, Tao NG, Liu Q, Deng XX (2006) Presence of diverse ratios of lycopene/b-carotene in five pink or red-fleshed citrus cultivars. Sci Hortic 108:181–184 Xu Q, Chen LL, Ruan XA, Chen DJ, Zhu AD, Chen CL et al (2013) The draft genome of sweet orange (Citrus sinensis). Nat Genet 45:59–66 Yu DJ (1979) Taxonomy of Chinese fruit trees [M]. Agricultural Publishing House, Beijing (In Chinese) Zagrobelny M, Møller BL (2011) Cyanogenic glucosides in the biological warfare between plants and insects: the Burnet moth-Birdsfoot trefoil model system. Phytochemistry 72:1585–1592 Zhang YZ (1983) A brief introduction of ancient crops unearthed in Xinjiang. Agr Archaeol 1:124 (In Chinese) Zhang JY, Lu MN, Wang ZM (1999) Two new species of the genus Armeniaca (Rosaceae). Acta Phytotaxonomica Sinica. 37(1):105–109 Zhang QX, Chen WB, Sun LD, Zhao FY, Huang BQ, Yang WR et al (2012) The genome of Prunus mume. Nat Commun 3:1318 Zhang JY, Zhang Z (2003) Annals of fruit trees in China: apricot [M]. Beijing: China Forestry Press, pp 18–26. (In Chinese) Zhang JH, Sun HY, Yang L, Jiang FC, Zhang ML, Wang YZ (2019) Construction of a high-density linkage map and QTL analysis for pistil abortion in apricot (Prunus armeniaca L.). Can J Plant Sci 99:599–610 Zhou WQ, Niu YY, Ding X, Zhao SR, Li YL, Fan GQ, Zhang SK, Liao K (2020) Analysis of carotenoid content and diversity in apricots (Prunus armeniaca L.) grown in China. Food Chem

4

The Apricot Genome

Zuriaga E, Soriano JM, Zhebentyayeva T, Romero C, Dardick C, Cañizares J, Badenes ML (2013) Genomic analysis reveals MATH gene(s) as candidate(s) for Plum pox virus (PPV) resistance in apricot (Prunus armeniaca L.). Mol Plant Pathol 14(7):663–677

67 Zuriaga E, Romero C, Blanca JM, Badenes ML (2018) Resistance to Plum Pox Virus (PPV) in apricot (Prunus armeniaca L.) is associated with downregulation of two MATHd genes. BMC Plant Biol 18:25

5

Chinese Jujube: Crop Background and Genome Sequencing Meng Yang, Mengjun Liu, and Jin Zhao

Abstract

Chinese jujube (Ziziphus jujuba Mill.), also called the Chinese date, is a member of the Rhamnaceae family.

5.1

Crop Background

5.1.1 Introduction Chinese jujube (Ziziphus jujuba Mill.), also called the Chinese date, is a member of the Rhamnaceae family. It is native to China and is valued for its nutritive substances as well as the high potential medicinal value (Liu 2004a). The origin center of jujube cultivation is in the middle and lower reaches of the Yellow River in China, and the earliest record of jujube cultivation can be dated back to 7000 years ago, the Neolithic age (Liu et al. 2015; Liu 2006; Liu and Wang 2019). About 2,000 years ago, jujube was taken into Japan and Korea and then to middle Asia and Europe through the ancient Silk Road, and

M. Yang (&)  M. Liu Research Center of Chinese Jujube, Hebei Agricultural University, Baoding, Hebei 071001, P. R. China e-mail: [email protected] J. Zhao College of Life Science, Hebei Agricultural University, Baoding, Hebei 071000, P. R. China

now spread into at least 47 countries (Table 5.1). The direct ancestor of Chinese jujube is sour jujube or wild jujube (Z. acidojujuba Liu et Cheng). The leaf fossil record of sour jujube goes back 12–14 million years (Qu et al. 1987). Chinese jujube is widely planted in China but is mainly cultivated in six provinces, i.e., Hebei, Shandong, Henan, Shanxi, Shaanxi, and Xinjiang, with the area and the annual yield of around 2 million hectares and more than 8 million tons, respectively. It is the main source of income for about 20 million farmers in China; meanwhile, over one billion people in Asia have used jujube as a herbal medicine (Liu 2008). Jujube is now becoming more and more important due to its high tolerance to drought and salty and barren soil. These advantages brought jujube a great value to people, and it is deserved for jujube to get more attention in the future.

5.2

Botanical Description

5.2.1 Taxonomy Ziziphus was first identified to be an independent genus in 1754 by Phillip Miller (Liu and Cheng 1995). In Ziziphus, there are many famous medical or fruit species including Chinese jujube, wild jujube, and Indian jujube. After comparison of morphology, growth habit, and geographical distribution, Ziziphus was subdivided into two sections (Liu and Cheng 1995): Ziziphus M.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_5

69

70

M. Yang et al.

Table 5.1 Worldwide distribution of Chinese jujube Continent

Country

Asia

Afghanistan, Armenia, Azerbaijan, Bengal, Burma, China, Cyprus, India, Iraq, Iran, Israel, Japan, Kyrgyzstan, Lebanon, Malaysia, Mongolia, Pakistan, Palestine, South Korea, Syria, Thailand, Turkey, Turkmenistan, Uzbekistan

Europe

Bulgaria, Croatia, England, France, German, Greece, Italy, Macedonia, Portugal, Romania, Spain, Russia, Yugoslavia, Slovenia, Spain, Ukraine

Africa

Egypt, Tanzania, Tunisia

North America

Canada, USA

Oceania

Australia, New Zealand

J. Liu et C. Y. Cheng, Perdurans M. J. Liu et C. Y. Cheng. Sec. Ziziphus has deciduous fruiting shoots and is mainly distributed in temperate zones (Liu 2006). For genus Ziziphus Mill, the species number varies in different records: Don (1832), Rendle (1952), Evreinoff (1964), and Johnston (1972) identified 39, 40, 80, and 86 species, respectively. Bhansali (1975) reported as many as 135 species, Chen and Chou (1982) suggested about 100 species. and Liu and Cheng (1995) defined the number to be about 170. Pareek (2001) developed a pilot study of the detailed classification, based on the structure of Liu and Cheng’s subdivision system, which is reviewed by Liu (2006). Z. jujuba Mill and Z. mauritiana Lam are the two most important cultivated species in genus Ziziphus. The former refers to the Chinese jujube, which is deciduous and mainly distributed in China, Iran, and South Korea. And the latter, also called ber or Indian jujube, is mainly distributed in tropical regions like India, Bangladesh, Pakistan, and southern China. The fruit of the former is usually smaller than the latter but is much sweeter.

5.2.2 Geographic Distribution Chinese jujube is indigenous to China and is now spread across nearly 50 countries (Table 5.1). It grows from 34° South to 51° North latitude and up to 2800 m in altitude. In China, jujube is

cultivated mainly in the middle and lower parts of the Yellow River valley (Fig. 5.1).

5.2.3 Morphology Chinese jujube has two types of roots: seedling roots and stem-originated roots. A seeding root has a more developed vertical root than horizontal root, while a stem-originated system has more horizontal roots. Jujube contains three types of buds: primary, secondary, and dormant buds; and four kinds of shoots: primary, secondary, motherbearing, and fruit-bearing shoot (Fig. 5.2). The primary shoots usually grow more than 50 cm each year, whereas the secondary shoot dies back each year. The mother-bearing shoot is very condensed and only grows 1 mm a year. Fruitbearing shoot falls off in the autumn and is usually shorter than 20 cm. Except for the motherbearing shoot, all the three others are zig-zagged and spiny (Liu 2006). Jujube’s flowers are not big and showy, but fragment and light greenishyellow in color, and 5–6 mm in diameter. Jujube’s inflorescence is a cyme, which contains up to 13 flowers in different situations (Fig. 5.2). The fruit forms from two parts of the flower, the disk and the ovary. There are a variety of shapes of fruits, such as round, oval, red pepper like, apple like, ovate, and oblate. Jujube fruit usually contains either no seed or shriveled kernels, but in some conditions, one stone contains one or two seeds (Fig. 5.2) (Liu et al. 2020a).

5

Chinese Jujube: Crop Background and Genome Sequencing

Fig. 5.1 Cultivation areas and major production areas of jujube in China (from Chen et al. 2017 published under a creative commons attribution license)

Fig. 5.2 Morphology of jujube shoot, inflorescence, and fruit. 0 Perennial shoot system in the dormant season, 1 Primary shoot, 2 Secondary shoots, 3 Mother-bearing shoots, 4 Fruit-bearing shoots, 5 Inflorescence, 6 Young fruit, 7 bisection profile of mature fruit, 8 Stone and kernel (from Liu et al. 2020a published under a creative commons attribution license)

71

72

5.3

M. Yang et al.

Nutrient, Utilization, and Propagation

Chinese jujube has been a traditional Chinese herb for thousands of years. Many studies have shown that its fruit is rich in nutrients including soluble sugars, vitamin C, cAMP, vitamin B, triterpenoid acid, polysaccharide, flavonoids, iron, proline, potassium, zinc, and calcium. Jujube leaves are rich in leucine, vitamin B6, betulinic acid, and ursolic acid; the flowers gather vitamin B1 and leucine (Zhao et al. 2008; Zhao et al. 2012b; Li et al. 2012; Su et al. 2019; Zhao et al.; Gao et al. 2012; Zhao et al. 2012a). During fruit maturation, sugar content is increasing and vitamin C becoming decreasing. Traditionally, jujube can be processed into many different products, such as the dry date, also the dominant products both for domestic and export in China, candied jujube, smoked jujube; jujube jam, roasted jujube, and liquor-saturated jujube. In the past 30 years, new products in new styles have emerged, like jujube juice, jujube slices, jujube powder, jujube tea, jujube beer, jujube essence, and jujube pigment (Jinfeng et al. 2010). In recent years, more condensed products in jujube nutrients have been developed, including the cyclic nucleotides, dietary fiber, jujube oil, high jujube vitamin C juice, instant jujube powder, and so on (Liu 2004b). Upon its various nutrients, jujube is becoming a super fruit for the future (Liu et al. 2020a). Although grafting is traditionally used in jujube propagation, until the 1960s, the popular way of jujube propagation is root suckers. After the 1980s, jujube grafting on sour jujube became more and more popular as more new cultivars were planted; in particular, using sour jujube seeds to obtain the rootstocks makes the seedlings grow faster. Meanwhile, using green shoot cuttings, i.e., secondary shoot and bearing shoot, can produce a high reproductive coefficient. For tissue culture, it has been successful in stem segments, leaves, anthers, embryos, and cotyledons after 1995 (Hao et al. 2013; Qi and Liu 2004; Li et al. 2004; Wang et al. 2002); however, it has not been applied in jujube propagation on a

large scale owing to the high cost, high technical requirements, and late fruiting of micropropagated plants.

5.4

Research Challenges and Opportunities

In the recent 30 years, great achievements have been made in jujube research. Except for the improvement of the traditional research area in horticulture, such as high-efficiency breeding, pest management or disease control, postharvest nutrition preservation, and so on, the multiomics research based on sequencing data emerged in recent years will be a good choice in deciphering the candidate genotype, evolution mechanism, and population genetics of jujube. These will, in turn, provide solid information in helping molecular breeding, high-efficiency cultivation, and high-nutrition processing.

5.5

Genome Sequencing

The sequencing technology revolution has brought great advances in whole-genome sequencing of various organisms to unravel the complete genetic code. Until now, more than 200 land plants have been sequenced and assembled, especially for the agricultural and horticulturalrelated species, whose genome sequences can be used to assist or improve breeding and cultivation. In domesticated horticultural species, the major fruit trees, including orange, apple, kiwifruit, pear, peach, jujube, and so on, have been sequenced by either one or several sequencing strategies (Zhang et al. 2019b; Wu et al. 2019; Daccord et al. 2017; Xu et al. 2013; Huang et al. 2013; International Peach Genome et al. 2013; Chagne et al. 2014; Huang et al. 2016; Liu et al. 2014). In most studies, based on the assembled genome sequences, analyses included comparative genomics, molecular evolution, lineagespecific adaptations, traditional experiments like QTL, genetic map, and RT-PCR, et al. related verification for functional genomics and

5

Chinese Jujube: Crop Background and Genome Sequencing

population genomics popular in recent years (Liang et al. 2019; Chen et al. 2019a). The ‘omics’ era, started in the 1990s, developed at the beginning of twenty-first century and now is entering a new stage based on multiomics. The first Chinese jujube genome, using a widely cultivated cultivar called ‘Dongzao’, was published in 2014 through next-generation sequencing technology, and it is also the first sequenced species in the Rhamnaceae family (Liu et al. 2014). Later, in 2016, a dry cultivar named ‘Junzao’ was sequenced and assembled, accompanied by resequencing for 31 jujube cultivars geographically different (Huang et al. 2016). Based on these two jujube genome sequences, multiomics-based researches combining transcriptome, comparative genome, and metabolome have been carried out for some functional-related regions or pathways, and meanwhile, gene families in cold stress, fruit development, flower development, and phytoplasma have been identified (Liu et al. 2020c, 2017; Zhang et al. 2019c, 2017; Xue et al. 2019; Qing et al. 2019; Li et al. 2019; Hou et al. 2019; Chen et al. 2019b; Zhang and Li 2018; Song et al. 2017; Guo et al. 2017). The following contents will present a detailed description of the omics-related studies in jujube.

5.6

Strategy for Jujube Genome Assembly and Annotation

In the study of ‘Dongzao’ genome, due to the high heterozygosity (1.90%, almost twice the heterozygosity of pear, another fruit tree), which is the highest among species sequenced by NGS method at the time the research was published in 2014 (Chagne et al. 2014), a mixed strategy, combining whole-genome sequencing (WGS), BAC-to BAC, and WGS-PCR-free libraries were adopted. The WGS method used paired-end libraries (170–800 bp) and mate-pair libraries (2, 5, 10, 20, and 40 kb) with the short reads approaches in next-generation sequence technology. The raw assembly was generated by these 100–150 bp short reads and scaffolding using paired-end and mate-pair libraries,

73

respectively. Then, a total of 21,504 BAC clones were constructed and assembled individually, which helped with the issue of heterozygosity. The assemblies resulting from these two approaches were next combined based on the overlapping information and again assembled to scaffolds with the assistance of paired-end and large insert-size data. To solve the problem of missing low GC content regions during wholegenome assembly caused by the emulsion PCR, the researchers construct PCR-free libraries to compensate these potentially missed complex regions. The combination of these three methods produced a preliminary assembly of ‘Dongzao’ genome. To anchor the scaffolds into chromosomes, the researchers used RAD-seq to generate paired-end RAD-reads from a population of 105 F1 generated by crossing Z. jujuba ‘JMS2’ and Z. acidojujuba ‘Xing 16’. A genetic map was generated based on the RAD single-nucleotide polymorphisms (SNPs), and then, by mapping the RAD-reads to the jujube genome, scaffolds were grouped into a final assembly with 12 chromosomes. In the study of the ‘Junzao’ genome, considered to be a highly heterozygous jujube (Huang et al. 2016), the researchers first identified the heterozygous status using a combination of the distribution of 19-mer fragments generated from raw sequencing paired-end reads and assembled the genome based on the 37-mer fragments from raw reads. Then, they solved the dual paths, also called bubbles, which are caused by the heterozygous contigs, by mapping the identified 19-mer heterozygous fragments to the assembled genome. This process of operation eliminates the most heterozygous regions and improves the quality of genome assembly. Finally, scaffolds were constructed by linking paired-end and mate-pair reads to the contigs, and the easy-to-fill gaps were solved by using the short paired-end reads. The final version with explicit chromosomes was created by a genetic map, the same approach as in ‘Dongzao’. Genome annotation is interpreting the whole genome into the known partitions, including the identification of repetitive sequences, coding genes, and non-coding genes. Similar annotation

74

M. Yang et al.

methods were used for both ‘Dongzao’ and ‘Junzao’ genomes. For repetitive sequence annotation, the software Repeatmasker (http:// www.repeatmasker.org/) was employed by using a combined repeat elements libraries from Repbase (https://www.girinst.org/repbase/), LTRfinder (Xu and Wang 2007), RepeatScout (Price et al. 2005), Piler (Edgar and Myers 2005), and so on. For gene annotation, three pieces of evidence, a de novo method based on hidden Markov model, a homology-based protein alignment, and transcriptome-dependent identification, were used. Each evidence would generate a gene set, and the final gene sets were generated by a combination of results from the above three methods, followed by the auto- and manualcurations according to the information of transcripts from the untranslated regions and the possible splicing isoforms. Non-coding elements identification is from the homology search against the homologous database, such as Rfam (Kalvari et al. 2018).

5.7

Features of Jujube Genome

The two sequenced jujube genomes, ‘Dongzao’ and ‘Junzao’, representing the fresh and dry jujube, were assembled by two different research groups. The ‘Dongzao’ genome is 437.65 Mb in assembly size, with 28,930 contigs (N50 = 33.95 kb) and 5,898 scaffolds (N50 = 301.04 kb). The size of the final assembly of ‘Junzao’ genome is 351 Mb, with the N50 of contigs and scaffolds to be 34 kb and 755 kb, respectively. In the ‘Dongzao’ genome, the genetic map spanned 974.01 cM and 935.40 cM for the female and male parents, respectively (Liu et al. 2014). The final map, including 2419 markers, is formed of 12 linkage groups, consistent with the number of chromosomes of jujube. Through mapping markers to the assembled scaffolds, 1120 scaffolds were linked to the 12 chromosomes, which comprised 73.56% of the jujube genome assembly (Fig. 5.3). In the ‘Junzao’ genome, 600 assembled scaffolds, accounting for

83.6% (293 Mb) of the assembled ‘Junzao’ genome, were anchored into 12 chromosomes by two high-density genetic linkage maps (Huang et al. 2016). The size of both assembled genomes (based on scaffolds) makes up more than 98% of the estimated genome size, but the ‘Dongzao’ genome is more than 80 Mb larger than the ‘Junzao’ genome. This 80 Mb of sequence in ‘Dongzao’ comprises nearly 69 Mb transposable elements (TEs), which is one of the main reasons contributing the larger genome size in ‘Dongzao’. In gene sets, ‘Dongzao’ has an annotation of a total of 32,808 genes, 78.26% of which contains two or more exons and 89.80% has RNA-seq data support. ‘Junzao’ has a set of 27,443 genes, with the average coding sequences length and the average number of exons per gene of 1136 bp and 4.83, respectively. Through mapping the raw reads to the assembled genome, SNP number and rate were calculated, and a total of 4.77 and 2.1 million SNPs were distributed in the ‘Dongzao’ and ‘Junzao’ genome, respectively, accounting for 1.10% and 0.72% of the assembled genome size. This result can be attributed to the higher heterozygosity in ‘Dongzao’ than in ‘Junzao’. Presence/absence variation (PAV) analyses between the two genomes found 7.8 Mb of fragments, containing 354 genes, and 14.2 Mb of fragments, containing 432 genes, in ‘Dongzao’-specific and ‘Junzao’-specific regions, respectively. Gene ontology (GO) enrichment identified DNA recombination and DNA integration-related genes in ‘Dongzao’ and cell wall modification related in ‘Junzao’ (Huang et al. 2016), possibly indicating functional differences between the two cultivars.

5.8

Comparison to Other Crops in Evolution

The routine approach for comparative genomes from the closely related species is to make use of the single-gene families. For each species, gene families containing only one gene are filtered, and in different species, the solo genes in the orthologous families constitute the orthologs of

5

Chinese Jujube: Crop Background and Genome Sequencing

75

Fig. 5.3 Genomic landscape of the 12 jujube pseudo-chromosomes of ‘Dongzao’ (from Liu et al. 2014 published under a creative commons attribution license)

that family. The orthologous single-copy genes from different species are aligned together, and then, all parallel alignments from different singlegene families are concatenated together as a merged alignment, which is used to construct a phylogenetic tree. These merged alignments can be considered the alignments representing the relationship of different species. In the genomic data analyses of ‘Dongzao’, one outgroup Arabidopsis thaliana and seven other closely related species, including Morus alba from the Moraceae, Cannabis sativa from

the Cannabaceae, and Pyrus bretschneideri, Prunus mume, Fragaria vesca, Malus domestica, and Prunus persica from the Rosaceae, were employed to perform comparative genome analyses. The single-gene families were used as orthologs among the nine species, and the phylogenetic tree was built based on the multiple alignments of these genes. Divergence time analyses according to the phylogenetic tree revealed that in all sequenced Rosales, the jujube is the earliest diverged species, indicating that Rhamnaceae has evolved an agelong history, as

76

the literature reported, to date back to the Campanian (Bishop et al. 2000; Richardson et al. 2000). Using a fourfold degenerate site (4dtv) and Ks method, no recent whole-genome duplication events in the jujube genome were found. Collinear block comparison between jujube and other selected species revealed that compared to other species, the jujube genome has high synteny with strawberry and peach. Among the 13,843 jujube gene families, 1043 are jujube specific (Liu et al. 2014). Further, calculation found that jujube and peach share the largest number of gene clusters compared with those of other species, which suggest a closer relationship between jujube and peach than jujube and others. Positive-selection analyses discover that in all 2791 single-copy genes, one-tenth (254) underwent positive selection. These genes with positive selection are mostly involved in energy metabolism, sugar-related pathway, vitamin C metabolism, and secondary metabolism, which are consistent with the jujube’s special physiological characteristics (Liu et al. 2014). In the study of the ‘Junzao’ genome, detailed chromosome evolution analyses were performed by using construction of ancient chromosome based on the most common ancestor with Ziziphus jujuba ‘Junzao’ and six other species including Vitis vinifera, Populus trichocarpa, Theobroma cacao, Arabidopsis thaliana, Prunus persica, and Malus domestica. For these seven species, after the early paleohexaploid event from the common eudicot ancestor, V. vinifera, T. cacao, P. persica, and Z. jujuba did not undergo a whole-genome duplication (WGD), whereas P. trichocarpa and M. domestica have one WGD (Velasco et al. 2010). Through the corresponding relationships of syntenic chromosome blocks among jujube, apple, and peach, there are at least three conservative large blocks that did not undergo any fissions, fusion, and rearrangements (Huang et al. 2016). This comparative genomic analyses of ‘Dongzao’ and ‘Junzao’ indicate a much more stable jujube genome, with no more whole-genome duplications, less chromosome rearrangements, fusions, or fusions, compared with the closely related species, such as peach and apple.

M. Yang et al.

5.9

Candidate Genes for Agronomic Traits

Analyses of candidate genes based on the ‘Dongzao’ genome found a series of genes or pathways that play important roles in the advantage traits of jujube, such as high vitamin C, high accumulated sugar, self-shoot-pruning, and so on; the following parts: 2.4.1–2.4.4 we will describe these advantages in the study of the ‘Dongzao’ genome (Liu et al. 2014).

5.9.1 Genes Related to Vitamin C Accumulation in Fruit An important botanical or horticulture characteristic in jujube is the accumulation of vitamin C (ascorbic acid, AsA) in fruit. Analyses of genes related to known AsA biosynthesis genes found that most of these involve the L-galactose pathway, and a few genes are related to the myoinositol pathway, another vitamin C pathway. This suggests the major pathway in vitamin C biosynthesis is via the L-galactose pathway with minor use of the myo-inositol pathway. Through transcriptome sequencing from different fruit ripening stages, including young fruit, white mature fruit, half-red fruit, and full red fruit, continuous high expression values were observed in two genes encoding two key enzymes in the L-galactose biosynthesis pathway: GDP-Dmannose 3,5-epimerase and GDP-L-galactose phosphorylase. Another gene, MDHAR, monodehydroascorbate reductase, a key enzyme in the AsA recycling pathway, is also expressed at a high level, indicating a contribution to the AsA regeneration system; MDHAR also has undergone an expansion with more copies in jujubes than in other closely related species.

5.9.2 Genes Related to Sugar Accumulation in Fruit The second feature of jujube is high sugar accumulation in the fruit. Four stages during fruit

5

Chinese Jujube: Crop Background and Genome Sequencing

development, including young, white mature, half-red, and full red, were investigated by using genes from the jujube genome and the RNA-seq data. In the young stage, the main components of the sugars are fructose and glucose, while during the process of fruit maturation, both sucrose and total sugar content increase, and sucrose becomes dominant. Extracting the starch and sucrose metabolism-related genes found a potential expansion in some families compared to the other sequenced Rosales species. The accumulation of sugar from white mature to full red is a consequence of transportation of sugar from the phloem to the fruit, and genes related to this metabolism pathway are most highly expressed during the fruit ripening.

5.9.3 Self-shoot-pruning Trait Related Genes As described above, jujube has four kinds of shoots: primary, secondary, mother bearing, and bearing, which form a self-shoot-pruning system (Fig. 5.2). Combined with the genome sequences and RNA-seq data, the smallest number of differentially expressed genes was between primary and secondary shoots, and these gene functions are related to secondary metabolism, and some genes, such as in lipid, polyamine, and arginine pathways, have much higher expression than in bearing shoot. In motherbearing shoot, some genes, for example, PYR/PYL abscisic acid (ABA) receptors, and ABA synthesis-related genes, are extremely highly expressed compared with other types of shoots, while genes in chlorophyll and porphyrin were repressed, which correlated with the slow growth and a decline of photosynthesis characteristic of mother-bearing shoots. In contrast, genes involve photosynthesis, for example in carbohydrate metabolism and members of the light-harvesting chlorophyll-binding gene family, are highly expressed only in bearing shoot but not in the other three shoots, while genes related to cytokinins and brassinosteroids are

77

highly downregulated. The expression patterns of the four shoots reflect the physiological characteristics of jujube. As is known, bearing shoots are typically deciduous but can become lignified to be persistent in a specific condition. Gene expression comparison between deciduous and lignified bearing shoots reveals different expression patterns, with plant hormone signal transduction (cyclin D3/Small auxin-up RNA/two-component response regulator ARRA family/brassinosteroids/cytokinins) and genes related to ABA and ethylene (ABA-responsive element binding factor/serine/threonine-protein kinase SRK2/ethylene-responsive transcription factor 1) highly expressed in lignified and deciduous bearing shoots, respectively. Greater levels of jasmonic acid in lignified bearing shoots than in deciduous bearing shoots suggest weaker stress resistance for the latter.

5.9.4 Abiotic/Biotic Stress-Related Genes Although the response to abiotic/biotic stress in jujube is not widely studied, based on genomic research, many genes were annotated as ‘response to stress’ in the gene ontology (GO) annotation. Jujube is well adapted to drought-prone environments, and genes in GO terms related to arginine metabolism, which play key roles in plant’s stress perception and adaption, are enriched in jujube. Genes highly expressed in osmotic stress may be involved in drought and salt tolerance of jujube. In the jujube genome, 13 genes encode homologs of autophagy-related protein, far more than in the six other Rosales species described in Sect. 5.2.3 which have only 1–2 of these genes). Autophagy is related to immunity and may play crucial roles in jujube’s defense system. Other resistance-related genes are R genes, which are functional in biotic stress response. Genome scanning for R gene identified 849 R genes in the jujube genome, which are widely distributed in the jujube genome, especially on chromosome 9, which contains 16%

78

M. Yang et al.

(140) of the R genes. These R genes need to be further studied.

5.9.7 Gene Family-Related Research Based on the ‘Dongzao’ Genome

5.9.5 Genes Related to Flower Development

Some gene families with functions of interest have been identified and clustered in recent years following the whole-genome sequencing of the jujube genome (Li et al. 2019; Xue et al. 2019; Qing et al. 2019; Liu et al. 2017, 2020c; Zhang and Li 2018; Wang et al. 2020; Chen et al. 2019b; Zhang et al. 2017; Hou et al. 2019; Song et al. 2017). One of the universal signal transduction modules, mitogen-activated protein kinase (MAPK) cascades, widely exists in plants and is involved in various biotic and abiotic stresses. In the jujube genome, researchers identified ten, five, and 56 jujube MAPK cascades gene of MAPK, MAPKK, and MAPKKK, respectively (Liu et al. 2017, 2020c). ZjMAPK and ZjMAPKK genes are widely expressed in various tissues/organs with different expression levels, among which the ZjMAPKK5 cascade might regulate the reproductive organ development in Chinese jujube (Liu et al. 2017). Certain ZjMAPKKKs may involve in the plant response to phytoplasma infection through qRT-PCR results (Liu et al. 2020c). Phytoplasma has been studied to associate with many plant diseases, and in jujube, witches’ broom disease (JWB) is a typical phytoplasma disease. Expression analysis of ten ZjMAPKs and four ZjMAPKKs under phytoplasma infection of jujube found three genes: ZjMAPK2, ZjMAPKK2, ZjMAPKK4, were significantly differentially expressed, whereas others were not. STRING database and yeast two-hybrid screening proved ZjMAPK2 and ZjMAPKK2 are involved in the same plant-pathogen interaction pathway and could interact with each other (Liu et al. 2019). Further, ZjWRKY and ZjbHLH genes also expressed a positive response to phytoplasma invasion, which indicates the possible relation to jujube witches’ broom disease (Xue et al. 2019; Li et al. 2019). The SQUAMOSA promoter binding protein (SBP), a family of transcription factors, was found to have 16 members in the ‘Dongzao’

Recent research published by Meng et al. represents the first comprehensive report of the flowering pathways in Chinese jujube (Meng et al. 2020). They identified 44 genes related to flowering based on the ‘Dongzao’ genome. Through a combination of tissue specific and temporal expression, photoperiod-related genes were found to be necessary for jujube flower bud differentiation. Meanwhile, a high temperaturerelated gene in jujube, ZjPIF4, orthologous to the Arabidopsis gene PIF4, is highly expressed in the early stage of mother-bearing shoot, which is consistent in experiments in both open field and greenhouse. The study proves that photoperiodrelated and ambient temperature-related pathways regulating the different flowering process, and genes, ZjPIF4, ZjFT, and ZjCO5, the members of ZjPHY family, are key genes in the regulatory network (Meng et al. 2020).

5.9.6 S-Locus Genes in Jujube One of the jujube’s characteristics is the kernel abortion to produce few seeds, which is caused by the self-incompatibility or crossincompatibility, and this has a negative influence in jujube’s breeding. Gametophytic selfincompatibility (GSI) has been reported to be controlled by the S-locus gene in jujube (Asatryan and Tel-Zur 2013). Genome scanning in ‘Junzao’ genome found one S-RNase gene and two S-like RNases; the former expressed only in flower, while the latter two expressed in all tissues. Genotyping in 31 resequenced individuals of both wild and cultivated jujubes found that SRNase type is the main haplotype in these resequenced samples, with 10 and 14 accessions in S-RNase homozygous and S-RNase heterozygous, respectively (Huang et al. 2016).

5

Chinese Jujube: Crop Background and Genome Sequencing

genome. Different members exhibit different expression patterns in leaves and flowers, with some members expressed higher in leaves, while some others are higher in flowers, suggesting the different roles of these family members in leaves and flowers (Song et al. 2017). Another transcription factor family, MADS-box, has 52 members in the jujube genome, distributed throughout all 12 chromosomes, and classified into 25 MIKCC-type, 3 MIKC*-type, 16 Ma, 5 Mb, and 3 Mc genes. The MIKC-types display different temporal and spatial expression patterns, and some are repressed in phyllody compared with normal development of flowers, providing valuable information for the study of flower development (Zhang et al. 2017). Analyses of 119 AP2/ERF family members using transcriptome sequencing identified 85 genes expressed in flower, fruit, and leaves, suggesting a broad regulation spectrum. In-depth study of the differential expression found ZjERF54/ DREB39 and ZjERF25/ZjERF36 positively and negatively regulating jujube’s fruit ripening, indicating the way in regulating the fruit development of this family (Zhang and Li 2018). Related to the regulation of fruit ripening, other gene families were also identified and analyzed, such as WRKY transcription factors (Chen et al. 2019b) and the bHLH gene family (Li et al. 2019). In the former, analysis of the ‘Junzao’ genome found a total of 39 ZjWRKY members and comparison of differential expression between jujube and wild jujube found dynamic differences during fruit development and ripening. The latter, based on the expression comparison of 92 ZjbHLH members, found some genes related to fruit development, especially in the early development stage. In addition to these gene families involved in flower and fruit development and response to phytoplasma invasion, there are also gene families related to abiotic/biotic response. Cyclic nucleotide-gated channels (CNGCs), another gene family, containing 15 copies in the jujube genome, were speculated to respond to environmental changes through cis-acting regulatory elements prediction and expression by real-time quantitative PCR results. Two genes, ZjCNGC2

79

and ZjCNGC4, were significantly repressed by cold stress and induced by cold, salt, and alkaline stresses, respectively. Moreover, ZjCNGC2 is regulated by microtubule changes, cAMP treatment, and interact with ZjMAPKK4, suggesting the possible roles of cAMP and microtubule in ZjCNGC2-mediated ZjMAPKK4 signaling transduction under cold stress (Wang et al. 2020). The MYB transcription factor superfamily regulates a variety of physiological processes in plants. There are 171 genes in the MYB superfamily of jujube; a synteny analysis between jujube and Arabidopsis suggests that jujube MYBs participate in a variety of processes, including the light signaling pathway, flavonoid/phenylpropanoid metabolism, responses to various abiotic stresses (cold, drought, and salt stresses), and auxin signal transduction (Qing et al. 2019). To conclude, whole-genome sequencing of jujube has opened the gate to deeply investigate candidate genes related to important biology traits, such as high vitamin C content, the sugar and acid changes, and the unique self-shootpruning system characteristic of jujube and common pathways such as flower or fruit development and abiotic/biotic response. Although most studies above did not give solid confirmation about the specific genes and their specific functions, multiomics combinations, such as genome, transcriptome, and proteome, still provide strong evidence to further studies in jujube functional genomics.

5.10

Resequencing in Jujube Research

Sequencing a species elucidates its genome features in genome size, genome composition, genome evolution, key gene families related to the important characteristics of the sequenced species, and so on. However, individual sequencing is only a start in studying the characteristics of the species. Resequencing, which is sequencing more individuals from the same species, is a step forward to study the genomics of the species. Resequencing is usually combined

80

with population genetics resolving population structure, positive selection, genetic mutations, and so on, typically with the goal of finding valuable economic or medical traits of the studied species at the population scale. Up to now, two articles have reported the resequencing of the jujube genome: one is the ‘Junzao’ genome project, which sequenced 31 cultivars (Huang et al. 2016); the other is aimed at genome-wide association study (Guo et al. 2020). In the resequencing based on ‘Junzao’ genome, 10 wild accessions and 21 jujube cultivars were sequenced to study population structure, domestication, and population differentiation. Average sequencing depth was 27.8  , and average coverage was 92.5%. Across all wild and cultivated accessions, 5,300,355 SNPs were resolved. Phylogenetic, PCA, and structure analyses revealed a transition in the evolution from wild jujube, semi-wild jujube, and cultivated jujube; and that the cultivated jujube can be divided into two groups: according to their geographical distribution (i.e., western and eastern groups). Positive selection analyses identified a series of genes that involve the possible domestication of jujube fruit sweetness and acidity. Using RNA-seq data for differential expression comparison for these positively selected genes found that one gene encoding a vacuole acid invertase is highly repressed in the cultivated jujube ripe fruit compared with that in wild jujube. On the other hand, genes involving in acid metabolism pathways were expressed much higher in wild jujube than in cultivated jujube. These results have great value in studying the domestication of jujube. To better understand the domestication history and the evolution of the species-specific traits of jujube, the resequencing of more cultivars from a different background is needed. Recent research sequenced a total of 350 wild, semi-wild, and cultivated jujubes using the next-generation sequencing technology, with an average sequencing depth of 15  and covered 96.33% of assembled ‘Junzao’ genome (Guo et al. 2020). There are 10,355,825 and 940,385 SNPs and indels (  5 bp), respectively. Using the combined mutations of the 350 accessions, GWAS

M. Yang et al.

analyses identified a series of candidate genes related to fruit shape, kernel shape, bearing shoot length, the number of leaves per bearing shoot, and the seed-setting rate. A causal gene, called ZjFS3, was identified by additional transcriptome, expression, and transgenic validation following the GWAS to be highly associated with fruit shape and kernel shape. Specifically, ZjFS3 affects jujube fruit shape by regulating the fruit length but not fruit width. Further, using selective sweep analysis, genes involving the presence of prickles on bearing shoots and the postharvest shelf life of fleshy fruits were also discovered. Although this study gave a wide investigation about the possible genes determining the jujube characteristics by using the population genetics and genome-wide association study, the dynamic mechanisms of jujube domestication and evolution history are still unsolved. In-depth studies are further needed to the jujube population genetics.

5.11

Transcriptome-Related Research

Transcriptomics, mostly in the form of RNA-seq based on NGS technology, has been applied for studying the functional changes of different tissues or different treatments of a species for more than ten years. RNA-seq can be both independent and dependent on the whole-genome assembly; however, a complete genome sequence makes the RNA-seq analyses more precise and reliable. The first transcriptome report in jujube used the now superseded 454 pyrosequencing to compare expression between early and late stages of fruit ripening during fruit development. Assembly of the raw sequencing reads generated a total of 97,479 genes. Differential expression analysis between the two different fruit development stages identified a series of up- and downregulated genes and found the Smirnoff–Wheeler pathway to be the main pathway for the ascorbic acid (ABA) biosynthesis in Chinese jujube (Li et al. 2014). Another work using transcriptomics to study the function of ascorbic acid in fruit ripening found a positive correlation between the

5

Chinese Jujube: Crop Background and Genome Sequencing

81

dynamic changes of endogenous ABA and the tolerance but also the alkaline stress. In 2017, a onset fruit ripening of Chinese jujube. A battery report using the wild jujube material tested the of genes was found to participate in ABA tolerance in alkaline stress since wild jujube is biosynthesis, metabolism, and signaling, and highly resistant to alkaline, saline, and drought expression was determined by qRT-PCR. Fur- stress (Guo et al. 2017). After analyses of tranthermore, through transcriptome sequencing, scriptome data, a batch of genes including tranhormone crosstalk, and transcription factor scription factors, cysteine-like kinases, heat activity, ABA-associated ripening metabolism shock proteins, serine/threonine-protein kinases, and regulatory networks were also examined. reactive oxygen species (ROS) scavengers, and The work will be a great advance to help jujube calmodulin-like proteins were found to be highly fruit storage and the mechanism of the non- correlated with alkaline stress. climacteric fruit ripening process (Zhang et al. 2019c). In addition to the fruit ripening-related study, 5.12 Future Goals and Prospects fruit color was also investigated because of its commercial importance. Postharvest storage for Although researches of genomics, transcriplong periods will cause the color to peel and be tomics, metabolomics, and other—omics-related lackluster. Researches concerning the color studies in jujube have been performed since change from three ripening periods were per- 2014, there are still many understudied avenues formed aiming to elucidate the color formation of genomics-based research in jujube, partly due mechanism. Accumulation of malvidin 3-O- to the limited support in research funding comglucoside and delphinidin 3-O-glucoside, activi- pared to other horticultural species such as apple, ties of flavonoid biosynthetic pathway, and three orange, kiwifruit, and so on. In the area of— UDP-glucose flavonoid 3-O-glucosyltransferase omics, future studies for jujube should focus on (UFGT) were found and proved by transcriptome the following aspects: and metabolome analyses to involve the red- (1) A more complete genome assembly as the new jujube reference. The current version dening stage of jujube peel, the onset of the fruit of the jujube genomes, either ‘Dongzao’ or ripening, and the last ripening periods, respec‘Junzao’, was mainly based on the nexttively (Zhang et al. 2020). Understanding these generation sequencing technology, using a inner mechanisms will shed light to keep the red short read length of only 100–150 bp. This anthocyanins in jujube peel, thus keeping its kind of assembly often results in some appeal even after long-term storage. regions of the genome being absent, such as In addition to studies about jujube fruits, other the high-repeat regions, high GC content characteristics have also been investigated, regions, or even the coding regions located including recent research illustrating dynamic within the complex structure. Furthermore, expression changes according to different short reads produce a more fragmented degrees of freezing. This work used ‘Dongzao’ genome assembly, with the common indicaand ‘Jinsixiaozao’ as the parallel samples and tor of N50 being short (10–50 kb). These treated them with chilling and freezing temperdisadvantages make the annotation of nonatures. Differential expression analysis between coding or coding regions incomplete or error the cultivars and functional enrichment of DE prone. In recent years, third-generation genes was carried out. Some transcription facsequencing technology based on the single tors, such as WRKY, AP2/ERF, NAC, and bZIP, DNA molecules has been widely applied (Li were found to be differentially upregulated in the et al. 2017; Jiao et al. 2017; Sedlazeck et al. two cultivars. The results might be useful to 2018; Wang et al. 2016). Currently, two study how to enhance the freezing tolerance of representative long-read sequencing techjujube (Zhou et al. 2020). Studies focusing on the nologies are single-molecule real-time abiotic influence were not only about the cold

82

M. Yang et al.

variation; the inputs for population genetics (SMRT) and nanopore sequencing develor pan-genome research, including structural oped by two commercial companies, Pacific variation (SV), have been used now in some Biosciences (PacBio) and Oxford Nanopore species, such as in maize (Yang et al. 2019) Technologies (ONT), respectively. Although and tomato (Alonge et al. 2020). Applying there are disadvantages at the beginning of these new approaches, including population their application, such as high error rate of genomics, pan-genomics, and SV analyses, raw reads, high cost, and low throughput, will open a new door in jujube research to these two platforms have been gradually find the key genes involving the key traits. used in a broad range of species with model and non-model organisms along with their (3) Fundamental research in traditional genetics. Genomics and bioinformatics procontinuous improvement in raw reads qualvide data for understanding whole DNA ity, high throughput, and reduced cost. Using composition, their past, present, and even either PacBio or nanopore technologies in future. DNA constitution will not be as jujube genome research will provide a more mysterious as considered in the past; howcomplete, comprehensive, and even accurate ever, new challenges emerged when facing a genome, and this will thus help to further sea of information. How to recognize and interpret the coding genes, non-coding verify them for their real function will, on repetitive elements, the evolution history, one side, utilize the skills of genomics and and so on. bioinformatics, and, on the other side, (2) Sequencing more representative cultivars. require traditional genetics to carry out This includes the resequencing of more cultransformation and investigate transient tivars highly related to the sequenced expression. This will help to uncover the ‘Dongzao’ and ‘Junzao’ genome, and de genetic basis of key economic traits through novo sequencing new cultivars, or even difcombined genomics, population genomics, ferent species in the same genus. Today’s pan-genomics, and so on. A solidification of research in genomics has entered the new these technologies in jujube will help to area with the population genomics and panimprove the breeding efficiency and cultivagenomics. Research reported in the last few tion innovation. years does not only focus on one genome but a batch of genomes and use comparative genomics and population genetics to study the population evolution of highly associated regions with significant economic traits, for References example, the pineapple genome (Chen et al. 2019a) and grapevine genome (Liang et al. Alonge M, Wang X, Benoit M, Soyk S, Pereira L, Zhang L, Suresh H, Ramakrishnan S, Maumus F, 2019). Besides, pan-genomics, which is an Ciren D, Levy Y, Harel TH, Shalev-Schlosser G, integrated genome assembly simultaneously Amsellem Z, Razifard H, Caicedo AL, Tieman DM, considering other species with the genus, or Klee H, Kirsche M, Aganezov S, Ranallo-Benavidez TR, Lemmon ZH, Kim J, Robitaille G, Kramer M, even family, is now becoming a new trend in Goodwin S, McCombie WR, Hutton S, Van Eck J, plant genome research, for example, the rice Gillis J, Eshed Y, Sedlazeck FJ, van der Knaap E, pan-genome research (Zhao et al. 2018; Schatz MC, Lippman ZB (2020) Major impacts of Wang et al. 2018), popular pan-genome widespread structural variation on gene expression and crop improvement in tomato. Cell. https://doi.org/ (Zhang et al. 2019a), and the newly pub10.1016/j.cell.2020.05.021 lished soybean pan-genome research (Liu Asatryan A, Tel-Zur N (2013) Pollen tube growth and et al. 2020b). Based on the sequencing of self-incompatibility in three Ziziphus species (Rhammultiple cultivars using the long-reads naceae). Flora—Morphol Distrib Funct Ecol Plants 208(5–6):390–399 sequencing, SNP is not the only source of

5

Chinese Jujube: Crop Background and Genome Sequencing

Bhansali AK (1975) Monographic study of the family Rhamnaceae of India. Ph.D. thesis, Univ Jodhpur, India Bishop JG, Dean AM, Mitchell-Olds T (2000) Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. Proc Natl Acad Sci USA 97(10):5322–5327. https://doi.org/10. 1073/pnas.97.10.5322 Chagne D, Crowhurst RN, Pindo M, Thrimawithana A, Deng C, Ireland H, Fiers M, Dzierzon H, Cestaro A, Fontana P, Bianco L, Lu A, Storey R, Knabel M, Saeed M, Montanari S, Kim YK, Nicolini D, Larger S, Stefani E, Allan AC, Bowen J, Harvey I, Johnston J, Malnoy M, Troggio M, Perchepied L, Sawyer G, Wiedow C, Won K, Viola R, Hellens RP, Brewer L, Bus VG, Schaffer RJ, Gardiner SE, Velasco R (2014) The draft genome sequence of European pear (Pyrus communis L. ‘Bartlett’). PLoS One 9 (4):e92644. https://doi.org/10.1371/journal.pone.0092644 Chen J, Liu X, Li Z, Qi A, Yao P, Zhou Z, Dong TTX, Tsim KWK (2017) A review of dietary ziziphus jujuba fruit (jujube): developing health food supplements for brain protection. Evid Based Complement Alternat Med 2017:3019568 Chen LY, VanBuren R, Paris M, Zhou H, Zhang X, Wai CM, Yan H, Chen S, Alonge M, Ramakrishnan S, Liao Z, Liu J, Lin J, Yue J, Fatima M, Lin Z, Zhang J, Huang L, Wang H, Hwa TY, Kao SM, Choi JY, Sharma A, Song J, Wang L, Yim WC, Cushman JC, Paull RE, Matsumoto T, Qin Y, Wu Q, Wang J, Yu Q, Wu J, Zhang S, Boches P, Tung CW, Wang ML, Coppens d’Eeckenbrugge G, Sanewski GM, Purugganan MD, Schatz MC, Bennetzen JL, Lexer C, Ming R (2019a) The bracteatus pineapple genome and domestication of clonally propagated crops. Nat Genet 51(10):1549–1558. https://doi.org/10.1038/s41588019-0506-8 Chen YL, Chou PK (1982) Flora Reipublicae (in Chinese). Popularis Sinicae 48(1). Science Press Beijing, pp 133–147 Chen X, Chen R, Wang Y, Wu C, Huang J (2019b) Genome-wide identification of WRKY transcription factors in Chinese jujube (Ziziphus jujuba Mill.) and their involvement in fruit developing, ripening, and abiotic stress. Genes (Basel) 10(5). https://doi.org/10. 3390/genes10050360 Daccord N, Celton JM, Linsmith G, Becker C, Choisne N, Schijlen E, van de Geest H, Bianco L, Micheletti D, Velasco R, Di Pierro EA, Gouzy J, Rees DJG, Guerif P, Muranty H, Durel CE, Laurens F, Lespinasse Y, Gaillard S, Aubourg S, Quesneville H, Weigel D, van de Weg E, Troggio M, Bucher E (2017) High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat Genet 49(7):1099–1106. https:// doi.org/10.1038/ng.3886 Don G (1832) A general history of the dichlamydeous plants. London, pp 22–27 Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21

83

(Suppl 1):i152-158. https://doi.org/10.1093/ bioinformatics/bti1003 Evreinoff VA (1964) Notes sur le jujubier (Ziziphus jujuba Gaetner) (in French). J D’agriculture Tropicale Et De Botanique Appliquee 11:177–187 Gao QH, Wu CS, Yu JG, Wang M, Ma YJ, Li CL (2012) Textural characteristic, antioxidant activity, sugar, organic acid, and phenolic profiles of 10 promising jujube (Ziziphus jujuba Mill.) selections. J Food Ence 77 (11):C1218–C1225 Guo M, Li S, Tian S, Wang B, Zhao X (2017) Transcriptome analysis of genes involved in defense against alkaline stress in roots of wild jujube (Ziziphus acidojujuba). PLoS ONE 12(10):e0185732. https:// doi.org/10.1371/journal.pone.0185732 Guo M, Zhang Z, Cheng Y, Li S, Shao P, Yu Q, Wang J, Xu G, Zhang X, Liu J, Hou L, Liu H, Zhao X (2020) Comparative population genomics dissects the genetic basis of seven domestication traits in jujube. Hortic Res 7:89. https://doi.org/10.1038/s41438-020-0312-6 Hao Z, Dai L, Wang J, Wu X, Liu M (2013) Callus induction and plant regeneration from anther walls in Ziziphus jujuba Mill. J Food, Agric Environ 11:405– 409 Hou L, Zhang Z, Dou S, Zhang Y, Pang X, Li Y (2019) Genome-wide identification, characterization, and expression analysis of the expansin gene family in Chinese jujube (Ziziphus jujuba Mill.). Planta 249 (3):815–829. https://doi.org/10.1007/s00425-0183020-9 Huang S, Ding J, Deng D, Tang W, Sun H, Liu D, Zhang L, Niu X, Zhang X, Meng M, Yu J, Liu J, Han Y, Shi W, Zhang D, Cao S, Wei Z, Cui Y, Xia Y, Zeng H, Bao K, Lin L, Min Y, Zhang H, Miao M, Tang X, Zhu Y, Sui Y, Li G, Sun H, Yue J, Sun J, Liu F, Zhou L, Lei L, Zheng X, Liu M, Huang L, Song J, Xu C, Li J, Ye K, Zhong S, Lu BR, He G, Xiao F, Wang HL, Zheng H, Fei Z, Liu Y (2013) Draft genome of the kiwifruit Actinidia chinensis. Nat Commun 4:2640. https://doi. org/10.1038/ncomms3640 Huang J, Zhang C, Zhao X, Fei Z, Wan K, Zhang Z, Pang X, Yin X, Bai Y, Sun X, Gao L, Li R, Zhang J, Li X (2016) The jujube genome provides insights into genome evolution and the domestication of sweetness/acidity taste in fruit trees. PLoS Genet 12(12):e1006433. https://doi.org/10.1371/journal.pgen.1006433 International Peach Genome I, Verde I, Abbott AG, Scalabrin S, Jung S, Shu S, Marroni F, Zhebentyayeva T, Dettori MT, Grimwood J, Cattonaro F, Zuccolo A, Rossini L, Jenkins J, Vendramin E, Meisel LA, Decroocq V, Sosinski B, Prochnik S, Mitros T, Policriti A, Cipriani G, Dondini L, Ficklin S, Goodstein DM, Xuan P, Del Fabbro C, Aramini V, Copetti D, Gonzalez S, Horner DS, Falchi R, Lucas S, Mica E, Maldonado J, Lazzari B, Bielenberg D, Pirona R, Miculan M, Barakat A, Testolin R, Stella A, Tartarini S, Tonutti P, Arus P, Orellana A, Wells C, Main D, Vizzotto G, Silva H, Salamini F, Schmutz J, Morgante M, Rokhsar DS (2013) The high-quality draft genome

84 of peach (Prunus persica) identifies unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45(5):487–494. https://doi.org/10. 1038/ng.2586 Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, Campbell MS, Stein JC, Wei X, Chin CS, Guill K, Regulski M, Kumari S, Olson A, Gent J, Schneider KL, Wolfgruber TK, May MR, Springer NM, Antoniou E, McCombie WR, Presting GG, McMullen M, Ross-Ibarra J, Dawe RK, Hastie A, Rank DR, Ware D (2017) Improved maize reference genome with single-molecule technologies. Nature 546 (7659):524–527. https://doi.org/10.1038/nature22971 Jinfeng B, Jingjing Y, Shasha B, Pei W, Yuanyuan D (2010) Research status of jujube processing technology at home and abroad. Acad Periodical Farm Prod Process 3:34–36 Johnston MC (1972) Rhamnaceae. In: Milne-Redhead E, Polhill RM (eds) Flora of tropical East Africa. Crown Agents, London Kalvari I, Nawrocki EP, Argasinska J, Quinones-Olvera N, Finn RD, Bateman A, Petrov AI (2018) Noncoding RNA analysis using the Rfam database. Curr Protoc Bioinf 62(1):e51. https://doi.org/10.1002/cpbi. 51 Li D, Du X, Wang Y, Sui C, He J (2004) Callus induction and embryogenesis of Ziziphus jujuba cv. liuyuexian. J Fruit Ence 21:414–418 Li X, Peng D, Zhang C, Ding X (2012) Nutrient quality analysis of yellow river Jujube. Food Sci 37:83–85 Li Y, Xu C, Lin X, Cui B, Wu R, Pang X (2014) De novo assembly and characterization of the fruit transcriptome of Chinese jujube (Ziziphus jujuba Mill.) Using 454 pyrosequencing and the development of novel trinucleotide SSR markers. PLoS One 9(9):e106438. https://doi.org/10.1371/journal.pone.0106438 Li C, Lin F, An D, Wang W, Huang R (2017) Genome sequencing and assembly by long reads in plants. Genes (Basel) 9(1). https://doi.org/10.3390/genes9010006 Li H, Gao W, Xue C, Zhang Y, Liu Z, Zhang Y, Meng X, Liu M, Zhao J (2019) Genome-wide analysis of the bHLH gene family in Chinese jujube (Ziziphus jujuba Mill.) and wild jujube. BMC Genomics 20(1):568. https://doi.org/10.1186/s12864-019-5936-2 Liang Z, Duan S, Sheng J, Zhu S, Ni X, Shao J, Liu C, Nick P, Du F, Fan P, Mao R, Zhu Y, Deng W, Yang M, Huang H, Liu Y, Ding Y, Liu X, Jiang J, Zhu Y, Li S, He X, Chen W, Dong Y (2019) Wholegenome resequencing of 472 Vitis accessions for grapevine diversity and demographic history analyses. Nat Commun 10(1):1190. https://doi.org/10.1038/ s41467-019-09135-8 Liu MJ (2006) Chinese jujube: botany and horticulture, vol 32. Wiley, Horticulture Review Liu MJ, Cheng CY (1995) A taxonomic study on the genus Ziziphus. Acta Hort 390:161–165 Liu M, Wang J (2019) Fruit scientific research in new China in the past 70 years: Chinese jujube. J Fruit Sci 36:1369–1381

M. Yang et al. Liu MJ, Zhao J, Cai QL, Liu GC, Wang JR, Zhao ZH, Liu P, Dai L, Yan G, Wang WJ, Li XS, Chen Y, Sun YD, Liu ZG, Lin MJ, Xiao J, Chen YY, Li XF, Wu B, Ma Y, Jian JB, Yang W, Yuan Z, Sun XC, Wei YL, Yu LL, Zhang C, Liao SG, He RJ, Guang XM, Wang Z, Zhang YY, Luo LH (2014) The complex jujube genome provides insights into fruit tree biology. Nat Commun 5:5315. https://doi. org/10.1038/ncomms6315 Liu ZG, Zhao ZH, Xue CL, Wang LX, Wang LL, Feng CF, Zhang LM, Yu Z, Zhao J, Liu MJ (2019) Three main genes in the MAPK cascade involved in the Chinese jujube-phytoplasma interaction. Forests 10(5):392 Liu M, Wang J, Wang L, Liu P, Zhao J, Zhao Z, Yao S, Stănică F, Liu Z, Wan L, Ao C, Dai L, Li X, Zhao X, Jia C (2020a) The historical and current research progress on jujube—a superfruit for the future. Hort Res 7:119 Liu Y, Du H, Li P, Shen Y, Peng H, Liu S, Zhou GA, Zhang H, Liu Z, Shi M, Huang X, Li Y, Zhang M, Wang Z, Zhu B, Han B, Liang C, Tian Z (2020b) Pangenome of wild and cultivated soybeans. Cell https:// doi.org/10.1016/j.cell.2020.05.023 Liu MJ, Wang JR, Liu P, Zhao J, Zhao ZH, Dai L, XianSong LI, Liu ZG (2015) Historical achievements and frontier advances in the production and research of chinese jujube (Ziziphus jujuba) in China. Acta Horticulturae Sinica Liu Z, Zhang L, Xue C, Fang H, Zhao J, Liu M (2017) Genome-wide identification and analysis of MAPK and MAPKK gene family in Chinese jujube (Ziziphus jujuba Mill.). BMC Genomics 18(1):855. https://doi. org/10.1186/s12864-017-4259-4 Liu Z, Wang L, Xue C, Chu Y, Gao W, Zhao Y, Zhao J, Liu M (2020c) Genome-wide identification of MAPKKK genes and their responses to phytoplasma infection in Chinese jujube (Ziziphus jujuba Mill.). BMC Genomics 21(1):142. https://doi.org/10.1186/ s12864-020-6548-6 Liu MJ (2004a) Handbook of high quality production of Chinese jujube (in Chinese). The agricultural publ house of China, Beijing pp 1–22, 275–323 Liu MJ (2004b) Technical manual for quality production in Chinese jujube. China Agriculture Press Liu M (2008) China jujube development report, 1949– 2007 China forestry publishing house Meng X, Li Y, Yuan Y, Zhang Y, Li H, Zhao J, Liu M (2020) The regulatory pathways of distinct flowering characteristics in Chinese jujube. Hort Res 7:123 Pareek OP (2001) International centre for underutilised crops. Southampton, UK, pp 15–194 Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351-358. https://doi.org/ 10.1093/bioinformatics/bti1018 Qi Y, Liu M (2004) Embryo abortion and young embryo culture of Chinese jujube. Acta Horticulturae Sinica 31:78–80

5

Chinese Jujube: Crop Background and Genome Sequencing

Qing J, Dawei W, Jun Z, Yulan X, Bingqi S, Fan Z (2019) Genome-wide characterization and expression analyses of the MYB superfamily genes during developmental stages in Chinese jujube. Peer J 7:e6353. https://doi.org/10.7717/peerj.6353 Qu ZZ, Wang YH, Zhou JZ, Peng SQ, Li SL (1987) Discussion on the origin of Chinese jujube (in Chinese). J Hebei Agr Univ 10 (Symposium of the study on Chinese jujube) pp 1–9 Rendle AB (1952) The classification of flowering plants. Cambridge University Press Richardson JE, Fay MF, Cronk QCB, Chase MW (2000) A revision of the tribal classification of Rhamnaceae. #N/A 55 (2):311 Sedlazeck FJ, Lee H, Darby CA, Schatz MC (2018) Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat Rev Genet. https://doi. org/10.1038/s41576-018-0003-4 Song S, Zhou H, Sheng S, Cao M, Li Y, Pang X (2017) Genome-wide organization and expression profiling of the SBP-box gene family in Chinese jujube (Ziziphus jujuba Mill.). Int J Mol Sci 18(8). https://doi.org/10. 3390/ijms18081734 Su C, Liu X, Yan C, X B, H L (2019) Nutrient composition analysis of jujube from different habitats. Deciduous Fruit Trees 51:8–10 Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, Salvi S, Pindo M, Baldi P, Castelletti S, Cavaiuolo M, Coppola G, Costa F, Cova V, Dal Ri A, Goremykin V, Komjanc M, Longhi S, Magnago P, Malacarne G, Malnoy M, Micheletti D, Moretto M, Perazzolli M, Si-Ammour A, Vezzulli S, Zini E, Eldredge G, Fitzgerald LM, Gutin N, Lanchbury J, Macalma T, Mitchell JT, Reid J, Wardell B, Kodira C, Chen Z, Desany B, Niazi F, Palmer M, Koepke T, Jiwan D, Schaeffer S, Krishnan V, Wu C, Chu VT, King ST, Vick J, Tao Q, Mraz A, Stormo A, Stormo K, Bogden R, Ederle D, Stella A, Vecchietti A, Kater MM, Masiero S, Lasserre P, Lespinasse Y, Allan AC, Bus V, Chagne D, Crowhurst RN, Gleave AP, Lavezzo E, Fawcett JA, Proost S, Rouze P, Sterck L, Toppo S, Lazzari B, Hellens RP, Durel CE, Gutin A, Bumgarner RE, Gardiner SE, Skolnick M, Egholm M, Van de Peer Y, Salamini F, Viola R (2010) The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet 42(10):833– 839. https://doi.org/10.1038/ng.654 Wang J, Liu M, Dai L (2002) Advances in tissue culture of Chinese jujube. J Fruit Ence 19:336–339 Wang B, Tseng E, Regulski M, Clark TA, Hon T, Jiao Y, Lu Z, Olson A, Stein JC, Ware D (2016) Unveiling the complexity of the maize transcriptome by singlemolecule long-read sequencing. Nat Commun 7:11708. https://doi.org/10.1038/ncomms11708 Wang W, Mauleon R, Hu Z, Chebotarov D, Tai S, Wu Z, Li M, Zheng T, Fuentes RR, Zhang F, Mansueto L, Copetti D, Sanciangco M, Palis KC, Xu J, Sun C, Fu B, Zhang H, Gao Y, Zhao X, Shen F, Cui X, Yu H,

85

Li Z, Chen M, Detras J, Zhou Y, Zhang X, Zhao Y, Kudrna D, Wang C, Li R, Jia B, Lu J, He X, Dong Z, Xu J, Li Y, Wang M, Shi J, Li J, Zhang D, Lee S, Hu W, Poliakov A, Dubchak I, Ulat VJ, Borja FN, Mendoza JR, Ali J, Li J, Gao Q, Niu Y, Yue Z, Naredo MEB, Talag J, Wang X, Li J, Fang X, Yin Y, Glaszmann JC, Zhang J, Li J, Hamilton RS, Wing RA, Ruan J, Zhang G, Wei C, Alexandrov N, McNally KL, Li Z, Leung H (2018) Genomic variation in 3010 diverse accessions of Asian cultivated rice. Nature. https://doi.org/10.1038/s41586-018-0063-9 Wang L, Li M, Liu Z, Dai L, Zhang M, Wang L, Zhao J, Liu M (2020) Genome-wide identification of CNGC genes in Chinese jujube (Ziziphus jujuba Mill.) and ZjCNGC2 mediated signalling cascades in response to cold stress. BMC Genomics 21(1):191. https://doi.org/ 10.1186/s12864-020-6601-5 Wu H, Ma T, Kang M, Ai F, Zhang J, Dong G, Liu J (2019) A high-quality Actinidia chinensis (kiwifruit) genome. Hortic Res 6:117. https://doi.org/10.1038/ s41438-019-0202-y Xu Q, Chen LL, Ruan X, Chen D, Zhu A, Chen C, Bertrand D, Jiao WB, Hao BH, Lyon MP, Chen J, Gao S, Xing F, Lan H, Chang JW, Ge X, Lei Y, Hu Q, Miao Y, Wang L, Xiao S, Biswas MK, Zeng W, Guo F, Cao H, Yang X, Xu XW, Cheng YJ, Xu J, Liu JH, Luo OJ, Tang Z, Guo WW, Kuang H, Zhang HY, Roose ML, Nagarajan N, Deng XX, Ruan Y (2013) The draft genome of sweet orange (Citrus sinensis). Nat Genet 45(1):59–66. https://doi. org/10.1038/ng.2472 Xu Z, Wang H (2007) LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res 35 (Web Server issue):W265–268. https://doi.org/10.1093/nar/gkm286 Xue C, Li H, Liu Z, Wang L, Zhao Y, Wei X, Fang H, Liu M, Zhao J (2019) Genome-wide analysis of the WRKY gene family and their positive responses to phytoplasma invasion in Chinese jujube. BMC Genomics 20(1):464. https://doi.org/10.1186/s12864019-5789-8 Yang N, Liu J, Gao Q, Gui S, Chen L, Yang L, Huang J, Deng T, Luo J, He L, Wang Y, Xu P, Peng Y, Shi Z, Lan L, Ma Z, Yang X, Zhang Q, Bai M, Li S, Li W, Liu L, Jackson D, Yan J (2019) Genome assembly of a tropical maize inbred line provides insights into structural variation and crop improvement. Nat Genet 51(6):1052–1059. https://doi.org/10.1038/s41588-0190427-6 Zhang Z, Li X (2018) Genome-wide identification of AP2/ERF superfamily genes and their expression during fruit ripening of Chinese jujube. Sci Rep 8(1):15612. https://doi.org/10.1038/s41598-01833744-w Zhang L, Zhao J, Feng C, Liu M, Wang J, Hu Y (2017) Genome-wide identification, characterization of the MADS-box gene family in Chinese jujube and their involvement in flower development. Sci Rep 7(1):1025. https://doi.org/10.1038/s41598-017-01159-8

86 Zhang B, Zhu W, Diao S, Wu X, Lu J, Ding C, Su X (2019a) The poplar pangenome provides insights into the evolutionary history of the genus. Commun Biol 2:215. https://doi.org/10.1038/s42003-019-0474-7 Zhang L, Hu J, Han X, Li J, Gao Y, Richards CM, Zhang C, Tian Y, Liu G, Gul H, Wang D, Tian Y, Yang C, Meng M, Yuan G, Kang G, Wu Y, Wang K, Zhang H, Wang D, Cong P (2019b) A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat Commun 10 (1):1494. https://doi.org/10.1038/s41467-019-09518-x Zhang Z, Kang C, Zhang S, Li X (2019c) Transcript analyses reveal a comprehensive role of abscisic acid in modulating fruit ripening in Chinese jujube. BMC Plant Biol 19(1):189. https://doi.org/10.1186/s12870019-1802-2 Zhang Q, Wang L, Liu Z, Zhao Z, Zhao J, Wang Z, Zhou G, Liu P, Liu MJ (2020) Transcriptome and metabolome profiling unveil the mechanisms of Ziziphus jujuba Mill. peel coloration. Food Chem 312:125903. https:// doi.org/10.1016/j.foodchem.2019.125903 Zhao Z, Liu M, Tu P (2008) Characterization of water soluble polysaccharides from organs of Chinese Jujube (Ziziphus jujuba Mill. cv. Dongzao). Eur Food Res Technol 226:985–989

M. Yang et al. Zhao AL, Deng-Ke LI, Wang YK, Sui CL, Xue-Mei DU, Ren HY, Liang Q (2012a) Study on the content of polysaccharides in different cultivars, growing periods and organs in chinese jujube. J Shanxi Agric Ences 993(993):219–224 Zhao Z, Liu M, Tu P (2012b) A bioactive polysaccharide isolated from the fruits of Chinese jujube. Asian J Chem 24:813–815 Zhao Q, Feng Q, Lu H, Li Y, Wang A, Tian Q, Zhan Q, Lu Y, Zhang L, Huang T, Wang Y, Fan D, Zhao Y, Wang Z, Zhou C, Chen J, Zhu C, Li W, Weng Q, Xu Q, Wang ZX, Wei X, Han B, Huang X (2018) P1_Pan-genome analysis highlights the extent of genomic variation in cultivated and wild rice. Nat Genet 50(2):278–284. https://doi.org/10.1038/s41588018-0041-z Zhao ZH, Liu MJ, Liu XY, Liu P, Tu PF Content variations of water soluble polysaccharides during fruit development in chinese jujube (Ziziphus jujuba mill.). Acta Hortic 840(840):529–532 Zhou H, He Y, Zhu Y, Li M, Song S, Bo W, Li Y, Pang X (2020) Comparative transcriptome profiling reveals cold stress responsiveness in two contrasting Chinese jujube cultivars. BMC Plant Biol 20(1):240. https:// doi.org/10.1186/s12870-020-02450-z

The Longan (Dimocarpus longan) Genome

6

Yan Chen, Xiaoping Xu, Xiaohui Chen, Shuting Zhang, Yukun Chen, Zhongxiong Lai, and Yuling Lin

Abstract

Longan (Dimocarpus longan) is an important evergreen fruit tree in tropical/subtropical regions and an important economic fruit tree. Longan is rich in nutrients and has high medicinal value. It is currently widely planted in China, Thailand, Vietnam, and Australia, and other tropical countries. With the development of sequencing technology, the genomes of three longan cultivars have been sequenced. In recent studies, based on the high-quality longan genome, whole-genome resequencing was used to analyze the highquality longan genotype population genome. Here, we review longan genome-based transcriptome sequencing, single-cell RNA sequencing, microRNAs, long noncoding RNAs, circular RNAs, DNA methylation sequencing, proteomics, and genetic transformation based on the longan ‘HHZ’ somatic embryogenesis (SE) system. We focused on the role of plant hormone signal transduction, flavonoid biosynthesis, and fatty acid synthe-

Y. Chen  X. Xu  X. Chen  S. Zhang  Y. Chen  Z. Lai (&)  Y. Lin (&) Institute of Horticultural Biotechnology, Fujian Agriculture and Forestry University, FuzhouFujian 350002, China e-mail: [email protected] Y. Lin e-mail: [email protected]

sis pathways during longan early SE and the important role of noncoding RNA in longan SE process. In addition, differential proteins involved in SE and genes related to induced DNA hypomethylation were also identified. Finally, we discussed the genetic transformation system of longan and further analyzed the molecular mechanism of the longan early SE.

6.1

Background

Longan (Dimocarpus longan Lour.) belongs to the Sapindaceae family and is an important evergreen fruit tree in tropical/subtropical regions (Fig. 6.1). It is a popular fruit variety in Asia. As longan is an important economic fruit tree, it is currently widely planted in China, Thailand, Vietnam, and Australia, and other tropical countries. Longan is native to southern China and has a long history of cultivation. China, as the country with the highest planting area and yield of longan, has abundant germplasm resources. According to statistics, in 2016, China’s longan cultivation area reached 376,000 hm2, accounting for 54.8% of the world’s cultivation area, and the output was 1.914 million tons, accounting for 54.7% of the world’s longan production. The output increased to 2.03 million tons in 2018. Due to the particularity of its resources and geographical location, longan is mainly distributed in Guangxi, Guangdong, Fujian, and Hainan provinces in China. Longan

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_6

87

88

Y. Chen et al.

Fig. 6.1 The morphology of perennial longan and different tissues. R (root), S (stem), L (leaf), B (vegetative bud), LFB (late stage of floral bud), I (inflorescence), FB

(flower bud), MF (male flower), FF (female flower); YF (young fruit), RF (ripe fruit), PC (pericarp), P (pulp), and S (seed)

is rich in nutrients and has high medicinal value. The fruit contains nutrients such as vitamins, carbohydrates, and amino acids and is an important evergreen fruit tree in tropical/ subtropical regions. In addition, longan also has a wealth of pharmacological uses. It contains a large number of polyphenols, which have effects on inflammation, cancer, and cardiovascular disease (Zhang et al. 2012). Important planting resources Longan grows well in tropical and subtropical regions; however, the degree of cultivation and promotion of longan varieties in different countries and regions is variable. The important cultivars in China include ‘Shixia’, ‘Chuliang’, ‘Caofuzhong’, and ‘Wuyuan’. ‘Shixia’ is native to Pingzhou, Nanhai, Guangdong, and has a long history of cultivation. The core is black, the flesh is white and crisp, and the flavor is sweet. The Total Soluble Solids is 23–26% and can be used as a good fresh food species. ‘Chuliang’ is a local high-quality variety in Gaozhou, Guangdong province. It is characterized by high fruit quality and high yield. The average fruit weight is 12.0– 16.5 g. ‘Chuliang’ is an excellent variety for both fresh food and processing. The processed pulp is yellow and clean, translucent, and thick. ‘Chuliang’ clones are characterized by early maturity and fruiting, high and stable yield, and genetic stability (Huang et al. 2001). ‘Caofuzhong’ fruits

are round or slightly oblate, and the peel is brown. Its flesh is white waxy to light yellow waxy white, translucent, crispy, and tender and has a high edible rate (proportion of fruit that is edible). It belongs to the middle and late-maturing varieties. ‘Caofuzhong’ has a long harvest period, in which fruit can be left on the tree to delay the harvest without affecting the quality of fresh food. ‘Wuyuan’ is originally produced in Rong County, Guangxi, and is the main cultivated variety in Guangxi. The tree is vigorous, tall, and coldresistant. The fruit is oblate and spherical, and the fruit is large. The average weight of a single fruit is 15–20 g, and the largest can reach 31 g. It is the largest variety of longan in China. The peel is yellowish brown, the flesh is waxy white and translucent, and the flesh is crisp and crisp. But the sweetness is slightly less, the edible rate is 74.3%, and the soluble solids is 18.5%. Thailand’s longan production occupies an important position in the world and is one of the fruits with the best economic benefits in Thailand. The main varieties of longan in Thailand are ‘Daw’, ‘Chompo’, ‘BiewKhiew’, ‘Haew’, ‘Dang’, and others. At present, the main longan cultivar in Thailand is ‘Daw’, and the planted area reaches 73% of the total area. ‘Daw’ belongs to early-maturing varieties, ‘Chompo’ and ‘Dang’ belong to medium maturing varieties, and ‘Biewkhiew’ and ‘Haew’ belong to late-maturing

6

The Longan (Dimocarpus longan) Genome

varieties. Daw’s resistance is relatively high, the infection rate of witches’ broom disease is low, and BiewKhiew’s infection rate is relatively high. Daw’s aril was sweet and tough, and the harvest time was short. ‘Chompo’ fruits are medium in size and have good flavors but require a lot of fertilization and good management to ensure high yields. ‘Biewkhiew’ has a thicker skin, which helps with storage. ‘Haew’ also has the characteristics of thick pericarp, small seeds, and high edibility. ‘Dang’ seeds result in a rather low edible rate.

6.2

Genome Sequencing

6.2.1 Strategy The genome is the sum of all the genetic material of an organism. In the early 1990s, with the development of automated sequencing technology, the Sanger sequencing technology with the dideoxy chain termination method as the core was born. The further upgrade of Sanger sequencing platform resulted in a significant increase in production, shorter sequencing, time and lower cost, which made it possible to sequence larger eukaryotic genomes (Adams et al. 2000). Compared with the first-generation Sanger sequencing technology, the next-generation sequencing technologies reduce the cost of sequencing, increases the sequencing rate, and together can lead to a higher sequencing coverage. Next-generation sequencing is a high-throughput sequencing technology represented by Roche 454 system, Applied Biosystems SOLiD system, and Illumina Solexa system and has gradually become the leading technology for large-scale whole genome sequencing. More recently, with the emergence of the third-generation sequencing technology, the use of single-molecule sequencing technology has greatly reduced the necessary time and cost and has also achieved unprecedented resolution. Longan is an important tropical/subtropical evergreen fruit tree, but the improvement of traditional breeding strategies is hampered by its long juvenile stage, high genetic heterozygosity, and plant size (Lai et al. 2000). In order to

89

identify varieties and improve the level of longan breeding, it is necessary to understand the genetic background of longan. At present, three research groups have sequenced the longan genome, and Fujian Agriculture and Forestry University has published the next-generation longan genome data for the first time. Lin et al. (2017b) completed the sequencing of the whole genome of longan for the first time using ‘Honghezi’ cultivar (2n = 2x = 30) as the material and established the first whole genome database of Sapindaceae family plants. The longan genome was sequenced using the whole genome shotgun method of the Illumina HiSeq 2000 system. In the HiSeq 2000 sequencing platform, the dualend sequencing mode was adopted, and one sequencing channel could produce more than 250 million reads of the full-length sequence. However, there are still shortcomings when using only short read data for genome assembly, such as small contigs, many gaps in the genome, and fragmentation of the whole genome sequence. The use of this longan genome in genetics and comparative genomics research application still had big obstacles. Therefore, the third-generation sequencing of the longan genome provides a more detailed reference genome map for longan. At present, three research groups have sequenced three longan varieties. Fujian Agriculture and Forestry University and Guangdong Academy of Agricultural Sciences sequenced the genomes of ‘Honghezi’ and ‘Jidanben’ (‘JDB’) longan varieties using PacBio Sequel and Illumina Hiseq, respectively, while Fujian Academy of Agricultural Sciences sequenced ‘Shixia’. However, the sequencing results of third generation have not been published.

6.2.2 Results-Genome Statistics In the next-generation genome data of longan, a total of 316.84 Gb of original data was generated. After strict filtering and correction steps, 121.68 Gb of high-quality sequence data was obtained, which included 273.44 X coverage of the longan genome (Lin et al. 2017b). Longan is generally considered to be highly heterozygous.

90

Y. Chen et al.

The genome of longan was estimated to be 445 Mb with a heterozygous rate of 0.88% according to the K-mer frequency method. The N50 of the longan genome assembly contig and scaffold were 26.04 kb (the longest, 173.29 kb) and 566.63 kb (the longest, 6942.32 kb), respectively. Among the 1244 BUSCO orthologous groups searched in the longan collection, 900 (72.23%) BUSCO genes were ‘complete single-copy’, 288 (23.15%) were ‘complete duplicated’, 16 (1.29%) were ‘fragmented’, and 40 (3.22%) were ‘missing’, indicating the quality and completeness of the longan genome assembly draft. A total of 34,934 protein-coding genes and 261.88 Mb (52.87%, 445 Mb) repetitive sequences were identified in longan genome. In the third-generation longan genome sequencing, Fujian Agriculture and Forestry University has used PacBio Sequel to sequence the genome of ‘Honghezi’ longan. The original draft of the genome sequence was 486 Mb in length, and the initially assembled Contig N50 reaches 5.71 Mb, with a total of 327 Contigs. HiC technology was used to further assist genome assembly to the chromosome level. By sorting Contigs, 15 Superscaffolds (including 300 Contigs) were finally constructed, with a total length

of 446 Mb, with Contig N50 of 4.4 Mb and Superscaffold N50 of 27.7 Mb, indicating excellent Hi-C–assisted assembly results. In sequencing of the ‘JDB’ longan by Guangdong Academy of Agricultural Sciences, the assembly size equated to 455.5 Mb and covered 95.90% of the estimated genome size (474.98 Mb). 98.7% of the sequences were anchored on 15 chromosomes, and the contig N50 was 12.1 Mb. Fujian Academy of Agricultural Sciences used ‘Shixie’ as the main sequencing material for thirdgeneration sequencing and obtained a 458.9 Mb genome with a Contig N50 of 1.154 Mb. In general, the longan genomes obtained by the three research groups were similar in size, and compared with the next-generation data, the genome quality is significantly improved.

6.2.3 Resequencing High heterozygosity is a typical characteristic of longan varieties, which leads to low efficiency in germplasm resource management and utilization. The population genomics of high-quality longan genotypes has been analyzed by whole genome resequencing. Fujian Agriculture and Forestry

Table 6.1 Origin of longan cultivars (Lin et al. 2017a, b) Species

Accession

Abbreviation

Characteristics

Original locality

D. longan

Honghezi

HHZ

Middle-maturing

Seedling plant, Fuzhou, Fujian, China

D. longan

Dongbi

DB

Early-maturing

Kaiyuan Temple, Quanzhou, Fujian, China

D. longan

Jiuyuewu

JYW

Late-maturing

Putian, Fujian, China

D. longan

Lidongben

LDB

Special late-maturing

Putian, Fujian, China

D. longan

Wulongling

WLL

Yield

Putian, Fujian, China

D. longan

Shuinanyihao

SN1H

Large fruit type, diseaseresistant variety

Putian, Fujian, China

D. longan

Youtanben

YTB

Late-maturing

Putian, Fujian, China

D. longan

Shieryue

SEY

Special late-maturing

Zhangpu, Fujian, China

D. longan

Jiaohe/Baihe

JHLY

Aborted-seeded

Quanzhou, Fujian, China

D. longan

Fuyan

FY

Disease-resistant variety

Quanzhou, Fujian, China

D. longan

Shixia

SX

Early-maturing, good quality

Guangdong, China

D. longan

Miaoqiao

MQ

Late-maturing, yield,

Thailand

D. longan

Sijimi

SJM

Multiple flowering

South-East Asia

6

The Longan (Dimocarpus longan) Genome

University resequenced the genome of 13 representative commercial cultivars with earlymaturing, middle-maturing, late-maturing, multiple-flowering, aborted-seeded, and diseaseresistant characteristics produced a total of 45.77 Gb of raw data (Table 6.1). After mapping the clean reads to the reference genome, 357,737 SNPs and 23,225 small insertions/deletions (indels) were identified, and the overall polymorphism density was 0.05–0.12 SNPs and 0.004–0.007 indels per 10 kb. It is worthwhile that there are major variations among the germplasm of ‘Fuyan’ (‘FY’), ‘Miaoqiao’ (‘MQ’,) and ‘Sijimi’ (‘SJM’), while the variation within the germplasm of cultivated longan, especially the ‘LDB’ germplasm, is relatively low. By constructing a neighboring tree and conducting principal component analysis (PCA), the 13 longan germplasms were clustered in two groups. Previous studies have shown that ‘FY’ originated in Quanzhou, China, is clustered with other Chinese longan germplasm through the results of molecular markers. In the study at the overall genome level, it was found that ‘FY’ has more genetic differences than other longan germplasms tested. PCA shows that samples from China tend to cluster together, while the ‘SJM’ and ‘MQ’ germplasms from Southeast Asia and Thailand are significantly different from the Chinese longan germplasm tested in this study. Guangdong Academy of Agricultural Sciences (Wang et al. 2022) performed genome resequencing analysis on 87 germplasm accessions from five southern Chinese provinces: Guangdong, Fujian, Guangxi, Sichuan, and Hainan and three other countries: Thailand, Vietnam, and Australia. The results showed that genes flowed from Hainan wild germplasm to Guangdong, then from Sichuan to Fujian, and finally from China to Thailand. The longan genome was analyzed by direct homologous cluster analysis with the genomes of eight other selected plants: Arabidopsis thaliana, orange, papaya, grape, banana, peach, kiwi, and apple. The number of gene families in Dimocarpus longan Lour. (14,961) was similar to that in Citrus sinensis (15,000) and Prunus persica (15,326); higher than Musa acuminata (12,519), Arabidopsis thaliana (13,406), Vitis vinifera

91

(13,570), Actinidia chinensis (13,702), and Ananas comosus (13,763), and lower than Malus domestica (17,740). The genome of longan was compared with that of citrus, banana, peach, and Arabidopsis thaliana. It was found that there was a total of 9215 genes in the core of these five species, and there were 1207 genes specific to longan. Longan has more specific genes than citrus and Arabidopsis, but lower than M. acuminata and peach. Fujian Academy of Agricultural Sciences (unpublished data) show that the gene families related to phenylpropanoid biosynthesis and UDP-glycosyltransferase were significantly amplified in longan genome. Biogeographic factors were the main factors affecting the genetic diversity of longan, and there was obvious population mixing and introgression among varieties from different geographical sources. In addition, it was revealed that no recent polyploidy occurred after the ancient hexaploidy (c-WGD) event.

6.3

RNA Sequencing

6.3.1 Whole Transcriptome Sequencing Currently, longan transcriptome sequencing includes nine organs of ‘SJM’, four somatic embryogenesis related stages of ‘Honghezi’, permanent flowering and seasonal flowering longan genotypes, and red fruit-skin longan ‘SX’. Based on the longan genome dataset, transcriptomes of nine organs (root, stem, mature leaf, flower bud, flower, young fruit, pericarp, pulp, and seed) of the longan ‘SJM’ cultivar were sequenced using the Illumina HiSeq 2000 system. A total of 490,502,822 clean reads were obtained, and the number of transcripts/genes ranged from 19,322 (pulp) to 23,118 (flower bud). The single nucleotide polymorphisms (SNPs) and insertions and deletions (Indels) detected from the transcriptome sequences revealed that the expressed transcripts were divergent between young fruit and leaf. The predominant types of alternative splicing events in all nine tissues were intron retention. The differentially expressed genes

92

(DEGs) between the nine organs were significantly enriched in the Gene Ontology (GO) terms ‘metabolic pathway’ and ‘biosynthesis of secondary metabolites’, and they may be critical contributors to the accumulation of polyphenolic compounds in longan fruit (Lin et al. 2017b). Polyphenols are the main category of secondary metabolites in longan and are a potential antioxidant compound. In order to further evaluate the changes between the primary and secondary metabolism of polyphenols during the nutritional and reproductive growth stages of longan, the copy numbers of 26 genes in the biosynthetic pathway of shikimic acid, phenylpropane, and flavonoids were compared to the corresponding pathways of Arabidopsis, orange, peach, grape, poplar, and eucalyptus. The copy number varied among the plants analyzed. The significantly expanded gene families of longan, orange, peach, poplar, and eucalyptus are DHS, SDH, F3’H, ANR, and UFGT. Compared with the corresponding families in grapes, grapes are considered to be the oldest family. Longan contains a large number of differentially expressed plant–pathogen resistance genes. Two resistance gene families were searched for in the longan genome, that is, the genes encoding NBS-LRR protein and the genes encoding LRRRLK. A total of 594 NBS-LRR and 338 LRR-RLK encoding genes were identified, which accounted for 1.51% and 0.86% of the longan annotation protein encoding genes, respectively (Lin et al. 2017b). The number of NBS-LRR and LRR-RLK genes in longan genome was larger than that of orange (509 and 325, respectively) (Xu et al. 2013), grape (341 and 234) (Jaillon et al. 2007), peach (425 and 268) (Verde et al. 2013), Prunus mume (411 and 253) (Zhang et al. 2012), and papaya (60 and 134) (Ming et al. 2008), but less than that of apple (1035 and 477) (Velasco et al. 2010). To explore the molecular basis of longan early somatic embryogenesis (SE), transcriptomes from four SE-related stages were sequenced, including non-embryogenic callus (NEC), embryogenic callus (EC), incomplete compact pro-embryogenic cultures (ICpEC), and globular embryos (GE) using Illumina HiSeq (Fig. 6.2).

Y. Chen et al.

22,743, 19,745, 21,144, and 21,102 expressed transcripts were detected in the NEC, EC, ICpEC, and GE stages, respectively. The DEGs between these four stages were enriched in plant hormones signaling, flavonoid biosynthesis, fatty acid biosynthesis, and plant–pathogen interaction pathways. The SE-related marker genes such as WOX2, WOX9, LEC1, LEC1-like, PDF1.3, GH3.6, AGL80, PIN1, BBM, and ABI3 were preferentially expressed (Chen et al. 2020a, b). In addition, reproductive growth such as vegetative growth, flowering, gametophytogenesis, and fertility-related genes was expressed in the EC (Lin and Lai 2013). The results serve as a valuable resource for further studies the embryogenesis of woody plants. Comparative transcriptome analysis was performed to study the molecular regulatory mechanism underlying longan perpetual flowering (PF) traits. A total of 27,266 known genes and 1913 new genes were detected from two longan cultivars, named as ‘SJM’ and ‘SX’. The number of DEGs identified during floral induction in ‘SJM’ and ‘SX’ were 6150 and 6202, respectively. The transcriptional expression of floral transition at the early stage related to hormones, circadian rhythm, and sugar metabolism are quite different between the two cultivars. Almost all DEGs associated with flowering were enriched in photoperiod and circadian clock pathways. This study provides a new sight for understanding the molecular mechanisms responsible for changes between PF and seasonal flowering (SF) longan genotypes (Jue et al. 2018). Longan fruits are rich in nutrient components, such as polysaccharides, alkaloids, phenolics, and flavonoids (Lin et al. 2017a, b). To understand the coloring mechanism of specific red pericarp (RP) longan, a longan variety that its pericarp are rich in amounts of bioactive compounds with excellent antioxidant, RP longan and ‘SX’ longan, were used in metabolome and transcriptome analysis to reveal the metabolites and molecular mechanism of RP longan. Among five types of anthocyanins identified in the longan pericarp, three cyanidin derivatives were specially identified in RP longan. Transcriptome analysis revealed that the structural genes, such

6

The Longan (Dimocarpus longan) Genome

93

Fig. 6.2 The microscopic observation of different embryogenic cultures in the process of longan somatic embryogenesis; friable-embryogenic callus (EC); incomplete compact pro-embryogenic cultures (ICpEC);

globular embryos (GE); heart-shaped embryos (HE); torpedo-shaped embryos (TE); cotyledonary embryos (CE); A1–A2: tissue culture virus-free plants (provided by Lai Zhongxiong and Lin Yuling research group)

as F3H and F3’H, and the regulatory genes, including MYB, bHLH, NAC, and MADS, which were enriched in anthocyanin biosynthesis pathway, were significantly up-regulated in RP longan. The results provide a new insight for the researching of bioactive compounds such as anthocyanin in longan fruits (Yi et al. 2021). To research the effect of blue light to the metabolites in longan EC, the transcriptome was performed using longan EC under different light treatments (white light, blue light, and dark as control). The results showed that more DEGs were identified in dark versus blue metabolic pathways than in dark versus white metabolic pathways, indicating that blue light plays more

important roles in the synthesis of metabolites in longan EC. Furthermore, HY5, PIF4, and MYC2 are the key regulators in the blue light signaling gene regulatory networks associated with longan functional metabolites (Li et al. 2019a, b). DNA methylation play important roles in regulating gene expression in plant growth and developmental process. 5-Azacytidine (5-AzaC), an inhibitor of DNA methylation, can regulate the change of DNA methylation levels. To investigate the impact of DNA methylation changes in longan EC, RNA-seq was performed using longan EC under 5-AzaC treatment. Compared with non-5-AzaC treatment, the DEGs were significantly enriched in butanoate

94

metabolism, C5-branched dibasic acid metabolism, sulfur metabolism, seleno compound metabolism, and plant hormone signal transduction pathways under 5-AzaC treatment. The results indicated that 5-AzaC treatment positively impacts genes expression in longan EC (Chen et al. 2020a, b).

6.3.2 Single-Cell RNA Sequencing The gene expression profiles of cells decide their protein components. All cells in the same plant contain the same set of genes, but different genes have distinct expression levels in different cell types, leading to the differences in the tissue components between various cells. Therefore, the gene expression patterns decide the cell’s molecular functions. Traditionally, transcriptome sequencing technology used mix samples to detect gene expression, resulting in limitations to research the molecular mechanisms within single cells. Single-cell RNA sequencing (scRNA-seq) technology has developed rapidly in recent years, it enables us to explore gene expression patterns at single cell level, research continuous development processes, and the heterogeneity of cell populations. The development of scRNA-seq technology depends on the development of droplet technology which is analyzed as follows: first, a single cell is encapsulated within an oil droplet, the cell lysed, and the transcripts reverse transcribed onto barcoded beads; second, the transcript library is produced and sequenced, the transcripts within the same cell can be confirmed using the beadderived barcode, and individual transcript can be identified using unique molecular identifiers (UMIs) (Prakadan et al. 2017). To data, scRNAseq technology has been used mostly in animal embryos (Wagner et al. 2018). It also has been applied to the research of plant tissues such as Arabidopsis and rice roots and stems, maize ears, Populus xylem, and Arabidopsis seeds (Nelms and Walbot 2019; Zhang et al. 2019a, b; Liu et al. 2020; Tian et al. 2020; Li et al. 2021; Picard et al. 2021). Based on this approach, Fujian Agriculture and Forestry University have

Y. Chen et al.

performed scRNA-seq of the EC in longan to study the cell types and heterogeneity, reconstructed a continuous differentiation trajectory, and identified the regulatory network during the SE. Through our established protoplast isolation method, 28,727 cells were captured, which was higher than the numbers obtained through protoplast isolation of Arabidopsis root, indicating that the method is appropriate for the protoplast isolation of longan EC (Fig. 6.3). By analyzing the up-regulated marker genes in each cell from the 28,727 protoplasts, 12 cell clusters were identified. By analyzing the top 20 up-regulated genes in each cluster, this confirmed four distinct cell groups: proliferating cells (PC), meristematic cells (MC), vascular cells (VC), and epidermal cells. Clusters 6 and 8 were defined as PC and cell cycle-related genes such as G1/S-specific cyclinE (cluster 6), cyclin-dependent kinase 2;1 (CDKB2;1), and CDKB1;2 (cluster 8) were enriched in these cells. The differentially expressed genes in cluster 6/8 relative to the other clusters were enriched in the ‘ribosome’ Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, revealing that more proteins were synthesized in the PCs. Cluster 9 was defined as MC, and the root meristem growth factor root meristem growth factor 3 (RGF3) was enriched in the cluster 9. The differentially expressed genes between cluster 9 and the other clusters were also enriched in the ‘ribosome’ KEGG pathway, which was similar to the result of the cluster 6 and 8. Cluster 7 was defined as VC, and cell wallrelated and lignin formation-related genes laccase 7 (LAC7), xyloglucan endotransglucosylase/ hydrolase 23 (XTH23), Germin-like protein 6-3 (GLP6-3), and pectin methylesterase 17 (PME17) were up-regulated in the cluster 7. Genes differentially expressed between cluster 7 and other clusters were enriched in the ‘phenylpropanoid biosynthesis’ pathway, and most of the enriched genes were specifically located in the lignin biosynthesis pathway. The results revealed that the VCs were, as expected, involved in secondary wall thickening process.

6

The Longan (Dimocarpus longan) Genome

95

Fig. 6.3 Morphological observation of longan EC protoplast

Cluster 11 was defined as epidermal cells, the epidermal cells formation, cuticular waxes, and cuticles synthesis-related genes such as Lipid transfer protein G16 (LTPG16), GDSL esterase/lipase (GDSL), and Cytochrome P450 family 86 (CYP86) were highly expressed in this cluster. The up-regulated genes in the cluster 11 were enriched in the ‘fatty acid biosynthesis’ KEGG pathway, likely because fatty acids are involved in cuticular waxes and cuticles synthesis in the epidermal cells. Determining the fatty acid composition revealed that C16 and C18 fatty acids were the main fatty acids in the early somatic embryos of longan, which was one of the main fatty acid composition involved in cuticular waxes and cuticles synthesis of epidermis. The results showed that clusters 2, 5, 7, 8, 9, 10, and 11 possess SE-related marker genes, such as Peroxidase 52 (PRX52), PRX53, GLP6, AGP20, LTP4, maternal effect embryo arrest 66 (MEE66), and LTPG16, illustrating that they were the SE-related cell clusters. Combined with the bulk RNA-seq results, the expression of the top 20 up-regulated genes in clusters 8 and 9 increased from EC to GE, and the up-regulated genes in the cluster 2, 5, 7, and 11 showed downregulation from EC to GE stages. This reveals that clusters 8 and 9 are the early SE-

related cell groups, and clusters 2, 5, 7, and 11 are the later SE-related cell groups. To research the continuous cell differentiation trajectory of the EC in longan, pseudo-time analysis was performed on clusters 8, 9, 7, and 11 corresponding to PC, MC, VC, and epidermal cells. The results showed that cells in the cluster 8 and 9 were assembled at the beginning stage, cells in the cluster 7 were assembled at an intermediate stage, and cells in the cluster 11 were assembled at the later stage, and various regulators such as cell cycle genes, ribosome protein-coding genes, vascular tissues formation genes, and epidermalrelated genes were enriched in the continuous differentiation process. In addition, a highly interconnected transcription regulatory network deeply regulated the differentiation process, such as ANAC058, MYB9, ethylene-responsive factor 114 (ERF114), WRKY75, and MYB4. Collectively, the scRNA-seq of longan EC provides a new insight into the somatic embryo division and differentiation. Furthermore, it is the first utilization of scRNA-seq in plant somatic embryos, which will provide more new insight into the molecular biology and cell biology researches not only in longan somatic and zygotic embryos but also in other plants through scRNA-seq technology.

96

6.3.3 microRNAs microRNA (miRNA) is a class of endogenous regulators, which vary in length between 21 and 24 nt, involved in plant growth and development, and has been investigated regarding molecular mechanisms, physiological, and metabolic studies across the plant kingdom. Concomitantly, miRNAs also play an important role in transcription by determining the methylation of chromosomal sites of the target gene in many aspects such as cell division, organ polarity, and plant totipotency. In plants, primary miRNAs (pri-miRNAs) are transcribed by RNA polymerase II (Pol II), and process by Dicer Like 1 (DCL1), HYPONASTIC LEAVES 1 (HYL1), and SERRATE complexes into a 21 nt majority dimer miRNA (miRNA*). These processes completed in the nucleus; then, one of the mature miRNAs is translocated to the cytoplasm and binds to AGO1 to form the RISC complex to cleave or repress translation of target genes, or to mediate methylation of target genes. During plant development, a class of lmiRNAs (long miRNAs) of 24 nt are sheared by DCL3 and binds to AGO4 to form a complex, and these lmiRNAs are involved in plant DNA methylation (Wu et al. 2010). Since the first miRNA was discovered, miRNA sequencing technology has been widely used to detect and quantify the overall distribution of miRNA in plant tissues and across different abiotic stress. At present, chip-seq and high-throughput sequencing have been widely used for the detection and identification of miRNA. A large number of miRNAs have been discovered. According to the principle of completely matching relationships between miRNA and their target genes, Solexa system (Illumina) technology has the three advantages of highthroughput sequencing, bioinformatical analysis, and RACE validation. This technology was first used in Arabidopsis thaliana, and then, it has been widely applied to various plants such as Physcomitrella patens, rice, Vitis vinifera, Glycine max, Zea mays, Cucumis sativus L., and longan. However, there are some advantages of Solexa sequencing such as high efficiency and

Y. Chen et al.

high precision for miRNA identification, and there are also limited by the depth of sequencing and the existence of miRNA special modifications. To address these questions, alternative sequencing technologies have appeared, such as 454 pyrosequencing, SOLID sequencing, cDNA gene cloning sequencing which need not depend on the genome information, and Illumina Hiseq 2000, 2500, 3000, and 4000 platforms which have been widely used to identify miRNA from plant genomes. With the release of longan genome in 2017, Xu et al. has revealed the genomewide dynamic changes of miRNA and their targets during longan early SE (Xu et al. 2020) using the Illumina HiSeq 2000 platform, recently, with the progress of third-generation sequencing by Hi-C technology (Highthroughput chromosome conformation capture). In total, 120 pre-miRNA and 118 miRNA sequences were obtained. However, the expression of pri-miRNA still could not be obtained, and it was limited by the assembly technology and the structure of miRNA. miRNA has been widely understood in longan SE, as an important regulator, participating in many transcriptional pathways. For example, dlo-miR398, dlo-miR159, and dlo-miR156 respond to antioxidant stress (Lin 2011), dlomiR393 participated in auxin synthesis (Lai et al. 2016), lignin biosynthesis is regulated by dlomiR397, dlo-miR408, and dlo-miR166 (Xu et al. 2020), dlo-miR171, dlo-miR390, dlo-miR396, and dlo-miR394 respond to light and functional metabolites (Li et al. 2018). Moreover, the regulatory network related to long noncoding RNA (lncRNA) and miRNA has been constructed during longan early SE (Chen et al. 2018). In 2013, the internal control of miRNA have been initially screened out, and the detection system of miRNA expression pattern has been constructed during longan SE, and it demonstrated that U6 snRNA and 5S rRNA were more suitable used as reference genes for miRNA qPCR (Lin and Lai 2013). Subsequently, the miRNA transcriptome analysis showed that dlomiR156 family and dlo-miR166 were specifically expressed in the early SE stage, dlo-miR160, dlomiR159, dlo-miR390, and dlo-miR398b

6

The Longan (Dimocarpus longan) Genome

97

regulated heart-shaped and torpedo-shaped embryo formation, and dlo-miR167, dlomiR168, dlo-miR397, and dlo-miR398 were related to cotyledonary embryonic development (Lin et al. 2013). Auxin signal transduction plays an important role during SE differentiation, and miRNA are likely to be involved in this process by targeting genes and transcription factors (TFs). The miR390-TAS3-ARF3/-4 autoregulatory network has been verified in longan SE (Lin et al. 2015b), and the eTM-miR160-ARF10-16-17 regulatory pathway was confirmed to participate in longan SE by the modified RLM-RACE PCR and qRT-PCR methods (Lin et al. 2015a). This revealed that miR160 and miR390 played an essential role in the totipotency of longan SE. Furthermore, Xu et al. (2020) constructed the regulatory network of dlo-miR166a-3p, dlomiR397a, and dlo-miR408-3p and their targets in longan early SE and revealed that some miRNAs regulate lignin metabolism during longan early SE (Fig. 6.4). In addition, the characteristics of structure and molecular evolution of dlomiR166 (Lin et al. 2017a), dlo-miR159 (Chen et al. 2017), dlo-miR171 (Zeng et al. 2017; Liyao

et al. 2018), dlo-miR167 (Wang et al. 2021), dlomiR396 (Li et al. 2019a, b), dlo-miR397 (Xu et al. 2019), and dlo-miR403 (Su et al. 2019) families also have been analyzed during longan early SE and have unraveled how researchers could use these miRNA to influence longan variety improvement by regulating their target genes during SE. Previous works also focused on pre-miRNA cloning, the molecular characterization of miRNA genes, promoter region analysis, and their transcript abundance in different stages of SE and tissues (Lin et al. 2021). In total, 80 premiRNAs were predicted, 13 longan pri-miRNA were verified by RLM-RACE and SMARTRACE, and the multiple transcription start sites (TSSs) of pri-miR156, pri-miR168, pri-miR170, pri-miR319, and pri-miR2118 were polymorphic. The cis-elements of the promoter in 19 MIR genes (MIR156, MIR319, MIR160, MIR394, and MIR395 et al.) contained light-responsive, abiotic stress response, and hormone-responsive elements. These indicated that the expression of miRNA would be regulated by different abiotic stress and participate in regulating longan early SE.

Fig. 6.4 The schematic of miRNA mediated target genes and involved in early SE morphogenesis of longan. a Represent some miRNA may participate in the lignin pathway and influence the activity of RNA pol II. b Represents some miRNA may participate in

Brassinosteroid biosynthesis, Spliceosome, Tyrosine metabolism. c Represents some DE miRNA participate in different early SE of longan. From Xu et al. (2020) under a Creative Commons license

98

Despite the diverse and important previous work of miRNA in longan SE, in-depth functional studies are still lacking regarding primiRNA. Recent researches evidenced that there were also miRNA-encoded peptide (miPEP) coding by efficient small open reading frame (sORF) involved in regulating miRNA and their target genes expression in longan SE. Such as miPEP319 was generated from pri-miR319, that including three candidate sORF totally, and might code miPEP319-1, miPEP319-2, and miPEP319-3 three peptide. Our researches found that it was interesting that MIPEP319-2 has biological activity, promoting the expression of miR319 family members and four pre-miR319s, and then influencing the development of longan SEs (Zeng 2017). Because longan SEs was closely involved in the longan production, miPEP as an important small molecule fertilizer, it might be hopefully applied to spray peptide to improve longan yield production. However, this work was the first step to study MIPEP biological function in longan. Subsequently, using in planta transformation, synthetic miPEP319 transfection technology and Agrobacterium-mediated transformation methods to transiently express in tobacco, the biological function of MIPEP319-2 during the longan SE has been demonstrated (Su et al. 2021). This is the first report MIPEP functional validation in woody plant SE. In recent years, some evidences showed that transfection method of artificial miRNA was effective in longan SE development. Pre-miR166 responds to 2,4-D, abscisic acid and ethylene, however, there were different expression patterns between different transcripts of pre-miR166 S338 and pre-miR166 S78, with the former more sensitive in responding to hormones than premiR166 S78 (Zhang et al. 2020). miR166a.2agomir, miR166a.2-antagomir, and miPEP166 S338 promote or inhibit the expression of miR166a.2 and ATHB15, although they have no consistent linear synchronization. This demonstrated that pre-miRNA transcription level and miRNA are not simply linearly correlated. In addition, Li et al. (2018) found that miR393,

Y. Chen et al.

miR394, and miR395 responded to blue light and promoted flavonoid accumulation. miRNA sequencing has revealed that dlo-mi171f, dlomiR396, and dlo-miR390 target DlDELLA, DlEIN3, DlBRI1, and DlEBF1/2 responding to gibberellin, ethylene, and brassinosteroids (Li et al. 2019a, b). This research indicated that miRNAs play essential roles in the accumulation of functional metabolites during longan SE, and it may be possible to direct the production of longan medicinal substances such as flavonoid and carotenoid by regulating miRNA.

6.3.4 Long Noncoding RNAs Long noncoding RNAs (lncRNAs) are more than 200 nts, most of which do not have open reading frame (ORF) and do not necessarily have a cap structure with a polyA tail. The conservative expression level of lncRNAs is lower than that of protein-coding mRNAs. lncRNAs play important regulatory roles at the gene transcription level (Pruneski et al. 2011), post-transcriptional level (Tripathi et al. 2010), translational level (Michael and Nicholas 2013), and epigenetic level (Magistri et al. 2012; Johnsson et al. 2014) and are also involved in protein modification, transcriptional interference, variable cleavage, and regulation of DNA methylation process. With the development of high-throughput sequencing technologies, thousands of lncRNAs have been identified in a variety of plant species, including rice (Oryza sativa), tomato (Solanum lycopersicum), cucumber (Cucumis sativus), banana (Musa itinerans), tea (Camellia sinensis), and orange (Citrus sinensis). Chen et al. (2018) used the Illumina HiSeq sequencing platform to isolate and identify lncRNA from the early longan SE. A total of 7,643 lncRNAs were identified in the genome, and of these, 6,005 lncRNAs were expressed. The number of lncRNAs specifically expressed in the GE stage is about 2.4 times that of the EC and ICpEC stages, indicating that more lncRNAs are required to participate in the morphological

6

The Longan (Dimocarpus longan) Genome

maintenance during the GE stage. During the early stage of longan SE, 1404 lncRNAs obtained family annotation, most lncRNAs belonged to miRNA, sRNA, and snoRNA families, a few were annotated to lncRNA family. It can be speculated that the majority of lncRNAs in the early development of longan SE play a regulatory role with other noncoding RNAs. In the KEGG analysis, it was found that most of the differentially expressed target genes (mRNA) of lncRNAs were enriched in the ‘plant–pathogen interaction’ and ‘plant hormone signaling’ pathways during the early development of the longan SE. 24 significant differentially expressed lncRNAs (EC vs. GE) and the five lncRNAs and their target genes related to ‘auxin response factor’ in KEGG enrichment expression in the early development of SE had the same expression trend in real-time Quantitative PCR (qPCR) and RNA-Seq (Chen et al. 2018). It is speculated that lncRNAs involved in auxin response have a regulatory role during the early development of SE. Through the analysis of the positional relationship between lncRNAs and target genes, it was found that lncRNAs have multiple roles in the target genes during the early development of the longan SE, which further confirms the complex regulatory networks. In lncRNA-miRNAmRNA relationship prediction, seven lncRNAs were predicted as potential miRNA precursors, 1765 lncRNAs were predicted as targets of miRNAs, and 40 lncRNAs were used as endogenous target mimics (eTMs) to regulate miRNAs (add citation). In addition, expression of Dlo-miR159a.1 and Dlo-miR398a did not correlate with expression of their related lncRNA and mRNA expression, while the Dlo-miR172a causing a greater impact. It is speculated that Dlo-miR172a can target and inhibit the expression of the lncRNA LTCONS-00042843 during the early stage of the longan SE, but the effect of LTCONS-00042843 on its target gene Dlo018542.1 was to promote its expression (Fig. 6.5). These findings provide important insights into lncRNAs and lay the foundation for the future functional analysis of lncRNAs during the early development of longan SE.

99

6.3.5 Circular RNAs Circular RNAs (circRNAs) are a class of ncRNA molecules that are joined head to tail. Growing evidence has demonstrated that circRNAs play key roles in plant developmental processes and environmental responses (Matsui et al. 2013; Ye et al. 2015). Owing to the development of highthroughput sequencing technologies and bioinformatics analytical methods, thousands of circRNAs have been identified in a variety of plant species, including Arabidopsis (Ye et al. 2015; Zhang et al. 2019a, b), rice (Ye et al. 2015; Fan et al. 2020), wheat (Triticum aestivum) (Wang et al. 2017a, b), soybean (Glycine max) (Zhao et al. 2017), and maize (Zea mays) (Zhang et al. 2019a, b; Chen et al. 2018), and also in horticultural plants such as tomato (Tan et al. 2017), kiwifruit (Actinidia chinensis) (Wang et al. 2017a, b), pear (Pyrus betulifolia) (Wang et al. 2018), tea (Tong et al. 2018), grape (Vitis vinifera) (Gao et al. 2019), and cucumber (Zhu et al. 2019). However, the regulation and function of circRNAs in plant SE remains largely unknown. To date, the role of circRNAs in embryos has been reported only in animals. It has been reported that circRNAs are potentially involved in chromosome organization, cell cycle regulation, and DNA repair in mouse early embryos (Fan et al. 2015). Similarly, circRNAs host genes in human embryos have been shown to play key roles in genes of organelle organization, chromosome organization, cell cycle process, and regulation of metabolic process, which suggested important roles of circRNAs in mammalian embryonic development (Dang et al. 2016). The potential function and regulatory mechanisms of circRNAs during longan early SE were investigated by circRNA-sequencing analysis. The results showed that a total of 5029 circRNAs were identified across the three stages of longan early SE. Among these circRNA, intronic circRNAs constituted the highest proportion (more than 50%) in longan early SE, followed by intergenic circRNAs and exonic circRNAs. And these circRNAs were exhibited tissue-specific expression patterns during longan early SE. Next, KEGG analyses were performed to analyze

100

Y. Chen et al.

Fig. 6.5 Regulatory network of lncRNAs during the early development of longan somatic embryos. a The overexpression vector of LTCONS-00042843 and LTCONS00006334 was transferred into longan protoplasts, and lncRNA was positively regulated by its target gene. b In the ‘phytohormone signal transduction’ pathway, there are 6 lncRNAs that have positive regulation of AUX/IAA, ABF, ARF, and ERF-related genes. c In the early

development of longan somatic embryos, the number of lncRNAs specifically expressed in EC, ICpEC, and GE stages were 159, 153, and 375, respectively. 24 specifically expressed lncRNAs were verified by qRT-PCR, of which 8 lncRNAs were highly expressed in the EC stage, 4 were highly expressed in the ICpEC stage, and 12 were highly expressed in the GE stage

differentially expressed (DE) circRNA host genes among the three comparison groups (EC vs. GE, EC vs. ICpEC, and GE vs. ICpEC). This revealed that DE circRNA host genes involved in the ‘non-homologous end-joining’ (NHEJ) and ‘butanoate metabolism’ pathways were highly enriched in longan early SE. In addition, a

competing endogenous RNA (ceRNA) regulatory network interaction network of lncRNAs, circRNAs, and mRNAs acting in concert with miRNAs was constructed. 678 circRNAs were found to be potential ceRNAs to combine with miRNAs, and these miRNAs had target genes that were significantly enriched in ‘phenylalanine

6

The Longan (Dimocarpus longan) Genome

metabolism’, ‘mitogen-activated protein kinase (MAPK) signaling pathway–plant’, ‘nitrogen metabolism’, ‘glycine, serine, and threonine metabolism’, ‘galactose metabolism’, and ‘phenylpropanoid biosynthesis’ pathways (unpublished). These results suggested that circRNAs potentially contribute to longan early SE through regulation of genes in these pathways, which will be valuable for understanding the function of circRNAs in plant SE.

6.4

DNA Methylation Sequencing

DNA methylation, one of the most important forms of epigenetic modification, is catalyzed by DNA methyltransferase, using the methyl group provided by S-adenosyl-l-methionine (SAM) to methylate the fifth carbon of cytosine (Zemach et al. 2010). However, active DNA demethylation is initiated by DNA glycosylase/lyase enzymes through a base excision repair pathway (OrtegaGalisteo et al. 2008; Penterman et al. 2007; Gehring et al. 2006). Unlike in mammals, DNA methylation in plant can be classified into three contexts, CG and CHG (in which H = A, T or C), and CHH (Zhang et al. 2006; Lister et al. 2008). CG and CHG methylation is maintained by DNA METHYLTRANSFERASE1 (MET1) (Chan et al. 2005), plant-specific DNA METHYLTRANSFERASE CHROMOMETHYLASE3 (CMT3), and CMT2 (Stroud et al. 2014; Lindroth et al. 2001). CHH methylation can be maintained by CMT2 and DOMAIN REARRANGED METHYLASE 2 (DRM2). De novo methylation of all three contexts is established through the RNA-directed DNA methylation (RdDM) pathway, which is directed by 24-nt small interfering RNA (siRNA) and involves DRM2 and DRM1 (Law and Jacobsen 2010). Active demethylation is mediated by four major DNA glycosylases, including REPRESSOR OF SILENCING 1 (ROS1) (Gong et al. 2002), DEMETER (DME) (Gehring et al. 2006), DEMETER-LIKE protein 2 (DML2), and DML3 (Penterman et al. 2007; Ortega-Galisteo et al. 2008). At present, the common sequencing methods for DNA methylation are as follows: whole

101

genome bisulfite sequencing (WGBS) (Ehsan et al. 2013), oxidative bisulfite sequencing (Booth et al. 2013), reduced representation bisulfite sequencing (Hascher et al. 2014), single cell whole genome methylation sequencing (Zhu et al. 2018), bisulfite amplicon sequencing (ShoreyKendrick et al. 2017), and hydroxymethylated DNA immunoprecipitation sequencing (Wu et al. 2011). Besides, enzymatic-linked immunosorbent assay, high-performance liquid chromatography, and methyl-sensitive amplification polymorphism methods are also used in detecting DNA methylation. Growing evidence has demonstrated that DNA methylation is associated with many biological processes, including plant growth and development and responses to environmental stimuli (Zhang et al. 2018). It has been reported that DNA methylation also plays an important regulatory role during the development of the SE (Lee et al. 2016; Birnbaum and Roudier 2017; Lee and Seo 2018). In longan early SE system, singlebase resolution maps of DNA methylation for EC, ICpEC, and GE were generated by WGBS. A global decrease in DNA methylation during longan early SE, and the loss in DNA methylation, probably associated with DNA methyltransferase genes and DlROS1 expression levels, was found. Moreover, the application of a DNA methylation inhibitor, 5-azadC, promoted the formation of GE and enhanced the capability of longan SE, which supported the idea that DNA demethylation is a necessary process during longan early SE (Chen et al. 2020a, b). Consistent with these results, 5-azadC has been shown to promote the formation of SE in Pinus pinaster (Klimaszewska et al. 2009), Brassica napus (Solís et al. 2015), Hordeum vulgare (Solís et al. 2015), and Theobroma cacao (Quinga et al. 2017). These results suggest a critical role of DNA methylation during SE development. DNA hypomethylation as epigenetic events may play essential roles in activating expression of specific transcription factor, hormonal or developmental genes and being responsible for totipotency acquisition and embryogenesis initiation (Pasternak and Dudits 2019). It was found that the application of the DNA methylation inhibitor zebularine in non-embryogenic calluses

102

Y. Chen et al.

increased the number of embryos and activated the transcription of hormone-related genes (IAA14, CKX6, LBD1/3, LOX1, and CRF4.1) and promotes SE (Li et al. 2019a, b). It was also reported that high expression of SE-related genes, including SOMATIC EMBRYOGENESIS RECEPTOR-LIKE KINASE 1 (SERK1), LEAFY COTYLEDON 2 (LEC2), and WUSCHEL (WUS) genes in Boesenbergia rotunda SE, was linked with decreased levels of DNA methylation (Karim et al. 2018). During longan early SE, genes involved in zein biosynthesis, fatty acid biosynthesis, circadian rhythm, and mitophagy pathways were active, which suggested that SEinduced hypomethylation is associated with activation of genes involved in these pathways that are important for longan early SE development (Chen et al. 2020a, b).

6.5

Proteomics

Proteomics is a science that studies the protein composition of cells, tissues, or organisms and their changing laws with proteome as the research object. In 1996, Marc Wilkins (Wilkins 1996) first proposed the concept of ‘proteomics’ which has since become the focus of life science research. Commonly used techniques in proteomics include protein separation techniques, quantitative techniques, and peptide enrichment techniques. In recent years, researchers have conducted proteomic studies on the SE of many plants, such as cassava (Li et al. 2010), sweet orange (Pan et al. 2009), cyclamen (Rode et al. 2012), and dates, providing new proteomic evidence for explaining the molecular mechanism of somatic cell generation. Kumaravel et al. (2020) used two-dimensional gel electrophoresis and mass spectrometry to identify differentially expressed proteins in the developmental stages of Musa spp. cv. Grand Naine somatic embryos. A total of 25 protein spots were differentially expressed in the continuous developmental stages of somatic embryos. The functional annotations of the identified spots indicate that the main proteins are involved in growth and development processes (17%), followed by

defense responses (12%) and signaling events (12%). During the early SE stage, cell division and growth-related proteins participate in the induction of somatic embryos, while in the later stage of development, cell wall-related proteins and stress-related proteins play a defensive role against dehydration and osmotic stress, leading to somatic cell maturation (Kumaravel et al. 2020). Some proteomics studies have been carried out in the longan SE, including friable EC, early SE, middle SE, and SE maturation. In addition, the differential proteomics of floral transformation and floral reversal in longan have also been studied. Chen et al. used IEF and SDS-PAGE to analyze the differentially expressed proteins in NEC, EC-I, EC-II, ICpEC, CpEC, GE, Pro-CE, and CE stages. Using the same method, Wang (2003) studied the different-expressed proteins during somatic embryogenesis of longan, including embryogenic callus, spherical embryo, torpedo embryo, cotyledon embryo, and mature embryo stage. Li (2003)’s research shows that specific expressed proteins can be detected during the maturation process and each maturation stage of longan somatic embryos, and the protein synthesis during the maturation process of longan somatic embryos is most active under the cocultivation conditions of darkness + low sugar + ABA. Anna et al. showed that during somatic embryo development, the acidic protein in pI 4–5 gradually decreased, while the acidic protein in pI 5–6 gradually increased. During somatic embryo maturation, the number of proteins with large molecular weight decreased significantly, while the number of proteins with small molecular weight increased gradually. Guo (2007) analyzed the proteins in the low temperature preculture process of longan spherical embryos and identified the specifically expressed proteins LLT1, LLT2, LLT3, and LLT4. Lai et al. (2012) used two-dimensional electrophoresis and mass spectrometry to analyze the proteomic changes during the longan early SE (friable-embryogenic callus, embryogenic callus II, ICpEC, compact pro-embryogenic cultures globular embryos, and GE). Through its functional analysis, it was inferred that energy and

6

The Longan (Dimocarpus longan) Genome

sugar metabolism were the basis of the early SE. The oxidative stress response was a prerequisite for SE, and through proteins involved in cytoskeleton stabilization, nitrogen metabolism, signal transduction, gene regulation, protein translation, processing, modification, and localization, it constitutes a large longan somatic cell embryogenesis protein regulatory network system to ensure the normal development of somatic embryos. In the middle stage of longan SE, the number of proteins expressed decreased with the development of SE but increased in torpedo embryogenesis. More than 1/2 of the proteins are related to metabolic pathways, and many proteins were related to oxidative stress. With the development of SE, cells continue to differentiate, and the types of proteins expressed in somatic embryos decrease. Oxidative stress-related proteins, RAN2 and GTPase ObgE, were involved in the regulation of SE in longan (Fang et al. 2011). During the maturation of longan SE, mass spectrometry (combined MALDI-TOF-MS + MS/MS) succeeded in identifying 29 proteins, 13.79% of which were related to protein synthesis, 13.79% related to energy and carbohydrate metabolism, 20.69% related to stress response and antioxidant activity, 3.45% involved in amino acid metabolism, 3.45% involved in signal transduction, 3.45% structural proteins involved in cytoskeletal remodeling and auxiliary transport, 6.90% vitamin metabolism, and 6.90% involved in nucleic acid metabolism (He 2009). In view of previous studies, Chen (2018) conducted itRAQ-labeled proteomic sequencing analysis of early somatic embryogenesis of longan, and 5035 proteins were identified at NEC, EC, ICpEC, and GE stages (Chen 2018). Metabolic pathways and synthesis of secondary metabolites were the main KEGG metabolic pathways in which the identified proteins were enriched. The differentially expressed proteins such as LEC1-like, chitinase CHI5/-4, Arabinogalactan AGP8, and glycoprotein EP1 may regulate the early somatic embryo development of longan. At the same time, carbohydrate and energy metabolism-related protein mass in longan are remarkably different expression of early embryogenesis, glycolysis, alcohol fermentation,

103

and pentose phosphate pathway. Differentially expressed proteins basically showed an upward trend in the Krebs cycle at the GE stage, and carbohydrate and energy metabolism-related proteins may be involved in the early regulation of longan SE. During the longan early SE, auxin, abscisic acid, and gibberellin-related proteins, hormones such as stress response protein CAT, POD, APX, and GPX, lipid metabolism, protein metabolism, and cytoskeletal structure-related proteins occurred significantly differentially expressed. It is related to longan SE and early somatic embryo development. You (2009) took longan normal and reversed flower buds (when the bud reverts to a vegetative branch) and used proteomics to compare the changes between normal and reversed flower buds. Through this, biological functions of substances and energy metabolism-related proteins, transcription and translation-related proteins, secondary metabolism-related proteins, regulationrelated proteins, stress resistance-related proteins, and cytoskeleton proteins related to the reversal of longan flower formation were identified. The differential expression of these proteins during the reversal of longan flower formation may have affected the normal development of flower buds, which in turn led to the reversal of longan flower formation (You 2009).

6.6

Genetic Transformation

Until now, transgenic technology has not been widely used in longan. In 1998, GUS transient expression in embryogenic callus of longan was first studied by the gene gun-mediated transformation and agrobacterium-mediated transformation, and finally, the regenerative plant was obtained (Zeng et al. 2000). Chen et al. (2018) used longan soil to sow seedlings for wounding and then used Agrobacterium tumefaciensmediated method to infect the wounds and established a method for rapid gene transfer of longan (Fig. 6.6). Zhang (2004) also constructed an optimal transformation receptor system by agrobacterium-mediated transformation method in longan and identified that the 10-15d culturing

104

growth status of EC was critical for the transgenic receptor system. The PEAS gene was imported into embryogenic callus longan, and obtained five resistant cell line make sure only one of them included hpt resistant gene (Zhang 2004). Next, for further constructing the stable genetic transformation system in longan SE, Xu Fig. 6.6 The process of obtaining transgenic longan plants. A1-6: obtaining resistant shoots by removing the apical bud method; B1-6: Obtaining resistant shoots by topping (including leaves) method; C1-3: Transplanting the transgenic longan (Chen 2018)

Y. Chen et al.

optimized the agrobacterium-mediated transformation method, showing that some factors such as agrobacterium suspension concentrations, the time of agrobacterium infection, the time of coculture, the temperature of coculture, and the concentration of 2,4-D in coculture medium all influence the success of longan EC genetic

6

The Longan (Dimocarpus longan) Genome

105

transformation (Xu 2010). However, due to the sensitivity of longan EC to antibiotics and agrobacterium, the resistant regenerative longan embryoid could not be obtained by 50 mg/L antibiotic (Hyg) screening and demonstrated a low rate of transformation efficiency. To address this problem, in planta transformation technology was carried out in longan seedlings. This method could obtain a large number of transgenic plants and with highly genetic transformation rate (Chen et al. 2020b; Tian 2017); however, the target genes could not be integrated into the genome and inherited to next generation by this method. All of the above research meant that stable genetic regeneration system of longan EC has not been established, which has largely hindered the development of longan transgenic regeneration research. Based on these research results, we still investigate the stable genetic regeneration system in longan. Meanwhile, stable transgenic research of miRNA during longan early SE has not been reported. Based on the agrobacterium-mediated genetic transformation method, we have successfully found that when the pri-miRNA sequences were constructed into a pCAMBIA1301: GUS fusion expression vector, stable over-expression during the early SE was possible. More importantly, the homozygous early globular embryo stage of longan was obtained (Fig. 6.7; unpublished). With the development of transgenic technology in longan EC, more and more gene functions will be verified and revealed.

WT

100 µm

Fig. 6.7 The transgenic cell lines of MIR408 during longan early SE. WT represents the GE cell line of wild type, and MIR408-OE represents the GE cell line of pri-

6.7

Future Prospects

With the deciphering of the genome, longan research has entered the post-genome era. Genomics is the study of the structure, function, evolution, positioning and editing of the genome, and their impact on organisms. In recent years, 3D genomics has developed rapidly, which enables us to understand the 3D structure of longan genome in more detail. In the future, through the analysis of multi-omics data, it will be possible to better reveal the interaction of regulatory elements at the spatial level during the development of longan and to understand the regulatory mechanism of chromatin conformation on gene expression. Based on the mature research results of longan genomics and model plants or related species, we will conduct in-depth research on the genetics of key biological traits of longan, further analyze the molecular basis of related traits of longan, and strengthen the molecular mechanism of the formation of agronomic traits such as fruit weight and fruit quality. The main way of longan breeding is traditional breeding such as seedling and bud mutation breeding, which uses genetic variation mainly produced by natural process or physical and chemical mutation. The low probability of occurrence of these mutations, unpredictable mutation sites, increasingly depleted seedling resources, unstable bud change, and other disadvantages greatly limit its scope and extend the breeding cycle. With the development

MIR408-OE

20 µm

miR408 transgene constructs to the pCAMBIA1301: GUS fusion expression vector, and over-expression in early SE

106

of genome deciphering and biotechnology, the application of gene editing technology in longan breeding has gradually become a trend, which can promote the improvement of longan varieties more purposefully and efficiently. Longan fruit is rich in secondary metabolites such as phenols and is known as ‘Southern Ginseng’. Using longan genome data to further study the formation and accumulation mechanism of longan medicinal ingredients, with a view to increasing the content of medicinal ingredients or factory production, and further used to improve the health quality of longan, SE is a model system for studying the molecular regulation mechanism of plant embryogenesis, especially early embryonic development. The SE system established in longan has the characteristics of high synchronization, high frequency, and strong regeneration ability and is an excellent experimental system for studying the embryogenesis of woody plants. The deciphering of longan genome not only lays the foundation and provides a good platform for the study of longan molecular biology, but also helps to explore the totipotency mechanism of plant cells by using the SE system of longan.

References Adams MD, Celniker SE, Holt RA et al (2000) The genome sequence of Drosophila melanogaster. Science 287:2185–2195 Birnbaum KD, Roudier F (2017) Epigenetic memory and cell fate reprogramming in plants. Regeneration 4 (1):15–20 Booth MJ, Ost T , D B eraldi, et al (2013) Oxidative bisulfite sequencing of 5-methylcytosine and 5hydroxymethylcytosine. Nat Protoc 8(10):1841–1851 Chan WL, Henderson IR, Jacobsen SE (2005) Gardening the genome: DNA methylation in Arabidopsis thaliana. Not Rev Genet 6(5):351–360 Chen L, Zhang P, Fan Y, Lu Q, Li Q, Yan J, Muehlbauer GJ, Schnable PS, Dai M, Li L (2018) Circular RNAs mediated by transposons are associated with transcriptomic and phenotypic variation in maize. New Phytol 217(3) Chen R, Chen X, Huo W, Zheng S, Lin Y, Lai Z (2020a) Transcriptome analysis of azacitidine (5-AzaC) treatment affecting the development of early somatic embryogenesis in longan. J Hortic Sci Biotechnol 96:1–13

Y. Chen et al. Chen X, Xu X, Shen X, Li H, Zhu C, Chen R, Munir N, Zhang Z, Chen Y, Xuhan X, Lin Y, Lai Z (2020b) Genome-wide investigation of DNA methylation dynamics reveals a critical role of DNA demethylation during the early somatic embryogenesis of Dimocarpus longan Lour. Tree Physiol 40(12):1807–1826 Chen X, Zeng Y, Wang J et al (2017) Evolutionary characteristics of miR159 gene family in Dimocarpus longan Lour., and their spatial and temporal expression. Chin J Appl Environ Biol 23:602–608 Chen YK (2018) Transcriptome and proteome analysis during early somatic embryogenesis and expression, Functional analysis of flowering time related genes in Dimocarpus longan Lour. Fujian Agriculture and Forestry University Dang YJ, Yan LY, Hu BQ et al (2016) Tracing the expression of circular RNAs in human preimplantation embryos. Genome Biol 17:130 Ehsan H, Brinkman A B, Arand J, et al (2013) Wholegenome bisulfite sequencing of two distinct interconvertible DNA methylomes of mouse embryonic stem cells. Cell Stem Cell 13(3):360–369 Fan J, Quan W, Li G, Hu X, Wang Q, Wang H, Li X, Luo X, Feng Q, Hu Z, Feng H, Pu M, Zhao J, Huang Y, Li Y, Zhang Y, Wang W (2020) circRNAs Are Involved in the Rice-Magnaporthe oryzae Interaction. Plant Physiol 182(1):272–286 Fan X, Zhang X, Wu X, Guo H, Hu Y, Tang F, Huang Y (2015) Single-cell RNA-seq transcriptome analysis of linear and circular RNAs in mouse preimplantation embryos. Genome Biol 16:148 Fang ZZ, Lai ZX, Lai C et al (2011) Proteomics on somatic embryogenesis during the middle stage in Longan. Sci Agric Sinica 44:2966–2979 Gao Z, Li J, Luo M, Li H, Chen Q, Wang L, Song S, Zhao L, Xu W, Zhang C, Wang S, Ma C (2019) Characterization and cloning of grape circular RNAs identified the cold resistance-related Vv-circATS11. Plant Physiol 180(2):966–985 Gehring M, Jin HH, Hsieh TF, Penterman J, Choi Y, Harada JJ, Goldberg RB, Fischer RL (2006) DEMETER DNA glycosylase establishes MEDEA polycomb gene self-imprinting by allele-specific demethylation. Cell 124(3):495–506 Gong Z, Morales-Ruiz T, Ariza RR, Roldán-Arjona T, David L, Zhu JK (2002) ROS1, a repressor of transcriptional gene silencing in Arabidopsis, encodes a DNA glycosylase/lyase. Cell 111(6):803–814 Guo YQ (2007) Studies on the cryopreservation of embryogenic cultures from longan (Dimocarpus longan Lour.). Fujian Agriculture and Forestry University Hascher A, Haase AK, Hebestreit K, Rohde C, Klein HU, Rius M, Jungen D, Witten A, Stoll M, Schulze I (2014) DNA methyltransferase inhibition reverses epigenetically embedded phenotypes in lung cancer preferentially affecting polycomb target genes. Clin Cancer Res Official J Am Assoc Cancer Res 20(4):814 He Y (2009) Studies on proteomics during somatic embryo maturation in Dimocarpus longan Lour. Fujian Agriculture and Forestry University

6

The Longan (Dimocarpus longan) Genome

Huang JS, Xu XD, Zheng SQ et al (2001) Selection for aborted-seeded longan cultivars. Acta Horticulturae, 115–118 Jaillon O, Aury JM, Noel B et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449(7161):463–467 Johnsson P, Lipovich L, Grandér D et al (2014) Evolutionary conservation of long non-coding RNAs; sequence, structure, function. Biochim Biophys Acta 1840:1063–1071 Jue D, Sang X, Liu L, et al (2018) Identification of WRKY gene family from Dimocarpus longan and its expression analysis during flower induction and abiotic stress responses. Int J Mol Sci 19(8):2169 Karim R, Tan YS, Singh P, Khalid N, Harikrishna JA (2018) Expression and DNA methylation of SERK, BBM, LEC2 and WUS genes in in vitro cultures of Boesenbergia rotunda (L.) Mansf. Physiol Mole Biol Plants 24(5):741–751 Klimaszewska K, Noceda C, Pelletier G, Label P, LeluWalter RRA (2009) Biological characterization of young and aged embryogenic cultures of Pinus pinaster (Ait.). In: Vitro cellular & developmental biology. Plant 45 (1):20–33 Kumaravel M, Uma S, Backiyarani S et al (2020) Proteomic analysis of somatic embryo development in Musa spp. cv. Grand Naine (AAA). Sci Rep 10:4501 Law JA, Jacobsen SE (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet 11(3):204–220 Lai CC, Lai ZX, Fang ZZ et al (2012) Proteomic analysis of early somatic embryogenesis in Longan (Dimocarpus longan Lour.). entia Agricultura Sinica Lai R, Lin Y, Lai Z (2016) Cloning of auxin receptor gene TIR1 and its interaction with miR393 in Dimocarpus longan Lour. Chinese J Appl Environ Biol Lai ZX, Chen CL, Zeng LH et al (2000) Somatic embryogenesis in longan [Dimocarpus longan Lour.]. Springer Netherlands Lee K, Seo PJ (2018) Dynamic epigenetic changes during plant regeneration. Trends Plant Sci 23(3):235–247 Lee K, Park O, Jung S, Seo PJ (2016) Histone deacetylation-mediated cellular dedifferentiation in Arabidopsis. J Plant Physiol 191:95–100 Li H, Dai X, Huang X, Xu M, Wang Q, Yan X, Sederoff R, Li Q. (2021) Single-cell RNA sequencing reveals a high‐resolution cell atlas of xylem in Populus. J Integr Plant Biol Li H, Lin Y, Chen X et al (2018) Effects of blue light on flavonoid accumulation linked to the expression of miR393, miR394 and miR395 in longan embryogenic calli. PLoS One 13:e0191444; Li H, Lyu Y, Chen X, Wang C, Yao D, Ni S, Lin Y, Chen Y, Zhang Z, Lai Z (2019) Exploration of the effect of blue light on functional metabolite accumulation in Longan Embryonic Calli via RNA Sequencing. Int J Mole Sci 20:441 Li J, Wang M, Li Y, Zhang Q, Lindsey K, Daniell H, Jin S, Zhang X (2019b) Multi-omics analyses reveal epigenomics basis for cotton somatic embryogenesis

107 through successive regeneration acclimation process. Plant Biotechnol J 17(2):435–450 Li K, Zhu W, Zeng K et al (2010) Proteome characterization of cassava (Manihot esculenta Crantz) somatic embryos, plantlets and tuberous roots. Proteome Sci 8:10–10 Lin YL (2021) Studies on cloning, expression and regulation of SOD gene family during somatic embryogenesis in Dimocarpus longan Lour. Fujian agriculture and Forestry university Lin YL, Lai et al (2013) Comparative analysis reveals dynamic changes in miRNAs and their targets and expression during somatic embryogenesis in Longan (Dimocarpus longan Lour.). PLoS One 8(4) Lin YL, Lai ZX (2013) Evaluation of suitable reference genes for normalization of microRNA expression by real-time reverse transcription PCR analysis during longan somatic embryogenesis. Plant Physiol Biochem 66:20–25 Lin Y, Lai Z, Tian Q et al (2015a) Endogenous target mimics down-regulate miR160 mediation of ARF10, 16, and -17 cleavage during somatic embryogenesis in Dimocarpus longan Lour. Front Plant Sci 6 Lin Y, Lin L, Lai R et al (2015b) MicroRNA390-directed TAS3 cleavage leads to the production of tasiRNAARF3/4 during somatic embryogenesis in Dimocarpus longan Lour. Front Plant Sci Lin Y, Zhang Q, Zeng Y et al (2017a) Analysis on evolutionary characteristics and the temporal and spatial expression patterns of miR166 Gene family in Dimocarpus longan. Acta Horticulturae Sinica Lin YL, Min JM, Lai RL et al (2017b) Genome-wide sequencing of longan (Dimocarpus longan Lour.) provides insights into molecular basis of its polyphenol-rich characteristics. Gigascience Lin Y, Chen Y, Zeng Y et al (2021) Molecular characterization of miRNA genes and their expression in Dimocarpus longan Lour. Planta 253(2):41 Lindroth AM, Cao X, Jackson JP, Zilberman D (2001) Requirement of Chromomethylase3 for maintenance of CpXpG Methylaton. Science Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis Cell 133(3):523–536 Ortega-Galisteo AP, Morales-Ruiz T, Ariza RR, RoldánArjona T (2008) Arabidopsis DEMETER-LIKE proteins DML2 and DML3 are required for appropriate distribution of DNA methylation marks. Plant Mol Biol 67(6):671–681 Li H, Lyu Y, Chen X, Wang C, Yao D, Ni S, Lin Y, Chen Y, Zhang Z, Lai Z (2019a) Exploration of the effect of blue light on functional metabolite accumulation in Longan Embryonic Calli via RNA Sequencing. Int J Mole Sci 20:441 Liyao S U, Zhang S, Chen X et al (2018) Molecular characteristics of miR171 family members and analysis of the expression pattern of miR171b regulatory targets during early somatic embryogenesis in longan. J Fruit Sci

108 Liu Q, Liang Z, Feng D, Jiang S, Wang Y, Du Z, Li H, Hu G, Zhang P, Ma Y, Lohmann J, Gu X (2020) Transcriptional landscape of rice roots at the single cell resolution. Mole Plant 14 MW (1995) Government backs proteome proposal. Nature 378:653 Magistri M, Faghihi MA, RD SLG et al (2012) Regulation of chromatin structure by long noncoding RNAs: focus on natural antisense transcripts. Trends Genet 28:389–396 Matsui A, Nguyen AH, Nakaminami K, Seki M (2013) Arabidopsis non-coding RNA regulation in abiotic stress responses. Int J Mol Sci 14:22642–22654 Michael H, Nicholas D (2013) The intertwining of transposable elements and non-coding RNAs. Int J Mole Sci 14:13307–13328 Ming R, Hou S, Feng Y et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nat 452(7190):991–996 Nelms B, Walbot V (2019) Defining the developmental program leading to meiosis in maize. Science 364 (6435):52 Pan Z, Guan R, Zhu S et al (2009) Proteomic analysis of somatic embryogenesis in Valencia sweet orange (Citrus sinensis Osbeck). Plant Cell Rep 28:281–289 Pasternak T, Dudits D (2019) Epigenetic clues to better understanding of the asexual embryogenesis in planta and in vitro. Front Plant Sci 10 Penterman J, Zilberman D, Jin HH, Ballinger T, Henikoff S, Fischer RL (2007) DNA demethylation in the Arabidopsis genome. Proc Natl Acad Sci 104 (16):6752–6757 Picard CL, Povilus RA, Williams BP, Gehring M (2021) Transcriptional and imprinting complexity in Arabidopsis seeds at single-nucleus resolution. Nat Plants 7(6):730–738 Prakadan SM, Shalek AK, Weitz DA (2017) Scaling by shrinking: Empowering single-cell ‘omics’ with microfluidic devices. Nat Rev Genet Pruneski JA, Hainer SJ, Petrov KO et al (2011) The Paf1 complex represses SER3 transcription in saccharomyces cerevisiae by facilitating intergenic transcription-dependent nucleosome occupancy of the SER3 promoter. Eukaryot Cell 10:1283 Quinga LAP, Fraga HPDF, Vieira, LDN (2017) Epigenetics of long-term somatic embryogenesis in Theobroma cacao L.: DNA methylation and recovery of embryogenic potential. Plant Cell Tissue Organ Cult (PCTOC) 131(2):295–305 Rode C, Lindhorst K, Braun HP et al (2012) From callus to embryo: a proteomic view on the development and maturation of somatic embryos in Cyclamen persicum. Planta 235:995–1011 Shorey-Kendrick LE, McEvoy CT, Ferguson B, Burchard J, Park BS, Gao L, Vuylsteke BH, Milner KF, Morris CD, Spindel ER (2017) Vitamin C prevents offspring DNA methylation changes associated with maternal smoking in pregnancy. Ame J Respir Crit Care Med 196(6):745–755

Y. Chen et al. Solís M, El-Tantawy A, Cano V, Risueño MC, Testillano PS (2015) 5-azacytidine promotes microspore embryogenesis initiation by decreasing global DNA methylation, but prevents subsequent embryo development in rapeseed and barley. Front Plant Sci 6 Stroud H, Do T, Du J, Zhong X, Feng S, Johnson L, Patel DJ, Jacobsen SE (2014) Non-CG methylation patterns shape the epigenetic landscape in Arabidopsis. Nat Struct Mol Biol 21(1):64–72 Su LY, Jiang MQ, Huang SQ et al (2019) Analysis of the molecular evolution characteristics of miR403 and its expression pattern during early somatic embryogenesis in Dimocarpus longan. J Fruit Sci 36(7):846–856 Tan J, Zhou Z, Niu Y, Sun X, Deng Z (2017) Identification and functional characterization of tomato CircRNAs derived from genes involved in fruit pigment accumulation. Sci Rep 7(1):8594 Tian C, Du Q, Xu M, Du F, Jiao Y (2020) Single-nucleus RNA-seq resolves spatiotemporal developmental trajectories in the tomato shoot apex. In: bioRxiv Tong W, Yu J, Hou Y, Li F, Zhou Q, Wei C, Bennetzen JL (2018) Circular RNA architecture and differentiation during leaf bud to young leaf development in tea (Camellia sinensis). Planta 248(6):1417–1429 Tripathi V, Ellis JD, Shen Z et al (2010) The nuclearretained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mole Cell 39:925–938 Tian QL (2017) Characterization of DlRan3A and DlRan3B gene from embryogenic cultures in Dimocarpus longan Lour Velasco R, Zharkikh A, Affourtit J et al (2010) The genome of the domesticated apple (Malus x domestica Borkh.). Nat Genet 42(10):833–839 Verde I, Abbott AG, Scalabrin S et al (2013) The highquality draft genome of peach (Prunus persica) identi es unique patterns of genetic diversity, domestication and genome evolution. Nat Genet 45(5):487–494 Wagner DE, Weinreb C, Collins ZM, Briggs JA, Megason SG, Klein AM (2018) Single-cell mapping of gene expression landscapes and lineage in the zebrafish embryo. Science 360(6392):981 Wang FH (2003) Differential gene expression in the process of somatic embryogenesis in Dimocarpus longan Lour. Fujian Agriculture and Forestry University Wang J, Li J, Li Z, et al (2022) Genomic insights into longan evolution from a chromosome-level genome assembly and population analysis of longan accessions. Hortic Res 9 Wang JY, Shen X, Chen XH et al (2021) Molecular characteristics and expression pattern of miR167 family and their potential targets during early somatic embryogenesis in longan (Dimocarpus longan Lour.). Chin J Appl Environ Biol 27(1):146–157 Wang J, Lin J, Wang H, Li X, Yang Q, Li H, Chang Y, Yang ZM (2018) Identification and characterization of circRNAs in Pyrus betulifolia Bunge under drought stress. PLoS ONE 13(7):e200692

6

The Longan (Dimocarpus longan) Genome

Wang Y, Yang M, Wei S, Qin F, Zhao H, Suo B (2017a) Identification of circular RNAs and their targets in leaves of Triticum aestivum L. under Dehydration Stress. Front Plant Sci 7 Wang Z, Liu Y, Li D, Li L, Zhang Q, Wang S, Huang H (2017b) Identification of circular RNAs in Kiwifruit and their species-specific response to bacterial canker Pathogen Invasion. Front Plant Sci 8 Wilkins MR, Sanchez JC, Gooley AA, et al (1996) Progress with proteome projects: why all proteins expressed by a genome should be identified and how to do it. Biotechnol Genet Eng Rev 13(1):19–50 Wu L, Zhou HY, Zhang QQ, et al (2010) DNA methylation mediated by a microRNA pathway. Mol Cell 38(3):465–475 Wu H, D’Alessio AC, Ito S, Wang Z, Cui K, Zhao K, Sun YE, Zhang Y (2011) Genome-wide analysis of 5hydroxymethylcytosine distribution reveals its dual function in transcriptional regulation in mouse embryonic stem cells. Genes Dev 25(7):679–684 Xu QF (2010) The optimization of somatic embryogenesis system and its applications to germplasm conservation and transformation in Dimocarpus longan Lour. Fujian Agriculture and Forestry University Xu Q, Chen LL, Ruan X et al (2013) The draft genome of sweet orange (Citrus sinensis). Nat Genet 45(1):59–66 Xu XP, Liao Q, Chen X et al (2019) Molecular characteristics and expression analysis of miR397 family members during the early somatic embryogenesis in Dimocarpus longan Lour. J Fruit Sci 36 (5):567–577 Xu X, Chen X, Chen Y et al (2020) Genome-wide identification of miRNAs and their targets during early somatic embryogenesis in Dimocarpus longan Lour. Sci Rep 10 Ye CY, Chen L, Liu C, Zhu QH, Fan L (2015) Widespread noncoding circular RNAs in plants. New Phytol 208(1):88–95 Yi D, Zhang H, Lai B, et al (2020) Integrative analysis of the coloring mechanism of red longan pericarp through metabolome and transcriptome analyses. J Agric Food Chem 69(6) You X (2009) Analysis of differential proteomics on Longan (Dimocarpus longan Lour.) Floral reversion and normal flowering. Fujian Agriculture and Forestry University Zemach A, McDaniel IE, Silva P, Zilberman D (2010) Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328(5980):916–919

109 Zeng YJ (2017) Application of miPEPs in somatic embryogenesis of Dimocarpus longan Lour. Fujian Agriculture and Forestry University Zeng LH, Chen ZG, Lu LX (2000) A preliminary report on Agrobacterium rhizogenes mediated transformation of longan. J Fujian Agric Univ 29(001):27–30 Zeng Y, Lin Y, Cui T, et al (2017) Evolutionary characterization and expression of miR171 family in Dimocarpus longan Lour. Acta Botanica BorealiOccidentalia Sinica Zhang MX (2004) Establishment of banana and longan transgenie receptor system and the preliminary study on the transformation of the PEAS Gene. Fujian Agric For Univ Zhang H, Lang Z, Zhu J (2018) Dynamics and function of DNA methylation in plants. Nat Rev Mol Cell Biol 19 (8):489–506 Zhang P, Fan Y, Sun X, Chen L, Terzaghi W, Bucher E, Li L, Dai M (2019a) A large-scale circular RNA profiling reveals universal molecular mechanisms responsive to drought stress in maize and Arabidopsis. Plant J 98(4):697–713 Zhang X, Cao W, Wang Y et al (2012) Study of the progress on chemical constituents and pharmacological activities of Longan. Northwest Pharm J Zhang T, Xu Z, Shang G, Wang J (2019b) A Single-Cell RNA sequencing profiles the developmental landscape of arabidopsis root. Mol Plant 12(5):648–660 Zhang X, Yazaki J, Sundaresan A, Cokus S, Chan WL, Chen H, Henderson IR, Shinn P, Pellegrini M, Jacobsen SE (2006) Genome-wide high-resolution mapping and functional analysis of DNA Zhang QL, Su L Y, Zhang ST, et al (2020) Analyses of microRNA166 gene structure, expression, and function during the early stage of somatic embryogenesis in Dimocarpus longan Lour. Plant Physiol Biochem 147:205–214 Zhao W, Cheng Y, Zhang C, You Q, Shen X, Guo W, Jiao Y (2017) Genome-wide identification and characterization of circular RNAs by high throughput sequencing in soybean. Sci Rep 7(1) Zhu P, Guo H, Ren Y, Hou Y, Dong J, Li R, Lian Y, Fan X, Hu B, Gao Y (2018) Single-cell DNA methylome sequencing of human preimplantation embryos. Nat Genet Zhu Y, Jia J, Yang L, Xia Y, Zhang H, Jia J, Zhou R, Nie P, Yin J, Ma D, Liu L (2019) Identification of cucumber circular RNAs responsive to salt stress. BMC Plant Biol 19(1)

7

The Mangosteen Genome Mohd Razik Midin and Hoe-Han Goh

Abstract

Mangosteen is one of the most popular tropical fruits in Southeast Asia. It is called ‘The Queen of Tropical Fruits’ as its thick sepals collectively resemble a crown. Mangosteen fruits contain white and juicy edible pulp with a sweet flavour and pleasant aroma. They are rich in beneficial phytochemicals such as xanthones, which make mangosteen a potential medicinal plant. Traditionally, mangosteen has been used to treat fever, diarrhoea, and wounds. In recent studies, researchers found that mangosteen has anti-cancer and anti-diabetic properties. However, mangosteen is still an underutilised crop due to its slow growth rate with a long juvenile period that usually takes eight to ten years to bear fruit. It is also an obligative apomict with asexual reproduction, hence producing clones of progenies with low genetic variations. Therefore, the breeding programme of mangosteen is challenging with a very low success rate. Furthermore,

Mohd Razik Midin Department of Plant Science, Kulliyyah of Science, International Islamic University Malaysia, 25200 Kuantan, Pahang, Malaysia H.-H. Goh (&) Institute of Systems Biology, Universiti Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia e-mail: [email protected]

genetic information on mangosteen accessions in different countries is limited to unravel its lineage and parental history. Other constraints in mangosteen improvement include low viability of recalcitrant seeds and the lack of a rapid propagation method. Efforts have been made to understand this crop through functional genomic studies. Recent genomic studies of mangosteen, including genome sequencing, genome survey, genome size estimation, and cytogenetic analysis, are highlighted in this chapter.

7.1

The Genus Garcinia L.

The genus Garcinia L. belongs to the family Guttiferae (Clusiaceae) and was characterised by the Swiss botanist, Linnaeus. Linnaeus named the genus after the French naturalist Laurent Garcin (1683–1757) in honour of his botanical contributions in the eighteenth century, and it was Garcin who provided the detailed description of Garcinia fruits (Corner 1952). It is a pantropical genus with 400 species distributed in the Southeast Asian region (Maheshwari 1964; Whitmore 1973). Out of 400 Garcinia species estimated worldwide, 49 species were discovered in Peninsular Malaysia (Whitmore 1973; Nazre et al. 2007). Garcinia species comprises small- to medium-sized dioecious trees or erect shrubs with hard timber and abundant gummy latex

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_7

111

112

(Ridley 1967; Whitmore 1973; Richards 1990a). Morphologically, male flowers have many stamens (with or without a pistillode) and female flowers have a large hypogynous ovary with a sessile, plate-like stigma (with or without staminodes) inserted on a variously shaped receptacle (Richards 1990a; Osman and Milan 2006). According to Richards (1990b), most Garcinia species are thought to be facultative agamosperms, except possibly G. scortechinii King and G. mangostana L. Garcinia is an economically important genus in Southeast Asia. Most of the species produce edible fruit (Yapwatannapun et al. 2002). Their fruits can be used as flavourings, for instance, G. atroviridis Griff. ex T. Anderson, G. gummi-gutta (L.) Roxb. (syn. G. cambogia) (Gaertn.) Dessr., and G. planchonii Pierre are used as ingredients in dishes in Peninsular Malaysia, India, and Indochina (Cox 1976). Numerous species of Garcinia are also used for purposes other than the edible fruits. Several species are important in domestic uses such as landscaping and the furniture industry (Osman and Milan 2006; Nazre et al. 2007). Garcinia species also contain phytochemicals that have pharmaceutical and therapeutic value (Hemshekhar et al. 2011). Several Garcinia species have antioxidant, anti-tumour, and antiviral properties with potential for the treatment of cancer and HIV (Ampofo and Waterman 1986; Rukachaisirikul et al. 2003; Nabandith et al. 2004; Pedraza-Chaverri et al. 2008; Aizat et al. 2019). According to Lim (2012), Garcinia species originate from the Malay Archipelago and several are considered important Asian species of Garcinia including G. dulcis Griffith ex. T. Anderson, G. tinctoria, G. atroviridis (asam gelugor), G. celebica syn. G. hombroniana (seashore mangosteen), G. indica Choisy (kokum), G. prainiana, and G. mangostana (mangosteen) (Osman and Milan 2006). The most famous Garcinia species is G. mangostana as it possesses white and juicy edible pulp with a sweet flavour and pleasant aroma (Jung et al. 2006).

Mohd Razik Midin and H.-H. Goh

7.2

Botanical Description of Mangosteen

Mangosteen is a slow-growing tree that is believed to be native in Malaysia and known as a cultivated species (Verheij 1991). It was recognised and grouped by Linnaeus (1753) in the family Guttiferae (Clusiaceae). Locally, it is called as ‘manggis’. It is a desirable species and sometimes referred to as ‘The Queen of Tropical Fruits’. The following description relies on information provided by Corner (1952), Richards (1990b), and Verheij (1991). Mangosteen is a medium height evergreen tree with a straight trunk. All parts of the plant contain yellow latex. Its leaves are shining and coriaceous, dark green, rarely yellow-green, dull pale green, or yellowgreen beneath. Its flowers are red in colour and solitary, paired, or rarely 3 at apices of branchlets. The ovary is broadly ellipsoid to globose, sessile, and 4–8 celled. Stigma is sessile and smooth. The fruit is a depressed-globose-shaped berry with thick pericarp, dark purple colour with fleshy sweet arils (Richards 1990b; Osman and Milan 2006). Figure 7.1 shows the tree, leaves, and fruits of two varieties of mangosteen in Malaysia, the common mangosteen ‘manggis’ and mesta.

7.2.1 Mangosteen as an Apomictic Species Mangosteen reproduces through apomixis. Apomixis is defined as asexual plant reproduction via seeds from the maternal tissues of the ovule that results in the production of genetically uniform progeny. The term apomixis is synonymous with the term agamospermous (Richards 1997). It can be divided into two main types based on the way of apomictic embryo development: (1) sporophytic apomixis, also known as adventitious embryony and (2) gametophytic apomixis (Koltunow and Grossniklaus 2003; Bicknell and Koltunow 2004; Brukhin 2017; Šarhanová et al. 2017). To date, 250 species from 57 families of

7

The Mangosteen Genome

113

Fig. 7.1 Commercial mangosteen in Malaysia of the a common variety and b Mesta variety located in Pahang, Malaysia (i) Tree (ii), immature, and (iii) ripe fruits. Photos are taken by Mohd Razik Midin.

flowering plants have been categorised as sporophytic apomicts including several economical plant species (Naumova 1992; Naumova et al. 2001; Brukhin 2017). Apomixis has been discovered previously in paleotropical Garcinia species as reported by several researchers (Sprecher 1919; Gustafsson 1946; Horn I940; Grant 1971). According to Richards (1990b), most Garcinia species are facultatively apomictic. In Malaysia, he found that at least ten Garcinia species are facultative apomicts except for two species, G. scortechinii and mangosteen whereby males are absent

(Richards 1990a). The current opinion suggests that mangosteen is exclusively female as it exhibited obligate apomictic and adventitious embryony in which embryo produced via sporophytic mechanism from ovular tissues (Lim 1984; Richards 1990a, b). However, several researchers previously reported the presence of male trees of mangosteen such as Burkill (1935) and Idris and Rukayah (1987). In Peninsular Malaysia (and Southeast Asia in general), the rarity of males in mangosteen may be due to the local planters’ activity in chopping them down as they bear no fruit (Nazre 2014). The only male

114

Mohd Razik Midin and H.-H. Goh

tree found by Idris and Rukayah (1987) at Ulu Kundor Village of Linggi District in Negeri Sembilan had been chopped down by villagers because they believe it was not worth keeping.

7.2.2 Genetic Variation of Mangosteen Although mangosteen has been considered as an obligate apomictic species (Richards 1990b), genetic variation has been reported among its cultivars in molecular studies (Ramage et al. 2004; Mansyah et al. 2010; Sobir et al. 2011; Matra et al. 2016). Sobir et al. (2011) studied the genetic variability of mangosteen in Indonesia through field evaluation and molecular analysis (using RAPD, AFLP, and ISSR markers) to select superior trees with desirable traits. Mansyah et al. (1999) reported that mangosteen trees in West Sumatra exhibit wide variability in leaf length, fruit weight, and rind thickness. Mansyah et al. (2010) also found that mangosteen in Tembilahan, Sumatera Island, shows a flattened fruit shape, very short peduncles, and an elliptic stigma lobe. Several mangosteen varieties were released by Bogor Agricultural University, Indonesia, for cultivation including Wanayasa, Puspahiang, and Malinau. Julu, a variety produced in the Philippines, possessed larger fruits than other varieties with large seeds and more acidic pulp (Horn 1940). Cytologically, mangosteen is Fig. 7.2 Fluorescent intensity histogram peaks a Glycine max cv. Polanka and b mangosteen. (Reproduced from Midin et al. 2018 with permission)

considered a polyploid species with apparently non-uniform chromosome number (2n = 76, 96, 88–90, 110–120) as reported by Krishnawary and Raman (1949), Tixier (1960), Ha (1978), Richards (1990b), Sarasmiryati (2008), and Midin et al. (2018). If chromosome count data are correct, the chromosome instability may contribute to morphological variation observed between groups (Ramage et al. 2004; Nazre 2014). Instead of chromosome instability, environmental differences and geographic adaptation may cause morphological variation. In 1991–1993, the Malaysian Agricultural Research and Development Institute (MARDI) started a project to collect and study the genetic diversity of mangosteen germplasm in Malaysia (Osman and Milan 2006). In this study, MARDI found that there are some distinct variations in fruit characteristics such as fruit size and shape. The Malaysian Department of Agriculture has also identified several accessions that show variations in fruit size and shape, seed number, shelf-life, fruiting precocity, and external coloration. Recently, two types of mangosteen are being cultivated in Malaysia, one with normal globose fruits (common mangosteen) and another with ovoid-shape fruits known as Mesta (Osman and Milan 2006; Raziah et al. 2007). Mesta (Fig. 7.2b) is the more popular commercial Malaysian seedless variety with smaller fruit size and less juice than the common mangosteen. The pulp texture is also more solid than the common variety.

Soybean

1_Otto_PVP_Mercp_30

(b)

(a) 0

200

400

600

DNA CONTENT (FL2-Area)

800

1000

0

200

400

600

DNA CONTENT (FL2-Area)

800

1000

7

The Mangosteen Genome

7.3

115

Origin and Distribution of Mangosteen

7.3.1 Origin of Mangosteen Up to now, the origin of mangosteen is still debatable. Mangosteen is thought to be closely related to two Garcinia species which are G. celebica (syn. G. hombroniana) and G. malaccensis. These two species are indigenous to Malaysia although the distribution of G. celebica extends to Nicobar Island. Richards (1990b) was the first who proposed that mangosteen is a hybrid between two species, G. celebica (syn. G. hombroniana) and G. malaccensis. However, as pointed by Nazre (2014), G. malaccensis from Pasoh Forest Reserve studied by Richards (1990b) was a misidentified G. penangiana Pierre. The misidentified G. penangiana was also used by Abdullah et al. (2012) in their phylogenetic analysis. They claimed that G. opaca and G. penangiana are the closest relatives to mangosteen. Due to this misidentification, Nazre (2014) conducted morphological analysis on the three Garcinia species including G. penangiana, G. malaccensis, and G. mangostana as listed in Table 7.1. Morphological characters of G. celebica as described by Richards (1990b) are also listed. Other phylogenetic studies were also conducted by Yapwattanaphun et al. (2004), Nazre et al. (2007), Abdullah et al. (2012), Sulassih et al. (2013), and Nazre (2014). Yapwattanaphun et al. (2004) found that G. celebica was more distant from mangosteen as compared to G.

malaccensis in their phylogenetic study. Nazre et al. (2007) reported that mangosteen was most closely related to G. penangiana among the cultivated Garcinia of Peninsular Malaysia based on ITS (internal transcribed spacer) sequences, but later Nazre (2014) found that mangosteen was clustered together with G. malaccensis rather than G. penangiana. Based on morphology and ISSR markers, Sulassih et al. (2013) claimed that the possible ancestors of mangosteen were G. malaccensis and G. celebica. In the most recent study based on morphology and ITS sequences, Nazre (2014) proposed two theories on the origin of mangosteen. First, mangosteen is a hybrid of different varieties of G. malaccensis, and second, that it may be a product of multiple, superior selections from different populations of female trees of G. malaccensis. Table 7.2 summarises the hypotheses on the origin of mangosteen and reveals that four Garcinia species including G. celebica, G. malaccensis, G. opaca, and G. penangiana are closely related to mangosteen.

7.3.2 Closely Related Species of Mangosteen Since the origin of mangosteen remains uncertain, many studies related to close species of Garcinia provide hints on the parent species of mangosteen. Four candidate species are described below based on the consensus from the literature which showed the closest relationship with G. mangostana.

Table 7.1 Morphological characters of G. celebica, G. malaccensis, G. mangostana, and G. penangiana Character

G. celebica

G. malaccensis

G. mangostana

G. penangiana

Latex

White

Yellow

Yellow

White

Petal colour

Cream

Pinkish red

Pinkish red

No information

Stigma surface

Smooth

Corrugated

Rather smooth surface

Nodule-like surface

Fruit shape

Globose

Ovoid, ellipsoid, globose

Ovoid, ellipsoid, globose

Ovoid, globose

Fruit colour (ripe)

Red

Yellowish red. reddish pink, purple-black

Purple-black

Reddish pink

Fruit flavour

Astringent

Sweet–sour

Sweet–sour

Sour

116

Mohd Razik Midin and H.-H. Goh

Table 7.2 List of hypotheses on the origin of mangosteen No

Hypothesis

References

1

Mangosteen might be a hybrid between G. malaccensis (misidentified G. penangiana) and G. celebica

Richards (1990b)

2

Mangosteen more closely related to G. malaccensis than to G. celebica

Yapwattanaphun et al. (2004)

3

Mangosteen was closely related to G. penangiana than to G. celebica

Nazre et al. (2007)

4

The closest relative to mangosteen is G. opaca and G. malaccensis (misidentified G. penangiana)

Abdullah et al. (2012)

5

The possible ancestors of mangosteen are G. malaccensis and G. celebica

Sulassih et al. (2013)

6

a. Mangosteen is a hybrid of different varieties of G. malaccensis b. Mangosteen may be a product of multiple, superior selections from different populations of female trees of G. malaccensis

Nazre (2014)

7.3.2.1 G. celebica L. (Syn. G. hombroniana Pierre) The scientific name of seashore mangosteen as G. hombroniana was first established by Pierre (Nazre 2010). Seashore mangosteen has been recorded in Southeast Asia (between Singapore and Malacca in Peninsular Malaysia), Nicobar Island, South Thailand, and Peninsular Malaysia (Maheshwari 1964; Whitmore 1973; Richards 1990b; John et al. 2008; Nazre 2010). G. celebica is known as ‘Beruas’, ‘Manggis hutan’, and ‘Minjok’ in Malaysia (Osman & Milan 2006; Nazre 2010). In India, it is called as ‘Puli mangosteen’ while in Thailand, it is called ‘Waaa’ (Osman and Milan 2006; John et al. 2008). Taxonomically, Pierre grouped G. hombroniana with two Linnaeus species, G. cornea and G. celebica in the same section based on their morphological characters (male flowers and fruits) and geographical distribution (Nazre 2010). However, morphological evidence based on the literature and herbarium specimen suggested that G. hombroniana in Thailand, Peninsular Malaysia, and Borneo, and G. cornea and G. celebica in Indonesia actually refer to the same species (Nazre 2010). Due to this, the valid taxonomic name of seashore mangosteen that should be used is G. celebica as it was published earlier by Linnaeus (1754) than G. cornea (1772) and G. hombroniana (1882–1885) (Nazre 2010). Garcinia celebica is planted for its timber that can be used for fencing and its fruit for flavouring in local dishes (Richards 1990b; Jamila et al.

2017). For traditional medicine, its roots and leaves have been used to treat skin infections and women after childbirth (Burkill 1935). Previous phytochemical investigations on G. celebica found that its extract such as xanthones, flavonoids, and triterpenes (garchombronanes) exhibits anticholinesterase, lipoprotein antioxidant, and antiplatelet (Jamila et al. 2015, 2017). This species can be used as a rootstock for the improvement of slow-growing Garcinia species (Yacob and Tindall 1995; Hammer 2001; John et al. 2008).

7.3.2.2 G. malaccensis Hook. F. Garcinia malaccensis is a wild species in Peninsular Malaysia, Sumatera, and Brunei. Locally, it is known as ‘Manggis burung’ (Lim 2012). It is confined to the lowland forest in Peninsular Malaysia. As is the case for G. celebica, G. malaccensis is dioecious and a facultative agamosperm (Richards 1990a). It is considered one of the closest related species to mangosteen. Morphologically, it seems to have been confused with mangosteen (Richards 1990b; Yapwattanaphun et al. 2004; Abdullah et al. 2012; Nazre 2014). It also produces fruits as delicious as mangosteen that tastes acidic sweet. However, no cultivated tree could be found today. A study conducted by Jabit et al. (2009) revealed that the extracts from its leaves exhibited moderate activity and selectivity towards non-small-cell lung cancer cells, whereas its stem extracts exhibited inhibition against nitric oxide production. However, its

7

The Mangosteen Genome

117

timber has no commercial value as it splits after drying (Lim 2012).

7.3.3 Geographic Distribution of Mangosteen

7.3.2.3 G. penangiana Pierre Garcinia penangiana is a widely distributed species in Peninsular Malaysia (Nazre et al. 2007). Locally, it is known as ‘Kandis Burung’. Morphologically, its bark is dark brown colour (Nazre 2014). Its leaves are bright reddish colour when dry. The intramarginal vein is absent on the leaves of G. penangiana which produce whitish latex. Its fruit shape is ovoid or globose with fewer stamen bundles that are widely spaced and display a nodule-like surface. Its colour is reddish pink and tastes sour. Traditionally, it is used to treat skin diseases and fever (Jabit et al. 2009). The previous study conducted by Jabit et al. (2009) revealed that its methanol extract is potent with selective cytotoxic activity against breast cancer. Its xanthone compounds, such as penangianaxanthone, cudratricusxanthone, macluraxanthone, and gerontoxanthone, showed strong cytotoxic activity towards tumour cell lines (Jabit et al. 2007).

Mangosteen is naturally distributed as well as cultivated throughout Southeast Asia. Recently, it has also been introduced and cultivated in other countries including Australia, Cuba, Dominica, Ecuador, Gabon, Ghana, Guatemala, Honduras, India, Jamaica, Liberia, Myanmar, Philippines, Puerto Rico, Singapore, Sri Lanka, Tanzania, and Vietnam (Cruz 2001; Murthy et al. 2018). Four countries including Malaysia, Indonesia, Philippines, and Thailand are considered as the major producers of mangosteen (Osman and Milan 2006; Rozhan et al. 2011). About 85% of the total production of these four countries is produced by Thailand. The producing countries of mangosteen show a variety of climates. Regarding this, the production seasons in these countries demonstrate some distinct differences. Generally, Malaysia, Indonesia, Philippines, Thailand, and Vietnam have similar production seasons with fruits available from May to January. However, their production seasons contrast to those of Australia, which has its production seasons from November to April.

7.3.2.4 G. opaca King Morphologically, G. opaca is a small tree or shrub. It is a widely distributed lowland and hill forest species and endemic to Peninsular Malaysia (Kochummen 1997). G. opaca is facultatively apomictic which can propagate via apomixis and also through sexual reproduction (Abdullah et al. 2012). The fruit of the former is flask-shaped with a thin wall, while the latter is globose-shaped with a thick wall. Its fruits are eaten by local people, and decoction of leaves is used to improve blood circulation (Jabit et al. 2009). The bark of G. opaca was identified to possess cytotoxic activity (Mori et al. 2014). The compounds extracted from its bark such as terpenoid and opaciniol showed moderate cytotoxicity against tumour cell lines (Jabit et al. 2009; Mori et al. 2014). Its xanthone compounds strongly inhibit platelet-activating factor receptor binding (Jantan et al. 2001).

7.3.4 Mangosteen Export from Malaysia Previously, the cultivation of mangosteen has never been targeted for commercial purposes. In the Malaysian Third National Agriculture Policy (1998–2010), mangosteen was identified as a flagship of the Malaysian fruits for export. Due to its potential, mangosteen has been imported by several countries including China, Singapore, and Thailand. In 2017, 1254 tonnes of mangosteen was exported from Malaysia (Department of Agriculture 2018). Based on the market potential and world demand, the Malaysian Ministry of Agriculture (MOA) implemented a mangosteen planting programme (Rozhan et al. 2011).

118

7.4

Mohd Razik Midin and H.-H. Goh

Conservation of Mangosteen Germplasm

Various approaches of conservation including in situ and ex situ programmes have been implemented to conserve genetic resources of mangosteen (Murthy et al. 2018). Large-scale collection and conservation of mangosteen from Southeast Asian countries such as Myanmar, Laos, Thailand, Cambodia, Vietnam, Malaysia, Brunei, Singapore, Philippines, Indonesia, and Papua New Guinea have been conducted by International Plant Genetic Resources Institute (IPGRI) (Coronel 1995). As mangosteen produces recalcitrant seeds, in situ conservation methods are prominent. Mangosteen seeds are high in moisture content, possess no dormancy, exhibit low seed viability, and short lived, without a differentiated embryo, endosperm, or embryonic axis (Normah et al. 1992; Malik et al. 2005). Due to this, seed storage is hampered (Murthy et al. 2018). For ex situ conservation, in vitro conservation is preferred to maintain the explants such as shoots, meristems, embryo, or plantlet in a sterile environment. This condition will assist in the production of recalcitrant seeds. Research institutes involved in germplasm collection and research on mangosteen in Southeast Asia countries include the Forest Research Institute of Malaysia (Malaysia), National Biological Institute, Bogor Agricultural University (Indonesia), Horticultural Research Station, Chanthaburi (Thailand), and Institute of Plant Breeding, UPLB College of Agriculture and Food Sciences, University of the Philippines Los Baños (Philippines) (Murthy et al. 2018).

7.5

Why Mangosteen is Underutilised?

Mangosteen has a long juvenile period which usually takes 8–10 years to bear fruit (Horn 1940). The slow growth is caused by poor root system (no root hairs, poor branching, easily broken, and disturbed by adverse environments resulting in very small contact surfaces between roots and

soil), poor nutrient and water uptake, low photosynthetic rate, low cell division rate in the apical meristem, and long shoot dormancy period (Cox 1976; Wieble et al. 1992; Ramlan et al. 1992; Poerwanto et al. 1995; Poerwanto 2002). To address its slow growth, grafting technique has been employed (Fairchild 1915; Galang 1955; Ochse et al. 1961; Poerwanto 2002). As well as slow growth rate, the strict climatic requirements, short viability of seeds, lack of rapid propagation methods, delayed precocity of trees, limited research manpower, and budget in producer countries can be other constraints for mangosteen improvement and thus make it underutilised (Osman and Milan 2006; Murthy et al. 2018). Further, the breeding programme of mangosteen is very slow and potentially has a very low success rate as it has a long juvenile period. Nonetheless, a breeding programme is necessary to improve the development and plant growth of mangosteen by shortening its juvenile period as well as increasing the yield and fruit quality. The information of mangosteen accessions in different countries is still limited to unravel the genetic background of species (Osman and Milan 2006). Genetic diversity information of mangosteen will provide clues of hybridisation and occurrence of important mutations.

7.6

Benefits of Mangosteen

Mangosteen has considerable economic potential in several Southeast Asia countries for the local and export markets. Its designation as ‘The queen of tropical fruit’ is because of its fruits with thick sepals that collectively resemble a crown aside its popularity due to the white and juicy edible pulp with a sweet flavour and pleasant aroma (Jung et al. 2006). Various components of mangosteen including stem, rind, leaves, and fruits have been used for many purposes. For instance, its rind contains tannins that can be utilised to tan leather and to dye fabric black (MacMillan 1956; Coronel 1983; Nakasone and Paull 1998). Besides, mangosteen trees provide timber for making furniture and are used in carpentry (Nakasone and Paull 1998; Yapwattanaphun

7

The Mangosteen Genome

et al. 2002). Its fruits are also commercialised as a functional food or drink, with the addition of other minor components such as vitamins, which exhibits general health boost and even promoted as an anti-diabetic supplement (Udani et al. 2009; Xie et al. 2015). Mangosteen has been also used in traditional medicine. Different parts of mangosteen such as fruit hulls, barks, and roots have been utilised for hundreds of years in Southeast Asia as traditional medicine. Its rind has been used to cure diarrhoea, dysentery, skin infection, and respiratory disorder (Burkill 1966; Yaacob and Tindall 1995; Ohizumi, 1999). Further, its leaves and roots are used for the cure of wounds and medicine for menstruation (Burkill 1966). Phytochemical studies conducted on mangosteen found that its extracts, such as xanthones, have antioxidant, anti-tumour, antiallergic, anti-inflammatory, anti-bacterial, antidiabetic, and anti-viral activities (Mahabusarakam et al. 1983; Yapwattanaphun et al. 2002; Pedraza-Chaverri et al. 2008; Obolskiy et al. 2009; Aizat et al. 2019; Ansori et al. 2020). For anti-cancer, a-mangostin, the largest constituent of xanthone in mangosteen pericarp extract, is applied in various cancer types such as gastric, cervical, colorectal, hepatocellular, and breast cancers (Ying et al. 2017; Mohamed et al. 2017; Muchtraridi et al. 2018). Mangosteen extract is also described to have anti-diabetic properties such as garcinone E and mangostanaxanthones III (Abdallah et al. 2017; Liang et al. 2018; Aizat et al. 2019). Several mangosteen extracts such as isogarcinol and c-mangostin can be used for liver protection (Liu et al. 2018; Wang et al. 2018).

7.7

Genomics Study of Mangosteen

Determining the entire DNA sequence of an organism is known as genome sequencing. This process involves sequencing the chromosomal DNA, mitochondrial DNA, and for plants also the chloroplast DNA. To date, only a few studies concerning the mangosteen genome and chromosomal characterisation have been reported despite being one of the important fruits

119

throughout Southeast Asia (Murugan et al. 2014). Studies conducted on mangosteen were mostly on tissue culture, seed characterisation, and morphology. A recent study reported the transcriptome-wide gene expression changes with transcriptional reprogramming during mangosteen seed germination (Goh et al. 2019). Several studies related to the mangosteen genome and organelle sequences have been reported (Abu Bakar et al. 2016; Midin et al. 2017; Jo et al. 2017; Wee et al. 2022a, Wee et al. 2022b). Data obtained from these works are necessary for genome size study, chromosome characterisation, and future genome sequencing project. The genome size and chromosome number data of mangosteen are important to study the genetic variability of mangosteen. The information will also contribute to other research areas including taxonomy and evolutionary studies. The correct information on the chromosome number and genome size of mangosteen will help on-going efforts to assemble and annotate mangosteen genome in future.

7.7.1 Genome Sequencing of Mangosteen Several studies on the mangosteen genome have been reported. Abu Bakar et al. (2016) conducted the first genome sequencing on the common variety of mangosteen in Malaysia to study its genome composition as well as attempted draft genome assembly by using Illumina HiSeq 2000 sequencer platform. They have predicted the best k-mer length (41 bp) for assembly through KmerGenie (Chikhi and Medvedev 2014) and SGA Preqc (Simpson 2014). De novo assembly was then conducted by using Minia assembler v2.0.3 followed by scaffolding using SSPACE. The assembled genome draft was evaluated using CEGMA pipeline. Table 7.3 shows the sequencing and assembly statistics obtained in the study. Genome sequence analysis has also been performed on another popular variety of mangosteen, ‘Mesta’ as reported by Abu Bakar et al. (2017) using Illumina HiSeq 2000 and Midin

120 Table 7.3 Statistics of mangosteen sequencing and assembly (Adapted from Abu Bakar et al. 2016 with permission)

Mohd Razik Midin and H.-H. Goh Attributes

Values

Raw reads Total number

505,856,290

Total bases (bp)

51,091,485,290

Filtered reads Total number

418,812,062

Total bases (bp)

42,300,018,262

N (%)

0.0089

GC (%)

38.14

Q20 (%)

99.19

Q30

95.43

Minia assembly K-mer

41

Number of contigs

281,494

Total contig size (bp)

272,873,894

N50 (bp)

1,006

Contig size range (bp)

83–14,015

SSPACE scaffolding Number of scaffolds

284,879

Scaffold size

279,483,966

N50 (bp)

1,022

et al. (2017) with single-molecule real-time (SMRT) sequencing on a PacBio RS II platform. These data allow comparative analysis of genome composition between the two varieties of mangosteen in Malaysia, which are invaluable for crop improvement due to the lack of mangosteen molecular genetics information. The data can be utilised in the genome assembly and provides sequence information on the GC content, as well as genome size estimation of mangosteen as reported in Midin et al. (2018) (see also Sect. 7.2). Jo et al. (2017) reported the first complete plastome sequence of mangosteen. The size of the mangosteen plastome is 158,179 bp in length. It contains a large single copy of 86,458 bp and a small single copy of 17,703 bp. Both are separated by two inverted repeats of 27,009 bp. Recently, there is a new report on the plastomes (156,580 bp) of mangosteen from the Malaysia varieties Mesta and Manggis (Wee et al. 2022a), which suggested a different origin to that

of Thailand variety as reported by Jo et al. (2017). Furthermore, the mitogenome of the Mesta variety (371,235 bp) has also been described for the first time from the Garcinia genus and Clusiaceae family (Wee et al. 2022b). These complete plastome and mitogenome sequences are useful for the phylogenetic and evolutionary studies of Clusiaceae.

7.7.2 Genome Size of Mangosteen Genome size is the total amount of DNA in the nucleus of an organism that is measured either in picograms (pg; i.e., 1  10–9 g) or megabase pairs (Mbp, with 1 pg = 978 Mbp) (Dolezel et al. 2003; Pellicer and Leitch 2013). Genome size is an important characteristic of eukaryotes that correlates with the chromosome number (Bennett and Leitch 2005). It gives important information for ecological and evolutionary studies, plant breeding, understand of somaclonal

7

The Mangosteen Genome

variation in tissue culture, and the development of genome sequencing project (Rival et al. 1997; Srisawat et al. 2005; Kron et al. 2007; Ochatt et al. 2011; Leitch and Leitch 2012; Cardoso et al. 2012). Various methods have been employed to estimate the genome size of plant species such as Feulgen densitometry, chemical extraction, pulse-field gel electrophoresis, reassociation kinetics, genome sequencing, and flow cytometry (Bennett and Leitch, 2011). Midin et al. (2018) measured the genome size of mangosteen by using two approaches, namely flow cytometry and k-mer analysis. The correct information on the genome size of mangosteen will help on-going efforts to assemble and annotate mangosteen genome. Flow cytometry (FCM) is a powerful tool used to determine the genome size of an organism. However, optimising nuclei preparation before FCM analysis is quite challenging for some plant species especially those containing high amounts of secondary metabolites. The presence of secondary metabolites compounds such as phenolic may cause stoichiometric errors during sample preparation, especially in woody plants such as mangosteen (Loureiro et al. 2006; Mallón et al. 2009; Midin et al. 2018). They will decrease the fluorescence and increase the CV level, hence reducing the quality of nuclei suspension as well as causing errors during the FCM analysis (Noirot et al. 2003; Bennet et al. 2008; Obae and West 2010). To minimise this problem, the selection of lysis buffer, plant materials, and DNA fluorochrome play important roles. Three types of lysis buffer namely LBO1, Tris-MgCl2, and Otto were used for mangosteen sample preparation by Midin et al. (2018). As a result, they found the most suitable lysis buffer for mangosteen sample preparation was Otto buffer supplemented with reducing agents (mercaptoethanol and PVP-40) and propidium iodide (PI), a DNA intercalator. The quality of DNA peak histogram from FCM analysis was evaluated based on the coefficient of variation (CV) value as well as the amount of debris produced in the background. The different types of lysis buffer generate different level of resistance to negative effects of

121

phenolic compounds (Loureiro et al. 2006, 2007; Vrána et al. 2014). The addition of reducing agents such as mercaptoethanol and PVP-40 in Otto buffer counteracted the interference of phenolic compound with DNA staining, thus decreased the CV value (Price 2000; Yokoya et al. 2000; Noirot et al. 2003; Loureiro et al. 2006). Young leaves of mangosteen used in FCM analysis also can reduce CV value as they may contain less secondary metabolite components (Jedrzejczyk and Sliwinska 2010). Meanwhile, the amount of plant material and chopping intensity can also be decreased to reduce the effect of secondary metabolites (Loureiro et al. 2006; Doležel et al. 2007). For genome size estimation of mangosteen, Midin et al. (2018) utilised Glycine max cv Polanka (soybean) as an external reference standard for two reasons: (1) It has a wellestablished genome size of 2.5 pg (Dolezěl and Bartos 2005) and (2) its leaves have a soft structure easy to process during sample preparation (Madon et al. 2008; Midin et al. 2013). In the genome size estimation, the selection of a reference standard is important (Johnston et al. 1999). The genome size of an ideal reference standard should be known and not too close or too distant to the target species to avoid the risk of non-linearity and offset errors (Vindelov et al. 1983; Bagwell et al. 1989; Dolezěl et al. 1992; Johnston et al. 1999; Bennet et al. 2003). Too close a genome size might cause DNA peak of sample and standard to overlap. The external standard method was selected following Hendrix and Stewart (2005) to avoid the overlapping of DNA peak. Figure 7.2 shows the histogram of DNA peak in reference standard, Glycine max cv. Polanka compared to mangosteen. The 2C peak of Glycine max cv. Polanka was not overlapping with the 2C peak of mangosteen which are located on the channel 180–200 and 420–440, respectively. Midin et al. (2018) used five biological replications for the genome size determination of mangosteen (Table 7.4). Based on the FCM result, the genome size of mangosteen was found to be 2C = 6.00 pg. However, Matra et al. (2014) previously reported that the genome size of common mangosteen was

122 Table 7.4 Genome size estimation of mangosteen using Glycine max cv. Polanka (2C = 2.5 pg) as external reference standard (Adapted from Midin et al. 2018 with permission)

Mohd Razik Midin and H.-H. Goh Replicate

Genome size (pg)

1

6.04

2

6.09

3

6.08

4

5.70

5

6.10

Mean ± SD

6.00 ± 0.17

Fig. 7.3 Distribution of kmer 41 depth in wholegenome Illumina reads of a common mangosteen variety (Reproduced from Midin et al. 2018 with permission)

2C = 7.42 pg. The difference may be due to the different DNA fluorochrome used during the FCM analysis. Matra et al. (2014) utilised DAPI while in this study, PI was used. DAPI intercalates preferentially on a specific region of DNA, which was AT-selective (Dolezěl et al. 1992). To resolve this discrepancy, Midin et al. (2018) incorporated an in silico method of k-mer analysis to confirm the genome size of mangosteen. Figure 7.3 presents the genome size estimation for mangosteen using k-mer analysis using a k-mer value of 41. The total number of k-mer predicted by Jellyfish version 1.1.11 1 (Marçais and Kingsford 2011) was 29,604,414,280, and the peak value of k-mer frequency distribution was 5. The genome size of mangosteen was estimated at 5.92 Gbp. The genome size was then converted according to the following relationship: 1 pg DNA = 9.78  108 (Doležel and Bartos 2005). The calculation gave approximately a genome size of 6.05 pg for mangosteen. Together, Midin

et al. (2018) concluded that the genome size of mangosteen to be between 6.00 and 6.05 pg via FCM and k-mer analysis. The correct genome size is necessary for informing genome sequencing projects (Bennett et al. 2000). As the project scale and cost depend on the genome size, it is necessary to have an accurate knowledge on genome size (Doležel and Greilhuber 2010; Cardoso et al 2012).

7.7.3 Cytogenetics of Mangosteen Information on chromosome number is also essential for a genome project. Chromosome count analysis is always difficult for species with many chromosomes (Mallón et al. 2009). It was not easy to count the chromosome number of Garcinia species due to high number of chromosomes (Robson and Adams 1968). Chromosome numbers for several Garcinia species have been reported including G. celebica L. [2n = 48 (Tixier

7

The Mangosteen Genome

123

Fig. 7.4 A mitotic metaphase spread of mangosteen root

Table 7.5 Previous findings of mangosteen chromosome number

Chromosome number (2n)

Reference

76

Krishnawary and Raman (1949)

88–90

Richards (1990b)

90

Sarasmiryati (2008)

96

Tixier (1960)

110–120

Ha (1978)

1960)], G. hanburyi Hook. f. [2n = 44 (Tixier 1953)], and G. indica Choisy [2n = 48 (Thombre 1964); 2n = 54 (Anerao et al. 2013)]. The chromosome number of mangosteen has also been reported by previous researchers (Table 7.4). In the most recent study, Midin et al. (2018) revealed that mangosteen has 2n = 74–110 chromosomes by evaluating more than twenty metaphase chromosome spreads (e.g., Fig. 7.4) which agrees with findings obtained by previous researchers (Table 7.5). Their results therefore suggest the occurrence of numerical chromosome variation in mangosteen genome. This variation could contribute to the phenotypic differences among mangosteen variety and explain the occurrence of phenotypic variations, such as fruit morphology between the common and mesta varieties of mangosteen. This variation might be due to the presence of variable number of B chromosomes (Bs) (Sarasmiryati 2008). The B chromosomes are also

known as supernumerary chromosomes and defined as additional dispensable components of the genome which exhibit a characteristic nonMendelian and irregular pattern of inheritance (Datta et al. 2016). These chromosomes are classically understood as a sea of repetitive DNA sequences that are poor in genes and maintained by a parasitic-driven mechanism during cell division (Valente et al. 2017). These chromosomes contribute to the occurrence of numerical chromosome variation (Roberto 2005). The variable number of Bs may have caused the difficulty in determining the chromosome number. Besides the presence of Bs, other factors including genome mutation and the occurrence of polyploidy and aneuploidy also cause the numerical chromosomes variation. Natural mutation in mangosteen has been reported previously by Ray (2002) and Sobir et al. (2011). This phenomenon can impair chromosome segregation, which could also cause

124

Mohd Razik Midin and H.-H. Goh

aneuploidy (Zuzana 2012). According to Huettel et al. (2008) and Birchler (2013), aneuploidy refers to unbalanced changes in chromosome number from the basic chromosomal complement that characterises each species. These changes in chromosome numbers are determined in relation to the somatic chromosome number of the species (Dar et al. 2017). Recent findings revealed the occurrence of aneuploidy in polyploid species (Ganem et al. 2007; Chester et al. 2012; Zhang et al. 2013; De Storme and Mason 2014; Wu et al. 2018). Polyploidy may induce aneuploidy by increasing the chromosome number and complexity of their pairing and segregation during meiosis and mitosis (Comai 2005). This explains the occurrence of numerical chromosome variation in mangosteen as itself is determined as a polyploidy species by Richards (1990b) and Matra et al. (2016). Matra et al. (2016) revealed the evidence of tetraploidy in mangosteen. This was proved via microsatellite analysis where they found that mangosteen has more than two alleles per locus. A maximum of four alleles per locus was found in mangosteen from five populations which indicated tetraploidy. The combination of findings obtained by Matra et al. (2016) and Midin et al. (2018) concluded that mangosteen is an apomictic species with a polyploid genome. Previous studies have linked the occurrence of polyploidy in apomictic species (Galdeano et al. 2016). Plant species with tetraploid genome are often associated with apomicts (Quarin et al. 2001; Bicknell and Koltunow, 2004). However, not all polyploids are apomicts. The formation of apomictic species requires polyploidy genome as diploid or aneuploid gametes are necessary for the transmission of genes that cause apomixis (Comai 2005), which possibly explains the occurrence of apomixis in mangosteen.

7.8

Conclusion

Crop improvement of mangosteen requires special approaches due to its long juvenile period as well as a slow growth rate. To date, a rapid propagation method of mangosteen is still

lacking, which is confounded by recalcitrant apomictic mangosteen seeds. The breeding programme of mangosteen is time-consuming and laborious with a very low success rate. As mangosteen has a long juvenile phase, hence it is difficult to perform progeny analysis. Another strategy to improve this crop is to identify genetic variation in mangosteen through molecular analysis. Mutational breeding has also been conducted on mangosteen to increase genetic variation. By using this approach, gamma-ray radiation was applied to mangosteen seeds. As genetic variation increase, superior trees with desirable traits can be selected. To date, genomescale studies of mangosteen are limited, with a few recent reports on attempts of genome sequencing but a draft genome assembly is still lacking. Nevertheless, these sequence data provide essential information on mangosteen genome size and complexity. Ascertained number of chromosomes, genome size, and cytogenetics of mangosteen will help on-going efforts to assemble and annotate mangosteen genome. A reference mangosteen genome is important to provide a blueprint of molecular genetics information for crop improvement, through studies of genetic diversity and genomics-assisted selection. Acknowledgements The authors would like to thank all members of their research groups, colleagues, and collaborators for useful discussions. A special dedication of this chapter to Professor Emeritus Dr. Normah Mohd Noor, the founding director of the Institute of Systems Biology, Universiti Kebangsaan Malaysia who is instrumental in the tissue culture studies of mangosteen. The research on mangosteen from our group was funded by the UKM Research University Grants (DIP-2020-005 and AP-2012-018).

References Abdallah HM, El-Bassossy HM, Mohamed GA et al (2017) Mangostanaxanthones III and IV: advanced glycation end-product inhibitors from the pericarp of Garcinia mangostana. J Nat Med 71(1):216–226 Abdullah NAP, Richards AJ, Wolff K (2012) Molecular evidence in identifying parents of Garcinia mangostana L. Pertanika J Trop Agric Sci 35(2):257–270 Abu Bakar S, Sampathrajan S, Loke K et al (2016) DNAseq analysis of Garcinia mangostana. Genomics Data 7:62–63

7

The Mangosteen Genome

Abu Bakar S, Kumar S, Loke K-K, Goh H-H, Normah MN (2017) DNA shotgun sequencing analysis of Garcinia mangostana L. variety Mesta. Genomics Data 12:118–119 Aizat WM, Jamil IN, Ahmad-Hashim FH et al (2019) Recent updates on metabolite composition and medicinal benefits of mangosteen plant. PeerJ 7:e6324 Ampofo SA, Waterman GP (1986) Xanthones from three Garcinia species. Phytochemistry 25(10):2351–2355 Anerao J, Desai N, Deodha M (2013) A comparative study of karyomorphology among three populations of Garcinia indica (Clusiaceae) (Thomas-Dupetite) Choisy. Pak J Biol Sci 16(11):530–535 Ansori ANM, Fadholly A, Hayaza S et al (2020) A review on medicinal properties of Mangosteen (Garcinia mangostana L.). Res J Pharm Tech 13 (2):974–982 Bagwell CB, Baker D, Whetstone S et al (1989) A simple and rapid method for determining the linearity of a flow cytometer amplification system. Cytometry 10:689–694 Bennett MD, Bhandol P, Leitch IJ (2000) Nuclear DNA amounts in Angiosperms and their modern uses—807 new estimates. Ann Bot 86(4):859–909 Bennett MD, Leitch IJ (2005) Plant genome size research: a field in focus. Ann Bot 95:1–6 Bennett MD, Leitch IJ (2011) Nuclear DNA amounts in angiosperms: targets, trends and tomorrow. Ann Bot 107(3):467–590 Bennett MD, Leitch IJ, Price HJ (2003) Comparisons with Caenorhabditis (*100 Mb) and Drosophila (*175 Mb) using flow cytometry show genome size in Arabidopsis to be *157 Mb and thus *25% larger than the Arabidopsis genome initiative estimate of *125 Mb. Ann Bot 91:547–557 Bennett MD, Price HJ, Johnston JS (2008) Anthocyanin inhibits propidium iodide DNA fluorescence in Euphorbia pulcherrima: Implication for genome size variation and flow cytometry. Ann Bot 101:777–790 Bicknell RA, Koltunow AM (2004) Understanding apomixis: recent advances and remaining conundrums. Plant Cell 16:228–245 Birchler JA (2013) Aneuploidy in plants and flies: the origin of studies of genomic imbalance. Semin Cell Dev Biol 24(4):315–319 Brukhin V (2017) Molecular and genetic regulation of Apomixis. Russ J Genet 53(9):943–964 Burkill IH (1935) Dictionary of economic products of the Malay Peninsula 1. Governments of the Straits Settlements and Federated Malay States, London Burkill IH (1966) A dictionary of the economic products of the Malay Peninsula. Ministry of Agriculture and Cooperative, Kuala Lumpur Cardoso DC, Carvalho CR, Cristiano MP et al (2012) Estimation of nuclear genome size of the genus Mycetophylax Emery, 1913: evidence of no wholegenome duplication in Neoattini. Comptes Rendus Biologies 335(10–11):619–624 Chester M, Gallagher JP, Symonds VV et al (2012) Extensive chromosomal variation in a recently formed

125 natural allopolyploid species, Trapogon miscellus (Asteraceace). PNAS 109(4):1176–1181 Comai L (2005) The advantages and disadvantages of being polyploid. Nature 6:836–846 Corner EJH (1952) Wayside trees of Malaya, 2nd ed, vol l, 318 pp. Govt. Printing Office Coronel RE (1983) Mangosteen. In: Promising fruits of the Philippines. College of Agriculture, Los Banos: UPLB, pp 307–322 Coronel RE (1995) Status report on fruit species germplasm conservation and utilization in Southeast Asia. In: Arora RK (eds) Expert consultation on tropical fruit species of Asia. International Plant Genetic Resources Institute, Regional Office, New Delhi, pp 85–100 Cox JEK (1976) Garcinia mangostana—Mangosteen. In: Garner RJ, Ahmed Chaudhari S (eds) The propagation of tropical fruit trees. Horticultural review No 4. Commonwealth Bureau of Horticulture and Plantation Crops, East Malling, pp 361–375 Cruz FSDJ (2001) Status report on genetic resources of Mangosteen (Garcinia mangostana L.) in Southeast Asia. IPGRI Office for South Asia, Delhi De Storme N, Mason A (2014) Plant speciation through chromosome instability and ploidy change: cellular mechanisms, molecular factors and evolutionary relevance. Current Plant Biol 1:10–33 Department of Agriculture (DOA) (2018) Perangkaan Pertanian 2018. Jabatan Pertanian Malaysia Doležel J, Bartos J (2005) Plant DNA flow cytometry and estimation of nuclear genome size. Ann Bot 95(1):99– 110 Doležel J, Bartoš J, Voglmayr H (2003) Nuclear DNA content and genome size of trout and human. Cytometry A 51:127–128 Doležel J, Greilhuber J (2010) Nuclear genome size: are we getting closer? Cytometry A 77A(7):635–642 Doležel J, Greilhuber J, Suda J (2007) Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2(9):2233–2244 Doležel J, Sgorbati S, Lucretti S (1992) Comparison of three DNA fluorochromes for flow cytometric estimation of nuclear DNA content in plant. Physiol Plant 85:625–631 Fairchild DG (1915) The Mangosteen. J Hered 6:338–347 Galang FG (1955) Fruit and nut growing in the Philippines. AlA Printing Press, Malabon Galdeano F, Urbani MH, Sartor ME et al (2016) Relative DNA content in diploid, polyploid, and multiploid species of Paspalum (Poaceae) with relation to reproductive mode and taxonomy. J Plant Res 129(4):697–710 Ganem NJ, Storchova Z, Pellman D (2007) Tetraploidy, aneuploidy and cancer. Curr Opin Genet Dev 17 (2):157–162 Goh H-H, Abu Bakar S, Kamal Azlan ND, Zainal Z, Normah MN (2019) Transcriptional reprogramming during Garcinia-type recalcitrant seed germination of Garcinia mangostana. Sci Hortic 257:108727 Grant V (1971) Plant speciation. Columbia University Press, New York

126 Gustafsson A (1946) Apomixis in higher plants. (3 parts). Lund Universitet Arsskrift. N F 42–43:1–370 Ha CO (1978) Embryological and cytological aspects of the reproductive biology of some understorey rainforest trees. Dissertation, University of Malaya Hammer K (2001) Guttiferae (Clusiaceae). In: Hanelt P (ed) Mansfeld’s Encyclopedia of agricultural and horticultural crops, vol 3. Institute of Plant Genetics and Crop Plant Research, Berlin, Springer, pp 1345– 1360 Hemshekhar M, Sunitha K, Santhosh MS et al (2011) An overview on genus Garcinia: phytochemical and therapeutical aspects. Phytochem Rev 10:325–351 Hendrix B, Stewart JM (2005) Estimation of the nuclear DNA content of Gossypium species. Ann Bot 95:789– 797 Horn CL (1940) Stimulation of growth in juvenile mangosteen plants. J Agric Res 61:397–400 Huettel B, Kreil DP, Matzke M et al (2008) Effects of aneuploidy on genome structure, expression, and interphase organization in Arabidopsis thaliana. PLoS Genet 4(10):1–13 Idris S, Rukayah A (1987) Description of the male mangosteen (Garcinia mangostana L.) discovered in Peninsular Malaysia. MARDI Res Bulletin 15(1):63– 66 Jabit ML, Khalid R, Abas F et al (2007) Cytotoxic xanthones from Garcinia penangiana Pierre. Z Naturforsch 62:786–792 Jabit ML, Wahyuni FS, Khalid R et al (2009) Cytotoxic and nitric oxide inhibitory activities of methanol extracts of Garcinia species. Pharm Biol 47(11): 1019–1026 Jamila N, Khairuddean M, Yeong KK et al (2015) Cholinesterase inhibitory triterpenoids from the bark of Garcinia hombroniana. J Enzyme Inhib Med Chem 30:133–139 Jamila N, Khan N, Khan AA et al (2017) In vivo carbon tetrachloride-induced hepatoprotective and in vitro cytotoxic activities of Garcinia hombroniana (seashore mangosteen). Afr J Tradit Complement Altern Med 14(2):374–382 Jantan I, Juriyati J, Warif NA (2001) Inhibitory effects of xanthones on platelet activating factor receptor binding in vitro. J Ethnopharmacol 75:287–290 Jedrzejczyk I, Sliwinska E (2010) Leaves and seeds as materials for flow cytometric estimation of the genome size of 11 rosaceae woody species containing DNAStaining inhibitors. J Bot 2010:1–9 Jo S, Kim H-W, Kim Y-K et al (2017) The complete plastome of tropical fruit Garcinia mangostana (Clusiaceae). Mitochondrial DNA Part B 2(2):722–724 John KJ, Kumar RS, Suresh CP (2008) Occurrence, distribution and economic potential of seashore mangosteen (Garcinia hombroniana Pierre) in India. Genetic Res Crop Evol 55:183–186 Johnston JS, Bennett MD, Rayburn AL (1999) Reference standards for determination of DNA content of plant nuclei. Am J Bot 86(5):609–613

Mohd Razik Midin and H.-H. Goh Jung H, Su B, Keller WJ et al (2006) Antioxidant Xanthones from the Pericarp of Garcinia mangostana (Mangosteen). J Agric Food Chem 54:2077–2082 Kochummen KM (1997) Tree flora of Pasoh. Forest Research Institute Malaysia, Kepong Krishnawary N, Raman VS (1949) A note on the chromosome numbers of some economic plants of India. Curr Sci 18(10):376–378 Kron P, Suda J, Husband BC (2007) Applications of flow cytometry to evolutionary and population biology. Annu Rev Ecol Evol Syst 38:847–876 Leitch AR, Leitch IJ (2012) Ecological and genetic factors linked to contrasting genome dynamics in seed plants. New Phytol 194(3):629–646 Liang Y, Luo D, Gao X et al (2018) Inhibitory effects of garcinone E on fatty acid synthase. RSC Adv 8 (15):8112–8117 Lim AL (1984) The embryology of Garcinia mangostana L. (Clusiaceae). Gardens’ Bulletin Singapore 37:93– 103 Lim TK (2012) Garcinia malaccensis. Edible medicinal and non-medicinal plants, pp 80–82 Liu Z, Li G, Long C et al (2018) The antioxidant activity and genotoxicity of isogarcinol. Food Chem 253:5–12 Loureiro J, Rodriguez E, Doležel J et al (2006) Flow cytometric and microscopic analysis of the effect of tannic acid on plant nuclei and estimation of DNA content. Ann Bot 98(3):515–527 Loureiro J, Rodriguez DJ et al (2007) Two new nuclear isolation buffers for plant DNA flow cytometry: a test with 37 Species. Ann Bot 100:875–888 Marçais G, Kingsford C (2011) A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27(6):764–770 MacMillan HF (1956) Tropical planting and gardening with special reference to Ceylon, 5th edn. MacMillan and Co., London Madon M, Phoon LQ, Clyde MM et al (2008) Application of flow cytometry for estimation of nuclear DNA content in Elaies. J Oil Palm Res 20:447–452 Mahabusarakam W, Phongpaichit S, Wiriyachitra P (1983) Screening of anti-fungal activity of chemicals from Garcinia mangostana. Sonklanakarin J Sci Technol 5:341–342 Maheshwari JK (1964) Taxonomic studies on Indian Guttiferae III. the Genus Garcinia Linn. S. I. Bulletin Botanical Surv India 6:107–135 Malik SK, Chaudhary R, Abraham Z (2005) Seed morphology and germination characteristics in three Garcinia species. Seed Sci Technol 33:595–604 Mallón R, Rodríguez-Oubiña J, González ML (2009) In vitro propagation of the endangered plant Centaurea ultreiae: assessment of genetic stability by cytological studies, flow cytometry and RAPD analysis. Plant Cell, Tissue Organ Cult 101(1):31–39 Mansyah E, Anwarudinsyah MJ, Sadwiyanti L et al (1999) Genetics variability of mangosteen base on isozymes analysis and its relationship to phenotypic variability. Zuriat 10:1–10

7

The Mangosteen Genome

Mansyah E, Muas I, Jawal MAS (2010) Morphological variability of apomictic mangosteen (Garcinia mangostana L.) in Indonesia: morphological evidence of natural populations from Sumatra and Java. SABRAO J Breed Genetics 42:1–8 Matra DD, Poerwanto R, Santosa E et al (2016) Analysis of allelic diversity and genetic relationships among cultivated mangosteen (Garcinia mangostana L.) in Java, Indonesia using microsatellite markers and morphological characters. Tropical Plant Biol 9:29–41 Matra DD, Poerwanto R, Sobir et al (2014) Determination of nuclear DNA content on mangosteen (Garcinia mangostana L.) by flow cytometry. In: Conference: 29th international horticultural congress 2014, Brisbane Midin MR, Loke KK, Madon M et al (2017) SMRT sequencing data for Garcinia mangostana L. variety Mesta. Genomics Data 12:134–135 Midin MR, Nordin MS, Madon M et al (2018) Determination of the chromosome number and genome size of Garcinia mangostana L. via cytogenetics, flow cytometry and k-mer analyses. Caryologia 71:35–44 Midin MR, Samsul Kamal R, Tarmizi AH et al (2013) Analysis of oil palm clones, their suspension calli and regenerants via flow cytometry (FCM) and rDNAfluorescence in situ hybridisation (rDNA-FISH). J Oil Palm Res 25(3):357–367 Mohamed GA, Al-Abd AM, El-Halawany AM et al (2017) Newxanthones and cytotoxic constituents from Garcinia mangostana fruit hulls against human hepatocellular, breast, and colorectal cancer cell lines. J Ethnopharmacol 198:302–312 Mori R, Nugroho AE, Hirasawa Y et al (2014) Opaciniols A-C, new terpenoids from Garcinia opaca. J Nat Med 68:186–191 Muchtaridi M, Afiranti FS, Puspasari PW et al (2018) Cytotoxicity of Garcinia mangostana L. pericarp extract, fraction, and isolate on HeLa cervical cancer cells. J Pharm Sci Res 10:348–351 Murthy HN, Dandin VS, Dalawai D et al (2018) Breeding of Garcinia spp. In: Al-Khayri JM et al (eds) Advances in plant breeding strategies: fruits Murugan, M., Madon, M., Goh, H-H.et al (2014) Cytogenetic characterization and bioinformatics analysis of mangosteen (Garcinia mangostana L.) genome. In: Abstracts of the plant genomics congress Asia, Shangri La Hotel, Kuala Lumpur, 24–25 February 2014 Nabandith V, Suzui M, Morioka T et al (2004) Inhibitory effects of crude a-mangostin, a xanthone derivative, on two different categories of colon preneoplastic lesions induced by 1,2-dimethylhydrazine in the rat. Asian Pac J Cancer Prev 5:433–438 Nakasone HY, Paull RE (1998) Mangosteen. In: Nakasone HY, Paull RE (eds) Tropical fruits, pp 359–369 Naumova TN (1992) Apomixis in angiosperms: nucellar and integumentary embryony. CRC Press, Boca Raton Naumova TN, Van der Laak J, Osadtchiy J et al (2001) Reproductive development in apomictic populations

127 of Arabis holboellii (Brassicaceae). Sex Plant Reprod 14:195–200 Nazre M (2010) Historical review and notes on the correct scientific name for seashore mangosteen. Genetic Res Crop Evolution 57:1249–1259 Nazre M (2014) New evidence on the origin of mangosteen (Garcinia mangostana L.) based on morphology and ITS sequence. Genetic Resources Crop and Evolution 61:1147–1158 Nazre M, Latiff A, Clyde MM (2007) Phylogeny relationship of locally cultivated Garcinia species with some wild relatives. Malaysian Appl Biol J 36:31–40 Noirot M, Barre P, Duperray C et al (2003) Effects of caffeine and chlorogenic acid on propidium iodide accessibility to DNA: consequences on genome size evaluation in coffee tree. Ann Bot 92(2):259–264 Normah MN, Rosnah H, Nor-Azza AB (1992) Multiple shoots and callus formation from seeds of mangosteen (Garcinia mangostana L.) cultured in vitro. Acta Hortic 292:87–92 Obae SG, West TP (2010) Nuclear DNA content of Hydrastis canadensis L. and genome size stability of in vitro regenerated plantlets. Plant Cell Tissue Organ Cult 102:259–263 Obolskiy D, Pischel I, Siriwatanametanon N et al (2009) Garcinia mangostana L.: a phytochemical and pharmacological review. Phytother Res 23:1047–1065 Ochatt SJ, Patat-Ochatt EM, Moessner A (2011) Ploidy level determination within the context of in vitro breeding. Plant Cell Tissue Organ Cult 104:329–341 Ochse JJ, Soule MJ, Dijkman MJ (1961) Tropical and subtropical agriculture. MacMillan Co, New York Ohizumi Y (1999) Search for antagonists of luztannin and serotonin from the Thai medicinal plant Garcinia mangostana and their pharmacological studies. Bioenvironment 2:215 Osman M, Milan AR (2006) Mangosteen: Garcinia mangostana L. University of Southampton, Southampton, UK, Southampton Centre for Underutilised Crops Pedraza-Chaverri J, Cárdenas-Rodríguez N, OrozcoIbarra M et al (2008) Medicinal properties of mangosteen (Garcinia mangostana). Food Chem Toxicol 46(10):3227–3239 Pellicer J, Leitch IJ (2013) The application of flow cytometry for estimating genome size and ploidy level in plants. In: Molecular plant taxonomy: methods and protocols. methods in molecular biology, vol 1115. Springer Science+Business Media, New York Poerwanto R (2002) Nurse stock plant - a new technique to enhance mangosteen (Garcinia mangostana) growth. Acta Hort 575:751–756 Poerwanto R, Hidayat R, Diana E. et al (1995). An attempt to enhance the growth of mangosteen rootstock. Pros. Simp. Hort. Nas., 105–112 Price H (2000) Sunflower (Helianthus annuus) leaves contain compounds that reduce nuclear propidium iodide fluorescence. Ann Bot 86(5):929–934

128 Quarin CL, Espinoza F, Martinez EJ et al (2001) A rise of ploidy level induces the expression of apomixis in Paspalum notatum. Sex Plant Reprod 13:243–249 Ramage CM, Sando L, Peace CP et al (2004) Genetic diversity revealed in the apomictic fruit species Garcinia mangostana L. (mangosteen). Eupthyica 136:1–10 Ramlan MF, Mahmud TMM, Hasan BM et al (1992) Studies on photosynthesis on young mangosteen plants grown under several growth conditions. Acta Hort 321:482–489 Ray PK (2002) Mangosteen. In: Breeding tropical and subtropical fruits. Narosa Publishing House, New Delhi, pp 304 Raziah ML, Idris S, Milan AR et al (2007) On farm diversity of Malaysia fruit species and their determining factor. Econ Technol Manage Rev 2:23–43 Richards AJ (1990a) Studies in Garcinia, dioecious tropical fruit trees: agamospermy. Bot J Linn Soc 103:233–250 Richards AJ (1990b) Studies in Garcinia, dioecious tropical fruit trees: the origin of the mangosteen (G. mangostana L.). Bot J Linn Soc 103:301–308 Richards AJ (1997) Why is gametophytic apomixis almost restricted to polyploids? The gametophyteexpressed model. Apomixis News 9:3–4 Ridley NH (1967) The flora of the Malay Peninsula. Ashford: L. Reeve & Co Rival A, Beule T, Barre P et al (1997) Comparative flow cytometric estimation of nuclear DNA content in oil palm (Elaeis guineensis Jacq) tissue cultures and seed derived plants. Plant Cell Rep 16:884–887 Roberto C (2005) Low chromosome number angiosperms. Caryologia 58(4):403–409 Robson NKB, Adams P (1968) Chromosome numbers in hypericum and related genera. Brittonia 20:95 Rozhan AD, Noorlidawati AH, Jamaluddin K et al (2011) Challenges and prospect of mangosteen industry in Malaysia. Econ Technol Manage Rev 6:19–31 Rukachaisirikul VP, Pailee A, Hiranrat P et al (2003) Anti-HIV-1n protostane triterpenes and digeranylbenzophenone from trunk, bark and stems of Garcinia speciosa. Planta Med 69(12):1141–1146 Sarasmiryati A. (2008) Analisis sitogenetika tanaman manggis (Garcinia mangostana L.) Jogorogo. Dissertation, Master Degree, Universitas Sebelas Maret Šarhanová P, Timothy FS, Sochor M et al (2017) Hybridisation drives evolution of apomicts in Rubus subgenus Rubus: evidence from microsatellite markers. Ann Bot 120(2):317–328 Sobir RP, Poerwanto R, Santosa E et al (2011). Genetic variability in apomictic mangosteen (Garcinia mangostana) and its close relatives (Garcinia spp.) based on ISSR markers. Biodiversitas 12(2):59–63 Sprecher A (1919) Etude sur la semence et la germination de Garcinia mangostana L. Revue Générale De Botanique 31(513–531):609–633 Srisawat T, Pattanapanyasat K, Srikul S et al (2005) Flow cytometric analysis of oil palm: a preliminary analysis

Mohd Razik Midin and H.-H. Goh for cultivars and genomic DNA alteration. Songklanakarin J Sci Technol 27:645–652 Sulassih, Sobir RP, Santosa E (2013) Phylogenetic analysis of mangosteen (Garcinia mangostana L.) and its relatives based on morphological and inter simple sequence repeat (ISSR) markers. SABRAO J Breed Genetics 45(3):478–490 Thombre MV (1964) Studies in Garcinia indica Choisy. Sci Cult 30(453):454 Tixier P (1953) Donnees cytologiques sur quelques Guttiferales du Viet-Nam. Revue Cytologigue Et De Biologique Vegetale 14:1–12 Tixier P (1960) Donnees cytologiques surquelques Guttiferales recoltees auLaos. Revue Cytologigue Et De Biologique Vegetale 22:65–70 Udani JK, Singh BB, Barrett ML et al (2009) Evaluation of mangosteen juice blend on biomarkers of inflammation in obese subjects: a pilot, dose finding study. Nutr J 8(1):1–7 Valente GT, Nakajima RT, Fantinatti BEA et al (2017) B chromosomes: from cytogenetics to systems biology. Chromosoma 126(1):73–81 Verheij EWM (1991) Garcinia mangostana L. In: Verheij EWM (ed) Plant resources of South East Asia, edible fruit and nuts. Bogor a Selection. PUDOC, Wageningen Vindelov L, Christensen I, Nissen N (1983) Standardization of high resolution flow cytometric DNA analysis by the simultaneous use of chicken and trout red blood cells as internal reference standards. Cytometry 3:328–331 Vrána J, Cápal P, Bednářová M et al (2014) Flow cytometry in plant research: a success story. In: Nick P, Opatrny Z (eds) Applied plant cell biology, plant cell monograph 22. Springer, Berlin, pp 395–429 Wang W, Liao Y, Huang X et al (2018) A novel xanthone dimer derivative with antibacterial activity isolated from the bark of Garcinia mangostana. Nat Prod Res 32(15):1769–1774 Wee CC, Nor Muhammad NA, Subbiah VK et al (2022a) Plastomes of Garcinia mangostana L. and comparative analysis with other Garcinia species. bioRxiv 2022.02.22.481552 https://doi.org/10.1101/2022.02. 22.481552 Wee CC, Nor Muhammad NA, Subbiah VK et al (2022b) Mitochondrial genome of Garcinia mangostana L. variety Mesta. bioRxiv 2022.02.23.481586 https://doi. org/10.1101/2022.02.23.481586 Whitmore TC (1973) Tree flora of Malaya: a manual for foresters, vol 2. Longman, Kuala Lumpur Wieble J, Chacko EK, Downtown WJS (1992) Mangosteen (Garcinia mangostana L.)—a potential crop for tropical northern Australia. Acta Hort 321:132–137 Wu Y, Sun Y, Sun S et al (2018) Aneuploidization under segmental allotetraploidy in rice and its phenotypic manifestation. Theor Appl Genet 131:1273–1285 Xie Z, Sintara M, Chang T et al (2015) Daily consumption of a mangosteen-based drink improves in vivo antioxidant and anti-inflammatory biomarkers in

7

The Mangosteen Genome

healthy adults: a randomized, double-blind, placebocontrolled clinical trial. Food Sci Nutr 3(4):342–348 Yaacob O, Tindall HD (1995) Mangosteen cultivation. FAO plant production and protection, Paper No. 129 Yapwattanaphun C, Subhadrabandhu S, Honsho C et al (2004) Phylogenetic relationship of mangosteen (Garcinia mangostana) and several wild relatives (Garcinia spp.) revealed by ITS sequence data. J Am Soc Hortic Sci 129:368–373 Yapwattanaphun C, Subhadrabandhu S, Sugiura A et al (2002) Utilisation of some Garcinia species in Thailand. Acta Hort 575(2):563–570

129 Ying Y-M, Yu K-M, Lin T-S et al (2017) Antiproliferative prenylated xanthones from the pericarps of Garcinia mangostana. Chem Nat Compd 53(3):555–556 Yokoya K, Roberts AV, Mottley J et al (2000) Nuclear DNA amounts roses. Ann Bot 85(4):557–561 Zhang H, Bian Y, Gou X et al (2013) Persistent wholechromosome aneuploidy is generally associated with nascent allohexaploid wheat. PNAS 110(9):3447–3452 Zuzana S (2012) The causes and consequences of aneuploidy in eukaryotic cells, Aneuploidy in health and disease. In: Storchova Z (ed) ISBN: 978-953-510608-1, InTech. https://doi.org/10.5772/457

8

The Passion Fruit Genome Maria Lucia Carneiro Vieira , Zirlane Portugal Costa, Alessandro Mello Varani, Mariela Analia Sader, Luiz Augusto Cauz-Santos, Helena Augusto Giopatto, Alina Carmen Egoávil del Reátegui, Hélène Bergès, Claudia Barros Monteiro-Vitorello, Marcelo Carnier Dornelas, and Andrea Pedrosa-Harand Abstract

The genus Passiflora comprises a large group of plants popularly known as passion fruits, much appreciated for their exotic flowers and edible fruits. The genus has long attracted considerable attention due to its economic value, broad geographic distribution and remarkable species diversity, which are found in tropical and subtropical regions of the Neotropics. Despite their biological attributes and economic importance, the species are largely neglected when it comes to conducting genomic studies. However, in 2021, a chromosome-scale genome assembly was published for a purple passion fruit cultivar

M. L. C. Vieira (&)  Z. P. Costa  L. A. Cauz-Santos  C. B. Monteiro-Vitorello Departamento de Genética, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba 13418-900, Brazil e-mail: [email protected] A. M. Varani Departamento de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, Jaboticabal 14884-900, Brazil M. A. Sader  A. Pedrosa-Harand Departamento de Botânica, Universidade Federal de Pernambuco, Recife 50670-901, Brazil

(Passiflora edulis) and a genome sequence resource of the wild species, P. organensis, was assembled by adopting short- and long-read technologies. In contrast to P. edulis (1,327 Mbp), P. organensis has a small genome (259 Mbp). In this chapter we summarize some interesting results that emerged from the analysis of the Passiflora sequences, including satellite DNAs and transposable element characterization in the context of cytogenetics and evolution of the genus, organellar genome organization, and the MADS-box gene family that is known to have important biological roles in Passiflora, especially with regard to reproductive development. Although understudied, over the last A. C. E. del Reátegui Instituto Nacional de Innovación Agraria, Sub Dirección de Recursos Genéticos, 2791 Lima, Peru H. Bergès Institut National de La Recherche Agronomique (INRAE), Centre National de Ressources Génomiques Végétales (CNRGV), 31326 Castanet-Tolosan, France L. A. Cauz-Santos Department of Botany and Biodiversity Research, University of Vienna, 1030 Vienna, Austria

H. A. Giopatto  M. C. Dornelas Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Campinas 13083-862, Brazil © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_8

131

132

M. L. C. Vieira et al.

decades, work on breeding passion fruit varieties has been conducted in some private and public institutions with a view to releasing cultivars of P. edulis, the main cultivated species worldwide. Therefore, studies related to genetics and breeding are also summarized.

8.1

Introduction

The genus Passiflora (Passifloraceae, Malpighiales) has long attracted considerable attention due to its economic value, broad geographic distribution and remarkable species diversity, especially in terms of the size and shape of its flowers, pollinated by insects, hummingbirds or bats. The genus is widely distributed in tropical and subtropical regions of the Neotropics, including the Amazonian and Andean regions. Approximately 150 species are native to Brazil, which is acknowledged as an important centre of diversity (Bernacci et al. 2020). Population pressure in all these regions is high, raising considerable concern over pollinator decline and Passiflora diversity conservation. Taxonomically speaking, there are four main subgenera: Astrophea (57 species); Decaloba (220); Deidamioides (13); and the Passiflora subgenus (240) (Ulmer and MacDougal 2004). Among the American tropical species of Passiflora, some 50 fruit-bearing species are marketed for human consumption. The main economic value lies in the production of passion fruit juice, an essential exotic ingredient in juice blends. Furthermore, some Passiflora species and hybrids are of great ornamental value; interspecific hybridization has been generally successful because many species bloom all year round, produce abundant flowers and are compatible (Abreu et al. 2009; Santos et al. 2012). An Italian Passiflora collection of ornamental value is maintained by the horticulturist M. Vecchia: readers can visit the web page at http:// www.passiflora.it and view flower photos, antique prints and drawings. Other species are used in phytotherapeutic remedies (see Deng et al. 2010; Ramaiya et al. 2014); for example, in North America, the native cold-tolerant species,

P. incarnata, is used by herbalists. Passiflora seed oil is also well suited for use as a regenerative ingredient in cosmetic products (Krist 2020). Several passion fruit species are outcrossing plants par excellence. The cross-pollination is conditioned by self-incompatibility (SI), in which the pollen of one plant is unable to fertilize the flowers of the same plant, and different plants may or may not be compatible. There is evidence that SI in Passiflora is controlled by sporophytic and gametophytic loci that act in association (Suassuna et al. 2003). In planning commercial orchards, it should be borne in mind that clones need sufficient SI allele diversity. Otherwise, this could cause losses during commercial production. In addition, Passiflora species have coevolved with Heliconius butterflies which feed exclusively on plant vines during the larval stage. Major adaptations of Passiflora plants lend weight to the argument that they coevolved with Heliconius, including the unusual variation of leaf shape within the genus, the occurrence of structures mimicking Heliconius eggs and their wide diversity of defence compounds (cyanogenic glucosides). On the other hand, the butterflies can synthesize cyanogenic glucosides themselves, and their ability to handle these compounds was one of the adaptations that allowed the ancestor of these butterflies to feed on Passiflora plants (reviewed in Castro et al. 2018). All these biological attributes are strong arguments in favour of generating genomic sequences to facilitate functional studies and different kinds of analysis, including the genetic changes associated with passion fruit domestication.

8.1.1 Genetic Studies and Breeding Efforts Breeding for yield and fruit quality traits in tropical fruit crops is complex due to the polygenic nature of these traits and the genetic correlations among them. Breeding using modern

8

The Passion Fruit Genome

quantitative genetic approaches is still a requirement, especially for underutilised crops such as the passion fruit. Despite their economic importance, these crop species are largely neglected when it comes to conducting genetic studies to characterize target traits of agronomic interest. Passion fruit cultivation is a relatively recent activity. In Latin America, particularly in Brazil, it has become more and more important, notably over the last four decades. Currently, it has earned the country an outstanding ranking in the global scenario as the main producer. Commercial crops are based almost exclusively on a single species, Passiflora edulis, the sour passion fruit. Despite its economic value and widespread distribution, few cultivars have been released (reviewed by Cerqueira-Silva et al. 2014). A second species, P. alata, the sweet passion fruit, is native to the Brazilian plateau and the eastern Amazon region, but is cultivated as a low-intensity crop only in the South and Southeast of Brazil. It is appreciated for its typical aroma and flavour characteristics and can therefore command up to triple the price of the sour passion fruit at local markets (Fig. 8.1). Both crops provide a good alternative source of employment and income to small farmers. Over the last decades, work on breeding passion fruit varieties has been conducted in Brazilian public institutions with a view to releasing cultivars (see https://www.embrapa.br/ cultivar/maracuja), or alternatively for generating populations for basic research (estimating genetic parameters, e.g. heritability and genetic correlations), and investigating genotype vs. environmental interaction issues and the response to selection for fruit traits and yield. Despite the occurrence of negative correlations, selection indexes have been successfully used to select simultaneously for traits of interest, and an efficient response to selection was achieved for both the sour (Moraes et al. 2005; Silva et al. 2016) and sweet passion fruits (Pereira et al. 2017; Chavarría-Perez et al. 2020). In addition, genetic gains have been achieved by undertaking a longterm breeding programme under recurrent

133

selection (Silva et al. 2017; Rodrigues et al. 2020; Ferreira et al. 2021). Quantitative trait loci mapping for fruit quality traits has been reported in a segregating population of P. alata. The proportion of total of variance explained by all QTLs ranged from 42.0% to 64.3%, which is high and consistent with the high heritability values obtained by phenotypic analysis (Pereira et al. 2017). In P. edulis, QTLs associated with plant response to infection by Xanthomonas axonopodis (Xap), the pathogen that causes the main bacterial disease in commercial orchards, have been mapped (Lopes et al. 2006). Interestingly, an analysis of plant gene expression during the P. edulis–Xap interaction implicated the enzymes lipoxygenase 2 and (+)-neomenthol dehydrogenase in host defence (Munhoz et al. 2015). Instead, and given the climatic requirements and fruit preferences on local and world markets, the purple form of P. edulis is mostly grown in tropical to subtropical areas of East Africa (Kenya, Tanzania and Zimbabwe), Latin America (Colombia, Ecuador, Peru), USA (for instance, Hawaii, California) and Asia (China, India). It is also grown widely in Oceania (Australia and New Zealand) where the climate is warm enough but mild (reviewed in Castillo et al. 2020). There is considerable potential for increasing production in all these regions, especially in a diverse agricultural region such as the Andes in Latin America (see Ortiz et al. 2012). Experimental stations are used mainly for sample collection, hybridization and evaluation of purple passion fruit germplasm and hybrids. The purple passion fruit has the advantages of a high brix value, as well as concentrated flavour and sugars, making it easy to move the fruit juice around the world as a concentrate or to a lesser extent as fresh fruit (Matheri et al. 2016). Many private sector juice producers have set up processing plants in productive regions around the world for developing mixed juices and flavoured drinks, and exporting them to developed countries in order to take advantage of current interest in tropical blended juices (see Castillo et al. 2020). This explains the motivation for elucidating the genome structure of the purple form of

134

M. L. C. Vieira et al.

Fig. 8.1 The sweet passion fruit (Passiflora alata): a plant being visited by its pollinator, a large carpenter bee (Xylocopa sp.). Photo credit: A. R. Benedetti, Escola Superior de Agricultura ‘Luiz de Queiroz’, Universidade de São Paulo, Piracicaba, Brazil

P. edulis in recently published studies (Ma et al. 2021; Xia et al. 2021).

8.2

Sequencing and Assembly of Passiflora Genomes

Nuclear genome sizes have been estimated for some 70 species of Passiflora (Souza et al. 2004; Yotoko et al. 2011; Amorim et al. 2014). The highest variation so far reported was found in the subgenus Passiflora (2n = 18), with values ranging from 259.2 Mbp (1C DNA content = 0.265 pg by flow cytometry) in P. palmeri to 2621 Mbp (1C = 2.68) in P. quadrangularis. The lowest estimates were reported for the

subgenus Decaloba (2n = 12), with values ranging from 205.4 Mbp (1C = 0.21 pg) in P. organensis to 968.2 Mbp (1C = 0.99 pg) in P. auriculata. At the beginning of 2021, a chromosomescale genome assembly was published for a purple passion fruit cultivar grown in China (Xia et al. 2021). In order to obtain a high-quality genome assembly, different sequencing techniques, platforms and bioinformatic approaches were used. First, Illumina NovaSeq6000 2  150 bp paired-end reads (*74 Gbp of data, *53  coverage) resulted in a genome survey using k-mer-based analysis, estimating the genome size at 1,395.76 Mbp, with a GC content of 42%. In a second stage, Oxford

8

The Passion Fruit Genome

Nanopore PromethION long reads were used resulting in an N50 of 30 kb (*171.4 Gbp of data, *122.43  coverage). High-throughput chromosome conformation capture (Hi–C) was then used for further sequencing based on 2  150 bp paired-end reads on the Illumina NovaSeq6000 platform (140.5 Gbp, 100  coverage). Genome assembly and pseudomolecule scaffold generation based on Hi–C data have been implemented using tools developed by a Chinese service provider (https://github.com/ Nextomics) and the LACHESIS tool (Burton et al. 2013). As a result, an *1327.18 Mbp chromosomescale genome assembly representing 98.91% of the estimated genome size was assigned to 9 pseudochromosomes, and a total of 23,171 protein-coding genes were predicted. Most of these genes (96.1%) were assigned to chromosomal locations and unevenly distributed throughout the chromosomes, with a preference for the ends. Consistent with other plant genomes, part of the assembled sequences are repetitive and predominantly represented by transposable elements (TEs) (Xia et al. 2021; Table 8.1). Shortly after, the genome of the wild diploid species, P. organensis (2n = 12, subgenus Decaloba; Fig. 8.2) was assembled by adopting

Table 8.1 Passiflora genome statistics

a

135

short- and long-read technologies, combined with Bionano optical maps to improve assembly contiguity. A total of 92 Gbp of sequencing data (PacBio and Illumina) were generated, representing coverage of *160 of the P. organensis genome. However, in contrast to P. edulis, P. organensis has a small genome of 259 Mbp. An important assessment prior genome assembly is genome profiling, where k-mer frequencies within sequencing reads are analysed to estimate major characteristics such as heterozygosity rate. In P. organensis, a heterozygosity rate (according to the GenomeScope 2.0 algorithm and 2  250 bp Illumina short reads) of 81% was found, consistent with its reproductive system. Attempts to self-cross P. organensis were unproductive, suggesting that it is selfincompatible and therefore accumulates loci in the heterozygous state. Thanks to the Bionano optical maps and cytogenomic markers (satellite DNAs), it was possible to anchor many P. organensis genomic scaffolds to chromosomes. Repetitive sequences, including TEs, accounted for the majority (58.55%) of the genome space. The combination of transcriptomic alignments and ab initio gene predictions allowed the authors to identify 25,327 genes, amounting to 31% of the P. organensis genome (Costa et al. 2021; Table 8.1).

Type

P. edulis genomea

P. organensis genomeb

Assembled genome size (bp)

1,327,182,440

259,301,974

Number of scaffolds

9

360

Longest scaffold

204.53 Mbp

35.94 Mbp

Shortest scaffold

112.42 Mbp

4,400 bp

Number of scaffolds >1 Mb

9

38

Number of scaffolds >10 Mb

9

7

Scaffold N50 length

140.18 Mbp

8.26 Mbp

Scaffold L50 count

5

9

Contig N50 length

3.1 Mbp

2,458,705 bp

GC content (%)

38.68

38.3

Number of protein-coding genes

23,171

25,327

Repeat content (%)

23.61

58.55

Xia et al. (2021) Costa et al. (2021)

b

136

M. L. C. Vieira et al.

Fig. 8.2 The wild species Passiflora organensis. Note the extrafloral nectaries on the leaf blades. Photo credit: M. C. Dornelas, Universidade Estadual de Campinas, Brazil

8.2.1 Transposable Element Detection in Passiflora Genomes TEs are frequently found in eukaryotic genomes, including the Passiflora genomes sequenced to date. TEs can cause substantial, deleterious mutations, but genomes have evolved and created various mechanisms to suppress this activity (Muñoz-López and García-Pérez 2010; Schrader and Schmitz 2019). Conversely, genome-scale studies have revealed that TEs play a key role in genome function, chromosome evolution, speciation and diversity (Klein and O’Neill 2018).

Long terminal repeat retrotransposons (LTR-RT) are the predominant order of TEs found in plants and are also responsible for genome expansion in some species (Park et al. 2012; Vicient and Casacuberta 2017). In Passiflora, TEs have been largely scrutinized over the past few years. The first study investigated P. edulis, examining a large-insert genomic BAC (Bacterial Artificial Chromosomes) library that was built in France at the National Centre for Plant Genomic Resources (CNRGV: cnrgv.toulouse.inra.fr). It was found that 19.6% of the BAC-end sequences (some 10,000) consisted of repetitive elements, most of

8

The Passion Fruit Genome

137

which (94.4%) were TEs (Santos et al. 2014). Over a hundred BAC large inserts were subsequently assembled from long sequence reads, providing the first landscape of a P. edulis generich fraction (Munhoz et al. 2018; Fig. 8.3). TEs represented 17.6% of this fraction and were predominantly hosted in intergenic spaces (*70%), although some overlapped genes. LTR was the most frequent order, consisting mainly of elements from the Gypsy superfamily, with a predominance of RLG_peDel (or Tekay lineage, according to Neumann et al. 2019) (Costa et al. 2019). Using low coverage sequencing methods and cytogenomics, other studies have suggested that there is a high proportion of repetitive elements

in the P. edulis genome (Araya et al. 2017; Pamponét et al. 2019; Sader et al. 2019b; 2021). Some studies on the distribution of TEs revealed by fluorescent in situ hybridization have shown that the Gypsy and Copia elements are dispersed along the chromosomes of P. edulis (Pamponét et al. 2019; Sader et al. 2019b). Consistent with the cytogenetic findings, Xia’s investigation confirmed that LTR retrotransposons were the most abundant class of repetitive DNA in the P. edulis genome, with a predominance of Gypsy repeats, followed by Copia (Xia et al. 2021). In P. organensis, despite its genome size (259 Mbp), the majority (58.5%) consisted of repetitive elements, with a predominance of LTR-RTs (33.8%). Results suggest that there

Fig. 8.3 Collinear microsyntenic regions identified in Passiflora edulis (yellow bars), the Malpighiales, Populus trichocarpa (green bars) and Manihot esculenta chromosomes (brown bars). The P. edulis gene-rich BAC Pe164D9 contains 28 genes: 27 and 26 orthologous genes are in chromosome 4 and 17 of P. trichocarpa and

15 and 25 orthologous genes are in chromosome 1 and 2 of M. esculenta, respectively. A total of 16 and 12 P. edulis orthologs are duplicated in P. trichocarpa and M. esculenta chromosomes, respectively. P. trichocarpa chromosome 4 is 5  larger in size when compared with P. edulis BAC Pe164D9 due to the presence of TEs

138

was a massive expansion of the Tekay evolutionary lineage (Gypsy superfamily), representing 21% of the genome. Due to its prevalence, the Tekay lineage might have contributed to the structure and evolution of Passiflora genomes (Costa et al. 2019; 2021; Sader et al. 2021). Interestingly, in P. organensis LINE elements have also undergone an expansion (5.7% of the genome), despite their low frequency in other plants (Wicker et al. 2007). Conversely, a small fraction (*2.8%) of the P. organensis genome harboured DNA transposons, and unclassified TEs accounted for up to 15.8% (Costa et al. 2021). Regarding TE activity, insertion time analysis suggested recent activity of LTR-RTs, since the majority appeared between 0.5 and 5.2 million years ago (mya), including the Tekay lineage. Some copies of non-autonomous TEs that have interacted with host genes incorporating gene fragments were also identified. Interestingly, all DNA transposons classified as Helitron appeared to incorporate gene fragments in their sequences, mostly in domains related to resistance genes (Costa et al. 2021). Considering the data as a whole, we may speculate that TEs have impacted the structure of Passiflora genomes, and further in-depth analysis of TE content could provide new insights into Passiflora evolution.

8.2.2 Passiflora Cytogenomics Passiflora exhibits high karyotype diversity, with different chromosome numbers and ploidy levels reported for 150 species, *26% of the total of 575 species (Rice et al. 2015). Basic chromosome numbers differ and are associated with the taxonomic classification: x = 6 in the subgenus Decaloba (2n = 12, 14, 18, 22, 24 and 36); x = 9 in the subgenus Passiflora (2n = 18, 20, 36 and 72); and x = 12 in the subgenera Astrophea, Deidamioides and Tetrapathea (2n = 24) (Melo and Guerra 2003; Hansen et al. 2006; Sader et al. 2019a). Considering this variation, a number of hypotheses have been proposed. Based on the

M. L. C. Vieira et al.

number and position of heterochromatic bands and 5S/35S ribosomal DNA (rDNA) sites, x = 6 was suggested as the ancestral chromosome number, whereas other chromosomal numbers would have originated by polyploidy (from x = 6 to x = 12) and subsequently by descending dysploidy (x = 10 and x = 9) (Melo et al. 2001; Melo and Guerra 2003). In contrast, based on maximum parsimony analysis, Hansen et al. (2006) have suggested x = 12 as the ancestral number, supported by x = 12 in Adenia, a phylogenetically related genus. Based on probabilistic models, Mayrose et al. (2010) predicted x = 12 or x = 6, the latter corroborated by Sader et al. (2019b) who hypothesized x = 6 as the ancestral chromosome number using a similar approach but a larger sample size. Currently, with almost the whole genome sequence assembled for P. edulis (Ma et al. 2021; Xia et al. 2021) and P. organensis (Costa et al. 2021), comparisons of the number of synonymous substitutions per synonymous site (Ks) for homologous gene pairs suggest that a whole genome duplication event (WGD) occurred around 65–45 mya, after the divergence of Passifloraceae and Euphorbiaceae, but before the subdivision of the genus. The well-known whole genome triplication event (c) shared by the core eudicots was also detectable in P. edulis and P. organensis. In silico alignments of three P. edulis generich BACs (Pe_164K17, Pe_69O16 and Pe_164D9, described in Munhoz et al. 2018) with P. organensis sequences showed two copies of each BAC, with one copy more conserved than the other. These BAC sequences occurred twice in the genomes of the Malpighiales, Maninhot esculenta and Populus trichocarpa, consistent with the WGDs undergone by these genomes (Fig. 8.4). The peak Ks for homologous gene pairs of the purple passion fruit genome indicated an additional WGD event estimated to have occurred *12 mya (Xia et al. 2021). However, this additional, more recent WGD has not been confirmed in other studies (Cai et al. 2019; LeebensMack et al. 2019; Ma et al. 2021). Indeed, P. organensis shared the latest WGD event

8

The Passion Fruit Genome

139

Fig. 8.4 Microsyntenic alignments of a Passiflora edulis gene-rich BAC 164D9 with the assembled sequences of Passiflora organensis, Populus trichocarpa and Manihot esculenta: two copies of the P. edulis sequences were found in each species, corroborating the occurrence of the WGD event. Taken from Costa et al. (2021) under the terms of the Creative Commons Attribution licence

(dated to the Eocene) with P. edulis, which predates or coincides with the origin of the Passiflora genus around 42.9 mya. In conclusion, the genomic data support the idea of x = 12 for Passiflora, after a WGD when the Passifloraceae family originated. The reduction in chromosome number back to n = 6 was accompanied by reduction in genome size (for instance, in P. organensis) resembling the genome repatterning observed in Arabidopsis thaliana (Lysak et al. 2006). Alternatively, the reduced dysploidy in P. edulis could have been followed by an increase in genome size by amplification of repetitive sequences (Sader et al. 2021). No satellite DNAs (satDNA) or retroelements were associated with centromeres in the Passiflora species (Pamponét et al. 2019; Sader et al. 2019b, 2021; Dias et al. 2020). Centromeric repeats were not characterized in depth, but telomeric sequences were found at the terminal regions of four chromosomes of P. edulis (Ma et al. 2021). Telomeric repeats were also observed at interstitial regions of P. organensis, the first evidence of rearrangements in this species (Costa

et al. 2021). Satellite DNA amounted to 4% of the genome in P. organensis, and the most abundant satellites were found in proximal or terminal arrays in a few chromosome pairs (Sader et al. 2021), enabling some scaffolds to be assigned to chromosomes (Costa et al. 2021). Cytogenetic maps are available for P. edulis (Fig. 8.5) and P. organensis. The karyotype of P. edulis consists of two submetacentric and seven metacentric chromosomes that vary in size from 2.21 to 3.19 lm (Sader et al. 2019b). On the other hand, P. organensis has five pairs of metacentric chromosomes and one pair, the second largest, of submetacentric chromosomes, ranging from 3.62 to 1.61 µm in size (Costa et al. 2021).

8.2.3 Functional Annotation of Passiflora Genomes The purple passion fruit (P. edulis) genome harbours 23,171 protein-coding genes, and important gene families were identified, with emphasis on genes involved in the synthesis of

140

M. L. C. Vieira et al.

Fig. 8.5 Mitotic metaphase of the P. edulis. Chromosomes were counter-stained with DAPI (grey) and hybridized in situ with BACs Pe93G04 (yellow),

Pe164K17 (green), Pe214H11 (blue) and Pe216B22 (pink). Bar 5 µm. Photo credit: M. Sader and Y. Dias, Federal University of Pernambuco, Brazil

volatile organic compounds (VOC), providing insights for improving the flavour of fresh fruits. For instance, an integrated analysis of genomic, transcriptomic and metabolomic data showed that the ‘alpha-linolenic acid metabolism’, ‘metabolic pathways’ and ‘secondary metabolic pathways’ were involved in the synthesis of important VOCs. Candidate genes were also identified, including GDP-fucose Transporter 1like, Tetratricopeptide Repeat Protein 33, protein NETWORKED 4B Isoform X1 and Golgin Subfamily A Member 6-like protein 22. In addition, 13 important gene families involved in fatty acid pathways and 8 in terpene pathways were described, providing insights into flavour trait biology and providing valuable resources for improving fruit quality-related traits (Xia et al. 2021). A similar number of protein-coding genes (25,327) were found in the genome of the wild species P. organensis. Importantly, genes

potentially involved in the self-incompatibility determining locus were identified based on similarity and domain structure resembling S-locus glycoproteins (SLG) and S-locus receptor kinases (SRK). A set of 54 proteins in P. organensis have domains commonly detected in proteins encoded by the S-locus. A cluster of genes with the structural characteristics of the S-locus were found in P. organensis based on the proximity of the SRK and SLG candidate genes. Identification of the S-locus would be the first step towards improving our knowledge of SI in Passiflora (Costa et al. 2021). Comparisons of gene families presented herein show that P. edulis retains 11,774 gene families and 4032 singletons; 569 gene families are unique to this species. Based on an enrichment analysis for assigning gene ontology (GO) terms, overrepresented gene families relate to the ‘unsaturated fatty acid biosynthetic process’ and ‘terpene synthase activity’. In contrast,

8

The Passion Fruit Genome

P. organensis retains 12,603 gene families and 8677 singletons. In the 1425 gene families unique to this species, enriched GO terms were mostly related to plant development, such as ‘regulation of growth’ and ‘cell wall organization’. The term ‘recognition of pollen’ was frequently detected, possibly because it relates to the mechanism that led to self-incompatibility in Passiflora (Fig. 8.6).

Fig. 8.6 Venn diagram showing the distribution of orthologous gene families in Passiflora edulis and P. organensis which share 11,178 gene families. Taken

141

8.2.4 The Passiflora MADS-Box Gene Family and Phase Change Transitions Phase change control is mediated by complex regulatory networks that constantly perceive and respond to endogenous and environmental cues. In the model plant, A. thaliana, flowering (second phase change) occurs only if the plant’s

from Costa et al. (2021) under the terms of the Creative Commons Attribution licence

142

genetic and physiological backgrounds are prepared in order to respond to flowering stimuli (Poethig 2003; Wigge et al. 2005; Parcy 2005; Balanzà et al. 2018). Competence to flower is acquired after plants undergo the previous phase change from the juvenile to adult vegetative phase. Adult plants are then able to respond to flowering stimuli, such as photoperiod, temperature, vernalization, hormones and sugar content (Wahl et al. 2013; Yang et al. 2013; Yu et al. 2013, 2015; Matsoukas 2014; Teotia and Tang 2015). In P. edulis, both phase transitions (from juvenile to adult vegetative and from adult vegetative to adult reproductive) are morphologically characterized by changes in leaf shape (from lanceolate leaves with smooth margins to trilobate leaves with serrated margins), and the appearance of a tendril at each node after the first phase change, and flowers after the second, also in the axillary meristem of each node (Cutri et al. 2013; Chitwood and Otoni 2017). In this context, the MADS-box gene family is known to have important biological and evolutionary roles in Passiflora, especially with regard to reproductive development. Regarding the control of both phase changes, several studies have shown that, in A. thaliana, seven main pathways are involved: photoperiod, autonomous, vernalization, temperature, gibberellin (GA), age and sugar. They all encompass environmental and endogenous cues that crosstalk with each other (Blázquez et al. 2006; Teeri et al. 2006; Aguilar-Martínez et al. 2007; Su et al. 2011; Janssen et al. 2014; Martins et al. 2018). Recent analysis of RNA-seq data on P. edulis shoot apexes in each of the three phases revealed that several of these pathways are differentially expressed. It also revealed that there is more repression than induction of genes during phase changes, which means that plant maturation and development involve the suppression of inhibitors, such as DELLA transcripts (flowering inhibitor) and members of the APETALA2/ ETHYLENE RESPONSE FACTOR (AP2/ERF) family, which are responsible for maintaining plants in the juvenile phase (Teotia and Tang 2015; Jung et al. 2016). The decrease in the number of transcripts of AP2/ERF loci is

M. L. C. Vieira et al.

followed by increased expression of members of the SQUAMOSA PROMOTER BINDINGLIKE (SPL) gene family, which are known as regulators of the first phase transition (Taylor et al. 2002; Huijser and Schmid 2011; Spanudakis and Jackson 2014). In addition to SPL transcripts, members of the MADS-box gene family, such as AGAMOUS-LIKE 24 (AGL-24), AGAMOUS-LIKE 79 (AGL-79) and SUPPRESSOR OF OVEREXPRESSION OF CONSTANS1 (SOC1) are also upregulated throughout P. edulis development. These transcription factors are involved in the control of the second phase change, from adult vegetative to adult reproductive, and their active status indicates that the plant is already competent to respond to flowering stimuli. These results could provide a basis for further investigations using mutants and/or gene editing technology with the aim of improving fruit yield by promoting flowering in spite of adverse environmental conditions.

8.2.5 Passiflora Organellar Genomes The chloroplast genomes (cpDNA) of several of Passiflora species have been sequenced and annotated. As a whole, the molecule was found to have the typical cpDNA quadripartite structure, consisting of two copies of inverted repeats (IRs) separating two single copy regions, one large (LSC) and the other small (SSC). Molecule length varies considerably across species (approximately 55 kb) ranging from 113,114 bp in P. capsularis (subgenus Decaloba) to 167,953 bp in P. deidamioides (subgenus Deidamioides) (Cauz-Santos et al. 2020). In addition, rearrangements in the plastid genome structure, such as inversions and gene losses, were reported not only in P. edulis (Cauz-Santos et al. 2017), but also in other species (Rabah et al. 2019; Shrestha et al. 2019). In Passiflora, large IR expansions were identified, as well as the loss of an IR, a rare event in angiosperms (Cauz-Santos et al. 2020). The cp genomes contain between 102 and 109 genes, and this variation is related to the protein-coding genes

8

The Passion Fruit Genome

identified when species were compared; all species were found to have the same tRNA (30) and rRNA (4) gene content. Repetitive sequence analysis detected between 115 (P. alata) and 445 (P. contracta) repeats, most of them found in the intergenic sequences of the LSC or IR regions, but some repeats were located in the gene sequence itself (Cauz-Santos et al. 2020). A range of rearrangements, including inversions, expansions/losses of IR regions and gene losses, has been detected, making Passiflora one of the few groups with complex chloroplast genome evolution. Comparing the four subgenera, inversions were particularly marked in the subgenus Astrophea, but surprisingly not in other basal genera in the Passifloraceae family (Shrestha et al. 2019). This suggests that the inversions in the Astrophea subgenus occurred after the separation of the Passiflora genus from its ancestors. The Decaloba subgenus exhibits many different inversions, possibly because of the high number of repeat structures, as well as large IR expansions that are typical of this subgenus (Fig. 8.7). In contrast, Deidamioides species have a cpDNA structure similar to that of other Passifloraceae, with just one small inversion in the LSC region. However, the Deidamioides subgenus differs from the other two Passifloraceae in that it has large IR expansions. Finally, species of subgenus Passiflora exhibit conserved structures and rearrangements in the LSC region (Cauz-Santos et al. 2017, 2020). Importantly, two mitochondrial DNA molecules (mtDNA) of P. organensis were assembled, throughout confirmed with long reads (PacBio and Nanopore) and pairs of short Illumina reads. The linear sequence represented the master mitochondrial DNA molecule comprising 1,031,229 bp, harbouring at least one full copy of most canonical mitochondrial proteinencoding genes (32) (rRNA and tRNA genes). Interspersed with these genes were three mitoviruses comprising a repetitive region of *1800 bp, besides relics of other genetic mobile and repetitive elements. The mitogenome of P. organensis was also found to exhibit interspersed sequences of

143

cpDNA. The various repetitive sequences were interspersed within intergenic regions, making them potential sites for rearrangements. A combination of introns, some requiring trans-splicing reactions, was found in genes encoding subunits of the mitochondrial membrane respiratory chain. The second molecule is circular comprising 102,307 bp, harbouring full versions of two canonical genes missing from the linear molecule (rsp19, rpl10), besides extra Lys- and MettRNAs (see Costa et al. 2021). Remarkably, mitovirus-related sequences were not present in the P. edulis mitogenome. The accession studied by the Chinese group is called as ‘Baixiang-guo’ and is very popular throughout southern China. The authors found an mtDNA of 680,480 bp in length and predicted 74 genes, including protein-coding genes (41), and the same number of rRNA (3) and tRNA (30) genes as P. organensis (Yang and Wang 2020).

8.3

Conclusion

In conclusion, passion fruit is an underutilized fruit crop which is adapted to a range of environments. The main commercialized species is P. edulis; both the purple and sour forms are found in orchards in tropical and subtropical areas. In this chapter, we summarized and compared the data relating to the draft genome of the wild species P. organensis and the genome structure of the purple form of P. edulis. This innovation is of unquestionable importance for plant breeding and genetic research on the cultivated Passiflora species, a non-model and underutilized group of fruit crops. Future prospects in this scenario concern two main areas of interest: (a) producing passion fruit varieties in the era of molecular breeding and (b) the practice of comparative genomics. To produce better varieties, molecular and phenotypic information on passion fruit experimental populations is needed to accelerate genetic gains continuously, namely for fruit quality traits, in order to meet the needs of consumers and the juice industry.

144

M. L. C. Vieira et al.

Fig. 8.7 Passiflora organensis chloroplast genome map. Genes are represented as boxes inside or outside the large circle to indicate clockwise (inside) or counterclockwise (outside) transcription. The colour of the gene

boxes indicates the functional group to which the gene belongs. Taken from Costa et al. (2021) under the terms of the Creative Commons Attribution licence

In addition, a comprehensive analysis of the genes involved in self-incompatibility (SI) remains a challenge in P. edulis. There is no single methodology at present that will detect all the types of SI alleles, but sequence resources are beginning to offer a clearer and more detailed picture of SI cluster composition, an initial step towards the development of allele-specific markers for use in determining compatible

parents for breeding programmes. The good news is that we have gained greater knowledge of the SI genes in the wild species, P. organensis. Connecting genes to phenotypes will become more and more important in the near future. Comparative genomics will help us to understand the evolutionary course of the Passiflora genus, in particular its diversification into subgroups that must have undergone several events,

8

The Passion Fruit Genome

such as deletions, duplications, inversions and occasionally translocations. Comparisons of karyotypes can reveal the changes that may have led to phenotypic variability through chromosome rearrangements. An obvious difference between Passiflora karyotypes is the number of chromosomes, which varies from one subgenus to another. The availability of complete genomes will facilitate the development of oligo-FISH technology, opening up new avenues for disentangling phylogenetic evolution and chromosomal speciation in Passiflora.

References Abreu PP, Souza MM, Santos EA et al (2009) Passion flower hybrids and their use in the ornamental plant market: perspectives for sustainable development with emphasis on Brazil. Euphytica 166:307–315. https:// doi.org/10.1007/s10681-008-9835-x Aguilar-Martínez JA, Poza-Carrión C, Cubas P (2007) Arabidopsis BRANCHED1 acts as an integrator of branching signals within axillary buds. Plant Cell 19:458–472. https://doi.org/10.1105/tpc.106.048934 Amorim JS, Souza MM, Viana AJC et al (2014) Cytogenetic, molecular and morphological characterization of Passiflora capsularis L. and Passiflora rubra L. Plant Syst Evol 300:1147–1162. https://doi. org/10.1007/s00606-013-0952-1 Araya S, Martins AM, Junqueira NTV et al (2017) Microsatellite marker development by partial sequencing of the sour passion fruit genome (Passiflora edulis Sims). BMC Genomics 18:549. https://doi.org/10. 1186/s12864-017-3881-5 Balanzà V, Martínez-Fernández I, Sato S et al (2018) Genetic control of meristem arrest and life span in Arabidopsis by a FRUITFULL-APETALA2 pathway. Nat Commun 9:1–9. https://doi.org/10.1038/s41467018-03067-5 Bernacci LC, Nunes TS, Mezzonato AC, Milward-deAzevedo MA, et al (2020) Passiflora. In: Flora do Bras, 2020. http://floradobrasil.jbrj.gov.br/reflora/ floradobrasil/FB12506. Accessed on 17 Apr 2020 Blázquez MA, Ferrándiz C, Madueño F, Parcy F (2006) How floral meristems are built. Plant Mol Biol 60:855–870. https://doi.org/10.1007/s11103-0060013-z Burton JN, Adey A, Patwardhan RP et al (2013) Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat Biotechnol 31:1119–1125. https://doi.org/10.1038/ nbt.2727 Cai L, Xi Z, Amorim AM et al (2019) Widespread ancient whole-genome duplications in Malpighiales coincide

145 with Eocene global climatic upheaval. New Phytol 221:565–576. https://doi.org/10.1111/nph.15357 Castillo NR, Ambachew D, Melgarejo LM, Blair MW (2020) Morphological and agronomic variability among cultivars, landraces, and genebank accessions of purple passion fruit, Passiflora edulis f. edulis. HortScience 55:768–777. https://doi.org/10.21273/ HORTSCI14553-19 Castro ÉCP, Zagrobelny M, Cardoso MZ, Bak S (2018) The arms race between heliconiine butterflies and Passiflora plants—new insights on an ancient subject. Biol Rev 93:555–573. https://doi.org/10.1111/brv. 12357 Cauz-Santos LA, Costa ZP, Callot C et al (2020) A repertory of rearrangements and the loss of an inverted repeat region in Passiflora chloroplast genomes. Genome Biol Evol 12:1841–1857. https://doi.org/10. 1093/gbe/evaa155 Cauz-Santos LA, Munhoz CF, Rodde N et al (2017) The chloroplast genome of Passiflora edulis (Passifloraceae) assembled from long sequence reads: structural organization and phylogenomic studies in Malpighiales. Front Plant Sci 8:334. https://doi.org/10.3389/ fpls.2017.00334 Cerqueira-Silva CBM, Conceição LDHCS, Souza AP, Corrêa RX (2014) A history of passion fruit woodiness disease with emphasis on the current situation in Brazil and prospects for Brazilian passion fruit cultivation. Eur J Plant Pathol 1–10. https://doi.org/ 10.1007/s10658-014-0391-z Chavarría-Perez LM, Giordani W, Dias KOG et al (2020) Improving yield and fruit quality traits in sweet passion fruit: evidence for genotype by environment interaction and selection of promising genotypes. PLoS ONE 15:e0232818. https://doi.org/10.1371/ journal.pone.0232818 Chitwood DH, Otoni WC (2017) Morphometric analysis of Passiflora leaves: the relationship between landmarks of the vasculature and elliptical Fourier descriptors of the blade. Gigascience 6:1–13. https://doi.org/ 10.1093/gigascience/giw008 Costa ZP, Cauz-Santos LA, Ragagnin GT et al (2019) Transposable element discovery and characterization of LTR—retrotransposon evolutionary lineages in the tropical fruit species Passiflora edulis. Mol Biol Rep 46:6117–6133. https://doi.org/10.1007/s11033-01905047-4 Costa ZP, Varani AM, Cauz-Santos LA et al (2021) A genome sequence resource for the genus Passiflora, the genome of the wild diploid species Passiflora organensis. The Plant Genome. https://doi.org/10. 1002/tpg2.20117 Cutri L, Nave N, Ben AM et al (2013) Evolutionary, genetic, environmental and hormonal-induced plasticity in the fate of organs arising from axillary meristems in Passiflora spp. Mech Dev 130:61–69. https://doi.org/10.1016/j.mod.2012.05.006 Deng J, Zhou Y, Bai M et al (2010) Anxiolytic and sedative activities of Passiflora edulis f. flavicarpa.

146 J Ethnopharmacol 128:148–153. https://doi.org/10. 1016/j.jep.2009.12.043 Dias Y, Sader MA, Vieira MLC, Pedrosa-Harand A (2020) Comparative cytogenetic maps of Passiflora alata and P. watsoniana (Passifloraceae) using BACFISH. Plant Syst Evol 306:51. https://doi.org/10.1007/ s00606-020-01675-7 Ferreira AFN, Krause W, Cordeiro MHM et al (2021) Multivariate analysis to quantify genetic diversity and family selection in sour passion fruit under recurrent selection. Euphytica 217:1–13. https://doi.org/10. 1007/s10681-020-02740-5 Hansen AK, Gilbert LE, Simpson BB et al (2006) Phylogenetic relationships and chromosome number evolution in Passiflora. Syst Bot 31:138–150. https:// doi.org/10.1600/036364406775971769 Huijser P, Schmid M (2011) The control of developmental phase transitions in plants. Development 138:4117–4129. https://doi.org/10.1242/dev.063511 Janssen BJ, Drummond RSM, Snowden KC (2014) Regulation of axillary shoot development. Curr Opin Plant Biol 17:28–35. https://doi.org/10.1016/j.pbi. 2013.11.004 Jung JH, Lee HJ, Ryu JY, Park CM (2016) SPL3/4/5 integrate developmental aging and photoperiodic signals into the FT-FD module in Arabidopsis flowering. Mol Plant 9:1647–1659. https://doi.org/10. 1016/j.molp.2016.10.014 Klein SJ, O’Neill RJ (2018) Transposable elements: genome innovation, chromosome diversity, and centromere conflict. Chromosom Res 26(1–2):5–23. https://doi.org/10.1007/s10577-017-9569-5 Krist S (2020) Passion fruit seed oil. In: Krist S (ed) Vegetable fats and oils. Springer Cham, pp 535–539 Leebens-Mack JH, Barker MS, Carpenter EJ et al (2019) One thousand plant transcriptomes and the phylogenomics of green plants. Nature 574:679–685. https:// doi.org/10.1038/s41586-019-1693-2 Lopes R, Lopes MTG, Carneiro MS et al (2006) Linkage and mapping of resistance genes to Xanthomonas axonopodis pv. passiflorae in yellow passion fruit. Genome 49:17–29. https://doi.org/10.1139/G05-081 Lysak MA, Berr A, Pecinka A et al (2006) Mechanisms of chromosome number reduction in Arabidopsis thaliana and related Brassicaceae species. Proc Natl Acad Sci USA 103:5224–5229. https://doi.org/10.1073/ pnas.0510791103 Ma D, Dong S, Zhang S et al (2021) Chromosome-level reference genome assembly provides insights into aroma biosynthesis in passion fruit (Passiflora edulis). Mol Ecol Resour 21:955–968. https://doi.org/10.1111/ 1755-0998.13310 Martins AO, Nunes-Nesi A, Araújo WL, Fernie AR (2018) To bring flowers or do a runner: gibberellins make the decision. Mol Plant 11:4–6. https://doi.org/ 10.1016/j.molp.2017.12.005 Matheri F, Mwangi M, Runo S et al (2016) Phenotypic characterization of selected Kenyan purple and yellow passion fruit genotypes based on morpho-agronomic

M. L. C. Vieira et al. descriptors. Adv Crop Sci Technol 4. https://doi.org/ 10.4172/2329-8863.1000226 Matsoukas IG (2014) Interplay between sugar and hormone signaling pathways modulate floral signal transduction. Front Genet 5:218. https://doi.org/10. 3389/fgene.2014.00218 Mayrose I, Barker MS, Otto SP (2010) Probabilistic models of chromosome number evolution and the inference of polyploidy. Syst Biol 59:132–144. https:// doi.org/10.1093/sysbio/syp083 Melo NF, Cervi AC, Guerra M (2001) Karyology and cytotaxonomy of the genus Passiflora L. (Passifloraceae). Plant Syst Evol 226:69–84. https://doi.org/10. 1007/s006060170074 Melo NF, Guerra M (2003) Variability of the 5S and 45S rDNA sites in Passiflora L. species with distinct base chromosome numbers. Ann Bot 92:309–316. https:// doi.org/10.1093/aob/mcg138 Moraes MC, Geraldi IO, De Pina Matta F, et al (2005) Genetic and phenotypic parameter estimates for yield and fruit quality traits from a single wide cross in yellow passion fruit. HortScience 40:1978–1981. https://doi.org/10.21273/HORTSCI.40.7.1978 Munhoz CF, Costa ZP, Cauz-Santos LA et al (2018) A gene-rich fraction analysis of the Passiflora edulis genome reveals highly conserved microsyntenic regions with two related Malpighiales species. Sci Rep 8:13024. https://doi.org/10.1038/s41598-01831330-8 Muñoz-López M, García-Pérez JL (2010) DNA transposons: nature and applications in genomics. Curr Genomics 11:115–128 Munhoz CF, Santos AA, Arenhart RA et al (2015) Analysis of plant gene expression during passion fruit —Xanthomonas axonopodis interaction implicates lipoxygenase 2 in host defence. Ann Appl Biol 167:135–155. https://doi.org/10.1111/aab.12215 Neumann P, Novák P, Ho N (2019) Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob DNA 10:1–17. https://doi.org/10.1186/s13100-018-0144-1 Ortiz DC, Bohórquez A, Duque MC et al (2012) Evaluating purple passion fruit (Passiflora edulis Sims f. edulis) genetic variability in individuals from commercial plantations in Colombia. Genet Resour Crop Evol 59:1089–1099. https://doi.org/10.1007/ s10722-011-9745-y Pamponét VCC, Souza MM, Silva GS et al (2019) Low coverage sequencing for repetitive DNA analysis in Passiflora edulis Sims: citogenomic characterization of transposable elements and satellite DNA. BMC Genomics 20:1–17. https://doi.org/10.1186/s12864019-5576-6 Parcy F (2005) Flowering: a time for integration. Int J Dev Biol 49:585–593. https://doi.org/10.1387/ijdb. 041930fp Park M, Park J, Kim S et al (2012) Evolution of the large genome in Capsicum annuum occurred through accumulation of single-type long terminal repeat

8

The Passion Fruit Genome

retrotransposons and their derivatives. Plant J 69:1018–1029. https://doi.org/10.1111/j.1365-313X. 2011.04851.x Pereira G da S, Di Cassia Laperuta L, Nunes ES et al (2017) The sweet passion fruit (Passiflora alata) crop: genetic and phenotypic parameter estimates and QTL mapping for fruit traits. Trop Plant Biol 10:18–29. https://doi.org/10.1007/s12042-016-9181-4 Poethig RS (2003) Phase change and the regulation of developmental timing in plants. Science (80-) 301:334– 336. https://doi.org/10.1126/science.1085328 Rabah SO, Shrestha B, Hajrah NH et al (2019) Passiflora plastome sequencing reveals widespread genomic rearrangements. J Syst Evol 57:1–14. https://doi.org/ 10.1111/jse.12425 Ramaiya SD, Bujang JS, Zakaria MH (2014) Assessment of total phenolic, antioxidant, and antibacterial activities of Passiflora species. Sci World J 2014. https:// doi.org/10.1155/2014/167309 Rice A, Glick L, Abadi S et al (2015) The chromosome counts database (CCDB)—a community resource of plant chromosome numbers. New Phytol 206:19–26. https://doi.org/10.1111/nph.13191 Rodrigues DL, Viana AP, Vieira HD et al (2020) Responses of sour passion fruit (Passiflora edulis Sims) seeds from the third recurrent selection cycle during storage. Acta Agron 69:61–67 Sader MA, Amorim BS, Costa L, Souza G (2019a) The role of chromosome changes in the diversification of Passiflora L. (Passifloraceae). Syst Biodivers 17:7–21. https://doi.org/10.1080/14772000.2018.1546777 Sader MA, Dias Y, Costa ZP et al (2019b) Identification of passion fruit (Passiflora edulis) chromosomes using BAC-FISH. Chromosom Res 7:299–311. https://doi. org/10.1007/s10577-019-09614-0 Sader M, Vaio M, Cauz-Santos LA et al (2021) Large vs small genomes in Passiflora: the influence of the mobilome and the satellitome. Planta 253:86. https:// doi.org/10.1007/s00425-021-03598-0 Santos EA, Souza MM, Abreu PP et al (2012) Confirmation and characterization of interspecific hybrids of Passiflora L. (Passifloraceae) for ornamental use. Euphytica 184:389–399. https://doi.org/10.1007/ s10681-011-0607-7 Santos A, Penha H, Bellec A et al (2014) Begin at the beginning: a BAC-end view of the passion fruit (Passiflora) genome. BMC Genomics 15:816. https:// doi.org/10.1186/1471-2164-15-816 Schrader L, Schmitz J (2019) The impact of transposable elements in adaptive evolution. Mol Ecol 28:1537– 1549. https://doi.org/10.1111/mec.14794 Shrestha B, Weng ML, Theriot EC et al (2019) Highly accelerated rates of genomic rearrangements and nucleotide substitutions in plastid genomes of Passiflora subgenus Decaloba. Mol Phylogenet Evol 138:53–64. https://doi.org/10.1016/j.ympev.2019.05. 030 Silva FHL, Muñoz PR, Vincent CI, Viana AP (2016) Generating relevant information for breeding Passiflora edulis: genetic parameters and population

147 structure. Euphytica 208:609–619. https://doi.org/10. 1007/s10681-015-1616-8 Silva FH de L e, Viana AP, Freitas JCDO et al (2017) Prediction of genetic gains by selection indexes and REML/BLUP methodology in a population of sour passion fruit under recurrent selection. Acta Sci Agron 39:183. https://doi.org/10.4025/actasciagron.v39i2.32554 Souza MM, Palomino G, Pereira TNS et al (2004) Flow cytometric analysis of genome size variation in some Passiflora species. Hereditas 38:31–38. https://doi. org/10.1111/j.1601-5223.2004.01739.x Spanudakis E, Jackson S (2014) The role of microRNAs in the control of flowering time. J Exp Bot 65:365– 380. https://doi.org/10.1093/jxb/ert453 Su YH, Liu YB, Zhang XS (2011) Auxin-cytokinin interaction regulates meristem development. Mol Plant 4:616–625. https://doi.org/10.1093/mp/ssr007 Suassuna T de MF, Bruckner H, de Carvalho R, Borem A (2003) Self-incompatibility in passionfruit: evidence of gametophytic-sporophytic control. Theor Appl Genet 106:298–302. https://doi.org/10.1007/s00122002-1103-1 Taylor SA, Hofer JMI, Murfet IC et al (2002) PROLIFERATING INFLORESCENCE MERISTEM, a MADS-box gene that regulates floral meristem identity in pea. Plant Physiol 129:1150–1159. https://doi. org/10.1104/pp.001677 Teeri TH, Uimari A, Kotilainen M et al (2006) Reproductive meristem fates in Gerbera. J Exp Bot 57:3445–3455. https://doi.org/10.1093/jxb/erl181 Teotia S, Tang G (2015) To bloom or not to bloom: role of micrornas in plant flowering. Mol Plant 8:359–377. https://doi.org/10.1016/j.molp.2014.12.018 Ulmer T, MacDougal JM (2004) Passiflora: passionflowers of the world. Timber Press, Cambridge Vicient CM, Casacuberta JM (2017) Impact of transposable elements on polyploid plant genomes. Ann Bot 120:195–207. https://doi.org/10.1093/aob/mcx078 Wahl V, Ponnu J, Schlereth A et al (2013) Regulation of flowering by trehalose-6-phosphate signaling in Arabidopsis thaliana. Science (80-) 339:704–707. https:// doi.org/10.1126/science.1230406 Wicker T, Sabot FF, Hua-Van AA et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982. https://doi.org/10. 1038/nrg2165 Wigge PA, Kim MC, Jaeger KE et al (2005) Integration of spatial and temporal information during floral induction in Arabidopsis. Science (80-) 309:1056– 1059. https://doi.org/10.1126/science.1114358 Xia Z, Huang D, Zhang S et al (2021) Chromosome-scale genome assembly provides insights into the evolution and flavor synthesis of passion fruit (Passiflora edulis Sims). Hortic Res 8:14. https://doi.org/10.1038/ s41438-020-00455-1 Yang J, Wang X-A (2020) The complete mitochondrial genome of a yellow passion fruit (Passiflora edulis Sims. f. flavicarpa Deg.) in China and phylogenetic relationships. Mitochondrial DNA Part B 5:1598–1600. https://doi.org/10.1080/23802359.2020.1742622

148 Yang L, Xu M, Koo Y et al (2013) Sugar promotes vegetative phase change in Arabidopsis thaliana by repressing the expression of MIR156A and MIR156C. Elife 2013:e00260. https://doi.org/10.7554/eLife.00260 Yotoko KSC, Dornelas MC, Togni PD et al (2011) Does variation in genome sizes reflect adaptive or neutral processes? New Clues from Passiflora. Plos One 6: e18212. https://doi.org/10.1371/journal.pone.0018212

M. L. C. Vieira et al. Yu S, Li C, Zhou CM et al (2013) Sugar is an endogenous cue for juvenile-to-adult phase transition in plants. Elife 2013:e00269. https://doi.org/10.7554/eLife. 00269 Yu S, Lian H, Wang JW (2015) Plant developmental transitions: the role of microRNAs and sugars. Curr Opin Plant Biol 27:1–7. https://doi.org/10.1016/j.pbi. 2015.05.009

9

The Soursop Genome (Annona muricata L., Annonaceae) Joeri S. Strijk, Damien D. Hinsinger, Mareike M. Roeder, Lars W. Chatrou, Thomas L. P. Couvreur, Roy H. J. Erkens, Hervé Sauquet, Michael D. Pirie, Daniel C. Thomas, and Kunfang Cao

Abstract

The Annonaceae family contains important tropical crops, but the number of species used commercially is limited, and development of other promising species for cultivation is hindered by a lack of genomic resources to support the building of breeding programmes. The family is part of the magnoliids, an ancient lineage of angiosperms for which evolutionary relationships with other major clades have remained unclear. To provide novel resources to both plant breeders and J. S. Strijk (&)  D. D. Hinsinger Alliance for Conservation Tree Genomics, Pha Tad Ke Botanical Garden, Luang Prabang, Lao PDR e-mail: [email protected] J. S. Strijk Institute for Biodiversity and Environmental Research, Universiti Brunei Darussalam, Jalan Tungku Link BE1410, Brunei Darussalam M. M. Roeder Xishuangbanna Tropical Botanical Garden (CAS), Menglun, China M. M. Roeder Aueninstitut, Institute for Geography and Geoecology, Karlsruhe Institute of Technology, Rastatt, Germany L. W. Chatrou Systematic and Evolutionary Botany lab, Ghent University, Ghent, Belgium

evolutionary research, we described the chromosome-level genome assembly of the soursop (Annona muricata L.), using DNA data generated with PacBio and Illumina short-read technology, in combination with 10XGenomics, BioNano data, and Hi-C sequencing. To disentangle key angiosperm relationships, we reconstructed phylogenomic trees comparing a wider sampling of available angiosperm genomes and reveal that the soursop represents a genomic mosaic supporting different evolutionary histories, with scaffolds almost exclusively supporting singular R. H. J. Erkens Maastricht Science Program, Maastricht University, Maastricht, The Netherlands H. Sauquet National Herbarium of New South Wales, Royal Botanic Gardens and Domain Trust, Sydney, NSW, Australia M. D. Pirie Department of Natural History, University Museum, University of Bergen, Bergen, Norway D. C. Thomas National Parks Board, Singapore Botanic Gardens, Singapore, Singapore K. Cao State Key Laboratory for Conservation and Utilization of Subtropical Agro-bioresources, Guangxi University, Nanning, China

T. L. P. Couvreur IRD, DIADE, University Montpellier, Montpellier, France © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_9

149

150

J. S. Strijk et al.

topologies. However, coalescent methods and a majority of genes support magnoliids as sister to monocots and eudicots, where previously published whole genome-based studies remained inconclusive. The soursop genome highlights the need for more early diverging angiosperm genomes and critical assessment of the suitability of such genomes for inferring evolutionary history. The soursop is the first genome assembled in Annonaceae and supports further studies of floral evolution in magnoliids, whilst providing an essential resource for delineating relationships of major lineages at the base of the angiosperms. Both genome-assisted improvement in promising Annonaceae fruit crops and conservation efforts will be strengthened by the availability of the soursop genome. The genome assembly as a community resource will further strengthen the role of Annonaceae as a model group for research on the ecology, evolution, and domestication potential of tropical species in pomology and agroforestry.

9.1

Introduction

9.1.1 Early Angiosperm Genome Evolution The emergence of angiosperms was a geologically sudden and ecologically transformative event in the history of life. Recent analyzes have suggested that the major angiosperm clades diverged in quick succession within the Cretaceous, with monocots, magnoliids, and eudicots starting to diversify within ca. 5 Ma (Ramírez-Barahona et al. 2020). Deep relationships and the sequence of divergence amongst major lineages of angiosperms (magnoliids, monocots, and eudicots) have remained ambiguous and differ depending on analytical approaches and datasets used. Complete genomes potentially provide opportunities to resolve these uncertainties, but recently published magnoliid genomes instead have provided further conflicting signals. Reconstructing the sequence of rapid speciation events in deep time is a major challenge in evolutionary inference. Bursts of

diversification result in short branches within phylogenetic trees and the persistence of discrepancies between histories of individual genes, genomes, and the underlying species tree (Oliver et al. 2013). These phenomena are prevalent across the tree of life, including the origin of important, species rich lineages such as tetrapods (Song et al. 2012; Jarvis et al. 2014), insects (Freitas et al. 2018), and flowering plants (Tank et al. 2015). Whole genome sequencing (WGS) and application of the multispecies coalescent offer a new and unprecedented opportunity to reconstruct such recalcitrant relationships (Edwards 2009; Edwards et al. 2016). But despite steady progress in the reconstruction of the angiosperm phylogeny and evolution, deeper nodes have proven notoriously difficult to resolve, in particular the relationships between Ceratophyllum, Chloranthaceae, monocots, magnoliids, and eudicots (Ruhfel et al. 2014; Wickett et al. 2014; Zeng et al. 2014). Potential relationships amongst the main angiosperms lineages can be summarized as either (1) magnoliids sister to eudicots + monocots (Moore et al. 2010; Qiu et al. 2010; Soltis et al. 2011; Zhang et al. 2012); (2) magnoliids sister to eudicots (Bell et al. 2010; Moore et al. 2011; Zeng et al. 2014); or (3) magnoliids sister to monocots (Nickrent and Soltis 2006; RamírezBarahona et al. 2020). Each of these topologies was previously inferred from organelles (chloroplast and mitochondrial loci) and/or nuclear datasets, with variable levels of support depending on the analytical method and taxonomic sampling used. In parallel, recently published studies challenge our understanding of the early evolution of angiosperms. Reconstruction of the ancestral angiosperm genome shows a reduction in the number of chromosomes between the divergence of the putative sister taxa of all remaining angiosperms, the iconic Amborella trichopoda, and the most recent common ancestor (MRCA) of eudicots (Murat et al. 2017). In addition to this assessment of early angiosperm genomic features, Sauquet et al. (2017) suggested that the early evolution of angiosperm flowers was marked by successive reductions in the number of whorls of both the

9

The Soursop Genome (Annona muricata L., Annonaceae)

perianth and the androecium. In both cases, disparate reductions may have paved the way for the evolution of clade specific features of genomes and flower morphology in contemporary clades. Since the Arabidopsis Genome Initiative (2000), the number of sequenced eudicot and monocot genomes has steadily increased. Despite this increase in sequencing effort, basal angiosperm diversity represented by the ancient lineages of Nymphaeales, Austrobaileyales, Chloranthales, and magnoliids has mostly been overlooked, the most notable exception being Amborella (Amborella Genome Project 2013). Following eudicots and monocots, Magnoliidae are the most diverse clade of angiosperms (Massoni et al. 2014) with 9000–10,000 species, distributed across four orders (Canellales, Piperales, Laurales, and Magnoliales). Despite this diversity and significant economic value of many species (e.g. avocado, black pepper, cinnamon, soursop), only two genomes have been published to date (Chaw et al. 2019; Chen et al. 2019). Contrary to expectations, comparative genomic analyzes did not resolve the still unclear relationships of magnoliids with the rest of angiosperms (Strijk et al. 2019b). Results strongly disagreed on the position of magnoliids, supporting either a sister relationship to eudicots and monocots (Chen et al. 2019), or to eudicots alone (Chaw et al. 2019). This disagreement could be an analytical artefact caused by whole genome duplication (WGD) because chromosomal rearrangements are apparent in both Liriodendron chinense (Magnoliaceae) and Cinnamomum kanehirae (Lauraceae) after their divergence from monocots or eudicots. More magnoliid genomes and critical assessment of the potential for such phenomena to impact phylogenetic inference are needed to break the impasse.

9.1.2 The Custard Apple Family (Annonaceae) and Pomology The custard apple family or Annonaceae (Rainer and Chatrou 2014) are the second most species rich family in the magnoliids (Chatrou et al.

151

2012) with nearly 2500 species known to science (Neotropics: *950; Africa: *450; Indomalaya: *950). Annonaceae are frequent components of (sub-)tropical rainforests worldwide (Gentry 1993; Tchouto et al. 2006; Punyasena et al. 2008; Sonké and Couvreur 2014) and are important structural components of the forest. They occur primarily as scrambling shrubs, lianas, or medium to emergent trees. As of 2020, the conservation status of over 800 species of Annonaceae has been assessed using IUCN Red List criteria, and more than 40% are currently in one of the three threatened categories (IUCN 2020). The major threats to members of the family come from (small local-) farming and urbanization. Many species in the family have been in use traditionally by local people due to their structural, nutritional, or beneficial health properties. For example, species in the genus Annona are used in a wide array of applications due to a rich collection of bioactive compounds present in roots, leaves, bark, fruits, and seeds (for a complete overview, see Pinto et al. 2006 and references therein). Another, commercial, well-known example from Asia is ylang-ylang (Cananga odorata (Lam.) Hook.f. & Thomson; grown for its aromatic oils for use in perfumes). More locally, petals of some Neotropical Cymbopetalum species have been used in Mesoamerica since ancient times until today to flavour chocolate, dishes, and drinks (Popenoe 1919), whilst leaves of Uvariopsis tripetala (Bak.f.) G.E. Schatz (Pepperfruit) are popular to spice meat dishes and to treat ailments in western Africa (Aniama et al. 2016). Parts of some species (leaf, bark) are used as dyes and colour agents (Annona muricata L. (Pinto et al. 2005); Hubera nitidissima (Dunal) Chaowasku (Toussirot et al. 2014)), whilst Lancewood (Oxandra lanceolata (Sw.) Baill.) is grown commercially for its favourable combination of wood properties (elastic, whilst very durable). Species of the genera Annona, Asimina, and Uvaria are well known and used for edible fruits, medicinal, and/or pharmaceutical properties. Phylogenetic studies began using extensive character-based analyzes (Koek-Noorman et al. 1990; Doyle and LeThomas 1994, 1996) but

152

have since moved to molecular marker-based studies (Sauquet et al. 2003; Chatrou et al 2012; Thomas et al. 2012; Pirie et al. 2006) and more recently to genome-wide analyzes of divergence (Couvreur et al. 2019). Two clades differing in overall branch length patterns (first defined by Richardson et al. 2004), coined the long branch and short branch clade, divide the family. Taxa contained within differ markedly in levels of genetic divergence and following work by Chatrou et al. (2012), the family is now taxonomically subdivided in the four subfamilies Anaxagoreoideae, Ambavioideae, Annonoideae, and Malmeoidea. As of 2020, a fully resolved phylogeny of the genus Annona is not yet available, but a new EU-funded initiative is underway to construct a global evolutionary framework for the family as a whole (GLOBAL —http://www.couvreurlab.org/erc-global.html). The genus Annona is the second largest in the family and consists of *180–200 species, most of whom originated in the Neotropics and Caribbean region, with the exception of some species (e.g. A. senegalensis) that are of African origin (Pinto et al. 2005). Most species in the genus Annona are (semi-)deciduous, even in tropical areas and depending on the local microclimate, can respond to dry periods with several leaf flushes per year. Currently, seven species of Annona and one hybrid species are grown commercially (Annona cherimola Mill; A. diversifolia Saff.; A. montana Macfad.; A. muricata L.; A. reticulata L.; A. senegalensis Pers.; A. squamosa L. and A. atemoya Mabb. (=A. squamosa  A. cherimola)). These are the only species that are under commercial cultivation (or development) and which hold immediate potential economic promise. However, there are other species within Annona that have drawn attention and many more that have so far not been studied at all. Examples of the former are A. purpurea Moc. and Sassé (Gauthier and Poole 2003), A. scleroderma Saff. (Uphof, 1959), both from Mesoamerica, and A. crassiflora Mart. from Brazil (Almeida et al. 1998).

J. S. Strijk et al.

In terms of economic value as well as advanced production practices and crop yields, the most important species are A. cherimola, A. muricata, and A. squamosa. These three species are almost exclusively sold for consumption but have so far not expanded beyond regional and national markets (Pinto et al. 2005). Major limitations are (still) biological (sensitivity of the crop in environments outside its natural range), technological (poorly developed agribusiness and cultivation framework), and scientific (lack of facilities and finance for crop improvement research). Soursop (A. muricata L.) originated in the Neotropics and is now cultivated throughout the tropics. Typically, it occurs as a small evergreen tree although it can be semi-deciduous in certain regions. Branches are hairy when young, leaves are oblong to oval, with a glossy green surface (Fig. 9.1a), and flowers simple, with green sepals and thick yellowish petals (Fig. 9.1b–c). Fruits are ovoid, dark green (but lighter when ripe), and tuberculate (Fig. 9.1d) and can be up to 30 cm long, with numerous pointed protuberances on the outer skin. Soursop fruits are the largest in the genus Annona and can weigh up to 10 kg. Its flesh is juicy, acidic, whitish, and aromatic, the flavour resembling a mix of pineapple and mango (Pinto et al. 2005). Phytochemical studies have shown the flesh to contain significant amounts of vitamins (e.g. vitamin C, vitamin B1 and B2), but also the neurotoxic annocianin. Due to their pharmacological properties, fruits, leaves, and seeds have been long used to treat a wide range of ailments (purported effects in literature list anti-microbial, -leishmanial, -hyperglycaemic, -parasitic, -inflammatory, -neuralgic, -rheumatic) (Pinto et al. 2005). More recently, research has turned to investigating the potential of using compounds extracted from parts (bark, wood, pulp, and leaves) to treat specific carcinogenic cell lines (Liu et al. 2016; Najmuddin et al. 2016; Artika et al. 2017; Gavamukulya et al. 2017).

9

The Soursop Genome (Annona muricata L., Annonaceae)

Fig. 9.1 Description of Annona muricata and its genomic landscape. Top: a leaves; b mature flower; c mature fruit. Bottom: Circular representation of the chromosome organization of Annona muricata, with genomic features indicated from outer to inner layers in sequence windows of 200 kb; d structural organization of the chromosomes arranged by size, indicated in Mb; e loci density from Couvreur et al. 2019; f GC deviation; g GC content (percentage); h gene breadth (i.e. the percentage

153

of the sequence window occupied by coding regions) heatmap; i gene density (i.e. the number of genes found in one sequence window) histogram; j TE protein breadth heatmap; k TE protein density histogram; l transposon breadth heatmap; m transposon density histogram. In (i), (k), and (m), values above and below the mean are indicated in green and red, respectively. Taken from Strijk et al. (2021) under the terms of the Creative Commons Attribution Licence

154

9.2

J. S. Strijk et al.

Research Scope and Methodological Approach

Our sequencing and analytical strategy centred around achieving three main goals: (1) undertake comparative intergenomic analyzes in magnoliids and reconstruct the relationships amongst the three major lineages of angiosperms; (2) explore gene tree incongruence patterns during early angiosperm evolution; and finally, (3) establish a high-quality genome resource for Annona crop improvement and stimulate further development of crop resources in the pantropical Annonaceae family.

9.2.1 Genomic DNA Extraction, Illumina Sequencing, and Genome Size Estimation Fresh leaves were collected from one individual of A. muricata in the living collections of Xishuangbanna Tropical Botanical Garden (Menglun, China) and frozen on site. We extracted high-quality genomic DNA from freshly frozen leaf tissue using the Tiangen Plant Genomic DNA Kit. Following purification, we constructed a short-insert library (300–350 bp) and sequenced this using the Illumina HiSeq 2500 platform (Illumina Inc., San Diego, CA, USA), resulting in a total of *65.47 Gb of raw data. Adapters were removed from the raw reads, and we screened for reads from non-nuclear origin (e.g. chloroplast, mitochondrial, bacterial, and viral sequences) using megablast v2.2.26 on the NCBI nr database (http://www.ncbi.nlm.nih. gov). Using a previously published script (Strijk et al. 2019a), we removed duplicated read pairs and filtered low-quality reads, resulting in *65 Gb of clean data available for genome size estimation (Strijk et al. 2021). Using the formula —“genome size = (total number of 17-mer)/(position of peak depth)”—we obtained a size estimate of 799.11 Mb. Finally, we built an additional library (250 bp insert size), sequencing it as above, and combining it with the 350 bp

library to generate approximately 900 million reads to provide a first estimation of the GC content, heterozygosity rate and repeat content for Soursop (Strijk et al. 2021).

9.2.2 Library Preparation and Sequencing for PacBio, 10X Genomics and BioNano A 20 kb insert size PacBio library was built as previously described (Strijk et al. 2019a, 2021) and sequenced on the PacBio RS II platform (Pacific Biosciences, Menlo Park, CA, USA). Sample preparation, indexing, and barcoding were performed using a GemCode Instrument (10X Genomics, Pleasanton, CA, USA), using very high-molecular weight DNA (>50 kb). DNA was sheared into 500 bp for fragments constructing libraries and then sequenced on an Illumina HiSeq X platform (Illumina Inc., San Diego, CA, USA). Using the Irys platform (BioNano Genomics, San Diego, CA, USA), we also constructed a BioNano optical map using the same DNA, generating 95.9 Gb data (Strijk et al. 2021).

9.2.3 Denovo Genome Assembly, 10X and Optical Scaffolding We used ALLPATHS-LG (Gnerre et al. 2011) to obtain a preliminary assembly of A. muricata (scaffold N50 size of 19,908 kb and corresponding contig size of 8.26 kb), applying PBjelly (English et al. 2012) to fill gaps with PacBio data (see Strijk et al. 2021 for analytical settings applied). For the input BAM file, we used BWA (Li and Durbin 2009) to align all the Illumina short reads to the assembly and SAMtools to sort and index, creating a second assembly with a contig N50 of * 700 kb. We then used fragScaff (Adey et al. 2014) to generate scaffolds from this assembly using the optical map (95.9 Gb– 120.01x) and 10X Genomics data (180.04 Gb–

9

The Soursop Genome (Annona muricata L., Annonaceae)

225.30x) using default settings. We assessed the quality of the soursop assembly by re-mapping the Illumina reads (Strijk et al. 2021).

9.2.4 Hi-C Scaffolding Two Hi-C libraries were constructed and sequenced on an Illumina NovaSeq platform (PE 150 bp), following the analytical protocol outlined elsewhere (Strijk et al. 2021).

155

were identified using tRNAscan-SE (Lowe and Eddy 1996) using eukaryote parameters. rRNA fragment prediction was performed by alignment with Arabidopsis thaliana and Oryza sativa template rRNA sequences, using BlastN (Camacho et al. 2009). miRNA and snRNA gene prediction were done using INFERNAL (Nawrocki and Eddy 2013) by mining the Rfam database (Nawrocki et al. 2015).

9.2.5 Repeat Sequence Detection

9.2.6 RNA Sequencing and Transcriptome Assembly

Identification of transposable elements (TE) in the assembly (at DNA and protein levels) was done using RepeatModeler (Smit and Hubley 2008) to develop a denovo transposable element library. This was followed by applying RepeatMasker (Smit et al. 2017) for DNA-level identification using Repbase and the new denovo transposable element library. At protein levels, RepeatProteinMask was used to conduct WU-BLASTX searches (Camacho et al. 2009) against the TE protein database, merging overlapping elements belonging to the same repeat type. tRNA genes

Total RNA was extracted from leaves, flowers, bark, and fruits (Table 9.1) using the RNAprep Pure Plant Kit, removing DNA contamination with RNase-Free DNase I (Tiangen, China). RNA integrity was evaluated on a 1% agarose gel, assessing quality-quantity with a NanoPhotometer spectrophotometer (IMPLEN, Munich, Germany) and Agilent 2100 Bioanalyzer (Agilent Technologies, Waldbroon, Germany). RNASeq libraries were constructed using the NEBNext mRNA Library Prep Master Mix Set for Illumina (New England Biolabs, Beverly, MA,

Table 9.1 Sequencing strategy and statistics for the A. muricata genome assembly and annotation Step

Technology

Tissue

Insert size

Genome assembly

Illumina reads

Leaves

250 bp

65.96

82.54

350 bp

65.47

81.93

PacBio reads

Leaves

10X

Leaves

Bionano

Leaves

20 kb

Total

Bases generated (Gb)

Sequence coverage (X)

36.95

46.24

180.04

225.3

95.9

120.01

444.32

556.02

Chromosome scaffolding

Hi-C

Leaves

N.A

66.17

N.A

Genome annotation

Illumina reads

Flowers (several developmental stages)

350 bp

5.52

N.A

Young fruit

350 bp

9.93

N.A

Ripening fruit

350 bp

5.73

N.A

Bark

350 bp

4.80

N.A

Leaves

350 bp

5.04

N.A

Total

25.51

156

J. S. Strijk et al.

USA) following details elsewhere (Strijk et al. using a repeat-masked genome sequence 2021). PCR products were purified (AMPure XP (Fig. 9.2 and for details, see Strijk et al. 2021): system, Beckman Coulter Inc., Indianapolis, IN, USA), and library quality was assessed using the (1) Structural annotation of protein coding genes Agilent Bioanalyzer 2100 system. Libraries were and domains was performed by aligning the sequenced on an Illumina HiSeq 2000 platform protein sequences of the soursop against a (Illumina Inc., San Diego, CA, USA) resulting in representative set of angiosperm taxa: 100-bp paired-end reads, filtering out those Amaranthus hypochondriacus, A. trichopoda, containing undetermined bases (“N”) or excesAquilegia coerulea, A. thaliana, Coffea sive numbers of low-quality positions (>10 canephora, Musa acuminata, Nelumbo nucipositions with quality scores 99.92% of the

Length Contiga (bp)

Number Scaffold (bp)

Contiga

Scaffold

2066

949





1990

873

Assembly v1 (Illumina + PacBio + 10X + BioNano) Total

652,885,881

656,774,640

Max

4,254,538

20,459,086

Number  2000





N50

784,561

3,429,555

250

52

N60

632,116

2,673,626

342

73

N70

483,912

2,112,119

459

101

N80

346,983

1,573,287

618

137

N90

207,456

856

189

964,101

Assembly v2 (Assembly v1 + Hi-C) Total

652,885,881

656,813,740

2262

755

Max

4,254,538

122,620,176





Number  2000

a





2186

679

N50

743,350

93,205,713

264

3

N60

578,736

89,409,058

364

4

N70

451,341

85,026,703

492

5

N80

320,782

69,840,041

665

6

N90

184,498

60,483,854

929

7

Contig after scaffolding

160

J. S. Strijk et al.

genome. Overall, a total of 99.81% of the genome was covered with a depth >20 which indicates a high accuracy of the assembly for SNPs detection. SNP calling on the final assembly yielded a heterozygosity rate of 0.032%, lower than 0.08% as estimated by the K-mer analysis (see Fig. S1 in Strijk et al. 2021). A “v2 assembly” (obtained by using Hi-C scaffolding to improve the v1 assembly) was then created to obtained to produce a chromosomelevel assembly for the soursop (see Tables 9.2 and 9.3 for assembly statistics and genome information). Final properties of the soursop genome assembly are summarized in Tables 9.2 and 9.4 (see also Strijk et al. 2019b, 2021).

9.3.2 Repeat Sequences in the Soursop Genome Repeats accounted for 54.87% of the genome, an intermediate value compared to Cinnamomum

(Lauraceae, 48%) and Liriodendron (Magnoliaceae, 63.81%). Long terminal repeat (LTR) retrotransposons were the most abundant TE, representing 41.28% of the genome (56.25% in Liriodendron), followed by DNA repeats (7.29%) (Fig. 9.3b, Table 9.5). The Cinnamomum genome exhibited a different balance between types, with LTR (25.53%) and DNA transposable elements (12.67%) being less dominant. No significant recent accumulation of LTRs and LINEs was found in the interspersed repeat landscape, but a concordant accumulation around 40 units was detected (Fig. 9.3c). Assuming a substitution rate similar to Liriodendron (1.51  10–9 subst./site/year), we estimate this burst of transposable elements to have occurred 130–150 Ma ago. By far the main contribution to this old expansion of repeat copy numbers was the LTRs, with an increase of up to approximately 1% at 42 units. We identified 1201 microRNA, 560 transfer RNA (tRNA), 315 ribosomal RNA (rRNA), and 3198 small nuclear RNA (snRNA) genes (Table 9.6).

Table 9.3 Estimates of genome size and properties Step Genome size

799.11 Mb

Genome heterozygosity

GenomeScope

Assembly size (Illumina + SOAPdenovo2)

Contig N50

Scaffold N50

654 Mb

595.5 Mb

8,258 bp

19,908 bp (total length 620.3 Mb)

Cluster number

Sequence length (bp)

0.08%

Repeat content

59.76%

36.20%

GC content

350 bp

35.46%

250 bp

37.64%

Table 9.4 Chromosome properties of the v2 assembly

Hi-C assembly

Chromosome name

Hic_asm_0

Amur4

49

89,409,058

Hic_asm_1

Amur1

68

122,620,176

Hic_asm_2

Amur3

57

93,205,713

Hic_asm_3

Amur2

75

118,991,926

Hic_asm_4

Amur7

34

60,483,854

Hic_asm_5

Amur5

62

85,026,703

Hic_asm_6

Amur6

53

69,840,041

9

The Soursop Genome (Annona muricata L., Annonaceae)

Fig. 9.3 Properties of the soursop genome and comparative analyzes with other angiosperms. a Effective population size history inferred by the PSMC method (black line), with one hundred bootstraps shown (red lines). b Distribution of repeat classes in the soursop genome. c Divergence distribution of transposable elements in the genome of Annona muricata. Both Kimura substitution level (CpG adjusted) and absolute time are given. d Venn diagram of shared orthologous gene families in Amborella trichopoda, Arabidopsis thaliana, Nelumbo nucifera, Oriza sativa, and Annona muricata, based on the presence of a representative gene in at least one of the grouped species. Numbers of clusters are provided in the intersections. e Coalescent tree of the dataset comprising the three magnoliid genomes plus Amborella and representative of eudicots (Arabidopsis) and monocots (Oriza), based on 1578 orthologs and ASTRAL-III reconstruction. Number of CAFEreconstructed gene family variation are shown on the

161

branches (green: expansion; red: contraction; blue: rapid changes). Major annotations experiencing rapid expansion on magnoliids branches are pictured (see main text for details). f Number of families (vertical axis) according to the number of orthologs (horizontal axis) found in the genome of Annona (red), Liriodendron (green), and Cinnamomum (blue) for families containing one single ortholog in Amborella. g Number of families (vertical axis) according to the number of orthologs (horizontal axis) found in the genome of Annona (red) and Cinnamomum (blue) for families containing two orthologs in Liriodendron. h Number of families (vertical axis) according to the number of orthologs (horizontal axis) found in the genome of Annona (red) and Liriodendron (green) for families containing one single ortholog in both Amborella and Cinnamomum. Panels (a)–(c) taken from Strijk et al. (2021) under the terms of the Creative Commons Attribution Licence

162

J. S. Strijk et al.

Table 9.5 Classification of TE content Denovo + Repbase

TE proteins

Length (bp)

Length (bp)

% in Genome

Combined Tes % in Genome

DNA

45,148,198

6.87

5,297,899

0.81

LINE

14,675,055

2.23

9,904,133

1.51

0.06

0

SINE

386,440

LTR

264,918,040

Unknown Total

40.33

36,644,775

5.58

341,061,737

51.93

0.00

74,949,732 0 89,835,088

11.41

Length (bp)

% in Genome

4,785,0849

7.29

20,519,192

3.12

386,440

0.06

271,141,308

41.28

0.00

36,644,775

5.58

13.68

353,504,508

53.82

Table 9.6 Non-coding RNA content of the soursop genome Type

Copy (w*)

Average length (bp)

Total length (bp)

% of the genome

miRNA

1201

154.69

185,787

0.0283

tRNA

560

75.14

42,078

0.0064

rRNA

315

279.10

87,915

0.0134

snRNA

3198

107.33

343,247

0.0523

Table 9.7 Characteristics of the annotated genes in 10 angiosperm species Average exon length (bp)

Average intron length (bp)

4.79

231.72

1082.04

1173.56

5.15

227.95

783.29

2833.22

1141.6

4.66

245.19

462.69

1947.90

1230.62

5.57

220.87

156.90

1038.4

Species

Number

Average gene length (bp)

Average CDS length (bp)

Annona muricata

23,375

5208.31

1109.49

Amaranthus hypochondriacus

23,033

4422.82

Aquilegia coerulea

30,023

Arabidopsis thaliana

35,386

Average exons per gene

Musa acuminata

36,519

3593.53

5.41

191.94

579.39

Amborella trichopoda

27,313

5607.06

944.87

4.06

232.72

1523.51

Coffea canephora

25,574

3188.40

1205.55

5.10

236.22

483.20

Oryza sativa

35,679

2165.58

991.55

3.78

262.57

422.87

Vitis vinifera

29,927

4728.63

1095.81

4.75

230.72

968.88

Nelumbo nucifera

24,613

10,542.46

1334.94

5.50

242.68

2045.70

9

The Soursop Genome (Annona muricata L., Annonaceae)

Table 9.8 Overview of annotated genes per database

#Database

163

Annotated number

Annotated per cent (%)

19,412

83.0

NR Swiss-Prot

16,237

69.5

KEGG

15,190

65.0

Interpro-All

22,639

96.9

Interpro-Pfam

15,762

67.4

Interpro-GO

20,595

88.1

Annotated

22,769

97.4

Total

23,375



9.3.3 Genes Involved in Soursop Defence and Disease Resistance We identified 23,375 genes, 21,336 of them supported by at least two of the three predictive methods, with an average coding region length of 1.1 kb and 4.79 exons per gene, similar to other angiosperms (Table 9.7). We assessed both the quality of our gene predictions and completeness of our assembly using BUSCO (Simão et al. 2015) and CEGMA (Parra et al. 2007). 231 CEGs genes (93.15%) and 899 (94%) of the BUSCO orthologous single copy genes were retrieved from the soursop assembly. 22,769 (97.4%) genes were annotated through Swiss-Prot and TrEMBL, and GO terms were retrieved for 20,595 (88.1%) genes (Table 9.8). Comparing gene content in Annona with that of C. kanehirae, we found a striking difference in diversity of resistance genes. Of 387 resistance genes in Cinnamomum, 82% were nucleotidebinding site leucine-rich repeat (NBS-LRR) or with a putative coiled-coil domain (CC-NBSLRR). By contrast, the soursop genome contains a similar number of resistance genes (301 annotations), but only 0.66% (2 genes) of them are NBS-LRR or CC-NBS-LRR genes. These results suggest the presence of different evolutionary strategies within magnoliids with respect to pathogen resistance (Strijk et al. 2019b, 2020). We identified 77 genes putatively under positive selection (p-value < 0.01, FDR < 0.05). We

identified the 10 most enriched gene families and retrieved their GO terms. Two families have GO terms, and none of these families have a defined KEGG pathway.

9.3.4 Historical Fluctuations in Population Size of Annona muricata Our data show that A. muricata exhibits heterozygous and homozygous SNP ratios of 0.0032% and 0.0001%, respectively. This very low level of heterozygosity is usually found in cultivated species that experienced strong bottlenecks during domestication (Eyre-Walker et al. 1998; Doebley et al. 2006; Zhu et al. 2007). Our PSMC analysis showed that this was not the result of an intense, recent decrease in population size, but rather, by a slow continuous reduction over a protracted period (Fig. 9.3a). This contraction is compatible with the Quaternary shrinking of tropical regions in several parts of the world and suggests that the soursop may have been severely affected by climate changes, resembling patterns found in many other tropical taxa (Barlow et al. 2018). Although not caused by an anthropogenic domestication-induced bottleneck, the very low levels of heterozygosity in soursop could complicate efforts of genetic improvement of the crop and would likely require outcrossing with wild relatives (Zamir 2001).

164

J. S. Strijk et al.

9

The Soursop Genome (Annona muricata L., Annonaceae)

b Fig. 9.4 Patterns of incongruence in the phylogenetic

signal in the genome of Annona muricata. Top: a Left panel: using 689 orthologous, loci found in 12 angiosperm species. Right panel: using 2426 orthologous, loci found in 4 species representative of major clades in angiosperms. For each of the topologies shown in the centre panel, the genes supporting this topology are sorted by the support of the nodes indicated by stars. The GO terms associated with each category of support for each topology are indicated as pie-chart below the histogram. Background GO terms distribution for 689 loci (left), 2426 loci (right), and the total annotated genes in A. muricata (centre) is shown below the graph. Topology supported by the concatenated 689 loci, coalescent analysis of 689 loci, and by both the concatenated and coalescent 2426 loci are highlighted in solid blue, dashed blue, and solid pink, respectively. Bottom: b summary of the supportive evidence for each tested topology. N.S.: non-significant

9.3.5 Mapping of Annona Genes from Hybridization Capture Analyzes Loci obtained using targeted enrichment of nuclear genes from the study of Couvreur et al. (2019) were mapped to the v2 assembly, and we superimposed their position and density onto the circular chromosome map (Fig. 9.1e) using circos 0.69–9 (Krzywinski et al. 2009). 2328 regions with gene coverage higher than 30 were identified across the genome, and mapping was significantly lower in the regions with high numbers of repeat sequences (Strijk et al. 2019b, 2021).

9.3.6 Coalescent Phylogenomics in Annonaceae and Early Angiosperms To infer phylogenetic relationships amongst major clades of angiosperms, we compared the soursop genome with the genomes of eleven other species. We included A. trichopoda, the putative sister lineage to all extant angiosperms, selected representatives from monocots (O. sativa, M. acuminata), and key lineages of eudicots, including Ranunculales (A. coerulea), Proteales (N. nucifera), superrosids (V. vinifera, Q. robur, A. thaliana), and superasterids (A.

165

hypochondriacus, H. annuus, C. canephora). We used all-against-all protein sequence similarity searches with OrthoMCL to identify 398,668 orthologs with at least one representative in angiosperms. 672 of these were unique to A. muricata, and 8614 were found to be shared with the four species selected as representatives of other main lineages of angiosperms (see Fig. 9.3d) (Strijk et al. 2019b). To investigate patterns of incongruence and support in the reconstructed phylogeny of the main angiosperms lineages, we focussed on testing three main hypotheses and corresponding topologies: (1) magnoliids sister to eudicots + monocots [(Annona, (eudicots, monocots), hereafter referred to as SAT]; (2) magnoliids sister to eudicots [(monocots, (Annona, eudicots)), SMT)]; and (3) magnoliids sister to monocots [(eudicots, (Annona, monocots), SET]. We assessed conflicting signals in the genome of the soursop (Fig. 9.4), using three datasets. Firstly, we used a set of 689 orthologs identified in the 12 species for comparative analyzes as described above, maximizing the number of species included. Secondly, we used a set of 2426 orthologs identified in the quartet (A. muricata, A. thaliana, A. trichopoda, and O. sativa), maximizing the number of loci. Finally, a set of 1578 orthologs identified in the previous quartet, plus C. kanehirae and L. chinense, was used to maximize the magnoliid representation. Using the set of 2426 orthologs and singlegene ML phylogenetic reconstructions, we show that the SMT topology found using Cinnamomum as a representative of magnoliids is only supported by * 16.8% of the genes (29) with a clear evolutionary signal (SH-like values > 0.95 for one topology) (SET: 16.27%—28 genes; SAT: 66.9%—115 genes). When taking into account the slightly weaker phylogenetic signal (SH-like values > 0.70), the majority (54.3%) of the 1228 genes supported SAT (SET: 23%; SMT: 22.5%). Assuming gene tree differences are the result of coalescent stochasticity, we performed coalescence-based analyzes (STAR, NJst, MP-EST, ASTRAL-III) of the three datasets to reconstruct a species tree. Using 689 loci (12 taxa), all three coalescent methods (STAR,

166

NJst, MP-EST) retrieved a SET topology, whereas both the 2426 loci (4 representative taxa) and the 1578 loci dataset (4 representative taxa + 2 published magnoliids) retrieved a SAT topology. Interestingly, in both the 689 and 2426 loci datasets, the branch supporting the position of Annona in the trio Annona-monocots-eudicots is very short, suggesting that divergence of the major clades occurred within a short time frame (Strijk et al. 2019b). The rise of angiosperms during the Cretaceous was likely triggered by their interaction with pollinators and the early onset of morphological adaptations in the group (e.g. reproductive and vegetative parts), suggesting that the underlying molecular functions (such as those linked to flowering processes or adaptation to insect herbivory) could have quickly diversified in response and could thus reflect diversification patterns and ecological adaptation, instead of phylogeny. To identify potential functions triggering bias in the evolutionary signal contained in the soursop genome, we compared the GO terms annotations for molecular functions in the genes supporting the different topologies (SHlike support > 0.95) highlighting different trends in functional annotations according to the topology they supported (P-value < 0.001, Friedman rank sum test). Briefly, differences amongst topologies were significant only for catalytic activity (p < 0.005, Kruskal–Wallis rank sum test) and receptor activity (p < 0.05). We found the “receptor activity” genes were over-represented (p < 0.005) in the SMT topology (when considering genes giving topologies without support (0 < SH < 50) relative to the background). To investigate the genomic landscape of incongruent phylogenetic signals, we compared scaffold features in terms of ortholog number and density (Fig. 9.4) for the 2426 loci dataset. The 181 scaffolds containing orthologs (19%) included an average of *22 orthologous coding regions (max = 799, found in the scaffold_1), with a density of 1.33  10–2 orthologs per kb. No correlation was found between the length of the scaffold and the proportion of genes supporting a given topology, excluding a potential

J. S. Strijk et al.

bias towards a given topology during assembly. However, the median of the proportion of genes in a scaffold supporting the SAT was close to 23%, instead of 4% for both the SMT and SET. Considering a given topology, density was one or two orders of magnitude lower, the genes supporting the SET, the SMT, and SAT hypotheses being found with densities of 5.09  10–4, 2.13  10−4, and 6.34  10–4 genes per kb, respectively. Compared to the density of all orthologs found in each scaffold (i.e. background density), the highest relative density was found in the genes supporting the SAT (¼ of the background density), followed by the SMT and SET hypotheses, with 13.9% and 10.6%, respectively (Strijk et al. 2019b). We generated heatmaps of the occurrences of orthologs and showed that distribution of orthologs is uneven across scaffolds: scaffolds rich in orthologs supporting a given topology did not contain a significant number of orthologs supporting a conflicting topology. Most of the scaffolds contained few orthologs supporting a given topology, with only a few scaffolds showing a high topologysupporting ortholog density. Altogether, our results strongly support the magnoliids as sister to a clade containing (eudicots + monocots), i.e. the SAT topology (Fig. 9.4).

9.3.7 Gene Family Expansion in Annona muricata We compared gene content in Annona with that found in C. kanehirae and found a striking difference in diversity of resistance genes. Of 387 resistance genes in Cinnamomum, 82% were nucleotide-binding site leucine-rich repeat (NBSLRR) or with a putative coiled-coil domain (CCNBS-LRR). By contrast, the soursop genome contains a similar number of resistance genes (301 annotations), but only 0.66% (2 genes) of them are NBS-LRR or CC-NBS-LRR genes. These results suggest the presence of different evolutionary strategies within magnoliids with respect to pathogen resistance. We explore the expansion of gene families in magnoliid lineages by adding Cinnamomum and Liriodendron to the

9

The Soursop Genome (Annona muricata L., Annonaceae)

quartet (A. muricata, A. thaliana, A. trichopoda and O. sativa) (Fig. 9.3e). GO terms from annotations of these gene families show that the lineage of Annona experienced a fast expansion of the MAD1 protein family (+6 copies), involved in flowering time (GO:0009908) and cold adaptation—(GO:0009409) and XXX metabolism through mitochondrial fission (GO:0,000,266), regulation of transcription (GO:0006383, GO:0006366, GO:0045892), and organism development (GO:0007275). Half of the expanded gene families with annotations in the branch of Magnoliales (Liriodendron, Annona) were involved in disease or pathogen resistance (Strijk et al. 2019b, 2020). On the contrary, gene families experiencing fast expansion on the magnoliids branch ((Liriodendron, Annona), Cinnamomum) were mainly involved in growth functions, for example, cell wall biogenesis (GO:0042546), membrane fission (GO:0090148) and metabolism of peptides (GO:0006518), proteins (GO:0046777) or mitosis cytokinesis (GO:0000281). Notably, gene family expansion is consistently lower along internal branches (approximately, 1/10th of the gene family expansion is found on their sister branches) (Strijk et al. 2019b, 2020).

9.3.8 Evolutionary Incongruence and WGD during Early Angiosperm Divergence Whole genome duplications (WGD) are suspected to be a significant factor in the rapid diversification of angiosperms (Tank et al. 2015; Vamosi et al. 2018). Ks distribution of both the soursop paralogs and synteny analysis using iadhore 3.0 and SynMap as implemented in CoGe (Lyons and Freeling 2008; Lyons et al. 2008a, b) did not reveal any obvious pattern of recent tandem or whole genome duplication. However, an old duplication (around 1.5 Ks units) was found in the soursop genome. This contrasts with recent studies in magnoliids (Chaw et al. 2019; Chen et al. 2019), where the authors found WGD with lower Ks values (thus potentially more recent) and hypothesized them to be shared in

167

magnoliids and thus older than the divergence between Lauraceae and Magnoliaceae. To assess whether the events identified in Cinnamomum and Liriodendron correspond to a magnoliids-shared WGD, or to independent events, we compared the syntenic graph of Arabidopsis and each of the magnoliids genomes. In each of the latter, we found evidence of duplicated syntenic blocks (stronger in Liriodendron, but inconclusive in Cinnamomum, depending on whether one or two WGD event occurred). We evaluated the distribution of gene copy numbers amongst magnoliids for gene families with only one copy in Amborella (i.e. the number of copies found in each magnoliid for gene families in which Amborella has only one copy). We found that despite a WGD event reported in Liriodendron and two in Cinnamomum, only the former displayed two gene copies for the majority of the gene families, with Annona and Cinnamomum holding one gene copy for almost all families (Fig. 9.3f). Conversely, Annona and Cinnamomum contained only one copy of almost all the gene families for which Liriodendron contains two copies (i.e. the gene families that show a WGD signal, Fig. 9.3g). For gene families in which only one copy was found in both Amborella and Cinnamomum, both Annona and Liriodendron showed a similar pattern of mainly unique orthologs families, with about half of the families being duplicated in both species (Fig. 9.3h). However, the ratio duplicated/unique occurrence in orthogroups was smaller in Annona (0.43) than in Liriodendron (0.58), suggesting a shared ancestral WGD in magnoliids. We used MCscan to detect syntenic regions in magnoliids and Amborella and detected large one copy portions of the genome in Amborella occurring as duplicates in Liriodendron, as expected from the previous studies (Chen et al. 2019) and our results (Strijk et al. 2019b, 2020). More surprisingly, duplicated syntenic regions in Cinnamomum showed evidence for two rounds of WGD (i.e. four copies of a single Amborella syntenic region), in contradiction with our results above, but according to previous studies. However, after careful evaluation of the synteny graphs, we found limited evidence (i.e.

168

few/shorter duplicated syntenic regions) of WGD in the genome of the soursop compared to Amborella (Strijk et al. 2019b). To characterize more precisely the ages and distributions of these duplication events, we analyzed the Ks distribution curves for the paranomes (i.e. the complete set of paralogs in a genome) of soursop and the two published magnoliids. We used WGD because it has been shown that node-averaged histograms are more accurate than weighted ones to infer ancient WGD (Tiley et al. 2018). Contrary to other magnoliids, the paranome of the soursop did not show the usual acute peak corresponding to newly duplicated genes that are continuously generated by small-scale duplication events (e.g. tandem duplication), but showed an older small-scale duplication event peak. No realistic Gaussian mixture model (i.e. implying < 4 WGD) was selected by either the BIC or AIC criterion, but the DBIC and DAIC favoured 2-component models for Annona and Cinnamomum, with a less clear signal in Liriodendron. In addition to the standard GMM method, we also used the BGMM method and confirmed that fitting more complex Gaussian models only resulted in components of negligible weights. Considering two components for Annona and Liriodendron and two to three components for Cinnamomum (as indicated by the results above), the main peak in Annona was found around 1.3– 1.5 Ks units, whereas it occurred around 0.8 Ks units (two components) or 0.4 and 1.3–1.4 Ks units (three components) in Cinnamomum (partially comparable—for the two components— with results from the previous studies (Chaw et al. 2019), as we did not identify a peak at 0.76). Notably, the oldest peak in Cinnamomum occurred approximately congruently with the peak in Annona, suggesting a potentially shared WGD. We identified a peak at 0.6 Ks units in Liriodendron, compatible with the location of the youngest peak in Cinnamomum, but younger than previously inferred (Chen et al. 2019). Considering the divergence of paralogs in each of the magnoliids, it seems unlikely that they share a common WGD event. Indeed, the pattern of potentially shared WGD (Annona + Cinnamomum * 1.5 Ks units; Liriodendron +

J. S. Strijk et al.

Cinnamomum * 0.5 Ks units) seems incompatible with both the current and our reconstructed hypothesis of magnoliid evolution (Strijk et al. 2019b). By performing one-vs-one ortholog comparisons in magnoliids, we found that the divergence of the Annona and Liriodendron lineage occurred around 0.6–0.7 Ks units, whilst the divergence between Magnoliales and Laurales appeared to be slightly older at 1.0–1.1 Ks units. The divergence of magnoliids (represented by Annona) from Amborella took place at 1.8–1.9 Ks units. This confirms the likely absence of a shared WGD in magnoliids and places the WGD events observed in both Cinnamomum and Liriodendron subsequent to their MRCA with Annona (Strijk et al. 2019b).

9.4

Future Goals and Prospects

This chapter describes the process of obtaining the first high-quality genome assembled for a plant in the Annonaceae—a large pantropical tree family of global ecological importance, describing the implications for this and other economically promising species for the development of novel agricultural, medicinal, and structural products. The A. muricata (soursop) genome provides a vital resource for research on floral morphology diversity, on the early evolution of magnoliids and on the conservation of this tropical tree species. The soursop genome is not only an exceptional resource for the scientific community, but also for breeders of other tropical trees (e.g. avocado, other Annona species, pepper, Magnolia) as it provides novel data on disease resistance and plant defence. Of particular relevance is the positional information inherent in genome data, which is absent from transcriptomes, allowing breeders to use linkage disequilibrium estimation in their programmes (Barabaschi et al. 2016). Increasing availability of high-quality genome assemblies enable far greater insight into challenging phylogenetic problems, such as ancient and rapid diversification events, as epitomized by

9

The Soursop Genome (Annona muricata L., Annonaceae)

the early evolutionary history of angiosperms. The soursop genome provides evidence for the rapid sequential divergence of magnoliids, monocots, and eudicots, with a mosaic of phylogenetic signals across the genome reflecting coalescence and potentially hybridization between closely-related ancestral lineages. This was followed by relative structural stasis since the Jurassic-Cretaceous boundary, with none or very few ongoing small-scale duplications, fewer paralogs than other magnoliids, no significant burst of transposable elements, and few expanded gene families along the branches leading to Annona. To our knowledge, the soursop is the first nuclear genome displaying such extensive signs of “fossilization” [notwithstanding the mitochondrial genome of Liriodendron tulipifera which has been also described as “fossilized” (see Richardson et al. 2013)]. A genome which has retained the original characteristics of the ancestral magnoliid lineage is an invaluable resource for future studies on angiosperm early diversification. The apparent stability of the soursop genome over time is notable given that gene family expansion (such as described by Chaw et al. 2019), increase of transposable elements (Belyayev 2014; Joly-Lopez and Bureau 2018), and WGD events (Hoffmanet al. 2012) are potential triggers of morphological or adaptive key innovations and rapid diversification (Tank et al. 2015; Soltis and Soltis 2019). These aspects raise further questions regarding the origin and evolutionary mechanisms giving rise to the diversity of magnoliids (Sauquet and Magallón 2018). The slow but regular reduction in population size of soursop is compatible with the Quaternary contraction of tropical regions in several parts of the world and suggests that the soursop could be severely affected by climate change, as may other tropical taxa. Contrary to the situation in most crop plants (Eyre-Walker et al. 1998), this reduction did not result from a genetic bottleneck during domestication. The soursop genome is smaller than L. chinense (1.75 Gb) or C. kanehirae (824 Mb), and it displays the same chromosomes number (7) as that reconstructed for the ancestor of

169

angiosperms (Badouin et al. 2017). Crucially, both L. chinense (19 chromosomes) and C. kanehirae (12 chromosomes) show clear signals of lineage specific whole genome duplication (WGD) and chromosomal rearrangement events occurring after their branching from their shared MRCA with monocots or eudicots (see Strijk et al 2019b). The comparative genomic analyzes using three magnoliid genomes (Annona, Cinnamomum, and Liriodendron) confirm some of the findings presented in these other studies but also raise important analytical considerations in such analyzes. Especially, these results suggest that evidence for WGD in other magnoliids represents events that occurred subsequent, not prior, to their divergence, demonstrating the importance of increased representation in phylogenomic analyzes of older lineages to improve our understanding of the early diversification of angiosperms. The soursop genome further highlights the limitations of using one species as a representative for a group as diverse as the magnoliids. Using one species per lineage makes it difficult to distinguish specific and shared WGD, especially in case of ancient events (Tiley et al. 2018). Ks distributions are useful to characterize specific WGD events, but for numbers of WGD events should be interpreted with caution (Tiley et al. 2018). Despite using a more robust method which avoids overfitting of a component-rich lognormal model to our distribution, we did not find clear evidence in favour of a given number of components (i.e. WGD events) in Cinnamomum. A further emerging concern for phylogenomic analyzes is the apparent unfavourable reciprocity between the number of taxa involved and the number of retrieved orthologs. We show that the number of orthologs used to perform phylogenomic reconstruction strongly impacts the retrieved topology, with too few genes also potentially resulting in the reconstruction of erroneous relationships. With the development of new methods for both sequencing (e.g. Nanopore, Hi-C) and analyzes [e.g. paleo-karyotypes (Murat et al. 2017)], it is feasible to obtain high quality, chromosome-scale assemblies that, when

170

combined, allow for addressing more complex evolutionary questions. Our current study is limited by the unknown arrangement of scaffolds relative to each other, hampering our ability to reconstruct a high-resolution genomic comparison of the landscape of soursop with other magnoliids. Early angiosperms divergences also cannot be fully resolved without taking into account the other basal angiosperms lineages, including the elusive Chloranthales. Here, we compared results based on three magnoliids genomes—a very small part of the clade’s diversity. To improve our understanding of relationships within the group and structural rearrangements at and below the level of the genome, it will be vital for future studies to add representatives from other divergent lineages (e.g. Piperales, Cannelales), as well as other lineages in Magnoliales and Laurales. The soursop has several properties that complicate its further development as a major economical crop species. For example, it requires hand pollination as the natural pollination by beetles is uneven, resulting in low fruit set. Secondly, seedlings batches hold significant variation, which affect not only the mature foliage and plant productivity, but also important commercial traits such as fruit size, form, colour, quality, and the number of seeds in the fruit (Pinto et al. 2005). To stabilize the crop and develop varieties with desired traits, further trials and selecting are needed. Thirdly, the fruit contains numerous large seeds which are mildly toxic. Reduction of the number of seeds produced for commercial production varieties would be an important step forward to ensure consumer safety. Ex situ conservation efforts of A. muricata are very much in their early stages and will need to include extensive development of seed storage, in vitro culture, and field gene banking standards. Soursop seeds tolerate desiccation to 5% moisture content, suggesting that seed banking would be possible in conventional seed gene banks under conditions of 18 °C or less, in airtight containers with stable seed moisture contents (Pinto et al. 2005). Nearly, all genetic diversity of cultivated soursop lines is currently housed in a few living

J. S. Strijk et al.

collections. These are vulnerable to stochastic events as well as all the regular threats faced by living specimens in the wild. To ensure conservation of existing genetic diversity as well as to provide genetic resources for future crop improvement, a global strategy for collecting, evaluating, and conserving germplasm needs to be thought out and implemented (Pinto et al. 2005). As with many of the world’s major crop species (e.g. corn, wheat, coffee, cacao), it is vital to set up germplasm resource centres in the crop’s major area of origin to capture and protect as much as the genetic diversity as possible and to allow easy access for local stakeholders and populations. To enable the establishment and long-term success of an Annona-centred germplasm and crop improvement programme, a stable line of financing and commitment is required. The creation of a multinational framework which includes government- and business-level involvement of countries holding important Annona species and local stakeholders (farmers, research institutes, logistic branches) will be a major step forward required to develop many of these promising crops for global consumption. Acknowledgements Genome sequencing, assembly, and annotation were conducted by the Novogene Bioinformatics Institute. We are grateful to Ghent University Botanical Garden for granting access to their living collections.

References Adey A, Kitzman JO, Burton JN, Daza R, Kumar A, Christiansen L, Ronaghi M, Amini S, Gunderson KL, Steemers FJ, Shendure J (2014) In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res 24(12):2041– 2049 Amborella Genome Project (2013) The Amborella genome and the evolution of flowering plants. Science 342(6165):1241089 Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402 Aniama SO, Usman SS, Ayodele SM (2016) Ethnobotanical documentation of some plants among Igala people of Kogi State. Int J Eng Sci 5(4):33–42

9

The Soursop Genome (Annona muricata L., Annonaceae)

Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–796 Arimoto A, Nishitsuji K, Higa Y, Arakaki N, Hisata K, Shinzato C, Satoh N, Shoguchi E (2019) A siphonous macroalgal genome suggests convergent functions of homeobox genes in algae and land plants. DNA Res 26(2):183–192 Artika IM, Julistiono H, Bermawie N, Riyanti EI, Hasan AE (2017) Anticancer activity test of ethyl acetate extract of endophytic fungi isolated from soursop leaf (Annona muricata L.). Asian Pacific J Trop Med 10(6):566–571. Barabaschi D, Tondelli A, Desiderio F, Volante A, Vaccino P, Valè G, Cattivelli L (2016) Next generation breeding. Plant Sci 242:3–13 Barlow J, França F, Gardner TA, Hicks CC, Lennox GD, Berenguer E, Castello L, Economo EP, Ferreira J, Guénard B, Leal CG (2018) The future of hyperdiverse tropical ecosystems. Nature 559(7715):517–526 Bell CD, Soltis DE, Soltis PS (2010) The age and diversification of the angiosperms re-revisited. Am J Bot 97:1296–1303 Boeckmann B, Bairoch A, Apweiler R, Blatter MC, Estreicher A, Gasteiger E, Martin MJ, Michoud K, O’Donovan C, Phan I, Pilbout S (2003) The SWISSPROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31(1):365–370 Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BLAST+: architecture and applications. BMC Bioinform 10(1):421 Castresana J (2002) Gblocks, v.0.91b. Online version available at http://molevol.cmima.csic.es/castresana. Gblocks_server.html Chatrou LW, Pirie MD, Erkens RH, Couvreur TL, Neubig KM, Abbott JR, Mols JB, Maas JW, Saunders RM, Chase MW (2012) A new subfamilial and tribal classification of the pantropical flowering plant family Annonaceae informed by molecular phylogenetics. Botanical J Linnean Soc 1;169(1):5–40 Chaw SM, Liu YC, Wu YW, Wang HY, Lin CY, Wu CS, Ke HM, Chang LY, Hsu CY, Yang HT, Sudianto E (2019) Stout camphor tree genome fills gaps in understanding of flowering plant genome evolution. Nat Plants 5(1):63–73 Chen J, Hao Z, Guang X, Zhao C, Wang P, Xue L, Zhu Q, Yang L, Sheng Y, Zhou Y, Xu H (2019) Liriodendron genome sheds light on angiosperm phylogeny and species–pair differentiation. Nat Plants 5(1):18–25 Collevatti RG, Telles MPC, Lima JS, Gouveia FO, Soares TN (2014) Contrasting spatial genetic structure in Annona crassiflora populations from fragmented and pristine savannas. Plant Syst Evol 300:1719–1727 Couvreur TL, Helmstetter AJ, Koenen EJ, Bethune K, Brandão RD, Little SA, Sauquet H, Erkens RH (2019) Phylogenomics of the major tropical plant family Annonaceae using targeted enrichment of nuclear genes. Front Plant Sci 9:1941

171

De Bie T, Cristianini N, Demuth JP, Hahn MW (2006) CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22:1269–1271 Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127:1309–1321 Dongen SV (2000) Graph clustering by flow simulation. Ph.D. thesis, University of Utrecht Doyle JA, Le Thomas A (1994) Cladistic analysis and pollen evolution in Annonaceae. Acta Botanica Gallica 141(2):149–170 Doyle JA, Le Thomas A (1996) Phylogenetic analysis and character evolution in Annonaceae. Adansonia 18:279–334 Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinform 5(1):113 Edwards SV (2009) Is a new and general theory of molecular systematics emerging? Evolution 63:1–19 Edwards SV, Xi Z, Janke A, Faircloth BC, McCormack JE, Glenn TC, Zhong B, Wu S, Lemmon EM, Lemmon AR, Leaché AD (2016) Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol Phylogenet Evol 94:447–462 El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer EL (2019) The Pfam protein families database in 2019. Nucleic Acids Res 47(D1): D427–D432 English AC, Richards S, Han Y, Wang M, Vee V, Qu J, Qin X, Muzny DM, Reid JG, Worley KC, Gibbs RA (2012) Mind the gap: upgrading genomes with pacific biosciences RS long-read sequencing technology. PLoS ONE 7:e47768–e47768 Eyre-Walker A, Gaut RL, Hilton H, Feldman DL, Gaut BS (1998) Investigation of the bottleneck leading to the domestication of maize. Proc Natl Acad Sci 95:4441–4446 Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39:W29–W37 Foster CSP, Sauquet H, Van Der Merwe M, McPherson H, Rossetto M, Ho SYW (2016) Evaluating the impact of genomic data and priors on Bayesian estimates of the angiosperm evolutionary timescale. Syst Biol 66:338– 351 Freitas L, Mello B, Schrago CG (2018) Multispecies coalescent analysis confirms standing phylogenetic instability in Hexapoda. J Evol Biol 31:1623–1631 Gavamukulya Y, Wamunyokoli F, El-Shemy HA (2017) Annona muricata: is the natural therapy to most disease conditions including cancer growing in our backyard? A systematic review of its research history and future prospects. Asian Pac J Trop Med 10 (9):835–848 Gentry AH (1993) Four neotropical rainforests. Yale University Press, New Haven Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, Walker BJ, Sharpe T, Hall G, Shea TP, Sykes S, Berlin AM (2011) High-quality draft assemblies of

172 mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci 108(4):1513–1518 Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321 Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell CR, Wortman JR (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to assemble spliced alignments. Genome Biol 9(1):R7 Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD (2009) InterPro: the integrative protein signature database. Nucleic Acids Res 37 (suppl_1):D211–D215 IUCN (2020) “Annonaceae”. The IUCN Red List of Threatened Species. Version 2020-2. https://www. iucnredlist.org. Accessed on 7 Sept 2020 Jarvis ED, Mirarab S, Aberer AJ, Li B, Houde P, Li C, Ho SY, Faircloth BC, Nabholz B, Howard JT, Suh A (2014) Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 12;346 (6215):1320–1331 Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780 Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14:R36–R36 Koek‐Noorman J, Westra LT, Maas PJ (1990) Studies in Annonaceae. XIII. The role of morphological characters in subsequent classifications of Annonaceae: a comparative survey. Taxon 39(1):16–32 Kohlhase M (2006) CodeML: an open markup format the content and presentation of program code. Available from: https://svn.omdoc.org/repos/codeml/doc/spec/ codeml Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA (2009) Circos: an information aesthetic for comparative genomics. Genome Res 19(9):1639–1645 Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 Li L, Stoeckert CJ, Roos DS (2003) OrthoMCL: Identification of ortholog groups for eukaryotic genomes. Genome Res 13:2178–2189 Liu N, Yang HL, Wang P, Lu YC, Yang YJ, Wang L, Lee SC (2016) Functional proteomic analysis revels that the ethanol extract of Annona muricata L. induces liver cancer cell apoptosis through endoplasmic reticulum stress pathway. J Ethnopharmacol 189:210–217 Liu S, Hansen MM (2017) PSMC (pairwise sequentially Markovian coalescent) analysis of RAD (restriction site associated DNA) sequencing data. Mol Ecol Resour 17:631–641

J. S. Strijk et al. Lowe TM, Eddy SR (1996) TRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25:955–964 Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J, He G, Chen Y, Pan Q, Liu Y, Tang J (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(1):2047–2117 Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers J, Paterson A, Lisch D, Freeling M (2008a) Finding and comparing syntenic regions among Arabidopsis and the outgroups papaya, poplar, and grape: CoGe with rosids. Plant Physiol 148 (4):1772–1781 Lyons E, Pedersen B, Kane J, Freeling M (2008b) The value of nonmodel genomes and an example using SynMap within CoGe to dissect the hexaploidy that predates the rosids. Tropical Plant Biology 1(3– 4):181–190 Massoni J, Couvreur TLP, Sauquet H (2015) Five major shifts of diversification through the long evolutionary history of Magnoliidae (angiosperms). BMC Evol Biol 15:1–14 Massoni J, Forest F, Sauquet H (2014) Increased sampling of both genes and taxa improves resolution of phylogenetic relationships within Magnoliidae, a large and early-diverging clade of angiosperms. Mol Phylogenet Evol 70:84–93 Mi H, Muruganujan A, Ebert D, Huang X, Thomas PD (2018) PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res 47:D419–D426 Moore MJ, Hassan N, Gitzendanner MA, Bruenn RA, Croley M, Vandeventer A, Horn JW, Dhingra A, Brockington SF, Latvis M, Ramdial J (2011) Phylogenetic analysis of the plastid inverted repeat for 244 species: insights into deeper-level angiosperm relationships from a long, slowly evolving sequence region. Int J Plant Sci 172(4):541–558 Moore MJ, Soltis DE, Burleigh JG, Bell CD, Soltis PS, Bell CD, Burleigh JG, Soltis DE (2010) Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. Proc Natl Acad Sci 107:4623–4628 Murat F, Armero A, Pont C, Klopp C, Salse J (2017) Reconstructing the genome of the most recent common ancestor of flowering plants. Nat Genet 49:490 Najmuddin SU, Romli MF, Hamid M, Alitheen NB, Abd Rahman NM (2016) Anti-cancer effect of Annona muricata L. leaves crude extract (AMCE) on breast cancer cell line. BMC Complement Alternative Med 16(1):311 Nawrocki EP, Burge SW, Bateman A, Daub J, Eberhardt RY, Eddy SR, Floden EW, Gardner PP, Jones TA, Tate J, Finn RD (2015) Rfam 12.0: updates to the RNA family database. Nucleic Acids Res 43 (D1):D130–D137 Nawrocki EP, Eddy SR (2013) Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29:2933–2935

9

The Soursop Genome (Annona muricata L., Annonaceae)

Nickrent DL, Soltis DE (2006) A comparison of angiosperm phylogenies from nuclear 18S rDNA and rbcL sequences. Ann Mo Bot Gard 82:208 Oliver KR, McComb JA, Greene WK (2013) Transposable elements: powerful contributors to angiosperm evolution and diversity. Genome Biol Evol 5:1886–1901 Parra G, Bradnam K, Korf I (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23:1061–1067 Pirie MD, Chatrou LW, Mols JB, Erkens RH, Oosterhof J (2006) ‘Andean-centred’ genera in the short-branch clade of Annonaceae: testing biogeographical hypotheses using phylogeny reconstruction and molecular dating. J Biogeogr 33(1):31–46 Pinto AD, Cordeiro MC, De Andrade SR, Ferreira FR, Filgueiras HD, Alves RE, Kinpara DI (2005) Annona species. International Centre for Underutilised Crops, University of Southampton, Southampton, UK Popenoe W (1919) Batido and other Guatemalan beverages prepared from cacao. Am Anthropol 21(4):403– 409 Price MN, Dehal PS, Arkin AP (2009) Fasttree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 26:1641–1650 Punyasena SW, Eshel G, McElwain JC (2008) The influence of climate on the spatial patterning of Neotropical plant families. J Biogeogr 35:117–130 Qiu YL, Li L, Wang B, Xue JY, Hendry TA, Li RQ, Brown JW, Liu Y, Hudson GT, Chen ZD (2010) Angiosperm phylogeny inferred from sequences of four mitochondrial genes. J Syst Evol 43:391–425 Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R (2005) InterProScan: protein domains identifier. Nucleic Acids Res 33(suppl_2): W116–W120. R Development Core Team (2010) R: a language and environment for statistical computing. Available from: http://www.r-project.org Rainer H, Chatrou LW (2014) AnnonBase: World species list of Annonaceae. Available from: https://www. catalogueoflife.org/col/details/database/id/40 Ramírez-Barahona S, Sauquet H, Magallón S (2020) The delayed and geographically heterogeneous diversification of flowering plant families. Nat Ecol Evol 4:1232–1238 Richardson JE, Chatrou LW, Mols JB, Erkens RH, Pirie MD (2004) Historical biogeography of two cosmopolitan families of flowering plants: Annonaceae and Rhamnaceae. Philos Trans R Soc Lond Ser B: Biol Sci 359(1450):1495–1508. Ruhfel BR, Gitzendanner MA, Soltis PS, Soltis DE, Burleigh JG (2014) From algae to angiospermsinferring the phylogeny of green plants (Viridiplantae) from 360 plastid genomes. BMC Evol Biol 14:23 Sauquet H, von Balthazar M, Magallón S, Doyle JA, Endress PK, Bailes EJ, de Morais EB, Bull-Hereñu K, Carrive L, Chartier M, Chomicki G (2017) The ancestral flower of angiosperms and its early diversification. Nat Commun 8(1):1–10

173

Shaw TI, Ruan Z, Glenn TC, Liu L (2013) STRAW: species TRee analysis web server. Nucleic Acids Res 41:W238–W241 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: Assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinformatics 31:3210–3212 Smit A, Hubley R, Green P (2017) RepeatMasker Open4.0.6. Available from: http://www.repeatmasker.org Smit AF, Hubley R (2008) RepeatModeler Open-1.0. Available from: http://www.repeatmasker.org Smith SA, O’Meara BC (2012) TreePL: Divergence time estimation using penalized likelihood for large phylogenies. Bioinformatics 28:2689–2690 Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, Bell CD (2011) Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 98(4):704–730 Soltis DE, Soltis PS (2019) Nuclear genomes of two magnoliids. Nat Plants 5:6–6 Song S, Liu L, Edwards SV, Wu S (2012) Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc Natl Acad Sci 109:14942–14947 Sonké B, Couvreur T (2014) Tree diversity of the Dja Faunal Reserve, southeastern Cameroon. Biodiversity Data J 2 Strijk JS, Hinsinger DD, Zhang F, Cao K (2019a) Trochodendron aralioides, the first chromosomelevel draft genome in Trochodendrales and a valuable resource for basal eudicot research. GigaScience 8 (11):giz136 Strijk JS, Hinsinger DD, Roeder MM, Chatrou LW, Couvreur TL, Erkens RH, Sauquet H, Pirie MD, Thomas DC, Cao K (2019b) The soursop genome and comparative genomics of basal angiosperms provide new insights on evolutionary incongruence. BioRxiv 1:639153. https://doi.org/10.1101/639153 Strijk JS, Hinsinger DD, Roeder MM, Chatrou LW, Couvreur TL, Erkens RH, Sauquet H, Pirie MD, Thomas DC, Cao K (2021) Chromosome-level reference genome of the soursop (Annona muricata), a new resource for Magnoliid research and tropical pomology. Mol Ecol Resour. https://doi.org/10.22541/au. 159103606.66673541 Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, Brown JW, Sessa EB, Harmon LJ (2015) Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. New Phytol 207:454–467 Tchouto MGP, Yemefack M, De Boer WF, De Wilde JJFE, Van Der Maesen LJG, Cleef AM (2006) Biodiversity hotspots and conservation priorities in the Campo-Ma’an rain forests, Cameroon. Biodiversity Conservation 15:1219–1252 Tiley GP, Barker MS, Burleigh JG (2018) Assessing the performance of Ks plots for detecting ancient whole genome duplications. Genome Biol Evol 10(11): 2882–2898

174 Toussirot M, Nowik W, Hnawia E, Lebouvier N, Hay AE, De la Sayette A, Dijoux-Franca MG, Cardon D, Nour M (2014) Dyeing properties, coloring compounds and antioxidant activity of Hubera nitidissima (Dunal) Chaowasku (Annonaceae). Dyes Pigm 102: 278–284 Vamosi JC, Magallón S, Mayrose I, Otto SP, Sauquet H (2018) Macroevolutionary patterns of flowering plant speciation and extinction. Annu Rev Plant Biol 69:685–706 Wei C, Yang H, Wang S, Zhao J, Liu C, Gao L, Xia E, Lu Y, Tai Y, She G, Sun J (2018) Draft genome sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proc Natl Acad Sci 115(18):E4151–E4158. Wickett NJ, Mirarab S, Nguyen N, Warnow T, Carpenter E, Matasci N, Ayyampalayam S, Barker MS, Burleigh JG, Gitzendanner MA, Ruhfel BR (2014) Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc Natil Acad Sci 11;111(45):E4859–E4868 Yang Z (2007) PAML 4: Phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591 Zamir D (2001) Improving plant breeding with exotic genetic libraries. Nat Rev Genet 2:983–983

J. S. Strijk et al. Zeng L, Zhang Q, Sun R, Kong H, Zhang N, Ma H (2014) Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times. Nat Commun 5:4956 Zhang C, Rabiee M, Sayyari E, Mirarab S (2018) ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees. BMC Bioinform 19:153 Zhang N, Zeng L, Shan H, Ma H (2012) Highly conserved low-copy nuclear genes as effective markers for phylogenetic analyses in angiosperms. New Phytol 195:923–937 Zhang T, Qiao Q, Novikova PY, Wang Q, Yue J, Guan Y, Ming S, Liu T, De J, Liu Y, Al-Shehbaz IA (2019) Genome of Crucihimalaya himalaica, a close relative of Arabidopsis, shows ecological adaptation to high altitude. Proc Natl Acad Sci 116(14):7137–7146 Zhu Q, Zheng X, Luo J, Gaut BS, Ge S (2007) Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: severe bottleneck during domestication of rice. Mol Biol Evol 24:875–888 Zwaenepoel A, Van de Peer Y (2019) wgd—simple command line tools for the analysis of ancient wholegenome duplications. Bioinformatics 1;35(12):2153– 2155

Underutilised Fruit Tree Genomes from Indonesia

10

Deden Derajat Matra, M. Adrian, and Roedhy Poerwanto

Abstract

Indonesia has enormous genetic biodiversity, especially for underutilised tropical fruits. However, these fruits have limited genetic information, such as genome databases. Furthermore, these fruits should be a source of germplasm that can be utilised for crop improvement and new fruits with much more nutritional compounds than others. Since multi-omics technology is one of the powerful tools for life sciences ranging from genomics to metabolomics, generating data is fast and extensive due to continued significant developments. Multi-omics approaches such as genomic, transcriptomic, and metabolomics studies have been carried out on several underutilised fruits. The RujakBase (http:// rujakbase.id/) is a resource to provide a valuable repository for basic, translational, and applied research in underutilised tropical fruits. Recently, RujakBase only collected underutilised Indonesian fruits such as Menteng (Baccaurea motleyana), Nangkadak

D. D. Matra (&)  M. Adrian  R. Poerwanto Collaborative Research Group on Fruits (Fruitomics), Department of Agronomy and Horticulture, Faculty of Agriculture, IPB University, Jalan Meranti, IPB Dramaga Campus, Bogor, West Java 16680, Indonesia e-mail: [email protected]

(Artocarpus nangkadak), Rambutan (Nephelium lappaceum), Salak Sidempuan (Salacca sumatrana), Gandaria (Bouea macrophylla), Lobi-lobi (Flacourtia inermis), Duku (Lansium domesticum), Matoa (Pometia pinnata), Kedondong (Spondias dulcis), Jambu Air (Syzygium samarangense), Sentul (Sandoricum koetjape), Kasturi Mango (Mangifera casturi), and Kura-kura Durian (Durio testudinarius). RujakBase project has been registered to the DDBJ Umbrella BioProject with accession number PRJDB726. Comprehensive omics research for underutilised fruit at this time will impact the progress of underutilised fruit as a potential fruit of the future that can be used for conservation, cultivation, and commercialisation.

10.1

Overview

Indonesia has enormous biodiversity of tropical fruits (Uji 2004). Therefore, Indonesia has many underutilised fruit trees species that can be utilised and have good prospects for future development for food and additional nutrition. This chapter will discuss several fruits that have been intensively used to develop fruit with bioprospecting. Indonesia has significant genetic diversity, especially for underutilised fruits. However, the fruits have limited genetic information. As a result, the fruits are neglected compared to imported fruit which is cheaper and looks more

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_10

175

176

D. D. Matra et al.

attractive in terms of colour, shape, and aroma. Therefore, utilisation of the native fruits needs to be intensified. Furthermore, the enormous biodiversity of fruits should be a source of germplasm that can be utilised for crop improvement and new fruits with much more nutritional compounds than others. However, such genome information of some underutilised fruits is minimal. Since ‘omics technology is one of the powerful tools for life sciences ranging from genomics to metabolomics, it is rapidly advancing for frontier science (Yang et al. 2021). Furthermore, the support of technological tools from each ‘omics field makes generating data fast due to continued significant developments. Therefore, the era of big data in the era of ‘omics is a must that needs to be achieved. In various aspects, ranging from biodiversity to food security, sources of biodiversity are essential. Indonesia is one of the centres of biodiversity. However, human life in the future is likely to be impacted because of biodiversity damage caused by natural disasters and the unwise destruction of tropical rainforests. Therefore, with the ‘omics approach, it is hoped that a significant leap in collecting beneficial information from these genetic resources is expected. The RujakBase (http://rujakbase.id/) is a resource to provide a valuable repository for basic, translational, and applied research in Table 10.1 List of underutilised fruits

Indonesian Tropical Fruits. The name RujakBase originates from Rujak and database combination. Rujak is an Indonesian salad containing sliced fruit and vegetables served with a spicy palm sugar dressing and spices. Recently, RujakBase only collected for underutilised Indonesian fruits (Table 10.1). For scientific purposes, RujakBase has been registered to the DDBJ Umbrella BioProject with accession number PRJDB7265.

10.2

Underutilised Fruits

10.2.1 Menteng; Baccaurea motleyana Müll.Arg. Menteng, also known as rambai (Fig. 10.1), is a fruit-producing tree originating from Kalimantan and planted in Peninsular Malaysia, Sumatra, Java, and Bali (Uji 2004). Menteng belongs to the Phyllanthaceae family and the genus Baccaurea (Lim 2013). The Menteng fruit has fruit characteristics, a sour and sweet taste. Currently, people only consume fresh fruits directly, and no one has used them for storage, which causes Menteng fruit to be less desirable (Hasan et al. 2009). The available ‘omics data for B. motleyana are transcriptomic using RNA-Seq (Matra et al. 2019a) and metabolomic data using GCMS and LCMS (Halim et al. 2019). The de novo assembly

Local name

Scientific name

Bioproject ID

Menteng

Baccaurea motleyana

PRJDB7368

Nangkadak

Artocarpus nangkadak

PRJDB7366

Rambutan

Nephelium lappaceum

PRJDB7348

Salak Sidempuan

Salacca sumatrana

PRJDB7367

Gandaria

Bouea macrophylla

PRJDB8582

Lobi-lobi

Flacourtia inermis

PRJDB7346

Duku

Lansium domesticum

PRJDB7345

Matoa

Pometia pinnata

PRJDB7344

Kedondong

Spondias dulcis

PRJDB8581

Jambu Air

Syzygium samarangense

PRJDB7347

Sentul

Sandoricum koetjape

PRJDB10716

Kasturi Mango

Mangifera casturi

PRJDB10715

Kura-kura Durian

Durio testudinarius

PRJDB10717

10

Underutilised Fruit Tree Genomes from Indonesia

177

Fig. 10.1 White (left) and reddish (right) flesh aril of Menteng

produced 37,077 contigs ranging from 201– 4972 bp, with an N50 of 696 bp. The functional annotation was performed using several databases such as SwissProt (46.70%), TrEMBL (70.93%), nr (69.17%), and nt (61.26%) NCBI databases. Further analysis for B. motleyana to find genes associated with sugar metabolism such as sucrose-phosphate synthase (SPS), sucrosephosphatase (SPP), sucrose synthase (SUS), alkaline/neutral invertase (INV), cytosolic invertase (CINV) is underway (Nurmayani et al. 2021). In addition, the specific secondary metabolites using untargeted-GCMS were analysed and contained in rambai; Decanoic acid, 1Decene, Methyl salicylate, and Stearyl alcohol.

10.2.2 Nangkadak; Artocarpus heterophyllus x Artocarpus integer Nangkadak is a hybrid variety from crosspollination between the mini jackfruit (Artocarpus heterophyllus Lam.) and Cempedak (Artocarpus integer Merr) (Fig. 10.2). This hybrid was produced by the researcher Gregori Garnadi Hambali, a botanist from Mekarsari Fruit Garden, and has been registered as a variety. Nangkadak is a new hybrid species that can become a commercial fruit. The trade of nangkadak seeds or seedlings is still classified on a scale that is not yet wide because many people still do not know much about this fruit. Jackfruit fans are still limited, so trade flow remains relatively low. Information about consumption and

production data is minimal, and there are no official publications. The available ‘omics data for nangkadak are genomic using whole-genome sequencing (Matra et al. 2019b) and metabolomic using GCMS and LCMS (Halim et al. 2019). The genome assembly with Velvet produced 255,526 contigs with N50 of 262 bp. The specific secondary metabolites using untargetedGCMS were analysed and contained in three nangkadak varieties (Fig. 10.2); b-Cyclocitral, 2Furanmethanol, and Linoleic acid. In addition, the genome size data is known to be 516.51 Mbp (Matra et al. 2019b).

10.2.3 Rambutan; Nephelium lappaceum L. Rambutan (Nephelium lappaceum L.; Fig. 10.3) is a tropical fruit that is very popular in Indonesia. This fruit belongs to the family Sapindaceae, commonly found in Asia and Africa (Mahisworo 2004). This fruit is spread across Java, Sumatra, and Kalimantan and is one of the leading fruit commodities in Indonesia. The name rambutan is taken from the morphology of the fruit, which has hair-like skin (Fig. 10.3). Rambutan is a seasonal fruit commonly found from December to March, along with the mango and durian season. According to the Ministry of Agriculture, Director General of Horticulture, there are 22 varieties of rambutan. The available omics data for rambutan are transcriptomic using RNA-seq (Matra et al. 2019c) and metabolomic using GCMS and LCMS (Halim et al. 2019). The de novo assembly

178

D. D. Matra et al.

Fig. 10.2 Appearances of three nangkadak varieties, Bola (left), Jumbo (centre), Super Orange (Right)

produced 36,303 contigs ranging 201–11 770 bp, with an N50 of 1327 bp. The functional annotation was performed using several databases such as SwissProt (59.39%), TrEMBL (83.28%), nr (81.58%), and nt (70.11%) NCBI databases. In addition, the specific secondary metabolites using untargeted-GCMS were analysed and contained in rambutan, Citraconic anhydride, 3,5Dideuteropyridine-4-carboxylic acid, Isobutyl formate, and n-Methyl-D3-Aziridine.

10.2.4 Sidempuan Snake Fruit; Salacca sumatrana Becc. Salacca sumatrana, better known as salak Sidempuan (Fig. 10.4), is a common snake fruit species in the Arecaceae family and is native to North Sumatra. The salak fruit tastes similar to honey with reddish in aril. Salak grows with a palm habit with some stemless and short varieties. It can reach 6 m in height and produce fruit for 50 years on average. This plant usually grows

in lowlands with high humidity (Zaini et al. 2013). The available omics data for Salak Sidempuan are genomic using whole-genome sequencing (Matra et al. 2019b) and metabolomic using GCMS and LCMS data (Halim et al. 2019). The genome assembly with Velvet produced 659,362 contigs with N50 of 501 bp. The specific secondary metabolites from three salak Sidempuan types (Fig. 10.4) using untargeted-GCMS were analysed and contained in 5-Formyl-2-furfurylmethanoate, 2-Methoxy-4vinylphenol, and Tiglic acid. In addition, the genome size data is known to be 1336.34 Mbp (Matra et al. 2019b).

10.2.5 Gandaria; Bouea macrophylla Griffith Gandaria (Fig. 10.5) has a distribution in the Malesia region, an area between the continents of Asia and Australia, including Indonesia, Malaysia, Singapore, the Philippines, Brunei

10

Underutilised Fruit Tree Genomes from Indonesia

179

Fig. 10.3 Appearance of fourteen rambutan varieties in Mekarsari Fruit Garden, Cileungsi, West Java

Fig. 10.4 Appearance of three types of salak Sidempuan with different reddish (left: without reddish colour, centre: half-reddish, right: full-reddish)

Darussalam, and Papua New Guinea (Gower et al. 2012). Gandaria (Ramania) belongs to the Anacardiaceae family. Gandaria has two types, namely, gandaria harang and hintalu. Gandaria

harang has a yellow skin colour with black spots, a slightly small size, and a sweet taste. Gandaria hintalu has a round, large fruit shape, smooth yellow skin colour, and a sweet taste (Saleh

180

D. D. Matra et al.

Fig. 10.5 Appearance of Gandaria fruit; ripe (top) and unripe (bottom)

et al.2007). The available ‘omics data for Gandaria are genomic using whole-genome sequencing (unpublished, Table 10.1). The genome assembly with Ray Assembler produced 451,281 contigs with N50 of 1588 bp. The genome annotation completeness was performed using BUSCO 3.0.2, and identified that 8.1% of the universal single-copy orthologs were complete, 0.7% duplicated, 7.7% fragmented, and 83.5% missing, using the embryophyta_od-b9 database.

10.2.6 Lobi-Lobi; Flacourtia inermis (Burm. f.) Merr. Flacourtia inermis (Fig. 10.6) is a plant species that produces fruit commonly found in many Asian and tropical African regions. The lobi-lobi belong to the family Salicaceae and the genus Flacourtia. F. inermis is known as rukam sour or lobi-lobi because of its sour and slight taste.

Flacourtia originated in Asia and then spread to the United Kingdom via Malaysia. The fruit is also favoured in the Caribbean and used for drinks and desserts (Pelima 2016). The available ‘omics data for F. inermis are genomic using whole-genome sequencing (Matra et al. 2019b) and metabolomic data using GCMS and LCMS data (Umam et al. 2022). The genome assembly with Velvet produced 694,452 contigs with N50 of 650 bp. Based on WGS data, Further analysis for F. inermis to find genes associated with sugar metabolism such as sucrose-phosphate synthase (SPS), sucrose-phosphatase (SPP), sucrose synthase (SUS), alkaline/neutral invertase (INV), cytosolic invertase (CINV) are underway (Silaban et al. 2021). Secondary metabolites with untargeted-GCMS resulted in a total of 34 compounds. A common compound in lobi-lobi is Diethyl malate. A specific compound from each stage; stage 1 has Vaccenic acid compound; stage 2 has 1,3,4,5-Tetrahydroxycyclohexanecarboxylic acid; stage 3 has Diethyl

10

Underutilised Fruit Tree Genomes from Indonesia

181

Fig. 10.6 Appearance of Lobi-lobi fruit maturity stages from 1 to 6 (unripe to overripe)

malate; stage 4 has 2-Ethylhexyl butyrate; stage 5 has 2,4,4-Trimethyl-1-hexene; and stage 6 has Malic acid, respectively.

10.2.7 Duku; Lansium domesticum (Lansium parasiticum (Osbeck) Sahni and Bennet) Duku (Fig. 10.7) is a fruit species that has long been known in Indonesia. This horticultural commodity is thought to be native to Southeast Asia. The duku has white flesh and tastes sweet (Badruzaman et al.2017). Duku belongs to the Meliaceae family. There are five types of commercial duku in Indonesia, namely, Palembang duku, Matesih duku, Sumber duku, Kalikajar duku, and Condet duku (Supriatna 2010; Efendi et al. 2022). The available ‘omics data for duku are genomic using whole-genome sequencing (Matra et al. 2019b; Sari et al. 2020). The genome assembly with Velvet produced 455,010 contigs with N50 of 654 bp. In addition, the

genome size is known to be 622.66 Mbp (Matra et al. 2019b).

10.2.8 Matoa; Pometia pinnata J.R. Forst. and G.Forst. Matoa (Fig. 10.8) belongs to the Sapindaceae family, and is spread across the Andaman Islands, Sri Lanka, southern China, Vietnam, Malaysia, Indonesia, the Philippines, Papua New Guinea, and South Pacific Islands (Lim 2013; Lely 2016). The types of matoa, namely, red, yellow, and green skin matoa. Matoa is a fruit-producing plant that grows in Papua. The available ‘omics data for P. pinnata are genomic using whole-genome sequencing (Matra et al. 2019b). The genome assembly with Velvet produced 176,002 contigs with N50 has 4241 bp. Based on WGS data, further analysis for P. pinnata to find for genes associated with sugar metabolism such as sucrose-phosphate synthase (SPS), sucrosephosphatase (SPP), sucrose synthase (SUS), alkaline/neutral invertase (INV), cytosolic

182

D. D. Matra et al.

Fig. 10.7 Appearances of duku fruit

Fig. 10.8 Appearances of two stages of Matoa fruit development; unripe (top) and ripe (bottom), from left to right; fruit inflorescences, whole fruit, the cross-section of fruit, and seed

10

Underutilised Fruit Tree Genomes from Indonesia

183

invertase (CINV) have been carried out (Agusri et al. 2021). In addition, the genome size data is known to be 468.63 Mbp (Matra et al. 2019b).

3.0.2, and 15.2% of the universal single-copy orthologs were complete, 0.7% duplicated, 8.9% fragmented, and 75.2% missing, using the embryophyta_od-b9 database.

10.2.9 Kedondong; Spondias dulcis L. Kedondong Fruit (Spondias dulcis; Fig. 10.9) is an exotic plant commonly found in the tropics, which belongs to the Anacardiaceae family. Kedondong was introduced to all tropical regions such as Hawaii, Caroline Island, and Jamaica in 1782 (Morton 1987). Kedondong fruit has a green colour and is hard when it is not ripe, and then it will turn yellow when it is ripe. This fruit is often called the Golden Apple (Koubala et al., 2018). The available omics data for Kedondong are genomic using whole-genome sequencing (unpublished, Table 10.1). The genome assembly with Ray Assembler produced 275,439 contigs with N50 has 3493 bp. The genome annotation completeness was performed using BUSCO

Fig. 10.9 Appearances of Kedondong fruit

10.2.10 Jambu Air or Wax Apple; Syzygium samarangense (Blume) Merr. and L.M. Perry Wax Apple or Java Apple (Fig. 10.10) belongs to Myrtaceae, originating from Southeast Asia (Susilo 2014). Wax Apple has a high diversity because the type of pollination is crosspollination, which can occur naturally or by a human. Morphologically, the shape of the leaves and flowers have striking differences in the wax apple (Anggraheni et al. 2019). The available omics data for Wax Apple are genomic using whole-genome sequencing (Matra et al. 2019b). The genome assembly with Velvet produced

184

D. D. Matra et al.

Fig. 10.10 Appearances of Jambu Air Fruit

521,352 contigs with N50 of 672 bp. In addition, the genome size data is known to be 548.33 Mbp (Matra et al. 2019b).

1.5% duplicate, 2.6% fragmented, and 4.4% missing, using the embryophyta_od-b9 database.

10.2.11 Sentul or Kecapi; Sandoricum koetjape (Burm.f.) Merr

10.2.12 Kasturi or Kalimantan Mango; Mangifera casturi Kosterm

Kecapi (Sandoricum koetjape) belongs to the family Meliaceae and the genus Sandoricum and is an important fruit-producing plant (Fig. 10.11). The Kecapi plant originated from Indochina and Western Malaysia and was introduced to Tropical Asia (Heliawati 2018). The available omics data for Kecapi are genomic using whole-genome sequencing (unpublished, Table 10.1). The genome assembly with Ray Assembler produced 31,540 contigs, with N50 having 15,769 bp. The genome annotation completeness was performed using BUSCO 3.0.2 and resulted in 91.4% of the universal single-copy orthologs were complete,

Kalimantan mango (Mangifera casturi) is a distinctive mango plant whose natural habitat is South Kalimantan. Kalimantan Province has 31 mango species or members of the Mangifera family; three of the species are endemic (Darmawan 2015). There are four varieties of Kasturi mango known to the people of South Kalimantan, namely, Kasturi, Cuban/Mawar, Asem Pelipisan/Palipisan, and Pinari (Matra et al. 2021; Fig. 10.12). The available ‘omics data for Kasturi are genomic using whole-genome sequencing (Matra et al. 2021) and metabolomic using GCMS data (Zulfina et al. 2021). The genome

10

Underutilised Fruit Tree Genomes from Indonesia

185

Fig. 10.11 Appearances of Kecapi Fruits, whole fruit (top), and cross-section of fruit (bottom)

Fig. 10.12 Appearance of four Kasturi varieties and a related species. From left to right, Pinari, Pelipisan, Kasturi, Cuban, Rawa-rawa (M. similis)

assembly with Ray Assembler produced 259,872 contigs with N50 of 1440 bp. The genome annotation completeness was performed using BUSCO 3.0.2, and identified that 42.3% of the universal single-copy orthologs were complete, 2.5% duplicated, 16.7% fragmented, and 38.4% missing, using the embryophyta_od-b9 database.

Secondary metabolites with untargeted-GCMS resulted in a total of common compounds found in the five types (Fig. 10.7) are 5hydroxymethylfurfural, ethyl palmitate, and palmitic acid. For example, the specific compounds in Cuban are 1,5-diazabicyclo [4.3.0] non-5-ene and benzenamine, 3-fluoro-4-methyl-.

186

10.2.13 Durian Kura-Kura; Durio testudinarius Becc Kura-Kura Durian (Durio testudinarius, synonym Durio testudinarum Becc.; Fig. 10.13) is a wild species belonging to the Malvaceae family that grows endemic to Borneo. This species grows in lowlands, wet tropics, at altitudes up to 700 m above sea level, in fertile and well-drained soil (Tropical Plants Database 2020). in Kalimantan, 19 of 27 durian species (Durio spp.) are found, 14 of which are endemic. Several types of fruit can be eaten (Kostermans 1958; Uji 2004;

D. D. Matra et al.

Kusmana and Hikmat 2015). The available omics data for Durian Kura-kura are genomic using whole-genome sequencing (Magandhi et al. 2021). The genome assembly with Ray Assembler produced 360,868 contigs with N50 of 1487 bp. The genome annotation completeness was performed using BUSCO 3.0.2, and identified that 45.9% of the universal single-copy orthologs were complete, 4.4% duplicated, 20.7% fragmented, and 29.0% missing, using the embryophyta_od-b9 database.

10.3

Conclusions

Omics research for underutilised fruit is still very limited due to a lack of information regarding the benefits of these fruit trees. Genomic, transcriptomic, and metabolomics studies have been carried out on several underutilised fruits. However, these are still limited. Furthermore, the analysis of key developmental stages of some underutilised fruit plants have not been carried out due to uncertain fruit availability, for example, seasonal fruiting for transcriptomic and metabolomic studies on fruit stage studies and limited funding for omics studies. Comprehensive omics research for underutilised fruit at this time will impact the progress of underutilised fruit as a potential fruit of the future that can be used for conservation, cultivation, and commercialisation.

References

Fig. 10.13 Appearance of Durian Kura-kura in the wild forest of West Kalimantan

Agusri et al (2021) Characterization of genes encoding key enzymes involved in sugar metabolism of matoa (Pometia pinnata). IOP Conf Ser Earth Environ Sci 694:012066 Anggraheni YGD, Adi EBM, Wibowo H, Mulyaningsih, ES (2019) Analisis keragaman jambu air (Syzygium sp.) koleksi kebun plasma nutfah Cibinong berdasarkan morfologi dan RAPD. Biopropal Industri 10(2) Badruzaman E, Soetoro S, Hardiyanto T (2017) Analisis saluran pemasaran buah duku. Jurnal Agroinfo Galuh 4(3):330–337 Darmawan ARB (2015) Review: Usaha peningkatan kualitas mangga kasturi (Mangifera casturi) dengan modifikasi budi daya tanaman. Pros Sem Nas Masy Biodiv Indon 1(4):894–899

10

Underutilised Fruit Tree Genomes from Indonesia

187

Efendi D, Sari HP, Suwarno WB et al (2022) Genetic diversity of Lansium parasiticum (Osbeck) K. C. Sahni & Bennet accessions based on vegetative morphological characters and simple sequence repeat markers. Genet Resour Crop Evol. https://doi.org/10. 1007/s10722-021-01336-9 Gower DJ, Johnson KG, Richardson JE, Rosen BR, Ruber L, Williams ST (2012) Biotic Evolution and environmental change in Southeast Asia. Cambridge University Press, Cambridge Halim HR, Hapsari DP, Junaedi A, Ritonga AW, Natawijaya A, Poerwanto R, Sobir WWD, Matra DD (2019) ‘Metabolomics dataset of underutilized Indonesian fruits; rambai (Baccaurea motleyana), nangkadak (Artocarpus nangkadak), rambutan (Nephelium lappaceum) and Sidempuan salak (Salacca sumatrana) using GCMS and LCMS‘. Data Brief 23:103706. https://doi.org/10.1016/j.dib.2019.103706 Hasan S, Hossain M, Akter R, Jamila M, Mazumder M, Rahman S (2009) DPPH free radical scavenging activity of some Bangladeshi medicinal plants. J Med Plants Res 3(11):875–879 Heliawati L (2018) Kandungan Kimia dan Bioaktivitas Tanaman Kecapi. PPS UNPAK PRESS, Bogor Kostermans AJGH (1958) The genus Durio Adans (Bombac.). Reinwardtia 4(3):387–460 Koubala BB, Kansci G, Ralet MC (2018) Ambarella— Spondias cytherea. Exotic fruits reference guide. https://doi.org/10.1016/B978-0-12-803138-4.00005-8 Kusmana C, Hikmat A (2015) Keanekaragaman Hayati Flora di Indonesia. Jurnal Pengelolaan Sumberdaya Alam Dan Lingkungan 5(2):187. https://doi.org/10. 29244/JPSL.5.2.187 Lely N (2016) Efektifitas beberapa fraksi daun matoa (Pometia pinnata JR Forst. & G. Forst.) sebagai antimikroba. Jurnal Ilmiah Bakti Farmasi 1(1):51–59 Lim TK (2013) Edible medicinal and non-medicinal plants, vol 6. Springer, Dordrecht Heidelberg, London Magandhi et al (2021) Development and characterisation of Simple Sequence Repeats (SSRs) markers in durian kura-kura (Durio testudinarius Becc.) using NGS data. IOP Conf Ser Earth Environ Sci 948:012082 Mahisworo (2004) Budidaya Rambutan, Penebar Swadaya, Jakarta Matra DD, Fathoni MAN, Majiidu M, Wicaksono H, Sriyono A, Gunawan G, Susanti, H Sari R, Fitmawati F, Siregar IZ, Widodo WD, Poerwanto R (2021) The genetic variation and relationship among the natural hybrids of Mangifera casturi Kosterm. Sci Rep 11(1). https://doi.org/10.1038/S41598-021-99381-Y Matra DD, Ritonga AW, Natawijaya A, Poerwanto R, Sobir WWD, Inoue E (2019a) ‘Dataset of the first de novo transcriptome assembly of the arillode of Baccaurea motleyana‘. Data Brief 22:332–335. https://doi.org/10.1016/J.DIB.2018.12.031

Matra DD, Ritonga AW, Natawijaya A, Poerwanto R, Sobir SUJ, Widodo WD, Inoue E (2019b) Datasets for genome assembly of six underutilized Indonesian fruits. Data Brief 22:960–963.https://doi.org/10.1016/ J.DIB.2018.12.070 Matra DD, Ritonga AW, Natawijaya A, Poerwanto R, Sobir WWD, Inoue E (2019c) Dataset from de novo transcriptome assembly of Nephelium lappaceum aril. Data Brief 22:566–569.https://doi.org/10.1016/J.DIB. 2018.12.034 Morton J (1987) Fig 47 50 Fruits of warm climates Julia F. Morton Miami, FL Nurmayani et al (2021) ‘Characterization of rambai (Baccaurea motleyana) genes putatively involved in sugar metabolism. IOP Conf Ser Earth Environ Sci 694:012067 Pelima NJ (2016) Kajian pengembangan tanaman Flacourtia inermis Roxb. Jurnal Envira 1(1):34–39 Saleh M, Mawardi EW, Hatmoko D (2007) Keanekaragaman Flora dan Buah-Buahan Eksotik Lahan Rawa. Balai Penelitian Pertanian Lahan Rawa, Banjar Baru Sari HP et al (2020) Mining and characterization of genomic-based microsatellite markers in duku (Lansium domesticum). IOP Conf Ser Earth Environ Sci 457:012083 Silaban et al (2021) Isolation and characterization genes in lobi-lobi (Flacourtia inermis) related to sugar metabolism. IOP Conf Ser Earth Environ Sci 694:012068 Supriatna S (2010) Teknologi pembibitan duku dan prospek pengembangannya. Jurnal Litbang Pertanian 29(1):19–24 Susilo J (2014) Sukses Bertanam Jambu Biji dan Jambu air. Pustaka Baru Press, Yogyakarta Tropical Plants Database (2020) The reference manual of woody plant propagation. Tropical plants database Uji T (2004) Keanekaragaman jenis, plasma nutfah, dan potensi buah-buahan asli Kalimantan. BioSMART 6 (2):117–125 Umam MN, Poerwanto R, Matra DD (2022) Morphological and phytochemical characterisation of Lobi-lobi Fruit (Flacourtia inermis) at Each Maturity Stage (unpublished) Yang Y, Saand MA, Huang L, Abdelaal WB, Zhang J, Wu Y, Li J, Sirohi MH, Wang F (2021) Applications of multi-omics technologies for crop improvement. Front Plant Sci 3(12):563–953 Zaini NAM, Osman A, Hamid AA, Ebrahimpour A, Saari N (2013) Purification and characterization of membrane-bound polyphenoloxidase (mPPO) from snake fruit [Salacca zalacca (Gaertn.) Voss]. Food Chem 136(2):407–414 Zulfina et al (2021) Characterization of secondary metabolites in kasturi mango (Mangifera casturi) using gas chromatography-mass spectrometry. IOP Conf Ser Earth Environ. Sci 948012059

The Bambara Groundnut Genome From the Crop to the Genome—The Progress and Constraints of Genome-Related Studies in Bambara Groundnut

11

Luis Salazar-Licea, Kumbirai Ivyne Mateva, Xiuqing Gao, Razlin Azman Halimi, Liliana Andrés-Hernández, Hui Hui Chai, Wai Kuan Ho, Graham J. King, Festo Massawe, and Sean Mayes Abstract

The combined effects of climate change, increase in world population and dependence on a relatively small selection of crops, are threating the global food security. Despite their limited promotion amongst farmers, seed companies and researchers, underutilised crops could provide alternative sources of nutritionally dense foods and aid in the quest for food production due to their resilience and natural adaptation to marginal environments that could be too harsh for staple crops. Bambara groundnut (Vigna subterranea (L.) Verdc.) is a protein-rich underutilised legume

which has also long been recognised to be drought-resistant, capable of fixing atmospheric nitrogen and producing yield in marginal soils. As a consequence of the rapid development of genomic technologies and their current accessibility, in this chapter we share the current progress in genomics using molecular tools, an overview of the genome sequence of bambara groundnut, future work incorporating next-generation sequencing technologies and bioinformatics, as well as an example that showcases the importance of linking trait data to the genome to benefit future breeding programmes.

11.1 L. Salazar-Licea (&)  S. Mayes School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire L12 5RD, UK e-mail: [email protected] K. I. Mateva  X. Gao  H. H. Chai  W. K. Ho  F. Massawe Future Food Beacon, School of Biosciences, University of Nottingham Malaysia, Selangor Darul Ehsan, Jalan Broga, 43500 Semenyih, Malaysia R. Azman Halimi  L. Andrés-Hernández  G. J. King Southern Cross Plant Science, Southern Cross University, Lismore, NSW 2480, Australia S. Mayes Crops For the Future (UK)CIC, NIAB, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK

Introduction

The quest for food security must be pursued around the world because of the increase in the human population worldwide combined with the effects of climate change. The global population has been increasing; in the last 12 years, it increased by 1 billion to the current count of 7.9 billion (https://www.worldometers.info/worldpopulation/). It is estimated that by 2050, the global population will reach 9.7 billion people, and possibly 11 billion by 2100 (James 2015). Globally, 820 million people are currently suffering from chronic hunger and 2 billion are categorised as malnourished (FAO, IFAD, UNICEF, WFP and WHO 2019). Sub-Saharan

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_11

189

190

Africa was registered as having the world’s highest proportion of undernourished people in 2016 according to FAO, IFAD, UNICEF, WFP and WHO (2017). Climate change events have also been predicted to have negative impacts on water resources, and hence, crop production (Piao et al. 2010; Kole et al., 2015). At present, there is an over-reliance on a limited number of crop species for food. Less than twenty crops of the possible 50,000 documented edible plants provide over 90% of the plant-based global food energy (Esquinas-Alcazar, 2005), with three staple crops (maize, wheat and rice) providing more than two-thirds of this (IPES-Food 2016). If these staple crops start to fail due to biotic and abiotic stresses new alternative species will be needed to fulfil the global demand for food and nutrition (Mayes et al. 2019b). The best candidates should be those that have valuable traits such as drought tolerance, high nutritional density, possible nutritional or agricultural complementarity with other crops and increased genetic diversity (Varshney et al. 2010; Mayes et al. 2019b). Legumes, such as bambara groundnut (Vigna subterranea (L.) Verdc.) (2n = 2x = 22), are a good alternative since they add nitrogen to the soil and provide good amounts of protein for human diets compared to cereals (Ahmad 2016; Mayes et al. 2019b). In this chapter, we describe some of the advantages and disadvantages of bambara groundnut, as well as the current and potential importance of this underutilised crop. Additionally, we present current developments in molecular research, from the genome to molecular breeding approaches. These developing resources will help researchers acquire knowledge and the molecular tools necessary to equip potential bambara groundnut breeders to overcome the current restraints on wider adoption of this promising pulse.

11.1.1 Botanical Description and General Ecology Bambara groundnut is an indigenous African protein-rich legume that is widely cultivated by

L. Salazar-Licea et al.

subsistence and small-scale farmers in subSaharan Africa and Southeast Asia (mainly in West Java, Thailand and parts of Malaysia). Bambara groundnut is cultivated in the tropics at altitudes up to 2000 m above sea level. The crop is recognised to be tolerant to drought and is grown successfully in areas with an average annual rainfall below 500 mm, although optimum yields are obtained when rainfall is higher (900–1000 mm year−1) (Ocran et al. 1998). Bambara groundnut can also be grown in humid conditions, such as in northern Sierra Leone, where the annual rainfall exceeds 2000 mm, although is seasonal. The crop grows on any well-drained soil, but light sandy loams with a pH of 5.0–6.5 rich in phosphorus and potassium are most suitable. Peduncle penetration and subsequent peduncle expansion to form pods can be aided by light sandy loam soils. Bambara groundnut has a life cycle that ranges from 110 to 150 days, although some early maturing genotypes of bambara groundnut have also been identified in Ghana, including ‘Zebra coloured’ with a maturation period of 90 days and ‘Mottled cream’, which matures in 98– 100 days (Berchie et al., 2010a, b). Bambara groundnut germinates between 7 to 15 days in temperature of between 28.5 and 32.5 °C (Makanda et al. 2008), anthesis starts from 30 to 35 days after emergence and may continue until the end of the crop life cycle. The formation of pods takes 30–40 days after fertilisation, and most genotypes require a photoperiod of 12 h for optimal pod and seed development, although variation for this trait has been identified (Kendabie et al., 2020). In many genotypes, flowering is not affected by photoperiod; however, long photoperiod can delay or inhibit pod set and/or seed development, such as in the genotypes ‘Ankpa4’ and ‘Tiga Nicuru’ (Linnemann 1993; Kendabie et al. 2020), while other genotypes may produce increased yields under long photoperiods, with a delay in maturity date. The morphology of bambara groundnut is similar to that of groundnut (Arachis hypogaea). Bambara groundnut is an annual, herbaceous, intermediate legume with trifoliate leaves and

11

The Bambara Groundnut Genome

191

Fig. 11.1 Morphology of bambara groundnut (modified from National Research Council 2006)

erect petiole grown from short, creeping and multibranched lateral stems just above ground level (Fig. 11.1) (Heller et al. 1997). The cultivated forms of bambara groundnut have stems with a limited creeping growth habit, which gives rise to either bunch or intermediate types (Linnemann and Azam-Ali 1993). The petioles are long, stiff and grooved, and they are grown from the nodes with a base of a range of colours including green, brown and purple (Swanevelder 1998). Bambara groundnut can grow up to 30– 35 cm in height with a well-developed taproot and lateral root branching under the soil (Mateva et al. 2020), which are capable of forming root nodules in association with Rhizobia for nitrogen fixation (Foyer et al. 2016; Considine et al. 2017). Wild forms of bambara groundnut (Vigna subterranea var. spontanea) demonstrate a slightly different morphology, such as a fully

spreading growth habit, limited number of elongated lateral stems with pentafoliate leaves and no distinct taproot (Swanevelder 1998). The flowers are generally described to be papilionaceous, and they are produced on long and hairy peduncles that elongate from the nodes of lateral stems (Swanevelder 1998). The flower colour changes from yellow whitish in the morning, to pale yellow or light brown in the evening and flowers generally open over 24 h (Heller et al. 1997). Upon pollination and fertilisation, the peduncles usually elongate and penetrate the soil surface, and in some cases, they would stay above ground and proceed to form pods with pod sizes ranging from 1.5 to 2.5 cm in diameter (Swanevelder 1998). Approaching the maturity stage, various pod colours in bambara groundnut are observed, ranging from cream yellow, pale or dark green, or red, depending on

192

the genotypes (Massawe et al. 2003). Each of the pods can produce one seed; however, some genotypes are reported to have double-seeded seeds (Pasquet and Fotso 1997; Gao et al. 2020). Depending on genotypes, the seed colour in bambara groundnut can be different, including cream, yellow, brown, red and black with or without hilum colouration and speckling (Swanevelder 1998).

11.1.2 Geographical Distribution The name ‘Bambara groundnut’ is reputed to be derived from a tribe named Bambara, who mainly live in Mali today. However, no wild form of the crop has been found in Mali and the exact origin centre of bambara groundnut has been unclear. Studies about the centre of origin of bambara groundnut suggested that the crop originated from the African continent (Hepper 1970). As cited in Heller et al. (1997), Guillemin et al. (1832) reported the discovery of wild forms near Senegal, and it has also been suggested by Dalziel (1937) to have originated from the region between north-eastern Nigeria (Yola Province) and northern Cameroon (Garoua City). The centre of domestication of bambara groundnut is believed to extend from the Jos Plateau and Yola Region in Nigeria, to Garoua in Cameroon, and probably even to Central African Republic (Hepper 1963; Begemann 1988). Begemann (1988) carried out detailed analysis of seed diversity for a large collection of bambara groundnut from the International Institute of Tropical Agriculture (IITA). The results showed greater seed diversity in the samples collected within 200 km of the original putative centre, between Yola and Garoua (Hepper 1963). In addition to seed traits, diversity was also observed in other traits, including number of days to maturity, pod length and number of shoots per plant (Hepper 1963). Interestingly, Somta et al. (2011) reported higher genetic diversity in accessions from Burkina Faso in contrast to those from Cameroon/Nigeria, hypothesising that the regions around Burkina Faso could be the more accurate place of

L. Salazar-Licea et al.

domestication of bambara groundnut. Though, recent work of Olukolu et al. (2012) using Diversity Array Technology (DArT) markers on 124 accessions from 25 African countries revealed greater genetic diversity for the Cameroon/Nigeria region compared to other regions. Based on the analysis of 363 local varieties, Rungnoi et al. (2012) also concluded that West Africa is the centre of diversity/ domestication of bambara groundnut. Bambara groundnut has been reported to be grown in tropical regions since the seventeenth century, including Nigeria, Ghana, Haute Volta, Eastern Africa and Madagascar. It is also grown in Central and South America, Oceania, Asia, including the Philippines, India, Sri Lanka, Indonesia, and Malaysia, and the South Pacific, as well as areas of northern Australia and Papua New Guinea. (Linnemann and Azam-Ali 1993; Duke 1981; Baudoin and Mergeai 2001).

11.1.3 Genetic Resources, Accessibility to/from Seed Banks There are around 6000 accessions of bambara groundnut, mainly collected from African countries, and these collections are held by international or regional seed banks (Table 11.1). The largest bambara groundnut germplasm collection is held by the International Institute of Tropical Agriculture (IITA) (Goli 1997). The collection was gathered from 25 African countries and has been characterised, evaluated and documented (Goli 1997). The crop is still largely grown as landraces, and the variation harboured by these landraces is a great asset for breeding programmes (Olukolu et al. 2012; Kendabie et al. 2015; Mayes et al. 2015; Massawe et al. 2016). Phenotypic descriptors (IPRGI 2000), biochemical markers (Pasquet et al. 1999), molecular markers including amplified fragment length polymorphism (AFLP) markers (Massawe et al. 2002), random amplified polymorphic DNA (RAPD) (Massawe et al. 2003b), simple sequence repeat (SSR) markers (Molosiwa et al. 2013; Aliyu and Massawe 2013; Redjeki et al.

11

The Bambara Groundnut Genome

193

Table 11.1 Bambara groundnut accessions held by countries or institutions (Begemann and Engels 1997; Muhammad et al. 2020) Country/institution

No. of accessions

Benin

3

Botswana

26

Botswana, Department of Agricultural Research (DAR)

338

Burkina Faso

143

France, Office de la Recherche Scientifique et Technique d’Outre-Mer (ORSTOM)

1416

Ghana, Plant Genetic Resources Unit (PGRC)

166

Ghana, Plant Genetic Resources Research Institute (PGRRI)

296

Ghana, Savanna Agricultural Research Institute (SARI)

90

Ghana, University of Ghana

80

Guinea

43

Kenya, Kakamega Regional Research Centre (KARI)

2

Kenya, National Genebank

6

Kenya, National Museums

2

Mali

70

Mozambique

12

Namibia

23

Niger

79

Nigeria, International Institute of Tropical Agriculture (IITA)

2035

South Africa, Department of Agriculture

20

South Africa, Grain Crops Institute

198

South Africa, Institute for Veld and Forage Utilization

117

Tanzania, National Plant Genetic Resources Committee (NPGRC)

222

Zambia, National Plant Genetic Resources Committee (NPGRC)

232

Zambia, University of Zambia

463

Zimbabwe

129

Total

6211

2020), DArT markers (Olukolu et al. 2012) and single nucleotide polymorphism (SNP) markers (Redjeki et al. 2020) have been used to assess genetic diversity within the available germplasm of bambara groundnut.

11.1.4 Bambara Groundnut—An Important but Underutilised Crop

under drought conditions. The adaptive characteristics that enable bambara groundnut to survive under drought conditions have been studied (Collinson et al. 1997; Collinson et al. 1999; Jørgensen et al.2010; Sesay et al. 2010; Vurayai et al. 2011; Tafadzwanashe et al. 2013; Chibarabada et al. 2015; Chai et al. 2016a; Muhammad et al. 2015) and could be explored further to develop bambara groundnut varieties for drought-prone areas.

Traits of importance—drought resistance Bambara groundnut uses a combination of drought resistance mechanisms to produce yield

Physiological changes—above ground As reviewed in Mayes et al. (2019a), drought resistance mechanisms in bambara groundnut

194

have been studied and evaluated over a period of 30 years (Collinson et al. 1997, 1999; Jorgensen et al. 2010; Vurayai et al. 2011; Laary et al. 2012; Mabhaudhi and Modi 2013; Al Shareef et al. 2014; Chibarabada et al. 2015; Berchie et al. 2016; Nautiyal et al. 2017). Given that bambara groundnut is tolerant to drought, cultivation of bambara groundnut may be one of the few options in drylands with minimal rainfall. Various reports with clear evidence have identified the potential of bambara groundnut in response to drought stress through stomatal regulation and osmotic adjustment (Collinson et al. 1997; Jorgensen et al. 2010; Mabhaudhi et al. 2013; Chai et al. 2016a, b). For example, the genotype S19-3, originating from Namibia, was reported to have late stomatal closure during drought stress (Jørgensen et al. 2010). The authors further defined S19-3 as a ‘waterspender’ exhibiting a slow decline in transpiration rate enabling the genotype to maximise use of available water. Accordingly, this is in line with a root system study by Mateva et al. (2020), identifying S19-3 as a genotype with a quick and high root length density in the deeper soil layers compared to the topsoil layer. The value in this would be increased root and soil contact enabling plants to access more water in lower soil depths. This value was also reported by Lynch (2007) and Blum (2011). Bambara groundnut is also found to be able to escape from drought, and this is related to phenological plasticity. Bambara groundnut was observed to have a reduced vegetative period, a reduced reproductive stage and earlier final maturity date in response to drought stress (Mabhaudhi et al. 2013). For example, landraces ‘Red’ and ‘Brown’ from Jozini, South Africa, demonstrated a significantly earlier maturity date (mean: 122.8 days after planting (DAP)) when subjected to stress at 30% of crop water use (ETa) compared to 100% ETa (mean: 128 DAP; Mabhaudhi et al. 2013). Although drought stress generally decreases the yield of most of the crops, bambara groundnut is still able to produce reasonable yields of up to 1.65 t ha−1 of seeds with a range of 1.3–2.1 t ha−1 (Mwale et al. 2007a). These yields are reported to be similar to

L. Salazar-Licea et al.

drought-tolerant cultivars of groundnut and are higher than chickpea cultivars (0.3–0.5 t ha−1) under comparable drought stress condition (Leport et al. 1999; Collino et al. 2000). High efficiency of resource capture and conversion is believed to contribute to crop productivity under drought. Although bambara groundnut was observed to have reduced radiation conversion coefficient (es) from 1.51 to 1.02 g MJ−1 due to drought stress, the es of bambara groundnut reported in Mwale et al. (2007b) is higher than those of reported in soybean, ranging from 0.52 to 0.92 g MJ−1 (Board et al. 1994; De Costa and Shanmugathasan 2002), and cowpea (Vigna unguiculata; Craufurd and Wheeler 1999) under minimal soil moisture conditions. In addition, the efficiency of plants to convert water into dry matter (ew) is essential for yield production. The ew of bambara groundnut (1.65 g kg−1) under drought stress (Mwale et al. 2007b) was reported to be higher than most of the grain legumes grown in low rainfall Mediterranean environments, such as lentil (Lens culinaris; 1.37 g kg−1; Zhang et al. 2000) and chickpea (Cicer arietinum; 1.11 g kg−1; Siddique et al. 2001). Root trait system variation and its contribution to drought stress resistance Roots are one of the most important organs for transporting various materials from the soil and thereby controlling productivity (Lynch 1995). Plants can modify their root system architecture (RSA) to respond to a variety of conditions (Jovanovich et al. 2007). As an underutilised grain legume, bambara groundnut has not been intensively studied for RSA. A better adapted RSA has been linked to alleviation of drought stress by increasing exploration for water in bambara groundnut genotypes (Mateva et al. unpublished). The root system of bambara groundnut, as with many dicotyledons, is characterised by a well-defined taproot system, with numerous first-order lateral branches. These lateral roots further branch into second- and third-order laterals. The depth of rooting and distribution of lateral roots are determining factors for RSA in bambara groundnut (Mateva et al. 2020).

11

The Bambara Groundnut Genome

In a comparative analysis of RSA of eight bambara groundnut genotypes derived from landraces, sourced from several countries, natural genetic variations in RSA have been reported and could be utilised for improvement of drought resistance (Mateva et al., unpublished). Using a lightweight polyvinyl chloride (PVC) columns evaluation, a known drought-resistant genotype (S19-3, from Namibia) showed a deeper taproot and more branching in the lower soil depths (Mateva et al. 2020). Recently, the genotype DodR (sourced from Tanzania) was identified as showing promising RSA for extensive root length density in the 60-90 cm of the soil and this was associated with grain yield. Mateva et al. (2020) suggested an adaptive response of bambara groundnut for soil resource capture through an improved foraging capacity of the root system in the hot–dry region derived single genotypes. Furthermore, genotypes that evolved in drier areas could have adapted by increasing taproot length (TRL) and reducing their branching distribution to capture deep water more efficiently. In addition, by screening of bi-parental populations obtained from crossing two distinct single genotypes (i.e. S19-3  DodR) (*22 lines), TRL and root length density in the 60– 90 cm region (RLD 60–90 cm) of the soil were found to be useful traits for selecting bambara groundnut lines for drought resistance (Mateva et al., unpublished). In this study, lines with promising TRL and RLD 60–90 cm were identified for further evaluation to breed more drought-resistant bambara groundnut varieties. Quantitative trait loci (QTL) mapping could be deployed to identify chromosomal regions that have a substantial impact on root system variation particularly TRL and RLD 60–90 cm in bambara groundnut populations to further accelerate breeding outcomes.

195

between lines. For example, determining intraand interspecies variation for nutritional components would enable direct comparison with commodity crops (Halimi et al. 2019a, see example in Fig. 11.2). Such data could be useful at the policy-making level to recognise the role underutilised crops may play alongside staple crops for food security (Mabhaudhi et al. 2018; Pingali 2015). Comparative analysis of the available literature on nutritional composition of bambara groundnut and four taxonomically related legume species (Halimi et al. 2019b) indicated that there is potential to develop the crop into a high protein or high oil species. The literature indicated a seed protein range of 9.6– 30.7%, with larger variation than those reported for major legumes such as chickpea and cowpea. The seed lipid of the 100 lines was used to determine the fatty acid composition on Gas Chromatography Flame Ionisation Detector (GCFID) using the Association of Official Analytical Chemist method 996.06 (Halimi et al. 2019b). Twenty-one fatty acids were detected in bambara groundnut seed lipid (Fig. 11.3; Table 11.2); a marked increase compared with the limited number of fatty acids reported for this species previously—oleic, linoleic, palmitic, myristic, stearic, behenic and linolenic acids (Minka and Bruneteau 2000; Mune et al. 2007; Adeleke et al. 2018). A study of a bambara groundnut landrace found on Ivory Coast detected 13 fatty acids (Yao et al. 2015), and this study has increased the knowledge base further. The predominant components observed were linoleic acid (18:2 n6) which accounted for 33–45% of the fatty acid, oleic acid (18:1, n-9) (15–27%) and palmitic acid (16:0) (16–23%). The concentration of oleic acid was similar to that present in soybean lines prior to modern selection for this trait.

11.1.6 Underutilisation of Bambara Groundnut 11.1.5 Nutritional Composition As with most underutilised crops, there are limitations in terms of access to reliable datasets for analysis of the variation for traits within and

Bambara groundnut has been reported to have unpredictable yield (with planting material consisting of landraces), lack of commercial varieties, sensitivity to long photoperiods, long

196

L. Salazar-Licea et al.

Compositional variation 80

Carbohydrate Protein Total fat Total dietary fibre

60 40 20 0

Bambara groundnut (Vigna subterranea )

Soybean (Glycine max)

Chickpea (Cicer arietinum )

Cowpea (Vigna unguiculata)

Fig. 11.2 Compositional variation in the four proximate components for raw bambara groundnut (Vigna subterranea) seeds and selected crop comparators: soybean (Glycine max), chickpea (Cicer arietinum) and cowpea (Vigna unguiculata). Green—carbohydrate, blue—protein, yellow—total fat and orange—total dietary fibre.

Data are presented as calculated mean values expressed as % edible portion. Dataset for each compound for each crop was constructed from at least three data sources; dataset averages were calculated and normalised to 100%. Adapted from Halimi et al. (2019b)

Fig. 11.3 Typical GC-FID chromatogram showing separation (retention times, minutes) of fatty acid methyl esters from bambara groundnut (V. subterranea) seed

) indicate peaks of major fatty acids lipid. Red stars ( and black diamonds ( ) indicate minor fatty acids (Halimi, unpublished)

11

The Bambara Groundnut Genome

197

Table 11.2 Typical GC-FID area per cent report tabulating retention times (min) and composition of each fatty acid (area %) measured for bambara groundnut (V. subterranea) seed lipid Peak #

Retention time (min)

Width (min)

Area [pA s]

Area %

Fatty acid

Fatty acid chain length

1

21.720

0.052

3.005

0.0608

Myristic

14:0

2

24.332

0.049

1.881

0.0381

Pentadecanoic

15:0

3

26.958

0.066

809.976

16.401

Palmitic

16:0

4

27.859

0.049

5.312

0.108

Palmitoleic

16:1(n-7)

5

29.230

0.053

11.497

0.233

Heptadecenoic

17:1

6

30.236

0.056

1.818

0.037

Magaric

17:0

7

31.626

0.069

352.930

7.147

Stearic

18:0

8

32.469

0.068

1181.555

23.925

Oleic

18:1(n-9)

9

32.586

0.046

57.321

1.161

cis-vaccenic

18:1 (n-11)

10

33.920

0.083

1768.274

35.806

Linoleic

18:2 (n-6)

11

34.707

0.046

0.341

6.928e-3

c-Linolenic

18:3(n-3)

12

35.552

0.049

106.194

2.1503

a-Linolenic

18:3(n-3)

13

35.820

0.052

112.602

2.2801

Arachidic

20:0

14

36.591

0.049

34.557

0.6997

11-eicosenoic

20:1(n-11)

15

37.384

0.060

3.229

0.0654

Heneicosanoic

21:0

16

37.971

0.050

2.405

0.0487

Eicosadienoic

20:2

17

39.887

0.060

295.687

5.9874

Behenic

22:0

18

40.537

0.049

9.205

0.1864

Erucic

22:1(n-9)

19

41.636

0.051

8.291

0.1679

Docosadienoic

22:2 (n-6)

20

43.441

0.050

136.987

2.7739

Lignoceric

24:0

21

44.194

0.048

0.779

0.0158

Nervonic

24:1(n-9)

Totals

4903.848

99.298

Peak area for each fatty acid is calculated by multiplying peak height (pA) by sample concentration (s)

cooking time and have few value-added product opportunities (Mayes et al. 2019b). Additionally, climate change is happening too rapidly for crops, including bambara groundnut, to passively adapt and may lead to erratic yields in currently used bambara groundnut landraces, and hence, deliberate breeding in bambara groundnut is required. Bambara groundnut yield in Africa is estimated to be approximately 0.3 million tonnes annually, with Nigeria being the largest producer (0.1 million tonnes; Hillocks et al. 2012). The

average annual production of legumes in Africa has been reported by Stanton et al. 1966; Hillocks et al. 2012; Nedumaran et al. 2015 and reviewed by Mayes et al. 2019a (see Table 11.3). It is important to take note that the yield of bambara groundnut in Africa varies between landraces and locations (0.5–3 t ha−1), but the crop has yield potential of over 3 t ha−1 (Begemann 1988) and the average yield of 0.85 t ha−1 was reported to be comparable to other legumes (Stanton et al.1966). Additionally, an estimated macronutrient comparison based on the

198

L. Salazar-Licea et al.

findings of Halimi et al. (2019a) is presented in Table 11.3. In comparison with major legumes such as chickpea and cowpea, bambara groundnut has higher fat content (55.3 kg/ha for bambara groundnut, 49.4 kg/ha for chickpea and 11.9 kg/ha for cowpea). Bambara groundnut also provides more protein per ha than cowpea. (Table 11.3). Compiled literature values represent the current situation for unimproved material as there are no commercial varieties released with improved nutritional composition (Halimi et al. 2019a). With Bambara groundnut showing approximately half the nutritional potential as soybean, there exists opportunity to develop varieties to meet the global demand for energy and protein. Berchie et al. (2016) reported that the time of sowing affected the yield of bambara groundnut, in which higher yields were observed in the dry minor rainfall season compared with the major rainy season. Pod yields of up to 4 t ha−1 were obtained in some landraces in the transition agroecological zone in Ghana, where temperatures are higher and rainfall is lower compared to forest agro-ecological zone (Berchie et al. 2016). However, if the cultivation of bambara groundnut occurs at the appropriate time, relatively high yields could be attained in the forest agroecological zone in Ghana (Berchie et al. 2016). As mentioned before, photoperiod was reported to influence the onset of flowering and podding of bambara groundnut, depending on the genotypes (Linnemann and Craufurd 1994; Linnemann et al. 1995; Kendabie et al. 2020). For instance, long photoperiods of 14 and 16 h

delayed flowering in genotype ‘Ankpa4’ which produced no pods, while genotype ‘Tiga Nicuru’ had delayed podding and decreased number of pods when photoperiod was increased from 12 to 14 h (Kendabie et al. 2020). Nevertheless, significant differences amongst genotypes in response to long photoperiod have been identified (Kendabie et al. 2020), and crosses between ‘quantitative long day’ (IITA-686) and ‘qualitative short day’ genotype (Ankpa 4) in bambara groundnut allow the generation of individual lines to be selected for future breeding programmes. Similar to many pulses, long cooking time due to the ‘hard-to-cook’ (HTC) phenomenon, which was defined to reflect the amount of energy required for the legume to have desirable texture and edible, is recognised as a major limitation for the usage of bambara groundnut. Bambara groundnut generally needs 3–4 h of boiling, which is almost identical to that of soybean (3.6 h), but it is significantly longer than common bean (1.5 h), cowpea (2.4 h) and mung bean (0.5 h) (Mubaiwa et al. 2017), leading to greater fuel and water requirement, thus increasing the cost to cook in many developing countries (Adzawla et al., 2016). Additionally, HTC also negatively impacts the eating and nutritional qualities on bambara groundnut. Ageing of seeds, as a result of long-term storage under increased temperature and humidity, was found to be associated with development of HTC traits, and ageing also can reduce in vitro bioavailability of calcium and magnesium (Gwala et al. 2020). The levels of minerals

Table 11.3 Yield and production of a subsection of legumes in Africa with estimated nutritional values (adopted from Mayes et al. 2019a as reported in Stanton et al. 1966; Hillocks et al. 2012; Nedumaran et al. 2015; Halimi et al. 2019a) Annual production (million tonnes)

Yield (t ha−1)

Cowpea

4.9

Soybean

1.4

Bambara groundnut Chickpea

Estimated macronutrient values in kg/ha Total Fat

Dietary fibre

Carbohydrate

Protein

0.49

327.12

133.13

11.86

17.93

1.22

327.45

509.47

223.99

159.09

0.3

0.85

547.66

200.52

55.34

46.75

0.3

0.94

573.21

207.55

49.35

109.79

11

The Bambara Groundnut Genome

including magnesium, iron and zinc (Gwala et al. 2020), and protein quality (Tuan and Philips 1992) were also observed to be affected by prolonged cooking time on aged seeds. Research and investment in appropriate processing methods and machinery—particularly micromanufacture—would be necessary to minimise the limitation and increase the uptake of bambara groundnut in the market. Although bambara groundnut has been cultivated for centuries in Africa, it remains as one of the underutilised crops that has not been long associated with large-scale research programmes as has the case been for many other crops. Bambara groundnut has been largely ignored by the research and breeding community and received limited support from governmental or international agencies, as compared to major crop like soybean, which has received significant attention and considerable scientific and financial support since its introduction (Heller et al. 1997; Oyeyinka et al. 2015). Bambara groundnut also faces competition from groundnut (which was introduced into West Africa from Brazil), due to significant amounts of oil in the seeds, and hence, groundnut can be cultivated as an oilseed crop. Bambara groundnut (along with other seed legumes) is commonly referred to as a ‘poor man’s crop’. The perception that underutilised crops, including bambara groundnut, have lower economic potential and export value compared to major crops, thus influences the exploitation of the crop (Azam-Ali et al. 2001). Although bambara groundnut can have higher market prices than other legumes, including groundnut, due to seasonal crop supply there are only a limited number of value-added products which have been developed for bambara groundnut. In addition, proper seed systems and best agronomic practices are yet to be established and shared with the bambara groundnut production community, causing the crop to remain underutilised (Hillocks et al. 2012; Feldman et al. 2019). However, the results of specific research programmes indicate that bambara groundnut is a crop with considerable potential that could contribute to food and nutritional security, especially

199

as a food crop in dry areas with marginal soils. As a drought-tolerant legume, bambara groundnut deserves to receive greater attention for further research and development. Nevertheless, research attention is required to develop improved varieties and crop management practices to increase yield production as well as harvest index, especially under drought conditions. For many crop species, it is often challenging to select for grain yield under drought conditions due to the interaction of genotype x environment (GxE), and thus, reliable and accurate phenotyping tools are important to incorporate targeted traits into molecular breeding programme and dissect genes controlling traits of interest (Salekdeh et al. 2009). In addition to yield traits, breeding work targeting traits such as tolerance to heat, disease resistance, seed nutrient quality and palatability of the foliage would be of value, and therefore, bambara groundnut can be used as pasture crop as well as seed consumption and as a cash crop for use by resource-poor farmers (Mayes et al. 2019a).

11.2

Molecular Tools and Their Application in Bambara Groundnut

Scientific work in plant genetics has used different molecular markers, but the reduction of costs and the development of technologies such as next-generation sequencing have allowed a significant increase in molecular work. In the following sections, we present a few examples of how this research has been implemented in bambara groundnut until present day.

11.2.1 Molecular Markers— Development and Applications As an important component of both fundamental research and practical application in many studies of plants, animals and microorganisms, a genetic linkage map represents the relative order of genetic

200

markers along a chromosome and the relative distance between them, determined by recombination frequency (Yeboah et al. 2007; Liu et al. 1998). Understanding the genetic basis and identification of molecular markers for target traits are prerequisites for deploying molecular breeding for developing superior genotypes (Kullan et al. 2012). The first genetic linkage map reported in bambara groundnut was constructed using 67 AFLP and one SSR markers, consisting of 20 linkage groups and 516 cM in length using an F2 segregating population derived from a cross between a wild accession, VSSP11 and a cultivated accession, DipC (Basu et al. 2007). QTL analysis in the F2 population identified a range of QTLs associated with agronomic traits including internode length, leaf water use efficiency (LWUE), carbon isotope discrimination (D13C), seed weight and testa colour (Basu et al. 2007). The first intraspecific genetic linkage map between two cultivated accessions was constructed using 269 polymorphic markers, which included 236 DArT and 33 SSR markers, from a F3 segregating population of bambara groundnut derived from a narrow cross between cultivated accessions, DipC and Tiga Nicuru (Ahmad et al. 2016). The genetic map consists of 21 linkage groups (LGs) with a total genetic distance of 608.3 cM, a total of 36 significant QTLs associated with various important phenotypic traits in bambara groundnut were detected (Ahmad et al. 2016). In addition to linkage map construction, two significant QTLs were mapped for the internode length (LG4, 3.0 cM) and growth habit (LG4, 0.0 cM) explaining more than 40% of phenotypic variation in the F3 populations under controlled environment glasshouse and field conditions (Ahmad et al. 2016). The first gene expression marker-based genetic map (GEM map) in a F5 population of bambara groundnut was developed for QTL analysis using 527 markers and covered 982.7 cM and 13 linkage groups (Chai et al. 2017). QTLs associated with stomatal conductance, carbon isotope discrimination and stomatal density were largely mapped on LG2 (Chai et al. 2017). QTLs for (D N15) isotope analysis (NID) mapped on LG1 and were associated with

L. Salazar-Licea et al.

internode length, pod number per plant, pod weight per plant and seed number per plant, showing a positive relationship between nitrogen assimilation and biomass in plants (Chai et al. 2017). A combination of population-specific and pre-selected common markers was used to construct two individual intraspecific genetic maps in bambara groundnut from the two crosses: a genetic map of IITA686  Ankpa4, which was derived from 263 F2 segregating population, gave 11 linkage groups comprising of 223 DArTSeq markers and covered 1395.2 cM; while a genetic map of Tiga Nicuru  DipC, derived from 71 F3 segregating population, showed 11 linkage groups consisting of 293 DArTSeq markers and covered 1376.7 cM (Ho et al. 2017). A significant QTL for internode length was mapped on LG2 (50.6 cM; flanking markers between 47.6 and 54.4 cM), explaining 33.4% phenotypic variation observed in this cross. This was syntenic to Pv03 (38.4–39.1 Mbp; common bean), Va11 (12.5–17.4 Mbp; azuki bean) and Vr07 (39.4–43.5 Mbp; mung bean) (Ho et al. 2017).

11.2.2 Microarrays Microarrays have been widely adopted in past few years to generate expression-based markers for the development of expression-based genetic map, expression quantitative trait loci (eQTL) as well as conventional QTL studies (Winzeler et al. 1998; Ronald et al. 2005; West et al. 2006; Potokina et al. 2007; Hammond et al. 2011; Chai et al. 2017). Expression-based markers, such as gene expression markers (GEMs), can be developed for map construction on the basis of significant differences in hybridisation signal strength observed between individuals when mRNA or cRNA is hybridised to microarrays, as a result of either sequence polymorphisms affecting the hybridisation efficiencies, or genuine differences in the transcript abundance (Chai et al. 2017). The potential of using microarrays developed for a major and/or model plant species to analyse less intensively studied species, such as bambara groundnut, is known as XSpecies

11

The Bambara Groundnut Genome

(cross-species) microarray approach. Some examples of proof-of-concept studies reported on XSpecies microarray approaches, including eggplant and pepper on tomato microarray (Moore et al. 2005), potato on tomato microarray (Bagnaresi et al. 2008), cowpea on soybean microarray (Das et al. 2008), banana on rice microarray (Davey et al. 2009), sweet sorghum on sugarcane microarray (Calvino et al. 2009) and Brassica oleracea on Arabidopsis microarray (Hammond et al. 2005, 2011). Chai et al. (2017) also reported the generation of GEMs at the unmasked probe-pair level after bambara groundnut leaf RNA was crosshybridised onto Affymetrix Soybean Genome GeneChip, followed by construction of the first spaced GEM map consisting of 13 linkage groups containing 218 GEMs, covering 982.7 cM of the bambara groundnut genome. Comprehensive QTL analysis with good genome coverage using the GEM map also demonstrated the use of XSpecies microarray pipeline in mapping both intrinsic and drought-related QTLs in bambara groundnut, allowing targeted QTL to be identified and used for marker-assisted selection (MAS) breeding in the future (Chai et al. 2017). Transcriptomic changes in two bambara groundnut genotypes, DipC and Tiga Nicuru, in response to drought stress were also studied by cross-hybridising cDNA onto the Soybean Affymetrix GeneChip (Khan et al. 2017). According to Khan et al. (2017), this revealed different sets of transcription factors and dehydration-response genes in the two genotypes. For example, DipC displayed differential expression of transcription factors WRKY40, while Tiga Nicuru showed differential expression of CONSTANS-LIKE 1 and MYB60. The XSpecies microarray approach has been demonstrated to have the potential of investigating molecular mechanisms underlying traits of interest related to drought in bambara groundnut. Nevertheless, the hybridisation efficiency of transcripts onto the probes could be affected by sequence divergence, leading to inaccurate abundance signals that might be an obstacle in data analysis. It could even cause the loss of signal, especially when Affymetrix technology is

201

utilised, as the cross-hybridisation is dependent upon a set of 11 oligonucleotides, which constitute a probe-set and each probe is only 25 nucleotides in length. Even with other microarray technology, such as Agilent, where the probe is a 60-mer, evolutionary distance between reference species and targeted species could still be a confounding factor. Divergence time between bambara groundnut and soybean is reported to be 20 My (Cannon et al. 2009). However, another complication of using the soybean microarray to study bambara groundnut is the duplication of soybean genome (2n = 2x = 40) since evolutionary divergence of the two species. The XSpecies microarray approach offers an alternative feasible route to translate information from major, or model plant species, to underutilised and less researched crops, especially in some cases where there is limited public access to sequence resources. It is also important to take note that the XSpecies microarray approach was a cheaper alternative to next-generation sequencing, although applications of both methods could be appropriate in different situations (Lai et al. 2014). As sequencing technologies are evolving at a rapid pace, and the cost of sequencing is declining, RNA sequencing (RNAseq) technology offers benefits in studying transcriptomes for any species, including detection of novel transcripts as they do not require species- or transcript-specific probes like microarrays, have a greater dynamic range, higher specificity and sensitivity and allow detection of rare and low-abundance transcripts (Han et al. 2015).

11.2.2.1 RNAseq in Bambara Groundnut Scientific efforts have been recently made to analyse the Bambara transcriptome of leaf tissue (unpublish data). A drought experiment involving four contrasting genotypes, under severe drought and well-watered conditions was performed, and leaf samples were collected for RNA isolation. The RNA was sequenced using Illumina NovaSeq platform. Figure 11.4 shows, in a Venn diagram, the significantly differentially expressed genes within each genotype in

202

L. Salazar-Licea et al.

Fig. 11.4 Venn diagram of four paired transcriptomes (Irrigated vs. drought) of 4 genotypes (Gresik, S19-3, DodR and Tiga Nicuru), showing the number of common

and unique genes expressed in response to drought (unpublished data)

response to drought and their overlap between genotypes. Additionally, Fig. 11.5 shows the number and statistical significance of up- and down-regulated genes in ‘Tiga Nicuru’ in response to drought (unpublished data). The data resulting from the RNA sequencing will also serve in the future for transcriptome and gene expression analyses. Additionally, it is being used for future genome annotation.

sequenced to act as a starting point for genetic improvement (AOCC 2020; Chang et al. 2019). Chang et al. (2019) published the first draft genome of bambara groundnut along with four other species (Lablab purpureus, Faidherbia albida, Sclerocarya birrea and Moringa oleifera) based on shotgun sequencing using the Illumina platform. For Bambara groundnut, this produced an assembly of 535 Mb with N50 at 640,666 bp (N90 = 75,271 bp). With the genome size predicted to be 550 Mb from k-mer analysis, this genome assembly is expected to cover 97.3% of the genome, despite being fragmented (65,586 scaffolds in total). From Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis, the genome assembly completeness has been estimated to be at 92.1%. With an average GC content of 33.2%, the GC content of bambara groundnut is similar to other legume species, particularly soybean (Glycine max) and common bean (Phaseolus vulgaris) (Chang et al. 2019). Long terminal repeat (LTR) mobile elements were predicted to be the most in abundant class (38.4%) of the

11.2.3 Bambara Groundnut Genome —Current Achievements First glance at the genome The genomes of a few Vigna species such as mung bean (Vigna radiata) and adzuki bean (Vigna angularis) have been sequenced and published (Kang et al., 2014; Yang et al., 2015). However, the first attempt in generating a genome sequence for bambara groundnut was by the African Orphan Crops Consortium (AOCC), as this crop is amongst the 101 selected nutritious African orphan food crop genomes to be

11

The Bambara Groundnut Genome

203

Fig. 11.5 Volcano plot showing the up- and downregulated genes of genotype ‘Tiga Nicuru’ by level of significance in response to drought (unpublished data)

transposable elements (TEs) identified in bambara groundnut. In both cultivated and wild soybean, a further characterisation found LTR/gypsy family to be predominant (Schmutz et al. 2010; Xie et al. 2019). On the other hand, short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), satellite and simple repeats accounted for less than 1% of the bambara groundnut genome (Chang et al. 2019). Coupled with the transcriptomic resources from different stages of leaf tissue and stem, 31,707 protein-coding genes were predicted with 84.2% belonging to 16,307 gene families (Chang et al., 2019). Amongst these, 7541 were transcription factors, predominantly of the basic helix–loop–helix (bHLH) type, occupying 10.5% of them (Chang et al. 2019). A total of 83.6 and 79.4% of these genes were found to have high similarity (1800 genomic and >850 transcriptomic SSRs. It is hoped that this encourages other researchers of underutilised crops; that even small resources can be mined for hundreds of potential SSR markers. Only a small number of genome and transcriptome resources are available for lablab currently. In the NCBI sequence read archive (SRA, https://www.ncbi.nlm.nih.gov/sra; as of July 2021), there were only five projects related to lablab, and 19 HTS files (11 transcriptome and 8 genome). These five projects relate to Chapman (2015), Chang et al. (2018; see below), an effort to sequence the cpDNA genome (Li et al. 2021), an unpublished study of the pod transcriptome to investigate different pod morphologies, and a study documenting reference genome sequences for hundreds of plants from a Botanic Garden (Liu et al. 2019). Although details of sample and accession/variety names are lacking from the pod RNAseq study, in total this represents transcriptomes from 3–8 different accessions/varieties and genomes from three different accessions/varieties.

242 Table 13.3 A summary of molecular marker studies investigating geographic and population genetic variation in lablab. Global analyses are listed first, and then chronologically. Studies of very small numbers of accessions, for example comparing a few breeding lines, are not included

B. L. Maass and M. A. Chapman Marker Type1

N markers2

Focus3

Study

RAPD

48

Global

Liu (1996)

RAPD

11

Global

Sultana et al. (2000)

AFLP

4

Global

Maass et al. (2005)

SSR

26

Global

Wang et al. (2007)

SSR*

5

Global

Robotham and Chapman (2017)

DArTseq

9320

Global

Sserumaga et al. (2021)

AFLP

3

India (+)

Venkatesha et al. (2007)

RAPD

25

India

Rai et al. (2011)

SSR

13

Asia (+)

Islam (2012)

AFLP

15

Kenya

Kimani et al. (2012)

SSR

4

Kenya

Shivachi et al. (2012)

RAPD

51

India

Saravanan et al. (2013)4

ISSR

15

India

-“-

SSR

10

India

-“-

SSR*

11

China/Kenya

Zhang et al. (2013)

SSR

29

India

Rai et al. (2016)

SSR

10

Kenya (+)

Kamotho et al. (2016)

SSR

8

Kenya (+)

Kamau et al. (2021)

SSR

13

Thailand

Amkul et al. (2021)

1

AFLP, Amplified fragment length polymorphism; DArTseq, Diversity Array Technology sequencing; ISSR, Inter Simple Sequence Repeat; RAPD, Random Amplified Polymorphic DNA; SSR, Simple Sequence Repeat; * SSR markers designed specifically from lablab (as opposed to studies using markers originally developed in other species) 2 The number of markers is not comparable among different marker types, for example one aflp primer pair might amplify 10–50 markers 3 The geographic focus of the study; + indicates some other accessions outside the focus were included 4 Saravanan et al. (2013) analysed and compared three different marker types on the same 39 accessions

Molecular markers and genetic mapping Understanding the genetic basis of adaptive phenotypes can be achieved through crossing divergent genotypes and observing segregation in the progeny. In lablab, this has demonstrated that some important phenotypic traits, for example photoperiod sensitivity, growth habit, and flower colour, appear to have a relatively simple genetic basis, being controlled by between one and three major loci (Keerthi et al. 2014, 2016). Traits under simple genetic control can relatively easily be selected and crossed between varieties in breeding studies, compared to traits with polygenic control.

Many agronomic traits are, however, under polygenic control, and so for these traits it is valuable to understand their genetic architecture . The quantitative trait loci (QTL) can be explored through applying molecular markers to segregating populations of individuals, giving an estimate of the (minimum) number of loci that control these traits, the magnitude of effect of each QTL and their genomic locations (Mauricio 2001). This information is valuable to select loci for breeding programmes, as knowledge of the genomic positions of the QTL can be applied to speed up the breeding process, for example using marker-assisted selection (Collard and Mackill 2008).

13

The Lablab Genome: Recent Advances and Future Perspectives

QTL mapping requires the generation of a linkage map; essentially molecular markers are assayed in a segregating population, for example an F2, F3 or recombinant inbred lines (RILs) and segregation patterns are used to identify the markers, which are located on the same linkage group (LG), and the distance between them (in terms of recombination distance). In lablab, the first linkage map was created by Konduri et al. (2000) who mapped 127 RFLPs and 91 RAPD loci in an F2 population from a cross between cultivated and wild lablab accessions. The markers coalesced into 17 LGs, greater than the haploid chromosome number of lablab, indicating some of the genome was not covered in this study, causing different regions of the same LGs not to remain unlinked. Extreme segregation distortion was observed; in one genome region paternal homozygotes were completely absent suggesting an incompatibility between genomic or cytoplasmic loci inherited from the two parents. Linkage maps can also be used to compare genome structure of different species if there are shared markers between the species being investigated. For example, using shared markers in tomato, eggplant, potato and Capsicum pepper (Wu and Tanksley 2010) were able to identify translocations, fusions and fissions as well as determine the approximate rate of these occurrences during the evolution of the Solanaceae. The lablab genetic map from Konduri et al. (2000) was compared to a genetic map of mung bean and revealed extensive collinearity and little evidence for large scale genomic reorganisation (Humphry et al. 2002). This was, however, only based on 65 shared markers and so smaller translocations and rearrangements would have been missed. Other lablab linkage maps have been generated and used to map QTL. Yuan et al. (2009) developed a mapping population and generated a linkage map using 122 RAPD and nine morphological markers. A range of important agronomic traits were mapped, including pod dimensions, flowering time and harvesting maturity period. All QTL had relatively small effect, none explained more than 10% of the phenotypic variance, which contrasts with the studies above where, for other agronomic traits,

243

only one to three major loci were responsible (Keerthi et al. 2014, 2016). Many of the QTL were also present in only one of the two sampling years indicating genotype x environment (GxE) interactions. In a follow-up, the same linkage map was used to map QTL for inflorescence traits (Yuan et al. 2011). In contrast to the 2009 work, some of the QTL were of large effect, explaining >20%, and in a few cases >50%, of the phenotypic variance. A substantial number were resolved in both sampling seasons indicating that these QTL exhibit reduced GxE effects.

13.3.2 Gene Expression in Lablab Resolving the genetic basis of adaptive phenotypes is an important step in identifying breeding targets for crop improvement. Often used methods include genome-wide association studies (GWAS), yet to be employed in lablab, and quantitative trait locus (QTL) mapping (see above). Another approach is to assay gene expression between important varieties (e.g. wild vs domesticated) or between treatments (e.g. drought vs control). In this way, differentially expressed genes can be identified, with the supposition that some of these are related to the phenotype or environmental differences being investigated. To identify genes involved in response to water stress, Yao et al. (2013) used suppression subtractive hybridization (SSH) to isolate mRNAs differentially expressed (DE) between water-stressed roots and well-watered roots. The mRNAs were sequenced and resulted in 1287 transcript sequences being shown to be expressed preferentially in the water-stressed roots or the well-watered roots. Using gene ontology (GO), functional classification of the DE transcripts revealed an over-representation of the terms phenylalanine metabolism, flavonoid biosynthesis and proline metabolism, all processes, which can be differentially regulated in response to water stress (Furlan et al. 2020; Ma et al. 2014). Using qRT-PCR (quantitative reverse transcription polymerase chain reaction), several genes

244

B. L. Maass and M. A. Chapman

were tested for timing of expression, resulting in some genes being identified as responding early in the stress. This would provide good breeding targets as the manipulation of these could enhance the early response to water stress, instead of later when damage may already have occurred. In follow-up analyses, the same group selected candidate loci from the above investigation and generated transgenic Arabidopsis to determine whether overexpression of the lablab gene could enhance response to water stress. In both cases, one involving a glycine-rich protein (Yao et al. 2016a) and the other a R2R3-MYB transcription factor (Yao et al. 2016b), overexpression in Arabidopsis caused an increased resilience to water and salt stress. An update to the initial SSH experiment was carried out more recently and resulted in further genes being identified as being differentially regulated under well-watered and water-stress conditions (Wang et al. 2018). Over 2700 transcripts were differentially regulated, of which 338 were associated with root development and/or drought response. These genes could be important breeding targets going forward.

13.4

A Lablab Reference Genome

The current reference genome for lablab (Chang et al. 2018) was generated through short read sequencing of multiple libraries of varied insert sizes. As is common when relying on only short read sequencing, the assembly of plant genomes is hampered by the inability to assemble across repetitive DNA (Tørresen et al. 2019), for example transposons, which make up a high proportion of some genomes (Lee and Kim 2014). The lablab reference genome is therefore quite highly fragmented. The reference genome is 395.5 Mb in length, comprising 93.5% of the estimated genome. The contig N50 (the contig length at which 50% of the genome is this length or larger) is 32.2 kb and the scaffold N50 621.4 kb. The genome is thought to be relatively complete, with 1341 (93.2%) of the 1440 BUSCOs (Benchmarking Universal Single-Copy

Orthologs; Simao et al. 2015) being present and 1258 (87.4%) being present and complete. From this reference genome, 20,946 genes are predicted, of which >98% were annotated based on similarity to related species’ proteomes (Chang et al. 2018). The average length of the genes and their coding sequences are 3696 bp and 1276 bp, respectively. Over 147 Mbp of repeat sequences are present, the majority being long terminal repeat (LTR) retrotransposons, making up 37.2% of the genome. Using the reference genome, many genes involved in important pathways such as nodulation/nitrogen fixation and biosynthesis of protein, starch and fatty acids have been identified based on putative orthology to these genes from soybean (Chang et al. 2018). Although this is a draft genome, and improvements in the assembly can be made in the future, it is an important resource. Chang et al. (2018) used the data from the lablab genome and other legumes to identify crop-specific and lineage-specific genes as well as genes shared across all legumes (Fig. 13.2). Development of a chromosome-scale genome sequence has recently been achieved, and a preprint has been released demonstrating an improvement of the contiguity of the genome by incorporating long read (Oxford Nanopore) sequencing (Njaci et al. 2022).

13.5

Future Goals and Prospects

A clear target should be the assembly and annotation of a more complete reference genome. The quality metrics for lablab (see above) are indicative of a fragmented genome, which can hamper future endeavours to resolve the genetic basis of important traits. Using long read sequencing and other 3rd generation sequencing technologies (see other chapters) should help to assemble the genome more completely, potentially into chromosome-scale scaffolds. From this, we will be in a stronger position to carry out comparative mapping, QTL mapping and cloning, and to identify selective sweeps during domestication, all of which will lead to

13

The Lablab Genome: Recent Advances and Future Perspectives

245

Fig. 13.2 Venn diagram demonstrating lineagespecific and shared groups of orthologs for five legume species: Lablab purpureus (L. pur), Faidherbia albida (F. alb), Glycine max (G. max), Medicago truncatula (M. tru) and Vigna subterranea (V. sub). Taken from Chang et al. (2018) under the terms of the Creative Commons CC BY license

the identification of candidate genes for agronomic and other important traits. More intense investigation of both the 2-seeded wild and cultivated Ethiopian types is also required, as well as the Indian accessions, which have variously been described as feral or wild accessions. These groups are relatively unknown, clearly underrepresented in seedbanks, and often overlooked in population genetic studies (but see above for exceptions), yet could represent untapped gene pools for beneficial traits.

13.6

Conclusions

Lablab is a typical underutilised crop in that it is stuck in a vicious circle; it is not included in government or research priorities, becoming further neglected at the expense of other crops, and there is slow (but promising) progress from breeding programmes, again causing it to be neglected by

farmers and consumers. This further results in neglect in terms of conservation and utilisation. Concerns made decades ago surrounding pest resistance and photo-insensitivity, for example, have not yet been addressed adequately. We also are concerned over the continued repetition of “old” knowledge, instead of stringent conclusions from current research towards building a more complete story of the uses and benefits of lablab germplasm. Many lablab reviews have not advanced the field and lack comparisons to other cultivated legumes, wherein there would be the opportunity to advocate the benefits of lablab over others and utilise the knowledge gained in other species to advance lablab research. Having reviewed the literature thoroughly (above and in the articles we cite), we hope that research avenues are clearer and future investigations can start to fill in the gaps, leading to a better understanding of this species and its potential for the future.

246 Acknowledgements We would like to thank various students and researchers in our laboratories and collaborators we have worked with or are currently, including Anastasia Kolesnikova for providing comments on this chapter.We acknowledge several researchers who provided personal comments and unique insight. The following are the affiliations for personal communications: Dr. S. Ramesh, University of Agricultural Sciences, Bengaluru, Karnataka, India, e-mail: [email protected]; Dr. Md. Tariqul Islam, BARI Gazipur, Bangladesh, e-mail: [email protected]; J. Chang and Dr. M. van Zonneveld, World Vegetable Center, Taiwan, emails: [email protected] and [email protected]; Dr. Kunyaporn Pipithsangchan, Genebank, Thailand, e-mail: [email protected]; Dr. Prakit Somta, Kasetsart University, Nakhon Pathom, Thailand, e-mail: [email protected]; Dr. M. G. Kinyua, Moi University, Eldoret, Kenya, e-mail: [email protected]; Dr. P. B. Venkataramana, NM-AIST, Arusha, Tanzania, e-mail: pavithravani.venkataramana@nm-aist. ac.tz; Dr. L. Guarino, Global Crop Diversity Trust, Bonn, Germany, e-mail: [email protected]; Dr. A. J. Clapham, UK, e-mail: [email protected]; and Dr. P. Vidigal, University of Lisbon, Portugal, e-mail: [email protected].

References Adesoji A, Oyebamiji N, Abubakar I (2020) Influence of incorporated lablab planted at various spacings on productivity of maize (Zea mays L.) varieties in northern Guinea savanna zone of Nigeria. Fudma J Sci 4(3):358–365. https://doi.org/10.33003/fjs-20200403-279 Ahmed MT, Miah MRU, Amin MR, Hossain MM (2015) Evaluation of some plant materials against pod borer infestation in country bean with reference to flower production. Ann Bangladesh Agric 19:71–78 Al-Snafi AE (2017) The pharmacology and medical importance of Dolichos lablab (Lablab purpureus)—a review. IOSR J Pharmacy 7(2):22–30 Ali M, Hasan MM, Ahmad Q (2011) Karyotype analysis in lignosus bean (Dipogon lignosus) and lablab bean (Lablab purpureus). J Bangladesh Agric Univ 9 (1):27–36 Amkul K, Sukbang JM, Somta P (2021) Genetic diversity and structure of landrace of lablab (Lablab purpureus (L.) Sweet) cultivars in Thailand revealed by SSR markers. Breed Sci 71(2):176–183. https://doi.org/10. 1270/jsbbs.20074 Amole TA, Oduguwa BO, Shittu O, Famakinde A, Okwelum N, Ojo VOA, Dele PA, Idowu OJ, Ogunlolu B, Adebiyi AO (2013a) Herbage yield and quality of Lablab purpureus during the late dry season in western Nigeria. Slovak J Animal Sci 46(1):22–30 Amole TA, Oduguwa BO, Jolaosho AO, Arigbede MO, Olanite JA, Dele PA, Ojo VOA (2013b) Nutrient

B. L. Maass and M. A. Chapman composition and forage yield, nutritive quality of silage produced from maize-lablab mixture. Malaysian J Anim Sci 16(2):45–61 Angeles JGC, Villanueva JC, Uy LYC, Mercado SMQ, Tsuchiya MCL, Lado JP, Angelia MRN, BercansilClemencia MCM, Estacio MAC, Torio MAO (2021) Legumes as functional food for cardiovascular disease. Appl Sci 11(12):5475. https://doi.org/10.3390/ app11125475 Ariina MMS, Warade SD, Kanaujia SP, Gadi Y, Kayia AA, Chandrakumar Singh M (2021) Seed protein profiling, an efficient method in diversity analysis of dolichos bean (Lablab purpureus L. Sweet.) from Northeast India. Chem Sci Rev Lett 10 (38):261–268. https://doi.org/10.37273/chesci.cs205 108201 Armstrong KL, Albrecht KA, Lauer JG, Riday H (2008) Intercropping corn with lablab bean, velvet bean, and scarlet runner bean for forage. Crop Sci 48(1):371– 379. https://doi.org/10.2135/cropsci2007.04.0244 AVRDC (World Vegetable Center) (2021). How to order seed: handling fees. Shanhua, Tainan, Taiwan. https:// avrdc.org/seed/seeds/ Ayuning-Tyas DW, Sjofjan O, Eka-Radiati L (2014) Evaluation protein digestibility, metabolic energy of autoclaved komak beans (Lablab purpureus L Sweet) on broiler. J World’s Poult Res 4(3):60–63 Bakari AE, Pauline NM (2020) Trade-offs of Dolichos lablab production in the context of the changing climate in semi-arid areas of Tanzania. Tanzania J Agric Sci 19(2):188–202 Bandyopadhyay B, Santra SC (2007) In situ 4C DNA content study of twenty-nine hybrid varieties of some selected taxa of tribe Phaseolae (Fabaceae). Legume Research-an Internat J 30(4):235–242 Bhardwaj HL, Hamama AA (2019) A preliminary evaluation of lablab biomass productivity in Virginia. J Agric Sci 11(13):42–47 Biswas SC (2015) Summer country bean cultivation raises farm income in Bangladesh. Feedback Field (AVRDC the World Vegetable Center) 28:3–4. https://avrdc.org/ wpfb-file/f0217-pdf/ Bobos I, Fedosy I, Zavadska O, Tonha O, Olt J (2019) Optimization of plant densities of dolichos (Dolichos lablab L. var. lignosus) bean in the right-bank of forest-steppe of Ukraine. Agron Res 17(6):2195– 2202. https://doi.org/10.15159/AR.19.223 Brilhante M, Varela E, P Essoh A, Fortes A, Duarte MC, Monteiro F, Maria M, Romeiras MM (2021) Tackling food insecurity in Cabo Verde Islands: the nutritional, agricultural and environmental values of the legume species. Foods 10(2):206.https://doi.org/10.3390/foods 10020206 Cardona C, Kornegay J, Posso CE, Morales F, Ramirez H (1990) Comparative value of four arcelin variants in the development of dry bean lines resistant to the Mexican bean weevil. Entomol Exp Appl 56:197–206. https://doi.org/10.1111/j.1570-7458.1990.tb01397.x Chakoma I, Gwiriri LC, Manyawu G, Dube S, Shumba M, Gora A (2016) Forage seed production

13

The Lablab Genome: Recent Advances and Future Perspectives

and trade as a pathway out of poverty in the smallholder sector: lessons from the Zimbabwe Crop Livestock Integration for Food Security (ZimCLIFS) project. Afr J Range Forage Sci 33(3):181–184. https://doi.org/10.2989/10220119.2016.1173097 Chang Y, Liu H, Liu M, Liao X, Sahu SK, Fu Y et al (2018) The draft genomes of five agriculturally important African orphan crops. GigaScience 8(3). https://doi.org/10.1093/gigascience/giy152 Chapman MA (2015) Transcriptome sequencing and marker development for four underutilized legumes. Appl Plant Sci 3(2):1400111. https://doi.org/10.3732/ apps.1400111 Chapman MA (2019) Optimizing depth and type of highthroughput sequencing data for microsatellite discovery. Appl Plant Sci 7(11):e11298. https://doi.org/10. 1002/aps3.11298 Chavan SS, Shinde AK, Burondkar MM, Sawardekar SV, Gimhavnekar V (2021a) Physiological analysis for growth and yield of lablab bean (Lablab purpureus L. Sweet) grown under residual moisture. J Pharmacognosy Phytochem 10(1):2094–2098 Chavan SS, Shinde AK, Burondkar MM, Sawardekar SV, Gimhavnekar V (2021b) Identifying drought tolerant genotypes of lablab bean (Lablab purpureus L. Sweet) grown under residual moisture. J Pharmacognosy and Phytochem 10(1):2598–2601 Chewaka-Tura D, Tadesse-Mosisa M (2017) Effect of processing on anti-nutritional factors and sensory qualities of ‘Hepho’, a black climbing bean (Lablab purpureus L.) flour. Food Sci Qual Manage 60:22–27. https://iiste.org/Journals/index.php/FSQM/article/view/ 35338 Clapham AJ (2019) The Archaeobotany of Nubia. In: Raue D (ed) Handbook of ancient Nubia, De Gruyter, Berlin/Boston, pp 83–102. https://doi.org/10.1515/ 9783110420388 Collard BCY, Mackill DJ (2008) Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans Royal Soc B Biol Sci 363(1491):557–572. https://doi.org/10.1098/rstb. 2007.2170 Cullen BR, Hill JO (2006) A survey of the use of lucerne, butterfly pea and lablab in ley pastures in the mixedfarming systems of northern Australia. Trop Grasslands 40(1):24–32 Ćupina B, Mikić A, Krstić Ð, Vujić S, Zorić L, Ðorđević V, Erić P (2017) Mixtures of legumes for forage production. In: Murphy-Bokern D, Stoddard FL, Watson CA (eds) Legumes in cropping systems, CABI, pp 193–208. https://doi.org/10.1079.9781780644981. 0193 Davari SA, Gokhale NB, Palsande VN, Kasture MC (2018) Wal (Lablab purpureus L.): An unexploited potential food legumes. Internat J Chem Stud 6(2): 946–949 D’souza MR, Devaraj VR (2011) Specific and nonspecific responses of Hyacinth bean (Dolichos lablab) to drought stress. Indian J Biotechnol 10:130–139

247

Deka RK, Sarkar CR (1990) Nutrient composition and antinutritional factors of Dolichos lablab L. seeds. Food Chem 38(4):239–246. Devaraj VR (ed) (2016a) Hyacinth bean: a gem among legumes—State of the art in Lablab purpureus research. Legume Perspectives 13:1–41 (July 2016a). https://www.legumesociety.org/2019/12/02/legumeperspectives/ Devaraj VR (2016b) Economic importance of hyacinth bean (Lablab purpureus L.): an Indian perspective. Legume Perspectives 13:37–38 (July 2016). https://www. legumesociety.org/2019/12/02/legume-perspectives/ Dheer M, Sharma RA, Gupta VP, Punia SS (2014) Cytomorphological investigations in colchicineinduced polyploids of Lablab purpureus (L.) Sweet. Indian J Biotechnol 13:347–355 Douglas MR, Chang J, Begum K, Subramanian S, Tooker JF, Alam SN, Ramasamy S (2018) Evaluation of biorational insecticides and DNA barcoding as tools to improve insect pest management in lablab bean (Lablab purpureus) in Bangladesh. J Asia-Pacific Entomol 21(4):1326–1336 Duke JA, Kretschmer Jr AE, Reed CF, Weder JKP (1981) Lablab purpureus (L.) Sweet. In: Duke JA (ed) Handbook of legumes of world economic importance, pp. 102–106. Plenum Press, New York, USA and London, UK Dörr de Quadros P, Martin AR, Zhalnina K, Dias R, Giongo A, Fulthorpe R, Bayer C, Triplett EW, Camargo FA de O (2019) Lablab purpureus influences soil fertility and microbial diversity in a tropical maize-based no-tillage system. Soil Syst 3(3):50. https://doi.org/10.3390/soilsystems3030050 Ewansiha SU, Ogedegbe SA, Falodun EJ (2016) Utilization potentials of lablab (Lablab purpureus (L.) Sweet) and the constraints of field pests and diseases in Nigeria. Agro-Science 15(1):11–16. https://doi.org/ 10.4314/as.v15i1.3 FAO (2017) FAO/INFOODS Global database for pulses on dry matter basis. Version 1.0 – PulsesDM1.0. Rome, FAO, Rome Italy. https://www.fao.org/infoods/infoods/ tables-and-databases/faoinfoods-databases/en/ Furlan AL, Bianucci E, Giordano W, Castro S, Becker DF (2020) Proline metabolic dynamics and implications in drought tolerance of peanut plants. Plant Physiol Biochem 151:566–578. https://doi.org/10.1016/j.plaphy. 2020.04.010 Guretzki S, Papenbrock J (2014) Characterization of Lablab purpureus regarding drought tolerance, trypsin inhibitor activity and cyanogenic potential for selection in breeding programmes. J Agron Crop Sci 200 (1):24–35. https://doi.org/10.1111/jac.12043 Habib HM, Theuri SW, Kheadr EE, Mohamed FE (2017) Functional, bioactive, biochemical, and physicochemical properties of the Dolichos lablab bean. Food Funct 8(2):872–880. https://doi.org/10.1039/c6fo01162d Haq N, Saifullah M, Chapman MA (2016) The humble lablab bean in Bangladesh: home garden to market. Agric Dev 29:13–15

248 Harouna DV, Mohammed EMI (2020) Biotic and abiotic stress responses of hyacinth bean (Lablab purpureus) and soybean (Glycine max): a mini-review. In: Amaresan N, Murugesan S, Kumar K, Sankaranarayanan A (eds) Microbial mitigation of stress response of food legumes. CRC Press, Boca Raton, pp 115–120 Hossain S, Ahmed R, Bhowmick S, Al Mamun A, Hashimoto M (2016) Proximate composition and fatty acid analysis of Lablab purpureus (L.) legume seed: implicates to both protein and essential fatty acid supplementation. SpringerPlus 5:1899. https://doi.org/ 10.1186/s40064-016-3587-1 Humphry ME, Konduri V, Lambrides CJ, Magner T, McIntyre CL, Aitken EAB et al (2002) Development of a mungbean (Vigna radiata) RFLP linkage map and its comparison with lablab (Lablab purpureus) reveals a high level of colinearity between the two genomes. Theor Appl Genet 105(1):160–166. https://doi.org/10. 1007/s00122-002-0909-1 Islam MN, Rahman MZ, Ali R, Azad AK, Sultan MK (2014) Diversity analysis and establishment of core subsets of hyacinth bean collection of Bangladesh. Pakistan J Agric Res 27(2):99–109 Islam MT (2008) Morpho-agronomic diversity of hyacinth bean (Lablab purpureus (L.) Sweet) accessions from Bangladesh. Plant Genet Resour Newsl 156:73–78 Islam MT (2012) Morpho-molecular characterization, diversity analysis and in vitro regeneration of hyacinth bean (Lablab purpureus L. Sweet). Ph.D. thesis, Department of Biotechnology, Bangladesh Agricultural University, Mymensingh, Bangladesh Janarthanan S, Suresh P, Radke G, Morgan TD, Oppert B (2008) Arcelins from an Indian wild pulse, Lablab purpureus, and insecticidal activity in storage pests. J Agric Food Chem 56(5):1676–1682. https://doi.org/ 10.1021/jf071591g Kabirizi J, Mpairwe D, Mutetikka D (2005) The effect of intercropping maize with lablab on grain and fodder production in small holder dairy farming systems in Masaka district, Uganda. Uganda J Agric Sci 11:51–56 Kamatchi KB, Soris PT, Mohan VR, Vadivel V (2010) Nutrient and chemical evaluation of raw seeds of five varieties of Lablab purpureus (L.) Sweet. Adv Bio Res 1(1):44–53 Kamau EM, Kinyua MG, Waturu CN, Kiplagat O, Wanjala BW, Kariba RK et al (2021) Diversity and population structure of local and exotic Lablab purpureus accessions in Kenya as revealed by microsatellite markers. Global J Mol Biol 3:8 Kamotho GN, Kinyua MG, Muasya RM, Gichuki ST, Wanjala BW, Kimani EN, Kamau EN (2016) Assessment of genetic diversity of Kenyan dolichos bean (Lablab purpureus L. Sweet) using simple sequence repeat (SSR) markers. Internat J Agric, Environ Bioresearch 1(1):26–43 Kamotho GN, Muasya RM, Kinyua MG (2017) Assessment of phenotypic diversity of Kenyan dolichos bean (Lablab purpureus L. Sweet) germplasm based on

B. L. Maass and M. A. Chapman morphological markers. Internat J Agric, Environ Bioresearch 2(6):1–22 Kankwatsa P, Muzira R (2018) Agronomic performance and sensory evaluation of lablab (Lablab purpureus L. Sweet) accessions for human consumption in Uganda. Open Access Library J 5:e4481. https://doi.org/10. 4236/oalib.1104481 Keerthi CM, Ramesh S, Byregowda M, Rao AM, Prasad BSR, Vaijayanthi PV (2014) Genetics of growth habit and photoperiodic response to flowering time in dolichos bean (Lablab purpureus (L.) Sweet). J Genet 93(1):203–206. https://doi.org/10.1007/s12041014-0336-5 Keerthi CM, Ramesh S, Byregowda M, Rao AM, Prasad BSR, Vaijayanthi PV (2016) Further evidence for the genetic basis of qualitative traits and their linkage relationships in dolichos bean (Lablab purpureus L.). J Genet 95(1):89–98. https://doi.org/10. 1007/s12041-015-0610-1 Keerthi CM, Ramesh S, Byregowda M, Rao AM, Reena GM (2018) Photo-thermal effects on time to flowering in dolichos bean (Lablab purpureus (L.) Sweet) var. lignosus. Curr Sci 115(7):1320–1327. https://doi.org/10.18520/cs/v115/i7/1320-1327 Khan AU, Choudhury MAR, Ferdous J, Islam MS, Rahman MS (2019) Varietal performances of country beans against insect pests in bean agroecosystem. Bangladesh J Entomol 29(2):27–37 Khan AU, Choudhury MAR, Talucder MSA, Hossain MS, Ali S, Akter T, Ehsanullah M (2020) Constraints and solutions of country bean (Lablab purpureus L.) production: a review. Acta Entomol Zool 1(2):37–45 Kilonzi SM (2020) Physicochemical and functional characterisation of three lablab bean (Lablab purpureus L. (Sweet) varieties grown in Kenya. Ph.D. thesis, Jomo Kenyatta University of Agriculture and Technology (JKUAT), Nairobi, Kenya. 160 pp. http:// ir.jkuat.ac.ke/handle/123456789/5409 Kilonzi SM, Makokha AO, Kenji GM (2017) Physical characteristics, proximate composition and antinutritional factors in grains of lablab bean (Lablab purpureus) genotypes from Kenya. J Appl Biosci 114:11289–11298. https://doi.org/10.4314/jab.v114i1.2 Kimani EN, Wachira FN, Kinyua MG (2012) Molecular diversity of Kenyan Lablab (Lablab purpureus (L.) Sweet) accessions using amplified fragment length polymorphism markers. Am J Plant Sci 3:313–321 Kimani E, Matasyoh J, Kinyua M, Wachira FN (2019) Characterisation of volatile compounds and flavour attributes of Lablab purpureus bean accessions. Afr J Biotechnol 18(24):518–530. https://doi.org/10.5897/ AJB2017.15993 Kokila S, Myrene RD, Devaraj VR (2014) Response of Lablab purpureus (Hyacinth bean) cultivars to drought stress. Asian J Plant Sci Res 4(5):48–55 Konduri V, Godwin ID, Liu CJ (2000) Genetic mapping of the Lablab purpureus genome suggests the presence of ‘cuckoo’ gene(s) in this species. Theor Appl Genet 100 (6):866–871. https://doi.org/10.1007/s001220051363

13

The Lablab Genome: Recent Advances and Future Perspectives

Lee S-I, Kim N-S (2014) Transposable elements and genome size variations in plants. Genomics Inform 12 (3):87–97. https://doi.org/10.5808/GI.2014.12.3.87 Letting FK, Venkataramana PB, Ndakidemi PA (2021) Breeding potential of lablab [Lablab purpureus (L.) Sweet]: a review on characterization and bruchid studies towards improved production and utilization in Africa. Genet Resour Crop Evol 68(8):3081–3101. https://doi.org/10.1007/s10722-021-01271-9 Li N, Bai JQ, Gao S, Yang L, Li J, Du SB et al (2021) The complete molecular sequence of chloroplast genome of Lablab purpureus (L.) Sweet. Mitochondrial DNA B Resour 6(3):758–9. https://doi.org/10.1080/ 23802359.2021.1878958 Liu CJ (1996) Genetic diversity and relationships among Lablab purpureus genotypes evaluated using RAPD as markers. Euphytica 90(1):115–119. https://doi.org/10. 1007/BF00025167 Liu H, Wei J, Yang T, Mu W, Song B, Yang T et al. (2019) Molecular digitization of a botanical garden: high-depth whole-genome sequencing of 689 vascular plant species from the Ruili Botanical Garden. GigaScience 8(4):giz007. https://doi.org/10.1093/ gigascience/giz007 Ma D, Sun D, Wang C, Li Y, Guo T (2014) Expression of flavonoid biosynthesis genes and accumulation of flavonoid in wheat leaves in response to drought stress. Plant Physiol Biochem 80:60–66. https://doi. org/10.1016/j.plaphy.2014.03.024 Maass BL (2016) Domestication, origin and global dispersal of Lablab purpureus (L.) Sweet (Fabaceae): current understanding. Legume Perspectives 13:5–8. https://www.legumesociety.org/2019/12/02/legumeperspectives/ Maass BL, Pengelly BC (2019) Tropical and subtropical forage germplasm conservation and science on their deathbed! 1 A Journey to Crisis. Outlook Agric 48 (3):198–209. https://doi.org/10.1177/0030727019 867961 Maass BL, Usongo MF (2007) Changes in seed characteristics during the domestication of the lablab bean (Lablab purpureus (L.) Sweet: Papilionoideae). Crop Pasture Sci 58(1):9–19. https://doi.org/10.1071/ AR05059 Maass BL, Jamnadass RH, Hanson J, Pengelly BC (2005) Determining sources of diversity in cultivated and wild Lablab purpureus related to provenance of germplasm by using amplified fragment length polymorphism. Genet Resour Crop Evol 52(6):683–695. https://doi.org/10.1007/s10722-003-6019-3 Maass BL, Knox MR, Venkatesha SC, Angessa TT, Ramme S, Pengelly BC (2010) Lablab purpureus—a crop lost for Africa? Trop Plant Biol 3(3):123–135. https://doi.org/10.1007/s12042-010-9046-1 Maass BL, Robotham O, Chapman MA (2017) Evidence for two domestication events of hyacinth bean (Lablab purpureus (L.) Sweet): a comparative analysis of population genetic data. Genet Resour Crop Evol 64

249

(6):1221–1230. https://doi.org/10.1007/s10722-0160431-y Mauricio R (2001) Mapping quantitative trait loci in plants: uses and caveats for evolutionary biology. Nat Rev Genet 2:370–381 Miller NR, Mariki W, Nord A, Snapp S (2018) Cultivar selection and management strategies for Lablab purpureus (L.) Sweet in Africa. In: Leal Filho W (ed) Handbook of Climate Change Resilience. Springer, Cham, pp 1–14. https://doi.org/10.1007/ 978-3-319-71025-9_102-1 Minde JJ, Venkataramana PB, Matemu AO (2021) Dolichos lablab—an underutilized crop with future potentials for food and nutrition security: a review. Crit Rev Food Sci Nutr 61(13):2249–2261. https://doi. org/10.1080/10408398.2020.1775173 Missanga JS, Venkataramana PB, Ndakidemi PA (2021) Recent developments in Lablab purpureus genomics: A focus on drought stress tolerance and use of genomic resources to develop stress‐resilient varieties. Legume Sci e99. https://doi.org/10.1002/leg3.99 Morris JB (2009) Morphological and reproductive characterization in hyacinth bean, Lablab purpureus (L.) Sweet germplasm with clinically proven nutraceutical and pharmaceutical traits for use as a medicinal food. J Dietary Suppl 6(3):263–279. https://doi.org/10.1080/ 19390210903070830 Morris JB, Grusak MA, Wang ML, Tonnis B, Kuang HX (2013) Mineral, flavonoid, and fatty acid concentrations in ten diverse Lablab purpureus (L.) Sweet accessions. In: Kuang HX (ed) Phytochemicals: Occurrence in nature, health effects and antioxidant properties. Nova Science, New York, pp 219–224 Mthembu BE, Everson TM, Everson CS (2018) Intercropping maize (Zea mays L.) with lablab (Lablab purpureus L.) for sustainable fodder production and quality in smallholder rural farming systems in South Africa. Agroecology Sustain Food Syst 42(4):362– 382 Naeem M, Shabbir A, Ansari AA, Aftab T, Khan MMA, Uddin M (2020) Hyacinth bean (Lablab purpureus L.) —an underutilised crop with future potential. Scientia Horticulturae 272:109551. https://doi.org/10.1016/j. scienta.2020.109551 Nascente AS, Dambiro J, Constantino C (2017) Effects of grain-producing cover crops on rice grain yield in Cabo Delgado, Mozambique. Revista Ceres 64:607–615. https://doi.org/10.1590/0034-737X201764060007 Nath DD, Islam MS, Akter T, Ferdousi J (2019) Morphology and yield potentials of lablab bean genotypes grown in early Kharif season. Asian J Agric Hort Res 4(4):1–5 Ngailo JA, Kaihura FBS, Baijukya FP, Kiwambo BJ (2003) Changes in land use and its impact on agricultural biodiversity in Arumeru, Tanzania. In: Kaihura F, Stocking M (eds) Agricultural biodiversity in smallholder farms of East Africa. United Nations University Press, Tokyo, pp 145–158

250 Ngure D, Kinyua M, Kiplagat O (2021a) Morphological and microsatellite characterization of improved Lablab purpureus genotypes. J Plant Breeding Crop Sci 13(2):23–34 Ngure D, Kinyua M, Kiplagat O (2021b) Evaluation of cooking time and organoleptic traits of improved Dolichos (Lablab purpureus (L.) Sweet) genotypes. Afr J Food Sci 15(5):218–225. https://doi.org/10. 5897/AJFS2021.2098 Njaci I, Waweru B, Kamal N, Shehabu Muktar M, Fisher D, Gundlach H et al (2022) Chromosome-scale assembly of the lablab genome—A model for inclusive orphan crop genomics. BioRxiv. BIORXIV/2022/ 491073; https://www.biorxiv.org/content/10.1101/ 2022.05.08.491073v2 Njarui DMG, Mureithi JG (2010) Evaluation of lablab and velvet bean fallows in a maize production system for improved livestock feed supply in semiarid tropical Kenya. Anim Prod Sci 50(3):193–202. https://doi.org/ 10.1071/AN09137 Nord A, Miller NR, Mariki W, Drinkwater L, Snapp S (2020) Investigating the diverse potential of a multipurpose legume, Lablab purpureus (L.) Sweet, for smallholder production in East Africa. PloS one 15(1):e0227739. https://doi.org/10.1371/journal.pone. 0227739 Northup BK, Rao SC (2015) Green manure and forage potential of lablab in the US Southern Plains. Agron J 107(3):1113–1118. https://doi.org/10.2134/agronj14. 0455 Nyawade SO, Gachene CK, Karanja NN, Gitari HI, Schulte-Geldermann E, Parker ML (2019) Controlling soil erosion in smallholder potato farming systems using legume intercrops. Geoderma Reg 17:e00225. https://doi.org/10.1016/j.geodrs.2019.e00225 Pandey D, Adhiguru P, Pandey A, Singh PK (2021) An underexplored diversity in “Yoksik Peron” [Lablab Purpureus (L.) Sweet] in East Siang, Arunachal Pradesh, India. Preprint, 11 pp. https://doi.org/10. 21203/rs.3.rs-713936/v1 Patil SM, Kauthale VK, Navale YP, Nalawade AD (2018) Variability study in hyacinth bean [Lablab purpureus (L.) Sweet] landraces from tribal blocks of Maharashtra, India. Crop Res 53(5&6):252–256. https://doi.org/ 10.31830/2454-1761.2018.0001.29 Pengelly BC, Maass BL (2001) Lablab purpureus (L.) Sweet–diversity, potential use and determination of a core collection of this multi-purpose tropical legume. Genet Resour Crop Evol 48(3):261–272. https://doi. org/10.1023/A:1011286111384 Pengelly BC, Maass BL (2019) Tropical and subtropical forage germplasm conservation and science on their deathbed! 2. Genebanks, FAO, and donors must take urgent steps to overcome the crisis. Outlook Agric 48 (3):210–219. https://doi.org/10.1177/0030727019867 955 Philip T (1982) Induced tetraploidy in Dolichos lablab Linn. Current Sci 51(19):945 Pramod C, Sudhakaran N, Harindran J (2020) Antiinflammatory effects of Lablab purpureus Linn in

B. L. Maass and M. A. Chapman polyphenolic fraction from methanolic leaf extract on experimental animal model. Pharma Innov J 9(2):338– 344 Punyalue A, Jongjaidee J, Jamjod S, Rerkasem B (2018) Legume intercropping to reduce erosion, increase soil fertility and grain yield, and stop burning in highland maize production in Northern Thailand. Chiang Mai Univ J Nat Sci 17(4):265–274. https://doi.org/10. 12982/CMUJNS.2018.0019 Purwanti E, Prihanta W, Fauzi A (2019a) The diversity of seed size and nutrient content of Lablab bean from three locations in Indonesia. Internat J Adv Eng Manage Sci 5(6):395–402. https://doi.org/10.22161/ ijaems.5.6.7 Purwanti E, Prihanta W, Fauzi A (2019b) Nutritional content characteristics of Dolichos lablab L. accessions in effort to investigate functional food source. In: 6th International Conference on Community Development (ICCD 2019b). Atlantis Press, Amsterdam, p 166–170. https://doi.org/10.2991/iccd-19.2019b.45 Raghu BR, Samuel DK, Mohan N, Ahora TS (2018) Dolichos bean: An underutilized and unexplored crop with immense potential. Internat J Recent Adv Multi Res 5(12):4338–4341 Rai N, Kumar S, Singh RK, Rai KK, Tiwari G, Kashyap SP et al (2016) Genetic diversity in Indian bean (Lablab purpureus) accessions as revealed by quantitative traits and cross-species transferable SSR markers. Indian J Agric Sci 86(9):1193–1200 Rai N, Singh PK, Rai AC, Rai VP, Singh M (2011) Genetic diversity in Indian bean (Lablab purpureus) germplasm based on morphological traits and RAPD markers. Indian J Agric Sci 81(9):801–806 Rai KK, Rai N, Pandey-Rai S (2021) Unlocking pharmacological and therapeutic potential of hyacinth bean (Lablab purpureus L.): role of OMICS based biology, biotic and abiotic elicitors. In: Legumes, Intech Open Book Series, pp 1–33. https://www. intechopen.com/online-first/77832 Rai KK, Rai N, Rai SP (2018a) Recent advancement in modern genomic tools for adaptation of Lablab purpureus L to biotic and abiotic stresses: present mechanisms and future adaptations. Acta Physiol Plant 40(9):1–29. https://doi.org/10.1007/s11738-018-2740-6 Rai KK, Rai N, Rai SP (2018b) Investigating the impact of high temperature on growth and yield of Lablab purpureus L. inbred lines using integrated phenotypical, physiological, biochemical and molecular approaches. Indian J Plant Physiol 23(2):209–226 Rai N, Rai KK, Tiwari G, Kumar S (2014) Nutritional and antioxidant properties and their inter-relationship with pod characters in an under-exploited vegetable, Indian bean (Lablab purpureus). Indian J Agric Sci 84 (9):1051–1055. https://doi.org/10.5897/AJPS12.059 Ram Bahadur KC, Joshi BK, Dahal SP (2016) Diversity analysis and physico-morphological characteristics of indigenous germplasm of Lablab bean. J Nepal Agric Res Counc 2:15–21 Ramesh S, Byregowda M (2016) Dolichos bean (Lablab purpureus L. Sweet var. lignosus) genetics and

13

The Lablab Genome: Recent Advances and Future Perspectives

breeding–present status and future prospects. Mysore J Agric Sci 50(3):481–500 Rana R, Sayem ASM, Sabuz AA, Rahman M, Hossain A (2021) Effect of lablab bean (Lablab purpureus L.) seed flour on the physicochemical and sensory properties of biscuits. Int J Food Sci Agric 5(1):52– 57. https://doi.org/10.26855/ijfsa.2021.03.008 Rapholo E, Odhiambo JJ, Nelson WC, Rötter RP, Ayisi K, Koch M, Hoffmann MP (2020) Maize–lablab intercropping is promising in supporting the sustainable intensification of smallholder cropping systems under high climate risk in southern Africa. Expl Agric 56(1):104–117. https://doi.org/10.1017/S001447971 9000206 Robertson CC (1997) Black, white, and red all over: Beans, women, and agricultural imperialism in twentieth-century Kenya. Agric Hist 71:259–299 Robotham O, Chapman M (2017) Population genetic analysis of hyacinth bean (Lablab purpureus (L.) Sweet, Leguminosae) indicates an East African origin and variation in drought tolerance. Genet Resour Crop Evol 64(1):139–148. https://doi.org/10.1007/s10722015-0339-y Sahay G, Shukla P (2015) Cytological investigations of Cowpea (Vigna unguiculata (L.) Walp) and Sem (Lablab purpureus (L.) Sweet) two major fodder legumes. In: Proc 23rd Internat Grassld Congr (Sustainable use of Grassland Resources for Forage Production, Biodiversity and Environmental Protection), New Delhi, India 20–24 Nov 2015. https:// uknowledge.uky.edu/cgi/viewcontent.cgi?article= 2066&context=igc Saravanan S, Shanmugasundaram P, Senthil N, Veerabadhiran P (2013) Comparison of genetic relatedness among Lablab bean (Lablab purpureus (L.) Sweet genotypes using DNA markers. Int J Integr Biol 14:23–30 Schaaffhausen RV (1963) Dolichos lablab or hyacinth bean: its uses for feed, food and soil improvement. Econ Bot 17(2):146–153. https://doi.org/10.1007/ BF02985365 Sen NK, Marimuthu KM (1960) Colchiploids of Dolichos lablab L. Caryologia 13(2):411–429. https://doi.org/ 10.1080/00087114.1960.10797090 Sennhenn A, Odhiambo JJO, Maass BL, Whitbread AM (2017a) Considering effects of temperature and photoperiod on growth and development of Lablab purpureus (l.) Sweet in the search of short-season accessions for smallholder farming systems. Expl Agric 53(3):375–395. https://doi.org/10.1017/S0014 479716000429 Sennhenn A, Njarui DMG, Maass BL, Whitbread AM (2017b) Understanding growth and development of three short-season grain legumes for improved adaptation in semi-arid Eastern Kenya. Crop Pasture Sci 68 (5):442–456. https://doi.org/10.1071/CP16416 Shaahu DT, Ikurior SA, Carew SN (2017) Effect of decorticating and cooking lablab seeds on performance and cost of producing table rabbits. Internat J Biotechnol Food Sci 5(2):18–22

251

Shaahu DT, Kaankuka FG, Okpanachi U (2015) Proximate, amino acid, anti-nutritional factor and mineral composition of different varieties of raw Lablab purpureus seeds. Intl J Sci Technol Res 4:157–161 She CW, Jiang XH (2015) Karyotype analysis of Lablab purpureus (L.) Sweet using fluorochrome banding and fluorescence in situ hybridisation with rDNA probes. Czech J Genetics and Plant Breeding 51(3):110–116 Shibli MM, Rasul MG, Islam AKM, Saikat MMH, Haque MM (2021) Genetic diversity of country bean (Lablab purpureus) genotypes collected from the coastal regions of Bangladesh. J Hortic and Postharvest Res 4(2):219–230. https://doi.org/10.22077/jhpr. 2020.3282.1135 Shivachi A, Kiplagat K, Kinyua G (2012) Microsatellite analysis of selected Lablab purpureus genotypes in Kenya. Rwanda J 28:39–52. https://doi.org/10.4314/rj. v28i1.3 Shivashankar G, Kulkarni RS (1989) Lablab purpureus. pp 48–50 in: van der Maesen LJG, Sadikin Somaatmadja (eds) Plant resources of South-east Asia (PROSEA), no. 1, Pulses. Pudoc, Wageningen, The Netherlands. https:// www.prota4u.org/prosea/view.aspx?id=2 Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinform 31(19):3210–3212. https:// doi.org/10.1093/bioinformatics/btv351 Singh A, Abhilash PC (2019) Varietal dataset of nutritionally important Lablab purpureus (L.) Sweet from Eastern Uttar Pradesh, India. Data Brief 24:103935. https://doi.org/10.1016/j.dib.2019.103935 Singh V, Kudesia R (2020) Review on taxonomical and pharmacological status of Dolichos lablab. Curr Trends Biotechnol Pharm 14(2):229–235. https://doi. org/10.5530/ctbp.2020.2.23 Sipahli S, Dwarka D, Amonsou E, Mellem J (2021) In vitro antioxidant and apoptotic activity of Lablab purpureus (L.) Sweet isolate and hydrolysates. Food Sci Technol (Campinas) 42:e55220. https://doi.org/10. 1590/fst.55220 Smith GR, Rouquette FM, Pemberton IJ (2008) Registration of ‘Rio Verde ’lablab. J Plant Registrations 2 (1):15. https://doi.org/10.3198/jpr2007.03.0164crc Snapp S, Roge P, Okori P, Chikowo R, Peter B, Messina J (2019) Perennial grains for Africa: possibility or pipedream? Expl Agric 55(2):251–272. https://doi. org/10.1017/S0014479718000066 Soetan KO (2012) Comparative evaluation of phytochemicals in the raw and aqueous crude extracts from seeds of three Lablab purpureus varieties. Afr J Pl Sci 6(15):410–415. https://doi.org/10.5897/AJPS12.059 Sonali A, Manju V, Ashwin K (2015) Comparative study of Indian varieties of Lablab and field bean for phenotypic and nutritional traits. Legume Genomics Genetics 6(3):1–7. https://doi.org/10.5376/lgg.2015. 06.0003 Sserumaga JP, Kayondo SI, Kigozi A, Kiggundu M, Namazzi C, Walusimbi K, Bugeza J, Molly A,

252 Mugerwa S (2021) Genome-wide diversity and structure variation among lablab [Lablab purpureus (L.) Sweet] accessions and their implication in a forage breeding program. Genet Resour Crop Evol 68 (7):2997–3010. https://doi.org/10.1007/s10722-02101171-y Su C, Tianlong W, Heping G, Luan C (2021) Research progress on germplasm innovation and cultivation technology of lablab bean in China. Legume Perspect 21:27–30. https://www.legumesociety.org/2019/12/ 02/legume-perspectives/ Sultana N, Ozaki Y, Okubo H (2000) The use of RAPD markers in Lablab bean (Lablab purpureus (L.) Sweet) phylogeny. Bull Inst Trop Agric Kyushu Univ 23:45– 51 Susmita C, Mohan N, Aghora TS (2020) Breeding for evolution of photo-insensitive pole type vegetable dolichos (Lablab purpureus L.) varieties to suit year round cultivation. Electr J Plant Breed 11(2):633–637 Tadesse-Mosisa M, Chewaka-Tura D (2017) Effect of processing on proximate and mineral composition of hepho, a black climbing bean (Lablab purpureus L.) flour. J Food Nutr Sci 5(1):16–22. https://doi.org/10. 11648/j.jfns.20170501.13 Tamiru-Workneh S (2020) Ethnobotanical knowledge of Lablab (Lablab purpureus (L.) Sweet Fabaceae) in Konso zone and genetic diversity of collections from Ethiopia using SSR markers. MSc thesis, Addis Ababa University, Ethiopia, 112 pp. http://197.156.72.153: 8080/xmlui/handle/123456789/3395 The Kenya Gazette (2015) Crop varieties: Dolichos (Lablab purpureus) ELDO-KT Black 1, Black 2, Cream and Maridadi. The Kenya Gazette 117(2):446– 447. http://kenyalaw.org/kenya_gazette/gazette/year/ 2015/ Thormann I, Engels JM, Halewood M (2019) Are the old International Board for Plant Genetic Resources (IBPGR) base collections available through the Plant Treaty’s multilateral system of access and benefit sharing? A Review. Genet Resour Crop Evol 66 (2):291–310. https://doi.org/10.1007/s10722-0180715-5 Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P et al (2019) Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 47(21):10994–11006. https://doi.org/10. 1093/nar/gkz841 Vaijayanthi PV, Ramesh S, Gowda MB, Rao AM, Keerthi CM (2015) Development of core sets of dolichos bean (Lablab purpureus L. Sweet) germplasm. J Crop Improv 29(4):405–419. https://doi.org/ 10.1080/15427528.2015.1036955 Vaijayanthi PV, Chandrakant, Ramesh S (2019) Hyacinth bean (Lablab purpureus L. Sweet): Genetics, breeding and genomics. In: Al-Khayri J, Jain S, Johnson D (eds) Advances in Plant Breeding Strategies: Legumes. Springer, Cham, pp 287–318. https://doi. org/10.1007/978-3-030-23400-3_8

B. L. Maass and M. A. Chapman Venkatesha SC, Byre Gowda M, Mahadevu P, Mohan Rao A, Kim DJ, Ellis THN et al (2007) Genetic diversity within Lablab purpureus and the application of gene-specific markers from a range of legume species. Plant Genet Resour 5(3):154–171. https://doi. org/10.1017/S1479262107835659 Verdcourt B (1970) Lablab Adans. Studies in the Leguminosae-Papilionoideae III. Kew Bull 24:409– 411 Vidigal P, Duarte B, Cavaco AR, Caçador I, Figueiredo A, Matos AR, Viegas W, Monteiro F (2018) Preliminary diversity assessment of an undervalued tropical bean (Lablab purpureus (L.) Sweet) through fatty acid profiling. Plant Physiol Biochem 132:508–514. https://doi.org/10.1016/j.plaphy.2018. 10.001 Vishnu V, Radhamany P (2021) Evaluation of Lablab purpureus (L.) Sweet germplasm using yield and quality traits. Genet Resour Crop Evol (Preprint). https://doi.org/10.21203/rs.3.rs-277538/v1 Wang B, Zhao M, Yao L, Babu V, Wu T, Nguyen HT (2018) Identification of drought-inducible regulatory factors in Lablab purpureus by a comparative genomic approach. Crop Pasture Sci 69(6):632–641. https://doi.org/10.1071/CP17236 Wang ML, Morris JB, Barkley NA, Dean RE, Jenkins TM, Pederson GA (2007) Evaluation of genetic diversity of the USDA Lablab purpureus germplasm collection using simple sequence repeat markers. J Hortic Sci Biotechnol 82(4):571–578. https://doi. org/10.1080/14620316.2007.11512275 Wangila AJ, Gachuiri CK, Muthomi JW, Ojiem JO (2021) Biomass yield and quality of fodder from selected varieties of lablab (Lablab purpureus L) in Nandi South sub-county of Kenya. Online J Anim Feed Res 11(1):28–35. https://doi.org/10.51227/ojafr. 2021.6 Westphal E (1974) Dolichos lablab L. In: Pulses in Ethiopia, their taxonomy and agricultural significance. Wageningen University and Research, The Netherlands, pp 91–104. https://library.wur.nl/WebQuery/ wurpubs/fulltext/197905 Whitbread AM, Ayisi K, Mabapa P, Odhiambo JJ, Maluleke N, Pengelly BC (2011) Evaluating Lablab purpureus (L.) Sweet germplasm to identify shortseason accessions suitable for crop and livestock farming systems in southern Africa. Afr J Range Forage Sci 28(1):21–28. https://doi.org/10.2989/ 10220119.2011.570950 WHO (2007) Protein and amino acid requirements in human nutrition. World Health Organization technical report series (935), Geneva, Switzerland WIEWS (World Information and Early Warning System on Plant Genetic Resources for Food and Agriculture) (2020) Ex situ search. FAO, Rome, Italy. http://www. fao.org/wiews/data/ex-situ-sdg-251/search/en/ Wu F, Tanksley SD (2010) Chromosomal evolution in the plant family Solanaceae. BMC Genomics 11:182. https://doi.org/10.1186/1471-2164-11-182

13

The Lablab Genome: Recent Advances and Future Perspectives

Yao LM, Jiang YN, Lu XX, Wang B, Zhou P, Wu TL (2016a) Overexpression of a glycine-rich protein gene in Lablab purpureus improves abiotic stress tolerance. Gen Mol Res 15(4). https://doi.org/10.4238/gmr 15048063. Yao LM, Jiang YN, Lu XX, Wang B, Zhou P, Wu TL (2016b) A R2R3-MYB transcription factor from Lablab purpureus induced by drought increases tolerance to abiotic stress in Arabidopsis. Mol Biol Rep 43(10):1089–1100. https://doi.org/10.1007/s11033016-4042-7 Yao LM, Wang B, Cheng LJ, Wu TL (2013) Identification of key drought stress-related genes in the hyacinth bean. PLoS One 8(3):e58108. https://doi.org/10.1371/ journal.pone.0058108 Yao LM, Zhang LD, Hu YL, Wang B, Wu TL (2012) Characterization of novel soybean derived simple

253

sequence repeat markers and their transferability in hyacinth bean Lablab purpureus (L.) Sweet. Indian J Genet Plant Breed 72(1):46–53 Yuan J, Wang B, Wu TL (2011) Quantitative trait loci (QTL) mapping for inflorescence length traits in Lablab purpureus (L.) Sweet. Afr J Biotechnol 10 (18):3558–3566. https://doi.org/10.5897/AJB10.536 Yuan J, Yang RQ, Wu TL (2009) Bayesian mapping QTL for fruit and growth phenological traits in Lablab purpureus (L.) Sweet. Afr J Biotechnol 8(2):167–75 Zhang G, Xu S, Mao W, Gong Y, Hu Q (2013) Development of EST-SSR markers to study genetic diversity in hyacinth bean (Lablab purpureus L.). Plant Omics J 6(4):295–301

The Perennial Horse Gram (Macrotyloma axillare) Genome, Phylogeny, and Selection Across the Fabaceae

14

David Fisher, Isaac Reynolds, and Mark A. Chapman

Abstract

Identification of adaptive genetic variation in plants is important both for improving our understanding of adaptive evolution, as well as tackling the practical challenge of enhancing and developing crops able to tolerate changes in climate, whilst also meeting the demands of a rapidly growing human population. A potentially lucrative source of adaptive alleles could be found in underutilised crops, but their study is currently limited by a lack of genomic resources and knowledge of phylogenetic relationships. Legumes are of interest to both academic and applied endeavours due to their high economic and ecological importance. In this chapter we present a draft genome for perennial horse gram, Macrotyloma axillare (E. Meyer) Verdc., an underutilised forage legume with well documented tolerance of heat and drought. The genome of accession PI 364785 from South Africa is estimated to be 474 Mbp (1C value) and our assembly covers ca. 88% of this, with an N50 of 20.5 Kbp. After filtering out short contigs our assembly covers 74% of the genome, in ca. 50,000 contigs, with an N50 of 29.2 Kbp. In

D. Fisher  I. Reynolds  M. A. Chapman (&) Biological Sciences, University of Southampton, Southampton SO17 1BJ, UK e-mail: [email protected]

addition, for future endeavours, we assembled the chloroplast (cp) genome and identified ca. 73,000 microsatellites in the genome sequence. Utilising analyses of orthologous coding sequences and tests for positive selection across nine legume species, candidate genes were identified which could play roles in environmental adaptations and/or species diversification across the Fabaceae. Lineagespecific orthogroups, unique to the Dalbergioids, Phaseolinae, Glycininae and Vicioids were identified. In addition, evidence of positive selection was detected in 103 genes shared by all nine legumes, with serine-type protease activity found to be over-represented. Serine-type proteases have putative functions in light acclimation, lateral root development and secondary metabolite biosynthesis, which pose interesting candidates for follow up work. We also used chloroplast DNA phylogenetics to infer relationships between perennial horse gram and some other Macrotyloma species and demonstrate a sister relationship between M. axillare and M. uniflorum, another edible horse gram, but a more distant relationship between these and the geocarpic Kersting’s groundnut (M. geocarpum). The current work adds to the list of legume genomic resources to further the study of legume genetics. Candidate adaptive genes also warrant further investigation to explore the possibility of using these genes to develop hardier, more productive crops for future food and sustainable agriculture.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_14

255

256

14.1

D. Fisher et al.

Introduction

It is clear to see that with the human population estimated to exceed 9 billion by 2050, combined with continuing changes to global climate and increasing levels of malnutrition, the current over-reliance on just a handful of staple crops is simply not sustainable (Khoury et al. 2014). In order to safeguard global food and nutrition security it will be necessary to find existing sources of adaptive phenotypic variation, and the underlying genetic variants, which can be used to improve the quality of existing crops. Beyond this, increasing agricultural diversity by developing new crops will provide potential new options for farmers (Cheng et al. 2017) with positive consequences for human health through diet diversification (Fanzo et al. 2013). There are many crop species worldwide which do not contribute to global food production but are pivotal within the local regions in which they are grown. Until recently, these underutilised crops have received little attention, even though their potential utility for crop improvement has been well documented for many years (e.g. NAS 1975), often having unique and diverse nutritional profiles and useful locally adapted phenotypes (Baldermann et al. 2016; Mayes et al. 2012). However, more recently their importance is starting to be widely recognised and investigated (Cullis and Kunert 2017; Tadele 2019) and various initiatives (e.g. Gregory et al. 2019; Hendre et al. 2019) have been set up to highlight the potential for these crops to aid in the fight against food and nutrition insecurity. As a potentially rich source of new adaptive alleles, underutilised crops and non-model plants should be of greater interest for sequencing and comparative genomic studies. Not only will a better understanding of the unique local adaptations of non-model plants contribute to the power with which comparative studies can investigate molecular evolution, but it will also enhance the search for adaptive alleles which could then be bred or transformed into widespread relatives. The genomes of widely cultivated crops have rapidly accelerated the study of plant genetics, as

well as providing important information to breeding programmes aiming to improve the genetic and nutritional quality of these crops (Jackson 2016). To date, the generation of genomic resources for non-model and underutilised crops has been lacking, largely due to the cost and complexity of assembling a plant genome de novo. In the majority of cases, de novo assembly of plant genomes has combined a whole genome shotgun approach with both short and long read sequencing technologies, and in some cases the use of BAC clone libraries. Whilst long read sequencing technologies have proved important to curating high quality plant genomes (Bevan et al. 2017), their considerable cost is likely to hinder their widespread usage for studies wanting to sequence the genomes of non-model plants, for which there is likely to be less funding. The prospect of assembling a non-model plant genome de novo on a budget is not, however, an impossibility as demonstrated in a recent benchmarking study (Paajanen et al. 2019). It also depends on the goal of the study. Being unable to assemble the repetitive fraction of the genome might not be important, and a highly fragmented genome with well assembled gene space may serve the desired purpose, for example for testing for selection on gene sequences during evolution.

14.1.1 Identifying Genes Underlying Important Traits A greater understanding regarding the evolution of adaptive traits in plants has long been sought, due to the potentially profound insights which could fuel both academic and applied endeavours. An adaptive trait can be defined as one which enhances an individual’s fitness in their local environment, driven by often complex interactions between population genetics, positive selection and existing constraints on selection such as epistasis (Olson-Manning et al. 2012), which therefore also encompasses traits

14

The Perennial Horse Gram (Macrotyloma axillare) Genome …

included in crop domestication, for example related to seed/fruit size, flavour, harvestability. Detecting genes carrying signatures of positive selection is a commonly used method for identifying candidate adaptive genes. This has been coined a “Bottom-up” approach, which effectively connects inter or intra-specific genotypic changes with fitness changes but does not identify the phenotype that is modified (Barrett and Hoekstra 2011). It compares orthologous gene sequences across related taxa, attempting to identify lineage-specific genes, or genes with evidence for positive selection. At the core of many widely used models that test for positive selection is the dN/dS ratio (Yang and Bielawski 2000). The dN/dS ratio is a measure of the number of non-synonymous nucleotide changes per non-synonymous site (dN), which are assumed to have evolved under selective pressures, relative to the number of synonymous changes per synonymous site (dS), which are assumed to have evolved neutrally. Non-synonymous mutations in coding sequences can translate into proteins with altered protein folding, binding and catalytic properties, which affects the functionality of the protein and in turn can alter the phenotype. Phenotypes are then subject to natural selection, which effectively selects for the modified trait (positive selection) or against it (purifying/negative selection). These groups of genes could be related to lineagespecific phenotypes. A dN/dS equal to one therefore implies neutrality, a dN/dS less than one implies negative or purifying selection against deleterious alleles, and a dN/dS greater than one implies positive selection for beneficial alleles. The simplicity and overall robustness of dN/dS has likely contributed to its overall utility, with models implementing dN/dS having been used to detect potentially and validated adaptive genes in a range of plant lineages. Roth and Liberles (2006) examined positive selection across the entire embryophyte lineage, identifying genes implicated in responses to pathogens, cold and general growth and development, as those under positive selection. Two economically and ecologically important plant families, the

257

Poaceae and Fabaceae, have also been examined using dN/dS tests (Li et al. 2015; Zhao et al. 2013), however only a limited number of orthologous genes were examined/studied (550 and 314, respectively). Analysing such a limited selection of genes is likely to produce conservative representations of the actual number and type of genes under selection, making any real inferences about the bigger picture of adaptation difficult.

14.1.2 Perennial Horse Gram and the Fabaceae (Legumes) The Fabaceae (legumes) contains around 19,000 species, making it the third largest plant family, and second in agronomic importance to the Poaceae (grasses). The edible beans and peas produced by legumes provide a major source of dietary protein across the globe, particularly in regions where meat is expensive or unavailable (Broughton et al. 2003). Soybeans are also used to produce a large proportion of the world’s vegetable oil and livestock feed, making legumes an economically favourable crop for farmers due to their diverse range of potential uses. In addition, the ecological importance of legumes cannot be overstated, owing largely to their ability to fix atmospheric nitrogen through a symbiosis with root-nodulating bacteria (Beringer et al. 1979; Westhoek et al. 2017). The legume-rhizobia symbiosis allows legumes to accumulate large amounts of nitrogenous compounds, even when grown in nutrient poor soils. Subsequent deposition of leaf litter and other organic material from legumes, therefore, plays a central role in community nitrogen cycling and improvement of local soil fertility. As a result, aside from their direct use as food and fuel crops, legumes also have the potential to be integrated into more efficient and sustainable agricultural practices. Legumes are used as cover crops and are often incorporated in intercrops, both approaches serving to improve soil fertility of arable land and subsequently improve the yield of another primary crop (Wang et al. 2020; Layek et al. 2018).

258

Legumes for which genomes have been sequenced include barrel medic [Medicago truncatula (Young et al. 2011)], red clover [Trifolium pratense L. (De Vega et al. 2015)], chickpea [Cicer arietinum L. (Varshney et al. 2013)], common bean [Phaseolus vulgaris L. (Schmutz et al. 2014)], cowpea [Vigna unguiculata (L.) Walp. (Lonardi et al. 2019)], peanut [Arachis hypogea L. (Bertioli et al. 2019)], pigeon pea [Cajanus cajan L. (Varshney et al. 2012)], soybean [Glycine max (Schmutz et al. 2010)] and wild soybean [G. soja Sieb. and Zucc. (Xie et al. 2019)]. Recently, the draft genomes for three underutilised legumes were also published, namely, Bambara groundnut (Vigna subterranea (L.) Verdc.), hyacinth bean (Lablab purpureus (L.) Sweet.) and apple-ring acacia (Faidherbia albida (Delile) A. Chev.) (Chang et al. 2018). These genomes represent a reasonable mix of widely cultivated and lesser utilised legume species, which presents a good platform from which studies of adaptive evolution within the Fabaceae family can be launched. Perennial horse gram (Macrotyloma axillare (E. Mey.) Verdc.) is another underutilised legume which could provide a useful, but until now relatively unexplored, genetic resource. Perennial horse gram is heat and drought tolerant, is grown as a forage, and serves well to stabilise the soil (Blumenthal and Staples 1993; Cameron 1986). Originating in sub-Saharan Africa, M. axillare is now distributed across the Middle East, South America, Southern Asia and Northern Australia where it is normally cultivated for livestock forage (Blumenthal and Staples 1993). M. axillare can outperform other legumes in inter-cropping field trials, being able to withstand extensive drought periods, tolerate the shade conditions imposed by larger primary crops whilst also serving to enhance yield of the primary crop (Yemataw et al. 2018; Araújo et al. 2017). Further work to identify the genetic basis of adaptation to hot, dry and light-limited environments in M. axillare could be invaluable to developing tropical forages for use as intercrops. The development of perennial grain crops is also of interest for improving agricultural sustainability, further increasing the potential applications of perennial legume genomic resources (Lubofsky 2016).

D. Fisher et al.

Members of the Macrotyloma genus (comprising ca. 24 species) are documented for their hardiness and ability to withstand drought. In addition to perennial horse gram, two other cultivated species reside in the genus, Kersting’s groundnut (M. geocarpum) and horse gram (M. uniflorum). The former is a geocarpic species, with the pods developing underground akin to peanuts (Arachis hypogea). The latter is primarily a forage, but also used in some foods for human consumption in India. These three species, whilst being locally important, are underutilised, but often perceived as “food for the poor” (Morris 2008; Aditya et al. 2019). The specific properties of M. axillare also appear to be overshadowed in the existing literature by M. uniflorum (Lam.) Verdc., which has been domesticated and established as a key agricultural crop in India for many centuries (Fuller and Murphy 2018; Bhardwaj et al. 2013; Reddy et al. 2008). Based on archaeological evidence, Fuller and Murphy (2018) suggest two origins of domesticated M. uniflorum. Hybridisation attempts between M. axillare and M. uniflorum have been partly successful (Singh et al. 2013), but this also resulted in undesirable traits such as juvenile flowering which may reduce seed yield. Little is known about the relationships between these taxa except for some assumptions based on morphology and palynology (e.g. Morris 2008; Verdcourt 1982), and scant genetic resources are currently available, and only for M. uniflorum (Bhardwaj et al. 2013; Sharma et al. 2015; http://www.hillagric.ac.in: 1005/database.php), which together could severely limit the potential of crop improvement strategies.

14.2

Materials and Methods

14.2.1 cpDNA Phylogeny of Macrotyloma Twenty-six accessions of the eight species that were available from seedbanks plus an outgroup (Sphenostylis stenocarpa (Hochst. ex A.Rich.)

14

The Perennial Horse Gram (Macrotyloma axillare) Genome …

Harms; African Yam Bean) were grown (Table 14.1) and DNA extracted using a modified CTAB procedure (Doyle and Doyle 1990). Eight cpDNA primer pairs were tested for consistent amplification and three were selected for sequencing across all accessions [(trnL-F-trnLUAA (Taberlet et al. 1991), rpL16 (Shaw et al. 2005), and psbM2-trnD-GUC (Lee and Wen 2004)]. DNA sequencing was carried out as per Chapman et al. (2008). Sequences were trimmed in Chromas (Technelysium Pty Ltd 1998–2001) and aligned in MEGA-X (Kumar et al. 2018). Trees were constructed using Maximum Parsimony with 1000 bootstraps in MEGA-X and using Bayesian analysis in MrBayes 3.2 (Ronquist et al. 2011) via the http://www.phylogeny. fr platform with the HKY85 with Gamma substitution (HKY85+G) model. Trees were sampled every 10 generations for a total of 10,000 generations of the Markov Chain (MCMC), with the first 250 trees discarded.

14.2.2 DNA Sequencing, De novo Genome Assembly and Annotation Two paired end (PE) libraries with insert sizes of 300 and 500 base pairs (bp), were prepared from DNA extracted from M. axillare accession PI 364785 from South Africa (available from the USDA-ARS) using the TruSeq DNA sample prep kit (Illumina, San Diego, California, USA) according to the manufacturers protocol. Illumina sequencing for 150 cycles was carried out at Novogene (Cambridge, UK). Adapters and low quality sequences were trimmed using Trimmomatic v0.39 (Bolger et al. 2014). The most suitable trimming parameters were identified through iterative testing of different minimum read length and sliding window quality combinations, run on both the 300 bp and 500 bp insert libraries (Table 14.2). A sliding window quality of 15 combined with a minimum read length of 96 was selected as the most suitable set of trimming parameters, such that low quality reads were discarded but more than 90% of the PE reads were still retained. Full trimmomatic parameters

259

included Illumina clip 2:30:10 leading:5 trailing:5 sliding window size:4 quality:15 minimum read length:96. Genome size was estimated by k-mer distribution analysis of all processed sequence data using jellyfish (Marçais and Kingsford 2011). The R (R Core Team) package “FindGSE” (Sun et al. 2017) was used to estimate haploid genome size. The 300 and 500 bp PE libraries were used as input for contig and scaffold construction using Abyss v1.9.0 (Simpson et al. 2009). In order to optimise the k-mer size (k) parameter in Abyss, nine assemblies were generated with k ranging from k = 40 to k = 120, in increments of 10. Assessment of genome assembly completeness was carried out with Benchmarking Universal Single Copy Orthologs v4.0.4 (BUSCO) using the Fabales dataset (Simão et al. 2015). The assembly generated with 80-mers was selected as the strongest assembly based on it showing the best combination of contiguity (N50), total assembled length and number of complete BUSCOs (Tables 14.3 and 14.4). Scaffold gaps, regions of the k80 assembly rich in unknown nucleotides (N), were closed using Sealer (Paulino et al. 2015). The trimmed paired end reads used to generate the assembly were provided as a template for Sealer to close scaffold gaps using eight k-mer sizes, increasing in increments of 8 from k = 48 to k = 96. BUSCO was then used to reassess the completeness of the final assembly. Putative protein coding genes were identified in the genome assembly using the MAKER de novo annotation pipeline (Cantarel et al. 2008). No expression or sequence data had been generated for M. axillare prior to this study. Validated expression sequence tags (EST) and 27,518 assembled transcripts from the close relative, M. uniflorum, were used as RNA-based evidence (Bhardwaj et al. 2013; Reddy et al. 2008). Homologous protein evidence was provided in the form of 275,440 protein sequences from Arabidopsis thaliana, G. max, Medicago truncatula, P. vulgaris and V. unguiculata (downloaded from Phytozome v13). RNA and protein evidence were used to train the ab initio gene predictor SNAP (Korf 2004) following MAKER support protocol 1 (Campbell

260

D. Fisher et al.

Table 14.1 Species analysed in the cpDNA phylogeny along with accession codes, seedbank names and country of origin Species

Accession

Seedbanka

Origin

M. africanum

17161

CIAT

Zimbabwe

APG51599

APG

Zambia

NI_386

MBG

Rwanda

NI_1209

MBG

Cameroon

NI_219

MBG

Democratic Republic of the Congo

17136

CIAT

Mozambique

17142

CIAT

Kenya

NI_1221

MBG

Cameroon

NI_1198

MBG

Cameroon

APG51871

APG

Sudan

APG53198

APG

Zimbabwe

4861

CIAT

Ethiopia

NI_385

MBG

Zambia

APG51581

APG

Zambia

NI_277

MBG

Democratic Republic of the Congo

TKg8

IITA

Ghana

TKg12

IITA

Unknown

NI_1261

MBG

Cameroon

NI_1251

MBG

Cameroon

NI_240

MBG

Democratic Republic of the Congo

NI_1267

MBG

Cameroon

NI_1435

MBG

Cameroon

PI_364789

USDA

South Africa

17156

CIAT

Tanzania

PI_658594

USDA

Nepal

17152

CIAT

Namibia

M. axillare

M. daltonii

M. ellipticum

M. geocarpum M. stenophyllum

M. tenuiflorum M. uniflorum

a

Seedbanks are indicated as follows: APG Australian Pastures Genebank CIAT International Centre for Tropical Agriculture IITA International Institute of Tropical Agriculture MBG Meise Botanic Garden USDA United States Department of Agriculture

et al. 2014). The first round of MAKER gene predictions was made directly from the RNA and protein evidence (options “est2genome” and “protein2genome” both set to “1”), with simple repeats masked to prevent evidence mapping to those regions. The generated gene models were then filtered to keep only models which had a maximum annotation edit distance (AED) of 0.25 and a minimum sequence length of 50 bp. All

gene models which passed the filters were used for the first round of SNAP training. The second round of MAKER was run using the trained SNAP parameters to predict genes using the RNA and protein evidence as hints (options “est2genome” and “protein2genome” both set to “0”). The models produced from the second MAKER run then seeded a second round of SNAP training. MAKER was run for a third time

14

The Perennial Horse Gram (Macrotyloma axillare) Genome …

Table 14.2 The effect of different trimming parameters on the percentage of reads retained

Insertsize

Minlen

500

120

SW_qual

96

300

261 Input

Output_total

Output_perc

15

70,895,089

61,895,088

87.31

10

70,895,089

70,315,899

99.18

5

70,895,089

70,321,441

99.19

15

70,895,089

64,536,016

91.03

10

70,895,089

70,751,489

99.80

5

70,895,089

70,755,674

99.80

72

15

70,895,089

66,544,528

93.86

120

15

62,052,420

58,645,042

94.51

10

62,052,420

61,777,965

99.56

96

72

with the second set of trained SNAP parameters and simple repeats masked, and a fourth time using repbase (Bao et al. 2015) to mask repeats previously annotated from G. max. The quality of each round of gene annotations was assessed based on the proportion of gene models with an AED 90% of gene models scoring an AED of 200 DAM). The composite collection of pigeonpea encompassing a broad range of phenotypic and genetic variability contains mini core collection (146), mini core comparator (146), 79 cluster of accessions from core collection (236), control cultivars (4), 63 accessions of seven wild species and promising germplasm for biotic (77) and abiotic (16) stresses, 59 accessions for promising traits such as nodulation, photoperiod response and several morphological variants, 16 from released cultivars and 237 from distinct morphoagronomic traits (Upadhyaya et al. 2007; Bohra et al. 2010). Cultivated accessions form major chunk (94%) of the composite collection, whereas remaining 6% represents wild types. Besides ICRISAT and NBPGR, the Australian Tropical Crop and Forage Genetic Resources Centre, Biloela, holds nearly 150 accessions of 13 Cajanus wild species (Rao et al. 2003). Evaluation of the wide germplasm collections including landraces and wild relatives has facilitated discovery of key agronomic traits such as resistance to major disease and pests. Noteworthy in this context is the recent “democratisation” of next generation sequencing (NGS) technologies that have greatly enhanced our capacity to analyse the large germplasm collections. Genome-wide approaches have elucidated significant marker-trait associations for deployment in breeding programmes. For instance, whole-genome resequencing (WGRS) of 292 Cajanus accessions covering breeding lines, landraces and wild relatives uncovered MTAs for a variety of agronomic traits (Varshney et al. 2017).

A. Bohra et al.

15.1.4 Benefits Pigeonpea offers a variety of benefits in terms of environment, nutrition, mankind and cultivation to farmers. According to Adjei-Nsiah (2012), pigeonpea farming in Ghana had helped rescuing the agricultural productivity of small farmers by restoring the fertility of soil. Pigeonpea plays a vital role in recycling nutrients and restoring nitrogen and phosphorus, which is a limiting nutrient for the soil. Owing to its drought tolerance, it does not require heavy irrigation system and can withstand the stress, which is beneficial for poor farmers. Its deep rooting and soil holding capacity (Mahta and Dave 1931) serves as a strong barrier to prevent soil erosion (BekeleTessema 2007). Heuzé et al. (2017) reported that it fixed 47–90 kg N2/ha/year and 235 kg N2/ ha/year either as sole crop or with mixed cropping in Africa and Florida, respectively. Pigeonpea improves long-term soil quality and fertility when used as a green manure, cover crop or alley crop. Though it does not require any inoculant, when inoculated, the affectivity of vesicular–arbuscular mycorrhiza (VAM) fungi is found to be the highest in comparison to any other legume crop (Ahiabor and Hirata 1994). Pigeonpea can be an important component of one’s balanced diet as its seed contains 20–30% protein with adequate amount of vitamins A and C and zero cholesterol. Several of the Cajanus species are known to have important therapeutic and medicinal properties (Drabu et al. 2011) with presence of alkaloids, anthraquinones, glycosides, carotenoids, coumarins, dihydrochalcones, fatty acids, steroids, flavonoids and triterpenoids. Pigeonpea seeds also exhibit anti-tumour or anticarcinogenic property, and some tribes have been using pigeonpea seed after boiling to decant many toxic substances of the body (Murthy and Emmannuel 2011). Treatment with Rhynchosia seeds (wild relative of pigeonpea) is found to restore the RBCs and lymphocyte count to normal level (Mallikarjuna et al. 2014). Pigeonpea has shown strong therapeutic benefits as antidiabetic, anti-inflammatory, and high antioxidant activities (Wu et al. 2009).

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

In the food industry, pigeonpea is used as flour additive with soups, and flour is preferred for making baked cookies, bread, etc. In the snack industry, its use is recommended to increase the nutritional value of pasta (Center for New Crops and Plants Products 2002). Pigeonpea is also used as feed and fodder for livestock, and the foliage is good fodder with rich nutrition for livestock, commonly used in African countries. Owing to its high nutritional value, pigeonpea serves as a cost-effective diet for broilers and pullet chicks and livestock (Odeny 2007). Notwithstanding its enormous benefits, pigeonpea is considered as “orphan legume.” Nevertheless, remarkable progress has been seen in recent years in terms of generation of large-scale genomic tools and technologies (Bohra et al. 2020c).

15.2

Traditional Breeding and Cultivar Development in Pigeonpea

Breeding of improved cultivars has caused reduction of the genetic diversity in domesticated crops. In India, following establishment of All India Coordinated Research Project (AICRP) on pigeonpea in 1996 by ICAR, breeding based on selection, mutation, hybridization and heterosis has led to the development and release of more than 150 pigeonpea varieties/hybrids suitable for cultivation in diverse agro-ecologies (Naik et al. 2020). Major breeding methods used for cultivar development in pigeonpea include pedigree method and hybrid breeding in addition to some alternative approaches like mutational breeding and sybrid production (Saxena et al. 2020a). Mutation breeding using mutagens like ethyl methane sulfonate, gamma rays and neutrons has created significant variability and led to the development of five commercial cultivars, viz. Co3, Co5, TT5, TT6 and TAT10 (Saxena et al. 2020a). Some of the landmark pigeonpea varieties delivered through conventional breeding include CORG 9701 (South zone), TJT 501 (Central zone), PA 291 (North-west Plain Zone, NWPZ), Phule T 0012 (Central zone), PRG 176

287

(Telangana state), PAU 881 (NWPZ), ICPL 8863/Maruti (South zone), ICPL 87119 (Asha) (South and Central), JKM 189 (Central zone), BSMR 736 (Maharashtra state), BSMR 853 (Central zone), BDN 711 (Central zone), Bahar (North East Plain Zone, NEPZ), NA1 (NEPZ) and IPA 203 (NEPZ). The majority of the current cultivars show resistance against Fusarium wilt (FW) and sterility mosaic disease (SMD), two major diseases of pigeonpea (Sharma et al. 2020). Male sterility systems have been widely used for large-scale hybrid seed production in different crops (Bohra et al. 2016). In pigeonpea, genic male sterility (GMS) and cytoplasmic male sterility (CMS) have been used for hybrid production. The first pigeonpea hybrid using GMS system was ICPH 8 (ICPH 82008), which was developed by crossing MS Pabhat DT line with ICPL 161 and released for commercial cultivation in 1991. With semi-spreading and indeterminate growth habit, the hybrid ICPH 8 matured in 142 days and offered 41% yield advantage over the pigeonpea cultivar UPAS 120 (http:// oar.icrisat.org/561/1/PMD_40.pdf). The progenies of the cross-msms  Msms segregate for male sterility trait, with the two types of genetic constitution [male sterile (msms) and male fertile (Msms)] obtained in equal proportions (1:1). Hence, maintenance of male sterility in the GMS system presents a Herculean challenge. Hybrid seed production using GMS system suffered a major bottleneck as discarding 50% plants in each generation causes a substantial increase in the seed production cost. Most importantly, the operation of discarding the fertile lot must be carried out before flowering. Presence of morphological or molecular markers tightly linked with the ms locus is of great value in selection and removal of the fertile individuals (Colombo and Galmarini 2017). So far, nine CMS systems were derived from various wild relatives such as C. sericeus, C. scarabaeoides, C. volubilis, C. cajanifolius, C. acutifolius, C. lineatus , C. platycarpus, C. reticulatus and recently from C. lanceolatus (Bohra et al. 2016, 2020c; Saxena et al. 2020a). Of these, CMS sources from C. scarabaeoides

288

A. Bohra et al.

and C. cajanifolius are being currently used for hybrid development (Bohra et al. 2017a, b). More recently, two pigeonpea hybrids IPH 15-03 and IPH 09-5 have been developed at ICARIIPR, Kanpur for cultivation in NWPZ (Bohra et al. 2020c). The two hybrids IPH 15-03 and IPH 09-3 offer 28% and 30% yield advantage, respectively, over the best check varieties, i.e. Pusa 992 and UPAS 120. CMS-based hybrid technology faces a great challenge of maintaining large isolation distances for seed production since fields in isolation are essentially required to ensure the genetic purity of the parental lines and hybrids.

15.3

Advances in Pigeonpea Genomics

15.3.1 Construction of the Reference Genome Sequence With the aim to develop large-scale genomic resources, pigeonpea genomics initiative (PGI) was launched in 2006 under the umbrella of Indo-US Agricultural Knowledge Initiative (AKI) and various institutes collaborated to develop the suitable genomic resources in pigeonpea. The consortium worked in phases and selected Asha (ICPL 87119) as the reference genotype. When the project initiated, there was no genetic linkage map in pigeonpea, and various DNA marker systems reported low level of DNA polymorphism amongst cultivated Cajanus (Bohra et al. 2011; Yang et al. 2006, 2011). Bohra et al. (2011) developed first large-scale set of SSR markers from BAC-end sequencing, and the study provided 3072 SSRs for diverse genetic studies including diversity analysis, linkage mapping and QTL discovery. Application of next generation sequencing (NGS) greatly facilitated the development of modern genomic resources in pigeonpea (Table 15.1). A timeline of key millstones in pigeonpea genome is presented in Fig. 15.4. By using illumina technology, Varshney et al. (2012) assembled 605.8 Mb (72.7%) of the pigeonpea genome, with scaffold N50 of

516.6 Kb. A total of 48,680 genes were predicted in the pigeonpea genome. Out of the total genes, 96.04% (46,750) genes were annotated, and remaining 3.96% remained unannotated. The study reported 309,052 SSRs in the pigeonpea genomes and revealed 28,104 SNPs across 12 pigeonpea genotypes. The reference genome sequence of pigeonpea provides novel opportunity to identify genome-wide genetic variants for genotyping and breeding applications. Another group (Singh et al. 2011) assembled 511 Mb of pigeonpea genome with >tenfold genome coverage and mean read length of >550 bp with 454 GS FLX technology. This study revealed a total of 47,004 protein-coding genes and 12,511 transposable elements. Several genes in this assembly predicted to have associations with physiological traits (6180), disease resistance and defence response (1213), growth and development (453) and DNA synthesis and repair (751). Recently, Mahato et al. (2018) established an improved genome assembly of pigeonpea by merging draft 454 GS FLX assembly (Singh et al. 2011) with new illumina sequences to provide 75.6% genome coverage.

15.3.2 Whole-Genome Resequencing Advances in NGS in combination with the availability of a reference genome sequence in pigeonpea have broadened the scope for cataloguing variations amongst different gene pools, landraces, breeding material and wild accessions. Resequencing diverse genomes enables better understanding of existing genetic diversity, novel structural variations and genome-wide gene-trait associations. By using the whole-genome resequencing (WGRS) and reference genome sequence information, the first Hapmap of pigeonpea using 20 accessions that represented parents of recombinant inbred lines (RIL), MAGIC and NAM, introgression lines and 18 wild and two cultivated accessions has been produced (Kumar et al. 2016). A total of 791.77 million pair-end reads were obtained with *12X depth per genotype and coverage of each accession against the

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

289

Table 15.1 Genomics resources in pigeonpea Genomic resources

Type

References

Mapping populations

Multiparent populations

Scott et al. (2020), Saxena et al. (2020b)

DNA markers

SNP

Kumar et al. (2016), Saxena et al. (2012, 2017a, b, d, 2018a, b)

InDel

Kumar et al. (2016), Varshney et al. (2017), Singh et al. (2017)

CNV

Kumar et al. (2016), Varshney et al. (2017)

PAV

Kumar et al. (2016), Varshney et al. (2017)

WGRS/Skim sequencing

Kumar et al. (2016), Singh et al. (2016, 2017), Varshney et al. (2017)

KASP

Saxena et al. (2012, 2020a)

Genotyping-bysequencing (GBS)

Saxena et al. (2017a, b, c, 2018a)

Restriction site-associated DNA (RAD) sequencing

Arora et al. (2017)

50K Axiom Cajanus SNP array

Saxena et al. (2018a)

62K Axoim Cajanus SNP array

Singh et al. (2020)

Digital gene expression profiles

Dubey et al. (2011), Dutta et al. (2011), Kudapa et al. (2012), Nigam et al. (2017), Rathinam et al. (2019), Saxena et al. (2020a), Bohra et al. (2021a, b)

Gene expression atlas (28,793 genes)

Pazhamala et al. (2017)

References gene for expression analysis

Sinha et al. (2015b, c)

Reference genome sequence

511 Mb

Singh et al. (2011)

605 Mb

Varshney et al. (2012)

HapMap

20 Cajanus spp. accessions CNV (2598)

Kumar et al. (2016)

High-density genotyping platforms

Transcriptomic resources

PAV (970) 4,686,422 SNPs and 779,254 InDels Pangenome

Core genes (48,067 = 86.6%)

Zhao et al. (2020)

Variable genes (7445 = 13.41%) Diagnostic DNA marker kit

10 KASP-SNPs each for FW and SMD

Saxena et al. (2020c)

Superior haplotypes for haplotype-based breeding

four haplotypes for drought response

Sinha et al. (2020)

290

A. Bohra et al.

Fig. 15.4 Key discoveries in pigeonpea genomics

reference genome varied from 75 to 91%. In addition to 5.4 million polymorphic regions including 4.6 million SNPs and 0.7 million indels, the study revealed larger structural variations (SVs) such as 2598 copy number variations (CNVs) and 970 presence and absence variations (PAVs). Recently, Varshney et al. (2017) have resequenced genomes of 292 lines from reference set, the sequencing depths varying between 5 and 12X. The accessions covered landraces, breeding lines and wild relatives (C. cajanifolius, C. scarabaeoides and C. platycarpus). The WGRS data set divulged details on both smaller SVs (SNPs and indels) and large SVs including CNVs and PAVs. The SVs amongst breeding lines, landraces and wild accessions spanned lengths of 0.002–13.3 Mb, 0.001–0.2 Mb and 0.001–2 Mb, respectively. More recently, WGRS of 89 accessions (mainly from India and the Philippines) with a minimum 9.5X coverage allowed construction of the first pigeonpea pangenome (Zhao et al. 2020). PAVs within the genes in these accessions were identified following mapping and assembly using the reference genome sequence and WGRS data. The pangenome consisted of 55,512 genes, and 86.6% of genes was identified as “core genes” present in all accessions, whereas 13.4% of the total genes had variable presence. The number of predicted genes was enhanced from 48,680 of the 606 Mb assembly (Varshney et al.

2012) to 53,612 by reannotation of the genome assembly using additional transcriptome data sets along with protein and EST sequences. Growing sequence information has accelerated the progress of gene discovery and opened scope for more efficient designs for trait mapping (Varshney et al. 2021a, b). For instance, goldstandard experimental designs, like multiparent advanced generation intercross (MAGIC) and nested association mapping (NAM) that accommodate multiple founder parents and profuse genome reshuffling, have been developed in pigeonpea for identifying important traits (Bohra et al. 2020c).

15.3.3 Genome-Wide SNP Arrays Advances in DNA sequencing technologies in combination with reference genome sequence information have enabled researchers to identify large-scale DNA markers for a variety of genetic studies. In the post-NGS era, SNPs are the marker of choice because of their bi-allelic nature, amenability to automation and genome-wide distribution. Rasheed et al. (2017) have reviewed a variety of SNP platforms in various plant species including fruits, crops and tress and discussed their applications in plant breeding. Several NGS-based protocols such as GBS, RAD-Seq and SLAF-Seq, exon capture and rAmpSeq are available to identify genomic

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

region associated with the phenotypic diversity. In pigeonpea, a high-density SNP array Axiom Cajanus SNP Array with 56K SNPs was constructed using *2 million SNPs from the WGRS data set of 63 released cultivars and 40 donor germplasm lines, landraces and founder parent lines. Importantly, the SNP array also contains 1554 SNPs and 385 indels having association with different traits such as grain protein content, fertility restoration, and resistance to SMD and FW (Saxena et al. 2018a, b). The immense utility of SNP chip was demonstrated by trait mapping (Yadav et al. 2019) and diversity studies. More recently, a genic-SNP array “CcSNPnks” was developed by resequencing of 45 diverse genotypes, and CcSNPnks encompasses a total of 62,053 SNPs from 9629 genes (Singh et al. 2020).

15.3.4 Genetic Linkage Maps A variety of molecular marker systems have been reported in pigeonpea and employed for linkage analysis (Table 15.2). The first large set of SSR markers in pigeonpea was developed by Bohra et al. (2011) based on analysis of BAC-end sequences (BESs). These BES-SSRs facilitated the development of first genetic map of pigeonpea with 239 SSR loci mapped onto eleven linkage groups. The next generation of DNA markers including SNPs and diversity arrays technology (DArT) has helped linkage analysis and elucidation of gene-trait associations. By using 1616 SNPs (known as pigeonpea KASP assay marker), a genetic map with 910 loci (875 PKAMs and 35 SSRs) was built from 167 lines of an F2 population (C. cajan ICP 28  C. scarabaeoides ICPW 94) (Saxena et al. 2012). Similarly, Kumawat et al. (2012) developed a 296-loci genetic map with SSR and SNP loci that spanned 1520 cM of the pigeonpea genome. Bohra et al. (2012) built four new intra-specific genetic linkage maps comprising 59–140 SSR loci that spanned map lengths in the range of 587–882 cM. Furthermore, authors constructed a consensus genetic map with 339 loci by merging six population-specific genetic linkage maps.

291

Arora et al. (2017) assayed three populations (Asha  UPAS 120, Pusa dwarf  H2001-4, Pusa Dwarf  HDM04) using RAD-Seq, and SNP data on three populations were used to a consensus genetic linkage map with 932 loci. In recent years, high-density genetic linkage maps were built in pigeonpea by assaying population with NGS-based genotyping protocols. These genetic maps allowed delineation of genomic regions explaining significant proportions of observed phenotypic variability in various agronomic traits. For example, Saxena et al. (2017a) identified 212,464, 89,699 and 64,798 SNPs following GBS assay of three mapping populations, viz. ICPB 20097  ICP 8863 (RIL), ICPL 20096  ICPL 332 (RIL) and ICP 8863  ICPL 87119 (F2), respectively. Genetic linkage analysis resulted in the development of genetic maps with 1101, 404 and 996 SNP loci for ICPL 20096  ICPL 332, ICPB 20097  ICP 8863 and ICP 8863  ICPL 87119 populations with respective map lengths of 921.2 cM, 798.3 cM and 1597.3 cM.

15.3.5 Transcriptomics and Gene Identification Genome-wide transcriptomic studies have provided a wide array of genomic resources in pigeonpea for functional genomics and gene discovery (Bohra et al. 2021a, b). In this context, the first set of 9468 ESTs was obtained from four SMD and FW-responsive pigeonpea genotypes, and EST-SSR markers were developed from this data set (Raju et al. 2010). Similarly, Kumar et al. (2014) developed a set of 105 ESTs from root tissues followed by validation of the genes coding for S-adenosylmethioninesynthase, phosphoglycerate kinase, serine carboxy peptidase and methionine aminopeptidase. In another study, a transcriptome assembly was developed by combining sequence reads from two genotypes “Asha” and “UPAS 120” using 454 GS FLX platform, and the authors could assemble a total of 43,324 transcripts assembly contigs (TACs) and identified 3000 genic SSR markers. Another transcriptome

292

A. Bohra et al.

Table 15.2 Genetics linkage maps in pigeonpea Population

Number of mapped loci

References

SSR-based linkage maps Pusa Dwarf  HDM04-1

296

Kumawat et al. (2012)

ICP 28 (C. cajan)  ICPW 94 (C. scaraboides)

239

Bohra et al. (2011)

ICP 8863  ICPL 20097

120

Bohra et al. (2012)

ICPA 2039  ICPR 2447

188

Bohra et al. (2012)

ICPA 2043  ICPR 2671

188

Bohra et al. (2012)

ICPA 2043  ICPR 3467

188

Bohra et al. (2012)

TTB 7  ICP 7035

130

Bohra et al. (2012)

ICPB 2049  ICPL 99050

188

Bohra et al. (2012)

Asha/UPAS 120

725

Arora et al. (2017)

Pusa Dwarf/H2001-4

136

Arora et al. (2017)

Pusa Dwarf/HDM04-1

291

Arora et al. (2017)

SNP-based linkage maps Pusa Dwarf  H2001-4

2078

Singh et al. (2020)

ICP 28  ICPW 94

875

Saxena et al. (2012)

ICPL 85063  ICPL 87119

996

Saxena et al. (2017c)

557

Saxena et al. (2017a)

ICP 5529  ICP 11605

787

Saxena et al. (2017d)

ICPA 2039  ICPL 87119

306

Saxena et al. (2018a, b)

ICPL 20097  ICP 8863

4867

Saxena et al. (2020d)

484

Saxena et al. (2017c)

ICPB 2049  ICPL 99050

964

ICPL 20096  ICPL 332

1101

ICPL 99010  ICP 5529

6818

Yadav et al. (2019)

ICP 11605  ICP 14209

662

Obala et al. (2020)

ICP 8863  ICP 11605

363

Obala et al. (2020)

HPL 24  ICP 11605

607

Obala et al. (2020)

ICP 8863  ICPL 87119

997

Obala et al. (2020)

Saxena et al. (2017a) Saxena et al. (2017a,c)

Consensus genetic maps Six mapping populations

339

Bohra et al. (2012)

Three mapping populations

932

Arora et al. (2017)

Five mapping populations

984

Obala et al. (2020)

assembly C. cajan transcriptome assembly (CcTA v. 1) was developed from 31 various tissues of Pusa Ageti (ICP 28), producing 494,535 short transcript reads and 10,817 ESTs (Dubey et al. 2011). Transcript reads from different NGS platforms were combined a comprehensive assembly “CcTA v. 2” that comprises of 21,434 TACs from 16 genotypes (Kudapa

et al. 2012). Further, comparative analysis of the reads with the soybean genome led discovery of intron spanning region (ISR) markers. Of the total 10,009 ISR markers identified from the assembly, a subset of 116 ISRs was validated in a set of eight pigeonpea genotypes. Pazhamala et al. (2017) constructed a gene expression atlas of pigeonpea (CcGEA) of Asha genotype. The

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

authors generated RNA-Seq data from 30 different tissues covering germination to senescence and the CcGEA catalogues a total of 28,793 genes. Network analysis of the flowering-related genes led to the identification of highly connected genes or hub genes, viz. as sucrose proton symporter 2 (C.cajan_35396), SF3 protein (C.cajan_07765) and genes (C.cajan_28171) encoding a putative H+ Symporting sucrose transporter protein, which could play an important role in pollen development and maturation. More recently, Rathinam et al. (2019) performed a comparative transcriptome analysis between cultivated pigeonpea (C. cajan) and wild species (C. platycarpus) to elucidate differential gene expression. Furthermore, the study validated a subset of 20 DEGs by qRT-PCR, which code for transcription factors, receptor like kinase and genes for secondary metabolism.

15.3.6 Identification of QTL/Candidate Genes for Important Traits Traits of agronomic importance are controlled by a number of genes or QTL having variable influence on the traits of interest, reflected in the form of phenotypic variation (PV) explained or R2. Conventional QTL mapping based on biparental mapping populations is limited due to their poor mapping resolution, and these result in identification of a board chromosomal region which may subsequently require fine mapping before their use in crop breeding (Bohra et al. 2020b). Initially, SSR markers were used for trait mapping in pigeonpea. Gnanesh et al. (2011) mapped QTL for SMD based on analysis of phenotypic and genotypic data recorded on two F2 populations (ICP 8863  ICPL 20097 and TBT7  ICP 7035), and the detected QTL had PV in the range of 8.3–24.7%. Studies have also elucidated QTL for fertility restoration in A4CMS (Bohra et al. 2012; Saxena et al. 2018b). DNA marker targeting 10-bp deletion in nad7a gene was also developed following analysis of SVs in mitochondrial genomes in combination with expression profiling of essential

293

mitochondrial genes (Sinha et al. 2015a). Various studies in pigeonpea have found QTL/candidate genes for other important traits including determinacy, flowering and plant height. For example, candidate gene CcTFL1 was identified in a panel of 142 pigeonpea germplasm, which explained PVs for determinacy (45–96%), flowering time (45%) and plant height (77%) (Mir et al. 2012). Concerning plant growth habit and earliness, Kumawat et al. (2012) found 13 QTLs in the population (Pusa dwarf  HDM04-1). A list of QTLs related to important breeding traits of pigeonpea is presented in Table 15.3. Gene discovery in pigeonpea has been accelerated by the availability of sequence-based trait mapping approaches such as QTL-seq. These approaches harness the benefits of reference genome sequence information and growing capacities of high-throughput genotyping/ sequencing systems (Bohra et al. 2020b). Sequence-based trait mapping can be conducted in two ways: sequencing the extreme bulks/ selected individuals or sequencing the entire mapping population segregating for traits of interest. Apart from low-coverage WGRS or skim sequencing, GBS and RAD-Seq can be used to genotype mapping individuals. Singh et al. (2016) adopted sequence-based trait mapping to find candidate genes such as C. cajan_03203 for Fusarium wilt (FW) and C. cajan_01839 for sterility mosaic disease (SMD). Similarly, Saxena et al. (2017b) have reported a set of 10 QTL controlling 3.6–34.3% phenotypic variation (PV) of SMD resistance. In another study, eight QTLs with PVs from 6.55 (qFW1.1) to 14.67% (qFW3.1) were detected for FW resistance (Saxena et al. 2017c). These highdensity genotyping platforms have also helped elucidating the genetic architecture of seed protein content (SPC) in pigeonpea. GBS data on five mapping populations allowed identification of both main effect and epistatic QTL for SPC and other agronomic traits (Obala et al. 2020). Earlier, the authors showed association of four cleaved amplified polymorphic sequence (CAPS) markers with SPC in an F2 population ICP 5529  ICP 11605 (Obala et al. 2019).

294

A. Bohra et al.

Table 15.3 Significant QTL associated with important breeding traits Trait of interest Fertility restoration

Number of QTL

Associated DNA marker

Range of phenotypic variation (%)

References

4

SSR

14.85–24.17

Bohra et al. (2012)

3

SNP

8–28.5

Saxena et al. (2018a, b)

4

SNP

2.34–45.06

Saxena et al. (2020a)

Fusarium wilt (FW)

6

SNP

10.04–15.26

Saxena et al. (2017a)

Sterility mosaic disease (SMD)

6

SSR

8.3–24.72

Gnanesh et al. (2011)

4

SNP

12.99–24.2

Saxena et al. (2017a)

Cleistogamous flower

5

SNP

9.1–50.6

Yadav et al. (2019)

Shrivelled seed

3

SNP

11.8–37.2

Yadav et al. (2019)

Seed size

2

SNP

29.5–33.5

Yadav et al. (2019)

Plant height (cm)

2

SNP

27.5–28.0

Kumawat et al. (2012)

Earliness (days to flowering and days to maturity)

5

SNP

8.7–51.4

Kumawat et al. (2012)

Seed length (mm)

2

SSR

10.48–15.75

Bohra et al. (2020a)

Seed width (mm)

3

SSR

7.67–16.07

Bohra et al. (2020a)

Seed thickness (mm)

3

SSR

6.75–10.47

Bohra et al. (2020a)

Seed weight (g)

2

SSR

8.46–13.37

Bohra et al. (2020a)

Electrical conductivity (lS cm−1 g−1)

1

SSR

19.91

Bohra et al. (2020a)

Water uptake (%)

4

SSR

4.4–5

Bohra et al. (2020a)

11

SNP

0.7–91.3

Obala et al. (2020)

100-seed weight

5

SNP

3.4–46.6

Obala et al. (2020)

Seed yield

5

SNP

1.7–53.0

Obala et al. (2020)

Days to first flowering

5

SNP

2.1–47.6

Obala et al. (2020)

11

SNP

3.4–91.3

Obala et al. (2020)

Seed protein content (SPC)

Growth habitat

The CAPS markers are based on sequence variants detected initially from WGRS data of four pigeonpea genotypes followed by confirmation using Sanger sequencing.

15.3.7 Haplotype-Based Breeding Haplotype assembly approach was proposed by Bevan et al. (2017) as an efficient approach to assemble several favourable alleles in a genotype. When used in retrospective terms, the approach applied on crop mega varieties could reveal the genomic regions associated with key

breeding decisions. The parental lines are the target of this haplo-pheno analysis where a superior haplotype for the given trait could be selected for this purpose. With the use of WGRS, researchers find marker-trait associations at a higher resolution and deploy the set of superior haplotypes in breeding approaches (Bhat et al. 2021). Mining of WGRS data of 292 pigeonpea accessions facilitated discovery of 83, 132 and 60 haplotypes for 10 drought-responsive genes in breeding lines, landraces and wild species, respectively. Furthermore, candidate gene-based association mapping and haplo-pheno analysis revealed a subset of most promising haplotypes

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

(C. cajan_23080-H2, C. cajan_30211-H6, C. cajan_26230-H11, C. cajan_26230-H5) showing strong association with five traits (Sinha et al. 2020). Researchers could now examine functional evaluation and epistatic interactions of these haplotypes under different conditions. The haplotype breeding could lead to the recovery of ideotypes via assembling a set of superior haplotypes having all preferable set of genes.

15.4

Rapid Generation Turnover

Protocols and recipes that allow shortening of generation time interval of a crop via manipulating growth conditions are grouped under “Speed breeding.” Examples include a recent research by Watson et al. (2018) relying on optimisation of photoperiod, temperature in concert with immature seed harvest to accelerate time to anthesis and breeding cycle. The SB protocols are particularly suited for long-day plants or day-neutral plants (Ghosh et al. 2018). The SB recipes have been optimised for various crops including chickpea, pea, barley and spring wheat, and the SB protocols allowed six generations per year. For reducing the length of crop breeding cycle, a chamber with controlled environment could be custom-made with light emitting diodes (LEDs), supplemental lighting, air and temperature regulators (Chiurugwi et al. 2019). Speed breeding can also be achieved by creating off-season nurseries. Research by Croser et al. (2016) involving cool season grain legumes (chickpea, pea, lentils, faba bean and lupin) demonstrated a faster flowering induction across all studied crops and genotypes with decreasing R:FR (red:far red) ratio. The study highlighted the importance of light quality in flowering induction in legume crops. Mobini et al. (2015) advocated for the use of plant growth regulators like cytokinins and auxins to obtain five generations in field pea within a year. Pigeonpea is a photoperiod-sensitive crop, which requires extended period of darkness for anthesis. Speed breeding of early maturating pigeonpea (90–120 days) could be easily done as these were reported to be relatively photo-

295

insensitive (Turnbull et al. 1981; Wallis et al. 1981). In pigeonpea, rapid generation turnover has been achieved with early flowering genotypes. Faster generation turnover was achieved in a greenhouse facility with natural light and evaporative air coolers using 5 hp pump @2870– 2900 rpm, blower wheel delivering air @2980– 9330 m3/H, a temperature regime of 28–32 °C and humidity in the range of 50–60%. These optimised conditions in combination with early immature seed harvest (35-day-old) with single pod descent method of breeding caused a significant reduction in the generation time by about 3 weeks (Saxena et al. 2017b, 2019). The genotypes used for the study were ICPLs 00004, 00151, 85024 and 97093. A wider application of these protocols to other maturity groups in pigeonpea remains to be seen. The application of SB protocols to different species warrants an exhaustive survey of photoperiod response of different genotypes (Bohra et al. 2020b, c; Samantara et al. 2022).

15.5

Conclusion and Prospects

Genomic resources in pigeonpea have improved significantly over the last decade. Only, 10 SSR markers were reported in pigeonpea till 2001, and there was no genetic linkage map in this crop till 2010. However, recent advances in pigeonpea genomics, driven mainly by the NGS technologies, have opened up new avenues for genomicsassisted breeding in pigeonpea. The current arsenal of genomic tools in pigeonpea includes high-density genome maps, comprehensive transcriptomic resources, whole genome sequence, genome-wide SNP array, pangenome and predictive DNA markers. Figure 15.5 illustrates a genomics-assisted approach for improving genetic gain in pigeonpea. Targeted and rapid improvement of pigeonpea mega varieties such as Bahar, UPAS 120, LRG 41, TJT 501 and BDN 711 has been undertaken using genomics-assisted approaches. The potential to reduce length of breeding cycle of pigeonpea has been successfully demonstrated in case of extra-early genotypes. More efforts are needed to optimise the speed breeding recipe for

296

A. Bohra et al.

Fig. 15.5 Genomics-assisted breeding for accelerating genetic gain in pigeonpea

pigeonpea genotypes belonging to different maturity groups. Deployment of modern genomic tools and technologies will impart greater strength to pigeonpea improvement programmes (Bohra et al. 2017c). For example, genome-wide prediction and genomic selection can be used for development of heterotic pools and identification of heterotic patterns for long-term gains in pigeonpea (Bohra et al. 2020a). Identification of superior haplotypes for a variety of important traits in pigeonpea would expedite delivery of tailor-made pigeonpea cultivars for future.

References Adjei-Nsiah S (2012) Role of pigeonpea cultivation on soil fertility and farming system sustainability in Ghana. Int J Agron 702506. http://doi.org/10.1155/ 2012/702506 Ahiabor BD, Hirata H (1994) Characteristic responses of three tropical legumes to the inoculation of two species of VAM fungi in Andosol soils with different fertilities. Mycorrhiza 5:63–70 Arora S, Mahato AK, Singh S, Mandal P, Bhutani S, Dutta S et al (2017) A high-density intraspecific SNP linkage map of pigeonpea (Cajanas cajan L. Millsp.).

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

PLoS ONE 12(6):0179747. https://doi.org/10.1371/ journal.pone.0179747 Bahadur B, Rao MM, Rao KL (1981) Studies on dimorphic stamens and pollen (SEM) and its possible role in pollination biology of Cajanus cajan Millsp. Indian J Bot 4:122–129 Bekele-Tessema A (2007) Profitable agroforestry innovations for eastern Africa: experience from 10 agroclimatic zones of Ethiopia, India, Kenya, Tanzania and Uganda. World Agroforestry Centre (ICRAF), Eastern Africa Region Bevan M, Uauy C, Wulff B, Zhou J, Krasileva K, Clark M (2017) Genomic innovation for crop improvement. Nature 543:346–354 Bhat JA, Yu D, Bohra A, Ganie SA, Varshney RK (2021) Features and applications of haplotypes in crop breeding. Commun Biol 4:1266 Bhatia GK, Gupta SC, Green JM, Sharma D, Kumble V (eds) (1981) Estimates of natural cross-pollination in Cajanus cajan (L.) Millsp. Several experimental approaches. In: Proceedings of international workshop on pigeonpeas, ICRISAT, Patancheru, India, pp 129– 136 Bohra A, Mallikarjuna N, Saxena K, Upadhyaya H, Vales MI, Varshney RK (2010) Harnessing the potential of crop wild relatives through genomics tools for pigeonpea improvement. J Plant Biol 37: 1–16 Bohra A, Saxena RK, Gnanesh BN et al (2012) An intraspecific consensus genetic map of pigeonpea [Cajanus cajan (L.) Millspaugh] derived from six mapping populations. Theor Appl Genet 125(6):1325–1338 Bohra A, Jha UC, Adhimoolam P et al (2016) Cytoplasmic male sterility (CMS) in hybrid breeding in field crops. Plant Cell Rep 35:967–993 Bohra A, Jha R, Pandey G, Patil PG, Saxena RK, Singh IP, Singh D, Mishra RK, Mishra A, Singh F, Varshney RK, Singh NP (2017a) New hypervariable SSR markers for diversity analysis, hybrid purity testing and trait mapping in pigeonpea [Cajanus cajan (L.) Millspaugh]. Front Plant Sci 8:1–15 Bohra A, Jha A, Singh IP, Pandey G, Pareek S, Basu PS, Chaturvedi SK, Singh NP (2017b) Novel CMS lines in pigeonpea [Cajanus cajan (L.) Millspaugh] derived from cytoplasmic substitutions, their effective restoration and deployment in hybrid breeding. Crop J 5:89– 94 Bohra A, Pareek S, Jha R, Saxena RK, Singh IP, Pandey G et al (2017c) Modern genomic tools for pigeonpea improvement: status and prospects. In: Varshney RK, Saxena RK, Scott J (eds) The pigeonpea genome. Springer, Cham, pp 41–54 Bohra A, Jha R, Lamichaney A et al (2020a) Mapping QTL for important seed traits in an interspecific F2 population of pigeonpea. 3 Biotech 10(10):434. http://doi.org/10.1007/s13205-020-02423-x Bohra A, Jha UC, Godwin ID, Varshney RK (2020b) Genomic interventions for sustainable agriculture. Plant Biotechnol J. https://doi.org/10.1111/pbi. 13472doi:10.1111/pbi.13472

297

Bohra A, Prasad G, Rathore A, Saxena RK, Satheesh Naik SJ, Pareek S, Jha R, Pazhamala L, Datta D, Pandey G, Tiwari A, Maurya AK, Soren KR, Akram M, Varshney RK, Singh NP (2021a) Global gene expression analysis of pigeonpea with male sterility conditioned by A2 cytoplasm. Plant Genome 14:e20132 Bohra A, Rathore A, Gandham P, Saxena RK, Satheesh Naik SJ, Dutta D, Singh IP, Singh F, Rathore M, Varshney RK, Singh NP (2021b) Genome-wide comparative transcriptome analysis of the A4-CMS line ICPA 2043 and its maintainer ICPB 2043 during the floral bud development of pigeonpea. Funct Integr Genomics 21:251–263 Bohra A, Saxena KB, Varshney RK et al (2020c) Genomics-assisted breeding for pigeonpea improvement. Theor Appl Genet 133:1721–1737 Carney JA, Rosomoff RN (2009) In the shadow of slavery. Africa’s botanical legacy in the Atlantic world. University of California Press, Berkeley Center for New Crops and Plants Products (2002) Cajanus cajan (L.) Millsp. Purdue University. http:// www.hort.purdue.edu/newcrop/duke_energy/Cajanus_ cajun.html Chiurugwi T, Kemp S, Powell W, Hickey LT (2019) Speed breeding orphan crops. Theor Appl Genet 132(3):607–616. http://doi.org/10.1007/s00122-0183202-7 Colombo N, Galmarini CR (2017) The use of genetic, manual and chemical methods to control pollination in vegetable hybrid seed production: a review. Plant Breed 136:287–299 Croser JS, Pazos-Navarro M, Bennett RG et al (2016) Time to flowering of temperate pulses in vivo and generation turnover in vivo–in vitro of narrow-leaf lupin accelerated by low red to far-red ratio and high intensity in the far-red region. Plant Cell Tissue Organ Cult 127:591–599 Drabu S, Chaturvedi S, Sharma M (2011) Analgesic activity of methanolic extract from aerial parts of Rhynchosia capitata DC. Int J Pharm Technol 3:3590– 3600 Dubey A, Farmer A, Schlueter J, Cannon SB, Abernathy B, Tuteja R, et al. (2011) Defining the transcriptome assembly and its use for genome dynamics and transcriptome profiling studies in pigeonpea (Cajanus cajan L.) DNA Res.18:153–164 Dutta S, Kumawat G, Singh BP, Gupta DK, Singh S, Dogra V et al (2011) Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh] BMC Plant Biol 11:17 Food and Agriculture Organization (2018) http://faostat. fao.org/database Ghosh S, Watson A, Gonzalez-Navarro OE et al (2018) Speed breeding in growth chambers and glasshouses for crop breeding and model plant research. Nat Protoc 13:2944–3296 Gnanesh B et al (2011) Genetic mapping and quantitative trait locus analysis of resistance to sterility mosaic

298 disease in pigeonpea [Cajanus cajan (L.) Mills]. Field Crops Res 123:56–61 Harlan JR, de Wet JMJ (1971) Towards a rational classification of cultivated plants. Taxon 20:509–517 Heuzé V, Thiollet H, Tran G, Delagarde R, Bastianelli D, Lebas F (2017) Pigeon pea (Cajanus cajan) forage. Feedipedia, a programme by INRAE, CIRAD, AFZ and FAO. https://www.feedipedia.org/node/22444 Howard A, Howard GL, Abdur R (1919) Studies in the pollination of Indian crops. I. Mem Dept Agric India (Bot Ser) 10:195–200 Kassa MT, Penmetsa RV, Carrasquilla-Garcia N, Sarma BK, Datta S, Upadhyaya HD, Varshney R, von Wettberg EJ, Cook DR (2012) Genetic patterns of domestication in pigeonpea (Cajanus cajan (L.) Millsp.) and wild Cajanus relatives. PLoS One 7(6): e39563. http://doi.org/10.1371/journal.pone.0039563 Kudapa H, Bharti AK, Cannon SB, Farmer AD, Mulaosmanovic B, Kramer R, Bohra A, Weeks NT, Crow JA, Tuteja R, Shah T, Dutta S, Gupta DK, Singh A, Gaikwad K, Sharma TR, May GD, Singh NK, Varshney RK (2012) A comprehensive transcriptome assembly of pigeonpea (Cajanaus cajan L.) using Sanger and Second-generation sequencing platforms. Mol Plant 5:1020–1028 Kumar RR, Yadav S, Joshi S et al (2014) Identification and validation of expressed sequence tags from pigeonpea (Cajanus cajan L.) root. Int J Plant Genomics 651912 Kumar V, Khan AW, Saxena RK, Garg V, Varshney RK (2016) First generation HapMap in Cajanus spp. reveals untapped variations in parental lines of mapping populations. Plant Biotechnol J 14:1673– 1681 Kumawat G, Raje RS, Bhutani S et al (2012) Molecular mapping of QTLs for plant type and earliness traits in pigeonpea (Cajanus cajan L. Millsp.). BMC Genet 13:84. Published 2012 Oct 8. http://doi.org/10.1186/ 1471-2156-13-84 Mahato AK, Sharma AK, Sharma TR, Singh NK (2018) An improved draft of the pigeonpea (Cajanus cajan (L.) Millsp.) genome. Data Brief 16:376–380 Mahta DN, Dave BB (1931) Studies in Cajanus indicus. Mem Dept Agric India (Bot Ser) 19:1–25 Mallikarjuna N, Jadhav DR, Saxena KB, Srivastava RK (2012) Cytoplasmic male sterile systems in pigeonpea with special reference to A7 CMS. Elect J Plant Breed 3:983–986 Mallikarjuna N, Srikanth S, Sameer Kumar CV, Srivastava R, Saxena RK, Varshney RK (2014) In: Singh M et al (eds) Broadening the genetic base of grainlegumes, vol 1. Springer, India Edition, pp 149–159 Mir RR, Saxena RK, Saxena KB, Upadhyaya HD, Kilian A, Cook DR, Varshney RK (2012) Wholegenome scanning for mapping determinacy in pigeonpea (Cajanus spp.). Plant Breed 132:472–478 Mobini SH, Lulsdorf M, Warkentin TD, Vandenberg A (2015) Plant growth regulators improve in vitro flowering and rapid generation advancement in lentil and faba bean. In Vitro Cell Dev Biol 51:71–79

A. Bohra et al. Murthy KSR, Emmannuel S (2011) Nutritional and antinutritional properties of the unexploited wild legume Rhynchosia bracteata benth. Bangadesh J Sci Ind Res 46:141–146 Naik SJ, Singh IP, Bohra A, Singh F, Datta D, Mishra RK, Singh NP (2020) Analyzing the genetic relatedness of pigeonpea varieties released over last 58 years in India. Indian J Genet 80(1):70–76 Nigam D, Saxena S, Ramakrishna G et al (2017) De novo assembly and characterization of Cajanus scarabaeoides (L.) Thouars Transcriptome by paired-end sequencing. Front Mol Biosci 4:48 Obala J, Saxena RK, Singh V, Sameer Kumar CV, Saxena KB, Tongoona P, Sibiya J, Varshney RK (2019) Development of sequence-based markers for seed protein content in pigeonpea. Mol Gen Genom 294:57–68 Obala J, SaxenaRK SVK et al (2020) Seed protein content and its relationships with agronomic traits in pigeonpea is controlled by both main and epistatic effects QTLs. Sci Rep 10:214 Odeny D (2007) The potential of pigeonpea (Cajanus cajan (L.) Millsp.) in Africa. Nat Res Forum 31:297– 305 Oshodi A, Olaofe O, Hall GM (1993) Amino acid, fatty acid and mineral composition of pigeon pea (Cajanus cajan). Int J Food Sci Nutr 43:187–191 Pazhamala LT, Shilp S, Saxena RK, Garg V, Krishnamurthy L, Verdier J, Varshney RK (2017) Gene expression atlas of pigeonpea and its application to gain insights into genes associated with pollen fertility implicated in seed formation. J Exp Bot 68:2037–2054 Raju NL, Gananesh BN, Lekha P, Jayashree B, Pande S, Hiremath PJ, Byregowda M, Singh NK, Varshney RK (2010) The first set of EST resource for gene discovery and marker development in pigeonpea (Cajanus cajan L.). BMC Plant Biol 10:45. http:// doi.org/10.1093/mp/ssr111 Rao MKV, Rao RG (1974) Gibberellin like substance in developing and germinating seeds of pigeonpea. Indian J Plant Physiol 17(1/2):65–72 Rao SC, Phillips WA, Mayeux HS, Phatak SC (2003) Potential grain and forage production of early maturing pigeonpea in the southern great plains. Crop Sci 43:2212–2217 Rasheed A, Hao Y, Xia X et al (2017) Crop breeding chips and genotyping platforms: progress, challenges, and perspectives. Mol Plant 10:1047–1064 Rathinam M, Mishra P, Vasudevan M et al (2019) Comparative transcriptome analysis of pigeonpea, Cajanus cajan (L.) and one of its wild relatives Cajanus platycarpus (Benth.) Maesen. PLoS One 14 (7):e0218731. http://doi.org/10.1371/journal.pone. 0218731 Royal Horticultural Society (Great Britain). Extracts from the proceedings of the Royal Horticultural Society. http://www.biodiversitylibrary.org/bibliography/ 79650 Samantara K, Bohra A, Mohapatra SR, Prihatini R, Asibe F, Singh L, Reyes VP, Tiwari A, Maurya AK,

15

Breeding and Genomics of Pigeonpea in the Post-NGS Era

Croser JS, Wani SH, Siddique KHM, Varshney RK (2022) Breeding more crops in less time: a perspective on speed breeding. Biology 11:275 Saxena KB, Kumar RV (2010) Insect-aided natural outcrossing in four wild relatives of pigeonpea. Euphytica 173:329–335 Saxena RK, Penmetsa RV, Upadhyaya HD, Kumar A, CarrasquillaGarcia N, Schlueter JA, Farmer A, Whaley AM, Sarma BK, May GD, Cook DR, Varshney RK (2012) Large-scale development of cost-effective single-nucleotide polymorphism marker assays for genetic mapping in pigeonpea and comparative mapping in legumes. DNA Res 19:449–461 Saxena RK, von Wettberg E, Upadhyaya HD, Sanchez V, Songok S, Saxena K, Kimurto P, Varshney RK (2014) Genetic diversity and demographic history of Cajanus spp. illustrated from genome-wide SNPs. PLoS One 9 (2):e88568 Saxena RK, Singh VK, Kale SM et al (2017a) Construction of genotyping-by-sequencing based high-density genetic maps and QTL mapping for fusarium wilt resistance in pigeonpea. Sci Rep 7:1911 Saxena KB, Saxena R, Varshney R (2017b) Use of immature seed germination and single seed descent for rapid genetic gains in pigeonpea. Plant Breed 136. http://doi.org/10.1111/pbr.12538 Saxena RK, Kale SM, Kumar VPS, Joshi S, Singh VK, Garg VDRR, Sharma M, Yamini KN, Ghanta A, Rathore A, Sameer Kumar CV, Saxena KB, Varshney RK (2017c) Genotyping by sequencing of three mapping populations for identification of candidate genomic regions for resistance to sterility mosaic disease in pigeonpea. Sci Rep 7:1813. http://doi.org/ 10.1038/s41598-017-01535-4 Saxena RK, Obala J, Sinjushin A, Sameer Kumar CV, Saxena KB, Varshney RK (2017d) Characterization and mapping of Dt1 locus which co-segregates with CcTFL1 for growth habit in pigeonpea. Theor Appl Genet 130:1773–1784 Saxena RK, Rathore A, Bohra A et al (2018a) Development and application of high-density axiom cajanus SNP array with 56K SNPs to understand the genome architecture of released cultivars and founder genotypes. Plant Genome 11(3). http://doi.org/10.3835/ plantgenome2018b.01.0005 Saxena RK, Patel K, Sameer Kumar CV, Tyagi K, Saxena KB, Varshney RK (2018b) Molecular mapping and inheritance of restoration of fertility (Rf) in A4 hybrid system in pigeonpea (Cajanus cajan (L.) Millsp.). Theor Appl Genet 131(8):1605–1614 Saxena KB, Saxena RK, Hickey LT, Varshney RK (2019). Can a speed breeding approach accelerate genetic gain in pigeonpea? Euphytica 215. http://doi. org/10.1007/s10681-019-2520-4 Saxena KB, Bohra A, Choudhary A, Sultana R, Sharma M, Pazhamala LT, Saxena R (2020a) The alternative breeding approaches for improving yield gains and stress response in pigeonpea. Plant Breed. http://doi.org/10.1111/pbr.12863

299

Saxena RK, Hake A, Hingane A et al (2020b) Translational pigeonpea genomics consortium for accelerating genetic gains in pigeonpea (Cajanus cajan L.). Agronomy 10:1289. http://doi.org/10.3390/agronomy 10091289 Saxena RK, Hake A, Bohra A et al (2020c) A diagnostic marker kit for fusarium wilt and sterility mosaic diseases resistance in pigeonpea. Theor Appl Genet. http://doi.org/10.1007/s00122-020-03702-0 Saxena RK, Molla J, Yadav P, Varshney RK (2020d) High resolution mapping of restoration of fertility (Rf) by combining large population and high density genetic map in pigeonpea [Cajanus cajan (L.) Millsp.]. BMC Genom 21(1):460 Scott MF, Ladejobi O, Amer S et al (2020) Multi-parent populations in crops: a toolbox integrating genomics and genetic mapping with breeding. Heredity. https:// doi.org/10.1038/s41437-020-0336-6 Sharma P et al (2020) Updates of pigeonpea breeding and genomics for yield improvement in India. In: Gosal SS, Wani SH (eds) Accelerated plant breeding, vol 3. Springer, Cham. http://doi.org/10.1007/978-3030-47306-8_4 Singh A (2016) Insect pollinators and productivity of pigeonpea. Indian J Entomol 78:163. https://doi.org/ 10.5958/0974-8172.2016.00045.6 Singh NK, Gupta DK, Jayaswal PK, Mahato AK, Dutta S, Singh S, Bhutani S, Dogra V, Singh BP, Kumawat G, Pal JK, Pandit A, Singh A, Rawal H, Kumar A, Prashat RG, Khare A, Yadav R, Raje RS, Singh MN, Datta S, Fakrudin B, Wanjari KB, Kansal R, Dash PK, Jain PK, Bhattacharya R, Gaikwad K, Mohapatra T, Srinivasan R, Sharma TR (2011) The first draft of the pigeonpea genome sequence. J Plant Biochem Biotechnol 21:98–112 Singh VK, Khan AW, Saxena RK, Kumar V, Kale SM, Sinha P, Chitikineni A, Pazhamala LT, Garg V, Sharma M, Sameer Kumar CV, Parupalli S, Vechalapu S, Patil S, Muniswamy S, Ghanta A, Yamini KN, Dharmaraj PS, Varshney RK (2016) Next-generation sequencing for identification of candidate genes for Fusarium wilt and sterility mosaic disease in pigeonpea (Cajanus cajan). Plant Biotechnol J 14:1183– 1194 Singh VK, Khan AW, Saxena RK et al (2017) Indel-seq: a fast-forward genetics approach for identification of trait-associated putative candidate genomic regions and its application in pigeonpea (Cajanus cajan). Plant Biotechnol J 15(7):906–914 Singh S, Mahato AK, Jayaswal PK et al (2020) A 62K genic-SNP chip array for genetic studies and breeding applications in pigeonpea (Cajanus cajan L. Millsp.). Sci Rep 10:4960. http://doi.org/10.1038/s41598-02061889-0 Sinha P, Saxena KB, Saxena RK, Singh VK, Suryanarayana V, Sameer Kumar V, Katta MAVS, Khan AW, Varshney RK (2015a) Association of nad7a gene with cytoplasmic male sterility in pigeonpea. Plant Genome 8:1–12

300 Sinha P, Singh VK, Suryanarayana V, Krishnamurthy L, Saxena RK, Varshney RK (2015b) Evaluation and validation of housekeeping genes as reference for gene expression studies in pigeonpea (Cajanus cajan) under drought stress conditions. PLoS ONE 10:e0122847. http://doi.org/10.1371/journal.pone.0122847 Sinha P, Saxena RK, Singh VK, Varshney RK (2015c) Selection and validation of housekeeping genes as reference for gene expression studies in pigeonpea (Cajanus cajan) under heat and salt stress conditions. Front Plant Sci 6:1071. http://doi.org/10.3389/fpls. 2015b.01071 Sinha P, Singh V, Saxena R, Khan A, Abbai R, Chitikineni A, Desai A, Molla J, Upadhyaya H, Kumar A, Varshney R (2020) Superior haplotypes for haplotype based breeding for drought tolerance in pigeonpea (Cajanus cajan L.). Plant Biotechnol J. http://doi.org/10:1111/pbi.13422 Turnbull LV, Whiteman PC, Byth DE (1981) Influence of temperature and photoperiod on floral development of early flowering pigeonpea. In: Proceedings of the international workshop on pigeonpeas, vol 2. ICRISAT, Patancheru, pp 217–222 Upadhyaya HD, Reddy LJ, Gowda CLL, Reddy KN, Singh S (2006) Development of a mini core for enhanced and diversified utilization of pigeonpea germplasm resources. Crop Sci 46:2127–2132 Upadhyaya H, Reddy K, Laxmipathi GC, Sube S (2007) Phenotypic diversity in the pigeonpea (Cajanus cajan) core collection. Genet Resour Crop Evol 54:1167– 1184 Varshney R, Chen W, Li Y et al (2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30:83–89 Varshney RK, Saxena RK, Upadhyaya HD et al (2017) Whole-genome resequencing of 292 pigeonpea accessions identifies genomic regions associated with domestication and agronomic traits. Nat Genet 49 (7):1082–1088

A. Bohra et al. Varshney RK, Bohra A, Yu J, Graner A, Zhang Q, Sorrells ME (2021a) Designing future crops: genomics-assisted breeding comes of age. Trends Plant Sci 26:631–649 Varshney RK, Bohra A, Roorkiwal M, Barmukh R, Cowling W, Chitikineni A, Lam HM, Hickey LT, Croser J, Bayer P, Edwards D, Crossa J, Weckwerth W, Harvey M, Kumar A, Bevan MW, Siddique KHM (2021b) Fast-forward breeding for a foodsecure world. Trends Genet 37:1124–1136 Wallis ES, Byth DE, Saxena KB (1981) Flowering responses of thirty-seven early maturing lines of pigeonpea. In: Proceedings of the international workshop on pigeonpeas, vol 2. ICRISAT, Patancheru, pp 143–159 Watson A, Ghosh S, Williams MJ, Cuddy WS, Simmonds J, Rey MD, Hatta MAM et al (2018) Speed breeding is a powerful tool to accelerate crop research and breeding. Nat Plants 4:23–29 Wu N, Fu K, Fu Y-J, Zu Y-G, Chang F-R, Chen Y-H, Liu X-L, Kong Y, Liu W, Gu C (2009) Antioxidant activities of extracts and main components of pigeonpea [Cajanus cajan (L.) Millsp.] leaves. Molecules (Basel, Switzerland) 14:1032–1043 Yadav P, Saxena KB, Hingane A et al (2019) An “Axiom Cajanus SNP Array” based high density genetic map and QTL mapping for high-selfing flower and seed quality traits in pigeonpea. BMC Genomics 20:235 Yang S, Pang W, Ash G et al (2006) Low level of genetic diversity in cultivated pigeonpea compared to its wild relatives is revealed by diversity arrays technology. Theor Appl Genet 113(4):585–595 Yang SY, Saxena RK, Kulwal PL et al (2011) The first genetic map of pigeon pea based on diversity arrays technology (DArT) markers. J Genet 90:103–109 Zhao J, Bayer PE, Ruperao P et al (2020) Trait associations in the pan genome of pigeon pea (Cajanus cajan). Plant Biotechnol J 18(9):1946–1954

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia of Micronutrients Essential for Sustainable Food and Nutritional Security

16

Tanushri Kaul, Sonia Khan Sony, Jyotsna Bharti, Rachana Verma, Mamta Nehra, Arulprakash Thangaraj, Khaled Fathy Abdel Motelb, Rashmi Kaul, and Murugesh Easwaran Abstract

Food and nutritional security in the face of global climatic changes and population spike poses challenges to the scientific community. Rice bean (Vigna umbellata) is a neglected, underutilized food, feed and pharmaceutically important nutrient-dense crop and is adapted to grow in diverse agricultural lands. Moreover, it is resistant to umpteen biotic and abiotic stresses, including drought, aluminium toxicity, and numerous diseases. Despite having nutritional importance still this crop is categorized as underutilized due to insufficient awareness of its nutritional potential and extensive research. This situation can be expected to be tapped via bringing rice bean into the mainstream of research employing genome editing and molecular breeding approaches. The Nutritional Improvement of Crops group at ICGEB has spearheaded whole genome sequencing (WGS) and transcriptome analyses for the rice bean to accomplish the trait improvement. This chapter has high-

T. Kaul (&)  S. Khan Sony  J. Bharti  R. Verma  M. Nehra  A. Thangaraj  K. Fathy Abdel Motelb  R. Kaul  M. Easwaran Nutritional Improvement of Crops Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), New Delhi, India e-mail: [email protected]

lighted the origin, domestication, and diversification of rice bean germplasm. We also shed light on the nutritional potential and significant sustainable strategies to improve the germplasm of rice bean to alleviate the global malnutrition problem.

16.1

Introduction: A Way Forward to Nourish Future Generations

Rice bean (V. umbellata syn. Phaseolus calcarata (Roxb.); Azuki umbellata (Thunb.) Owhi and Ohashi; Fig. 16.1) is a multipurpose legume crop belonging to one of the most important nutritionally rich grain crops family, i.e. Fabaceae/Leguminosae (Bhardwaj et al. 2021; Dahipahle et al. 2017). Vigna as a genus is comprised of numerous domesticated commercially relevant species, including V. radiata (mungbean), V. angularis (adzuki bean), V. unguiculata (cowpea) and others, which makes it an economically important genus (Dahipahle et al. 2017). However, V. also comprises other nutritionally rich and bioactive molecules packed species that are underdogs and have not shared the obvious limelight linked to the legume family (Priyadarshini et al. 2021). Rice bean apparently is such an underutilized, orphan and lesserknown crop that holds enormous potential waiting to be utilized to feed the exponentially growing population with nutritionally balanced

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_16

301

302

T. Kaul et al.

Fig. 16.1 Common uses of rice bean. a Vegetative stage. b, c Reproductive stage: vine legume plants with yellow flowers and small edible beans. d Maturation stage: mature grain. The vegetative parts can be fed to livestock

as fresh or made into hay. Grain is generally used as dal/boiled soup for human consumption. Grain foliage is used as livestock fodder and manure

food in future (Bhat and Karim 2009). Being a legume rice bean is potent in accumulating biological nitrogen to revive the fertility of the soil and thus can be a considerable intercropping option. It has been utilized as a source for livestock food and fodder (Dahipahle et al. 2017; Khanal et al. 2009), nutritious flour, and culinary purposes (Tripathi et al. 2021). Interestingly, amongst rice bean varieties, the rice bean VRB3 exhibited a high average yield of 1708 kg/ha at multiple locations, thereby showing the potential to get established and flourish in different soil

types (Rana et al. 2014). Nonetheless, considering its commendable agricultural performance and nutritional excellence, the abandonment of rice bean by farmers is connected to the traits of landraces concerning unpalatability factors (Basu and Scholten 2012) and late-flowering regulation (Kaul et al. 2019a, b; Takahashi et al. 2015; Joshi et al. 2007). Further, there is insufficient information regarding its nutritional benefits, anti-nutrients, and unexplored underlying molecular biology of this crop (Katoch 2020). According to Food Security through Rice bean

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

Research in India and Nepal (FOSRIN) network, rice bean has been foreseen as one of the prospected crop for nourished future (Basavaprabhu et al. 2013; Andersen 2007). Rice bean faces numerous socio-economic challenges which greatly reduce the possibilities for its genetic improvement (Pattanayak et al. 2018). However, it could be revived to develop landraces in the future considering its exceptional genetic diversity in the limited geographical region (Iangrai et al. 2017; Joshi et al. 2007). Furthermore, it faces huge loss in terms of quality and quantity at the time of harvesting due to the vulnerability towards weeds and insects (Dhillon and Tanwar 2018; Khadka and Acharya 2009). Overlooked and scanty application of genomics tools for crop enhancement has derailed directed crop and yield improvement. Most of the available literature indicates evidence of rice bean resistance towards harsh environments constituting few well known biotic and abiotic stresses as well as tolerance towards metal toxicity in the soil. However, little progress has been achieved concerning the major issues related to late-flowering (Joshi et al. 2007), palatability factors (Kaul et al. 2019a), hard and coarse grain (Andersen 2012), sensitivity towards shattering (Parker et al. 2021), antinutrients (Bajaj 2014) and disease resistance (Pandiyan et al. 2008) which has led to a lag in ushering the full potential of this crop. The present condition of global hunger and high risk of developing life-style related chronic disorders calls for urgent crop improvement and domestication of unexplored crops like rice bean to generate new sustainable climate-smart varieties for enhanced yield in challenging environments along with lowered anti-nutrient levels. Through this chapter, we wish to highlight the comprehensive studies done till date on the aspects of nutritional excellence and improvement via identifying various agronomically important trait linked genes. Advancement in the recent genomics tools has given the much-needed push to carry more comprehensive studies to decipher the unexplored genetic resources.

16.2

303

Origin and Domestication of Rice Bean

Rice bean is a traditional crop of East, Southeast, and South Asia (Seehalak et al. 2006; Tomooka et al. 2002). Furthermore, its centre of origin is believed to be the South and Southwest Asia (Bisht and Singh 2013), with the centre of domestication in the region of Indo-China (Doi et al. 2002). It is mostly grown in diverse soil types throughout Vietnam, Indonesia, Myanmar, Southern China, Bhutan, Laos, East Timor, Northern Thailand, and India (Pattanayak et al. 2018; Iangrai et al. 2017; Tian et al. 2013). The region from Himalaya and Central China to Malaysia is the home for the wild forms of rice bean, such as V. umbellata var. gracilis (Rejaul et al. 2016; Seehalak et al. 2006). These wild relatives are suspected to be the source of the presently domesticated crop. The wild relatives showed intermediate plant types with photoperiod sensitivity, small seeds, and free branching; however, the northeast region of India still experiences the cultivation of such intermediate landraces (Iangrai et al. 2017). Amongst molecular markers, AFLP markers have been employed in studying the genetic diversity present between (interspecific) or within (intraspecific) the populations and co-relationships of V. species between the populations from Thailand and adjoining regions (Seehalak et al. 2006). Markerassisted analysis has revealed the genetic basis of domestication traits through the generation of linkage maps and for the study of diversity in the germplasm (Pattanayak et al. 2019). Whilst studying the close relatives V. umbellata and V. mungo it was evident that these species have coevolved in the single domestication event in the region of Thailand or Myanmar. Rice bean is believed to have originated from V. umbellata var. gracilis and is conspecific to the cultivated V. umbellata grown in South and Southeast Asia (Seehalak et al. 2006; Bisht et al. 2005; Tomooka et al. 1991, 2002). In areas like Sri Lanka, Indonesia, Jamaica, Ghana, Haiti, Fiji and Mexico, rice bean has been adopted as a cover

304

T. Kaul et al.

crop whilst in Australia, West Indies, Africa, Honduras, Brazil and USA it is cultivated to a limited extent (Wang et al. 2015; Khadka and Acharya 2009; Rajerison 2006; Burkill 1953). The requisite growth parameters mainly guide the geographical distribution of the crop and rice bean flourishes in areas with rainfall ranging from 1000 to 1500 mm annually, however it shows fair tolerance towards dry seasons as well. The temperature optimum for this crop ranges from 18– 30 °C whereas it can tolerate an extreme temperature of as low as 10 °C with susceptibility to even lower temperature or frost and as high as 40 °C (Pattanayak et al. 2019). Rice bean germplasm screening for novel genomic variation opens new avenues for discovering neoteric traits and linked genes, which are agriculturally favourable for developing robust and resilient cropping systems for the drastically changing environment. Rice bean is an interesting yet fully equipped option that can be added to the list of futuristic crops. But the crop requires more dedicated research to fast track the development process.

16.3

Genetic and Molecular Diversification of the Rice Bean Gene Pool

The lack of polymorphic molecular markers effectuated the lag in genomic studies for rice bean compared to other economically important leguminous crops such as mung bean, soybean, common bean, and azuki bean. Thus, hitherto there are only a handful of reports analysing intra and interspecies molecular diversification of V. umbellata utilizing SSR, AFLP, RAPD markers (Iangrai et al. 2017; Thakur et al. 2017; Tian et al. 2013; Bajracharya et al. 2008; Muthusamy et al. 2008). A comprehensive rice bean draft assembly can furnish knowledge about nucleotide polymorphism present in the population that can be translated into the documentation of polymorphic SSRs useful in ascertaining the marker-trait correlation. Identification of population-wide agriculturally relevant trait linked markers, especially quantitative trait markers, can effectively regulate the molecular breeding approach for crop

improvement (Muthusamy et al. 2008). To capture the entire genetic diversity, it is imperative to include landraces from different geographical regions. In an analysis with a set of 112 geographically diverse rice bean landraces from India and Nepal utilizing 35 azuki bean polymorphic markers, SSR markers addressed the maximum diversity (Bajracharya et al. 2008). Polymorphic markers generated for the orthologous crop can be extrapolated and utilized for the lesser-known sister crops to analyse the diversity indices for the genotype of the target crop. It is critical to mention that the first extensive study with SSR markers, which analysed 388 cultivated and 84 wild accessions from 16 distinct geographical regions in the Asian continent, resulted in the revelation that accession from Vietnam, Nepal, Myanmar, and India harbour the greatest genetic diversity (Tian et al. 2013). Evidently, accessions collected from Japan, Korea, Thailand, and China clustered together with high genetic diversity amongst 65 accessions analysed with a set of 28 SSR markers (Iangrai et al. 2017). To take advantage of the diversity inherently present in the germplasm of rice beans (Table 16.1) and to mitigate the pitfalls linked to the renouncement of the crop by the farmers, it is imperative to thoroughly analyse the germplasm. Therefore, crucial investigations of rice bean must address essential traits that genetically influence its domestication, for instance, palatability, latency, and increased flowering rate, that will make this crop more viable (Pattanayak et al. 2019). This sort of analysis enables the investigation of evolutionary elements and the molecular functional impact of domestication and breeding in rice beans.

16.4

Rice Bean: Cornucopia of Potential

Providing safe and healthy nourishment for poor and undernourished populations has been a significant challenge for the developing world. The population growth rate is rapid compared to food production. Therefore, we need to boost our food grain production to feed the steadily expanding population and meet the Sustainable Development

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

305

Table 16.1 List of rice bean germplasm resource centres Country

Centre name

Total number of accessions

India

Indian Council of Agricultural Research-National Bureau of plant genetic resources (ICAR-NBPGR)

2071

USA

Agriculture Research Services-Germplasm Resources Information Network (ARS-GRIN)

147

Japan

National Agriculture and Food Research Organization (NAFRO)

399

Taiwan

World Vegetable Centre (WVC)

351

Nepal

Nepal Agricultural Research Council (NARC)

300

Philippines

National Plant Genetic Resource Laboratory (NPGRL), UPLB, Laguna

161

China

Institute of Crop Genetic Resources (ICGR)

1363

Indonesia

Centre of Biology, Indonesian Institute of Science Research and Development (IISRD), Bogor

100

Goal (SDG). Utilizing climate-smart species, including underutilized and neglected crops, will be greatly important to accomplish SDGs.

16.4.1 Nutritional Composition Rice bean is emerging as a super food possessing several health benefits with tremendous nutritional value compared to other commercially cultivated crops. Rice bean is generally consumed in the form dal/boiled soup, as well as roasted (whole grain), mixed with chickpea and wheat flour in the South Asian countries. They are a good source of dietary fibre, tannin, slowly digestible starch, and considerable amounts of essential amino acids and minerals. Rice beans have a lower glycaemic index with more calcium (Ca), iron (Fe), potassium (K), zinc (Zn), thiamine (Vit-B1), riboflavin (Vit-B2), and niacin (Vit-B3) than any other legumes (Katoch 2013; Katoch et al. 2014; Saharan et al. 2002; Table 16.2). It contains a significant number of bioactive molecules, including hepato-protecting, antioxidative, anti-inflammatory, antihypertension immunity booster, anti-cancerous, antimicrobial, antifungal, anti-diabetic, and inhibitory response towards the mutagens and HIV-1 (Wei et al. 2015). Bioactive phytochemicals of rice beans can control and/or prevent various chronic, persistent and metabolic diseases,

particularly coronary heart disease, diabetes mellitus, colon cancer, and others (Tharanathan and Mahadevamma 2003). Moreover, they hold tremendous potential for alleviating micronutrient deficiency in populations globally. The proximate nutrition properties of rice bean have been studied (Table 16.2), with huge benefits, this crop has become the excellent choice of developed and underdeveloped countries (Dhillon and Tanwar 2018; Katoch 2013; Parvathi and Kumar 2006). The prime component of any legume is carbohydrates, which are categorized in digestible and non-digestible forms. Rice bean contain 58.15–71.99% carbohydrate (Katoch 2013; Buergelt et al. 2009; Sadana et al. 2006); 3.60–5.56% crude fibre (Bajaj 2014; Buergelt et al. 2009), and also contains 13.0 g/100 g of Neutral Detergent Fibre (NDF) and 8.5 g/100 g of Acid Detergent Fibre (ADF) (Katoch 2013). Rice bean comprises 5.0– 5.6 g/100 g of soluble sugars, 4.7–5.3 g/100 g of non-reducing sugars, as well as less quantity of starch (50–55 g/100 g) than other beans (Saharan et al. 2002). The level of numerous oligosaccharides (which create flatulence in the human body) are found to be less in rice bean compared to soybean, lima bean and sword bean, i.e. raffinose: 1.56–2.58%; verbascose: 0.85– 1.23%; stachyose: 0.94–1.88% (Katoch 2013). The total protein content in rice beans ranges from 14–26%. Moreover, rice beans exhibited

306

T. Kaul et al.

Table 16.2 Proximal nutrition components in rice bean Carbohydrate

Oligosaccharides

Protein

Amino acids

Minerals

Saturated fatty acids Unsaturated fatty acids

Vitamins

Nutrition components

Content

Total carbohydrate (%)

58.15–71.99

Crude fibre (g/100 g)

3.60–5.56

Neutral detergent fibre (NDF) g/100 g

13

Acid detergent fibre (ADF) g/100 g

8.5

soluble sugar (g/100 g)

5.0–5.6

non-reducing sugar (g/100 g)

4.7–5.3

Starch (g/100 g)

50–55

Stachyose (%)

0.94–1.88

Raffinose (%)

14.00–26.10

Verbascose (%)

0.85–1.23

Arginine (%)

4.32–7.12

Alanine (%)

3.26–6.60

Total protein (%)

14–26

Globulins (%)

15.56–13.11

Albumin (%)

7.47–6.13

Glutelin (%)

2.22–1.77

Prolamin (%)

1.97–1.45

Glutamic acid (%)

12.36–17.00

Proline (%)

2.54–8.36

Aspartic acid (%)

10.39–13.50

Methionine (%)

0.90–2.88

Lysine (%)

5.38–8.75

Glycine (%)

2.96–4.26

Tryptophan (mg/100 g)

1.23–2.00

Tyrosine (mg/100 g)

2.12–3.31

Valine (mg/100 g)

4.40–5.89

Sodium (mg/100 g)

6.00–347.40

Potassium (mg/100 g)

610.40–2875.00

Calcium (mg/100 g)

111.5–598.23

Magnesium (mg/100 g)

73.0–356.12

Zinc (mg/100 g)

2.45–10.44

Iron (mg/100 g)

3.72–9.25

Manganese (mg/100 g)

2.04–5.0

Copper (mg/100 g)

0.68–4.97

Phosphorous (mg/100 g)

124.0–567.69

Palmitic acid (%)

14.23–16.88

Stearic acid (%)

4.36–5.87

Linoleic acid (%)

7.50–18.98

Oleic acid (%)

15.62–17.91

Linolenic acid (%)

17.24- 18.98

Ascorbic acid: Vit. C (mg/100 g)

15.33–28.23

Niacin: Vit. B (mg/100 g)

3.48–4.26

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

higher protein digestibility (86.1–88.5%) in vitro than other legumes (Bajaj 2014; Buergelt et al. 2009). Only 1.92–3.42% fat content has been reported, which is considerably less than other legumes (Bepary et al. 2017; Buergelt et al. 2009; Sadana et al. 2006). Rice bean has been reported as a potential source of unsaturated fatty acids, i.e. oleic acid: 15.6–17.91%; linolenic acid: 39.89–44.36%; palmitic acid: 14.23– 16.88%; stearic acid: 4.36–5.87%; linoleic acid: 17.24–18.98% (Katoch 2013; Pugalenthi et al. 2004). According to Katoch (2013), rice bean comprises higher amino acid content (methionine and tryptophan), whilst tyrosine, lysine, and valine content are comparable with black and green gram. Katoch (2013) found 15.33– 28.23 mg/100 g of ascorbic acid and 3.48– 4.26 mg/100 g of niacin in rice bean. Phytic acid (PA) is considered as a major anti-nutrient property in pulses, and a meagre amount (0.20– 2.27%) of PA has been found in rice bean, and less than soybean, green and black gram (Bepary et al. 2017; Bajaj 2014). Soaking and pressure cooking can reduce anti-nutrient properties, for instance, phytic acid, polyphenols, saponins, trypsin inhibitor activity. Therefore, its nutritional composition may represent an optimistic addition of healthy and economical value to underdeveloped and developing communities to alleviate the food crisis.

16.4.2 Domestication of Rice Bean: Identification of Neoteric Genes Involved in Stress Resilience and Nutritional Composition Being an underutilized crop, yet an exceptional source of nutrients amongst the V. genus, it was important to decipher the genome of rice bean (V. umbellata). A draft of genome assembly (NCBI. SRA.SRP132447) was generated with a size of 414 MBP. An estimated 31,276 genes were identified with high confidence by analysing 15,521 scaffolds (Kaul et al. 2019a, b). In this study a 96.08% functional coverage was

307

achieved through a 30X coverage data generated via PacBio and Illumina platform (Kaul et al. 2019a, b). After genome assembly, it was revealed that the V. umbellata genome was closest to V. angularis, whilst closely related to other members of the Vigna genus for instance V. radiata and V. unguiculata. Employing collinearity block mapping the assembled rice bean genome was aligned to 31 genomes (constituting 13 complete- and 18 partial-genomes) of leguminous crop plants. Simultaneously, the complete CDS alignment also exhibited discriminant findings as previous alignment. In addition, 18,000 potentially medicinally influential genes were deciphered, when rice bean locally collinear alignment block (LCB) clusters were compared with 17 therapeutically important genomes present in the National Institute of General Medical SciencesNIH database. In-depth comparative genomics has revealed the genetic basis of functional traits in rice bean. Notably, the study offered neoteric palatability related and late-flowering genes, which were linked to numerous metabolic pathways, abiotic and biotic stress regulations, specifically related to disease resistance, photoperiod sensitivity and others. Concomitantly, it was found that mitochondria and chloroplast genomes harbour genes for flowering potential, and palatability factors. We incorporated this information in a repository for underutilized crops at www.nicg.in using D3.js. that would boost studies conducted in order to develop rice bean into a potential resource via molecular breeding. The introduction of desired traits such as palatability, early flowering, and determinate habit employing CRISPR/Cas9-based genome editing in this underutilized legume crop has been initiated at the Nutritional Improvement of Crops Group at ICGEB. Rice bean is one of the potential future crops that can be a source of numerous neoteric stress resistance genes. The underlying genetic information has been little explored due to its limited use in the restricted geographical distribution and few existing marketing channels, leading to its inadequate commercialization. The paradigm shift

308

of agriculture can be achieved via exploring its genetic resources. Till date, most of the findings were focused on the resistance capability of rice beans against grain storage pests, i.e. bruchid beetles (Callosobruchus spp.) (Somta et al. 2006; Kashiwaba et al. 2003; Tomooka et al. 2000). Additionally, rice bean displayed resistance towards mung bean yellow mosaic virus (Pandiyan et al. 2010) and bacterial leaf spot (Arora et al. 1980). By QTL analysis, numeric resistance gene sources may be identified, in a bid to harness the hidden potential of this underutilized crop. Likewise, resistance genes and various domesticationrelated traits were also identified by QTLs utilizing intra- and inter-specific mapping populations (Isemura et al. 2010). During F2 population mapping between a cross of VRM (Gg) 1 (mung bean) and TNAU RED (rice bean), a single recessive gene was discovered which showed resistance to mung bean yellow mosaic virus (Sudha et al. 2013). The compatibility of rice beans towards different soil types is due to the inherent mechanism to release citric acid and other organic acids to combat the soil acidity, which decreases the accumulation of aluminium (Fan et al. 2014). Two transporters, i.e. VuMATE1 and VuMATE2 belonging to MATE: multidrug and toxic compound extrusion family has been identified as prime organic acid efflux transporters which impart aluminium tolerance to rice bean (Liu et al. 2018; Yang et al. 2006). Kaul et al. (2019a, b) reported numerous stress-responsive genes from different families, for instance, stress enhanced protein 1 (SEP1-), heat shock transcription factor HSF-02 (GMHSF-02), universal stress protein PHOS32 precursor (PHOS32), stress enhanced protein 2 (SEP2-), stressresponsive alpha–beta barrel domain-containing protein (GSU2970) in V. umbellata, which can be targeted for biotic and abiotic stress mitigation. To increase the yield by reducing the vegetative and indeterminate growth, photoperiod independent early flowering genes like Dead/death box helicase domain-containing protein (PIE1), dehydration-responsive element-binding protein 2C (DREB2C), ethylene-responsive transcription factor tiny (TINY), determinant stem 1 (DT1), flowering locus T1 (Rft1), early flowering 3 (Elf3)

T. Kaul et al.

gene, and FLC-transcription factors can be targeted. Moreover, anti-nutrient elements like polyphenols, phytic acid, saponins, tannin, trypsin inhibitors, and hemagglutination properties reduce micronutrient (specifically iron and zinc) uptake from soil. Numerous identified transporter genes for micronutrients absorption, for instance ferric reduction oxidase 2 (FRO2), iron regulated transporter- 1(IRT1), nicotianamine synthase 1 (NAS1), natural resistance-associated macrophage protein (Nramp), iron-phytosiderophore transporter yellow stripe 1 (YS1), constitutive photomorphogenic 1 (COP1) and many more can be targeted for nutritional quality improvement in rice bean. Along with this, phytic acid residues present in rice bean that forms complex with the available micronutrient invariably making them unavailable for the absorption in human body. The genes like, phospholipase D (PLDa) and inositol triphosphate kinase (ITPK6) can be targeted to reduce phytic acid content in the grain of rice bean. Likewise polyphenols and tannins add on pungent taste to the rice bean hence reducing the palatability index of the grain. Congruently, different genes linked to the polyphenol and tannin biosynthesis pathways such as, N-(5-phosphoribosyl) anthranilate isomerase—a chloroplastic isoform, isoliquiritigenin 2-Omethyltransferase-like (ILMT), leucoanthocyanidin dioxygenase (LDOX), spermidine hydroxycinnamoyl transferase-like (SHT), chalcone synthase 17-like (CHS-17) that can be targeted to increase the palatability of rice bean. In line with this, genome editing studies may lead to deciphering the functionality of genes linked to these crucial agronomic traits influencing domestication of this crop.

16.5

Issues Related to Rice Bean Commercial Utilization

Despite its dietary advantages, rice bean has been classified as an underdeveloped crop due to a pervasive low level of awareness about its significant nutritional benefits. This kharif season crop has several limitations that need to be rectified by geneticists before it may be widely

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

embraced for regular usage, including its antinutritional contents, late-flowering, and constraints related to weeds, diseases and insects. Weed infestations may not pose as a significant hurdle for this crop. However, initial weeding after seeds sowing and before the flowering phase is usually recommended for better yield. Although rice beans are less susceptible to disease than some legumes, a few common diseases and insects can harm the crop (Andersen 2012). Common diseases such as rust (Uromyces appendiculatus), powdery mildew (Oidiopsis taurica), bacterial blight (Pseudomonas), showed a significant reduction in crop yield. To conquer these problems, the application of Maneb (Indofil M 45), triadimefun bayletan 25% EC, bavistin has proved better results. Along with this, some insects, for instance pod borers, soybean caterpillar, aphid, pod bugs, green bugs, leaf folder and others, damage the pods and thereby contribute to loss in grain yield. Rice bean use is restricted by anti-nutritional substances, notably phenolic compounds, tannins, phytate, and enzyme inhibitors. However, conventional procedures, including sprouting, pressure boiling, and fermentation, can reduce these issues and improve its nutritional adequacy (Dhillon and Tanwar 2018); however, due to its photosensitivity and indeterminate vegetative growth because of late-flowering economically in terms of yield, it has lost traction compared to crops such as chickpea, pigeon pea, lentil, pea, and black and green gram. Apparently, rice bean varieties are unpalatable with a lengthy vegetative growth period and this extended time frame is a problem for the subsequent cropping. Moreover, the hard and coarse grains, even after boiling, restrain its daily consumption (Andersen 2012). Manipulating flowering pathway-related genes may boost early flowering (González et al. 2016; Dhanasekar and Reddy 2014; Joshi et al. 2007). Apart from this, rice bean can reportedly thrive in a diverse range of climatic conditions; however, the requirement of moderate rainfall and temperature, along with sensitivity to prolonged exposure to extreme environments, limits its cultivation (Noda 1951). In addition, the

309

inflexible response of the crop towards the mechanized cultivation system and lower harvest index (25.8–27.3) (Sarma et al. 1995) contrasts with other commercially relevant pulses such as lentil, pea, and soybean (36.6–44.6) (Jayasundara 2015), which further reduce the possibility of including rice bean in the cropping system. Due to these major issues, there is no proper trade channel, but locally in Thailand, Nepal, India, and Myanmar it is marketed in an unorganized trade. Reportedly, 15.5 metric tonnes of rice bean as a substitute of azuki bean was imported in Japan from India, Thailand and Myanmar in 1991 (Pattanayak et al. 2019). Accordingly, with continued endeavours to mitigate the agriculturally undesirable traits linked to the crop may pave the path for its commercialization and consumption globally.

16.6

Global Approaches for Rice Bean Germplasm Improvement

Underutilized crops may be glorified as “primitive savages” amid plants since they are presented and marketed with economic, environmental, and dietary benefits. Then why are not these propagating on their own if they are so superior? Will they lose out if agriculture becomes more commercialized and food trading becomes more global? These obstacles can be pacified with both conventional and molecular strategies delineated hereafter (Fig. 16.2).

16.6.1 Conventional Approaches Underexploited crops are often mentioned in the context of collaboration, interdisciplinary stakeholder/partnership approaches to agricultural growth (Padulosi and Hoeschle-Zeledon 2004), and therefore fall under the post-green revolution development strategies. The marketing and utilization of rice beans have been emphasized, and both producers and consumers have little knowledge of its huge potential to

310

T. Kaul et al.

Fig. 16.2 Strategies for promotion and utilization of rice bean

assist food and nutritional benefits. Old variety germplasm should be exploited to give diversity in new varieties and seed selection flexibility. Breeding and on-farm trials should be used for crop enhancement and propagation. To facilitate seed distribution and accessibility of seeds, seed suppliers (systems/commodities) should be properly organized. Access can be expanded by engaging broader groups about the crop and delivering seed packets at exhibitions. In addition, educating farmers about disease resistance and early germinating neighbourhood seed production is quintessential. It is also critical to educate farmers on local seed production and quality (in terms of germination and disease resistance). Furthermore, the timely transmission of information about rice bean adaptability, producing techniques, and utilization is required. Collaboration of non-government entities in the marketing strategy to make it popular at food outlet chains can be engaged (Sthapit et al. 2010). To incorporate data from diverse varieties market surveys nutritional, chemical, and sensory evaluations shall be done at different intervals. These approaches could lead to developing a novel resource for mass consumption, which may attain nutritional food security and alleviate malnourishment.

16.6.2 Molecular Approaches Massive parallel sequencing is a platform to explore crop genomes. Being a high throughput DNA sequencing approach, it is considered the second generation of genome sequencing technique, next generation sequencing (NGS). Recent advances in the field of genomics resulted in the generation of high confidence and accurately aligned data. Invariably, crop genetic improvement is solely dependent on genome sequencing of the target crop. Incongruence, the utilization of Illumina and PacBio platforms has massively rendered open the latent potential of rice bean genomes. In a report by Kang et al. (2014), the draft genome of V. radiata was released (approximate genome size 459 MB) and in the consecutive year they released the draft of V. angularis (approximate genome size 455 MB) (Kang et al. 2015). In a similar report by Muñoz-Amatriaín et al. (2017) regarding V. unguiculata (approximate genome size of 607 MB) the authors drafted the genome from two genome assemblies. Whole genome sequencing aids in identifying molecular markers, and Chen et al. (2016) were able to identify 3011 potential SSR markers from the rice bean genome using NGS. However, it is

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

crucial to have a comparative genomic analysis of the target genome for drafting a genome to gain confidence and partial functional annotation. Kaul et al. (2019a) mapped the whole genome of V. umbellata on the basis of corresponding functional features or the anchoring elements of the genetic and functional orthologous sequences of V. unguiculata, V. radiata, and V. angularis. The orthologous sequences can be utilized as the reference in order to design the draft genome blueprint, for gene identification and translational annotation for the target rice bean crop. Additionally, from the assembled sequence data, the putative protein-coding genes can be identified using the MAKER v2.31.9 pipeline (Campbell et al. 2014), which leads to gene annotations with evidence-based quality values, which ultimately will lead to the discovery of target gene/genes for trait improvement and enhance the commercial value of the abandoned target crop (Kaul et al. 2019a). For trait improvement, it is quintessential to employ genetic engineering approaches. Amongst numerous genome editing approaches, CRISPR-based tools are the most efficacious, precise and ease-to-design editing approach, which has generated a buzz in the field of research in current times. This approach renovates the designated genome altering into a more convenient and adequate manner (Kaul et al. 2019b, 2020; Zhang et al. 2018) and has been widely employed for trait discovery and the generation of high yielding crops (Wang et al. 2019). To date, genome editing via CRISPR-based tools has empowered researchers with an unprecedented way to improve commercial crops (Tabassum et al. 2021; Ku and Ha 2020; Schenke and Cai 2020; Zafar et al. 2020). Similarly, the genes linked to important traits of rice bean, for instance low palatability, late-flowering, anti-nutrients and sensitivity to photoperiod, can be modified employing CRISPR/Cas-based approaches. This may be an efficient strategy to improve rice bean as well as remove some of the pitfalls linked to rice bean domestication.

16.7

311

Conclusion

Nutritional diversification, in collaboration with improved relationships between crop yields, intake, and human nutrition, is a cost-effective and long-term strategy for combating malnutrition and boosting the food system. Rice bean is a high-protein, high-vitamin, and high-mineral food that grow well in depleted soils and in extreme climates. Its diverse applications as food for humans, fodder, and manure present it as a beneficial and economically viable asset to farms in poor regions. Unfortunately, there are considerable shortcomings in existing rice bean studies and marketing, and therefore more effort is needed. Farmers should be motivated to engage in research initiatives on local underutilized plant species such as rice bean, and policies should be devised to allow this. To promote its cultivation, scientists should develop types that farmers and customers desire, as well as plan dissemination operations to highlight its usefulness to people in developing nations who are suffering from malnutrition. Acknowledgements The authors thank the ICAR scheme-National Agricultural Science Fund (NASF/GTR7025/2018-19) for financially supporting this research. We also thank Department of Biotechnology (DBT) Funds (BT/PR38339/GET/119/334/2020) for additional support towards the completion of this work.

References Andersen P (2007) Food security through rice bean research in India and Nepal (FOSRIN). Report 3. In: Hollington PA (ed) Nutritional qualities of rice bean. Department of Geography, University of Bergen; CAZS Natural Resources, College of Natural Sciences, Bangor University, Bergen, Norway; Bangor, Wales, UK. Available from http://www.riceb ean. org/publication.htm Andersen P (2012) Challenges for under-utilized crops illustrated by rice bean (Vigna umbellata) in India and Nepal. Int J Agric Sustain 10(2):164–174. https://doi. org/10.1080/14735903.2012.674401 Arora RK, Chandel PS, Joshi BS, Pant KC (1980) Rice bean: tribal pulse of Eastern India. Eco Bot 34:260–263

312 Bajaj M (2014) Nutrients and antinutrients in rice bean (Vigna umbellata) varieties as effected by soaking and pressure cooking. Asian J Dairy Food Res 33:71–74 Bajracharya J, Singh S, Dangol B, Hollington PA, Witcombe JR (2008) Food security through rice bean research in India and Nepal (FOSRIN). Report 2: Identification of polymorphic markers. Agriculture botany division, Nepal agriculture research council; CAZS Natural Resources, College of Natural Sciences, Bangor Basavaprabhu NM, Niranjanmurthy NM, Asif M, Venkatesha KT, Vijay Kumar KV (2013) Genetic divergence analysis in rice bean, Vigna umbellata (L.). Int J Plant Sci 8(1):165–167 Basu P, Scholten B (2012) Technological and social dimensions of the green revolution: connecting pasts and futures. Int J Agri Sustain 10(2):109–116 Bepary RH, Wadikar DD, Neog SB, Patki PE (2017) Studies on physico-chemical and cooking characteristics of rice bean varieties grown in NE region of India. J Food Sci Technol 54(4):973–986. https://doi.org/10. 1007/s13197-016-2400-z Bhardwaj N, Kaur J, Sharma AP (2021) The beans and the peas from orphan to mainstream crops. 55–66. https://doi.org/10.1016/B978-0-12-821450-3.00012-3 Bhat R, Karim AA (2009) Exploring the nutritional potential of wild and underutilized legumes. Compr Rev Food Sci Food Saf 8:305–331. https://doi.org/10. 1111/j.1541-4337.2009.00084.x Bisht IS, Singh M (2013) Asian Vigna. In: Singh M, Upadhayay HD, Bisht IS (eds) Genetic and genomic resources of grain legume improvement. Elsevier, pp 237–267 Bisht IS, Bhat KV, Lakhanpaul S, Latha M, Jayan PK, Biswas BK, Singh AK (2005) Diversity and genetic resources of wild Vigna species in India. Genet Resour Crop Evol 52:53–68 Buergelt D, von Oppen M, Yadavendra JP (2009) Quality parameters in relation to consumer’s preferences in rice bean. Presentation at the international conference on India, Grain legumes: quality improvement, Value Addition and Trade held at IIPR, Kanpur, 14–16 Feb 2009 Campbell M, Holt C, Moore B, Yandell M (2014) Genome annotation and curation using MAKER and MAKER-P. Curr Proto Bioinfor 4.11.1–4.11.39 Chen H, Chen X, Tian J, Yang Y, Liu Z, Hao X, et al (2016) Development of gene-based ssr markers in rice bean (Vigna umbellata L.) Based on transcriptome data. PLoS ONE 11(3): e0151040. https://doi.org/10. 1371/journal.pone.0151040 Dahipahle AV, Kumar S, Sharma N, Singh H, Kashyap S, Meena H (2017) Rice Bean—a multipurpose, underutilized, potential nutritive fodder legume—a review. J Pure Appl Microbio 11(1):433–439 Dhanasekar P, Reddy KS (2014) A novel mutation in TFL1 homolog affecting determinacy in cowpea (Vigna unguiculata). Mol Genet Genomics 290:55– 65. https://doi.org/10.1007/s00438-014-0899-0

T. Kaul et al. Dhillon PK, Tanwar B (2018) “Rice bean: a healthy and cost-effective alternative for crop and food diversity,” Food security: the science, sociology, and economics of food production and access to food, Springer; The International Society for Plant Pathology, 10(3):525–535 Doi K, Kaga A, Tomooka N, Vaughan DA (2002) Molecular phylogeny of genus Vigna subgenus Ceratotropis based on rDNA ITS and atpB-rbcL intergenic spacer for cpDNA sequences. Genetica 114:129–145 Fan W, Lou HQ, Gong YL, Liu MY, Wang ZQ, Yang JL, Zheng SJ (2014) Identification of early al-responsive genes in rice bean (Vigna umbellata) roots provides new clues to molecular mechanisms of Al toxicity and tolerance. Plant Cell Env 37:1586–1597 González AM, Yuste-Lisbona FJ, Saburido S, Bretones S, De Ron AM, Lozano R, Santalla M (2016) Major contribution of flowering time and vegetative growth to plant production in common bean as deduced from a comparative genetic mapping. Front Plant Sci 7:1940. https://doi.org/10.3389/fpls.2016.01940 Iangrai B, Pattanayak ADEA, Khongwi G, Pale EM, Gatphoh A, Das NK, Chrungoo (2017) Development and characterization of a new set of genomic microsatellite markers in rice bean (Vigna umbellata (Thunb.) Ohwi and Ohashi) and their utilization in genetic diversity analysis of collections from North East India. PLoS One 12(7):0179801 Isemura T, Kaga A, Tomooka N, Shimizu T, Vaughan DA (2010) The genetics of domestication of rice bean, Vigna umbellata. Ann Bot 106:927–944. https://doi.org/10.1093/aob/mcq18 8 Jayasundara S (2015) Harvest index of three grain legume crops grown in Canada, calculated by two approaches. https://www.researchgate.net/post/How_is_harvest_ indexestimated_especially_in_indeterminate_pulses Joshi KD, Bhandari B, Gautam R, Bajracharya J, Hollington P (2007) Rice bean: a multipurpose underutilized legume. Paper presented at the 5th international symposium on new crops and uses: their role in a rapidly changing world, The University of Southampton, Southampton, 3–4 Kang Y, Kim S, Kim M, Lestari P, Kim K, Ha B, Jun T, Hwang W, Lee T, Lee J, Shim S, Yoon M, Jang Y, Han K, Taeprayoon P, Yoon N, Somta P, Tanya P, Kim K, Gwag J, Moon J, Lee Y, Park B, Bombarely A, Doyle J, Jackson S, Schafleitner R, Srinives P, Varshney R, Lee S (2014) Genome sequence of mungbean and insights into evolution within Vigna species. Nat Commun 5(1):5443 Kang Y, Satyawan D, Shim S, Lee T, Lee J, Hwang W, Kim S, Lestari P, Laosatit K, Kim K, Ha T, Chitikineni A, Kim M, Ko J, Gwag J, Moon J, Lee Y, Park B, Varshney R, Lee S (2015) Draft genome sequence of adzuki bean, Vigna angularis. Sci Rep 5(1):8069 Kashiwaba K, Tomooka N, Kaga A, Han OK, Vaughan DA (2003) Characterization of resistance to three bruchid species (Callosobruchus spp., Coleoptera, Bruchidae) in cultivated rice bean (Vigna umbellata). J Econ Entomol 96:207–213

16

Rice Bean—An Underutilized Food Crop Emerges as Cornucopia …

Katoch R (2013) Nutritional potential of rice bean (Vigna umbellata):an underutilized legume. J Food Sci 78: C8–C16 Katoch R (2020) Ricebean: exploiting the nutritional potential of an underutilized legume. Springer. https:// doi.org/10.1007/978-981-15-5293-9 Katoch R, Singh SK, Thakur N, Dutt S, Yadav SK, Shukle R (2014) Cloning, characterization, expression analysis and inhibition studies of a novel gene encoding Bowman-type protease inhibitor from rice bean. Gene 546:342–351 Kaul T, Eswaran M, Thangaraj A, Meyyazhagan A, Nehra M, Raman N, Bharti J, Gayacharan BC, Balamurali B (2019a). Rice Bean (Vigna umbellata) draft genome sequence: unravelling the late flowering and unpalatability related genomic resources for efficient domestication of this underutilized crop. bioRxiv https://doi.org/10.1101/816595 Kaul T, Raman NM, Eswaran M, Thangaraj A, Verma R, Sony SK, Sathelly KM, Kaul R, Yadava P, Agrawal PK (2019b) Data mining by pluralistic approach on CRISPR gene editing in plants. Front Plant Sci 10:801. https://doi.org/10.3389/fpls.2019.00801 Kaul T, Sony SK, Verma R, Motelb KFA, Prakash AT, Eswaran M, Bharti J, Nehra M, Kaul R (2020) Revisiting CRISPR/Cas-mediated crop improvement: special focus on nutrition. J Biosci 45:137 Khadka K, Acharya BD (2009) Cultivation practices of rice bean. Local Initiatives for Biodiversity, Research and Development (LI-BIRD). 1st edn. Pokhara, Nepal Khanal AR, Khadka K, Poudel I, Joshi KD, Hollington P (2009) Report on farmers’ local knowledge associated with the production, utilization and diversity of ricebean (Vigna umbellata) in Nepal. In: The rice bean network: farmers indigenous knowledge of ricebean in Nepal (report N°4), EC. 6th FP, Project no. 032055, FOSRIN (Food Security through Ricebean Research in India and Nepal) Ku HK, Ha SH (2020) Improving nutritional and functional quality by genome editing of crops: status and perspectives. Front Plant Sci 11:577313. https:// doi.org/10.3389/fpls.2020.577313 Liu MY, Lou HQ, Chen WW (2018) Two citrate transporters coordinately regulate citrate secretion from rice bean root tip under aluminum stress. Plant Cell Environ 41:809–822. https://doi.org/10.1111/pce.13150 Muñoz-Amatriaín M, Mirebrahim H, Xu P, Wanamaker S, Luo M, Alhakami H, Alpert M, Atokple I, Batieno B, Boukar O, Bozdag S, Cisse N, Drabo I, Ehlers J, Farmer A, Fatokun, (2017) Genome resources for climate resilient cowpea, an essential crop for food security. The Plant J 89(5):1042–1054 Muthusamy S, Kanagarajan S, Ponnusamy S (2008) Efficiency of RAPD and ISSR markers system in accessing genetic variation of rice bean (Vigna umbellata) landraces. Electronic J Biotechnol 11(3):32–41 Noda A (1951) Studies on native (primitive) varieties of crop plant: observations on a climbing variety of adzuki bean native to San’in district, Japan (1). Japan J Crop Sci 21:134–135

313

Padulosi S, Hoeschle-Zeledon I (2004) Underutilized plant species: what are they? LEISA Magazine, March issue, 5–6 Pandiyan M, Ramamoorthi N, Ganesh SK, Jebraj S, Pagarajan P, Balasubramanian P (2008) Broadening the genetic base and introgression of MYMY resistance and yield improvement through unexplored genes from wild relatives in mungbean. Plant Mut Rep 2:33–38 Pandiyan M, Senthil N, Ramamoorthi N, Muthiah AR, Tomooka N, Duncan V, Jayaraj T (2010) Interspecific hybridization of Vigna radiata x 13 wild Vigna species for developing MYMV donor. Electron J Plant Breed 1:600–610 Parker TA, Sassoum L, Paul G (2021) Pod shattering in grain legumes: emerging genetic and environmentrelated patterns. Plant Cell 33(2):179–199 Parvathi S, Kumar VJF (2006) Value added products from rice bean (Vigna umbellata). J Food Sci Technol 43 (2):190–193 Pattanayak A, Ingrai B, Khongwir DEA, Gatpoh EM, Das A, Chrungoo NK (2018) Diversity analysis of rice bean (Vigna umbellata (Thunb.) Ohwi and Ohashi)) collections from North Eastern India using morphoagronomic traits. Sci Hort 242:170–180 Pattanayak A, Roy S, Sood S, Iangrai B, Banerjee A, Gupta S, Joshi D (2019) Rice bean: a lesser known pulse with well-recognized potential. Planta 250 (3):873–890 Priyadarshini A, Brijesh K, Tiwari GR (2021) Sustainable food production systems: the potential of pulses. In: Tiwari BK, Gowen, A, McKenna (eds) Pulse foods processing, quality and nutraceutical applications. Elsevier, London, pp 487–506 Pugalenthi M, Vadivel V, Gurumoorthi P, Janardhanan K (2004) Comparative nutritional evaluation of little known legumes, Tamarindus indica, Erythrina indica and Sesbania bispinosa. Trop Subtrop Agroeco 4 (3):107–123 Rajerison R (2006) Vigna umbellata (Thunb.) Ohwi and Ohashi. In: Brink M, Belay G (eds) Cereals and pulses. PROTA1, Wageningen Rana J, Sood SG, Arun NK, Lal H (2014) Rice bean variety VRB3 (Him Shakti). Indian J Genet Plant Breeding 74:268 Rejaul HB, Wadikar DD, Patki PE (2016) Rice bean: nutritional vibrant bean of Himalayan belt (North East India). Nutr Food Sci 46(3):412–431. https://doi.org/ 10.1108/NFS-08-2015-0097 Sadana B, Hira CK, Singla N, Grewal H (2006) Nutritional evaluation of rice bean (Vigna umbellata) strains. J Food Sci Technol 43(5):516–518 Saharan K, Khetarpaul N, Bishnoi S (2002) Variability in physicochemical properties and nutrient composition of newly released rice bean and faba bean cultivars. J Food Comp Analys 15(2):159–167 Sarma BK, Singh M, Gupta HS, Singh G, Srivastava LS (1995) Studies in rice bean germplasm: research bulletin no. 34. ICAR Research Complex for NEH Region, Barapani, 35

314 Schenke D, Cai D (2020) Applications of CRISPR/Cas to improve crop disease resistance: beyond inactivation of susceptibility factors. iScience 23(9):101478. https://doi.org/10.1016/j.isci.2020.101478 Seehalak W, Tomooka N, Waranyuwat A, Thipyapong P, Laosuwan P, Kaga A (2006) Genetic diversity of the Vigna germplasm from Thailand and neighbouring regions revealed by AFLP analysis. Genet Resour Crop Evol 53:1043–1059 Somta P, Kaga A, Tomooka N (2006) Development of an interspecific Vigna linkage map between Vigna umbellata (Thunb) Ohwi and Ohashi and V nakashimae (Ohwi) Ohwi and Ohashi and its use in analysis of bruchid resistance and comparative genomics. Plant Breed 125:77–84 Sthapit B, Padulosi S, Mal B (2010) Role of on-farm/in situ conservation and underutilized crops in the wake of climate change. Indian J Plant Genet Resour 23 (2):145–156 Sudha M, Anusuya P, Ganesh NM, Karthikeyan A, Nagarajan P, Raveendran M, Senthil N, Pandiyan M, Angappan K, Balasubramanian P (2013) Molecular studies on mungbean (Vigna radiata (L.) Wilczek) and rice bean (Vigna umbellata (Thunb.) interspecific hybridization for Mungbean yellow mosaic virus resistance and development of species-specific SCAR marker for ricebean. Arch Phytopathol Plant Prot 46:503–517 Tabassum J, Ahmad S, Hussain B, Mawia AM, Zeb A, Ju L (2021) Applications and potential of genomeediting systems in rice improvement: current and future perspectives. Agronomy 11:1359. https://doi. org/10.3390/agronomy11071359 Takahashi Y, Iseki K, Kitazawa K, Muto C, Somta P, Irie K, Naito K, Tomooka N (2015) A homoploid hybrid between wild Vigna species found in a limestone karst. Front Plant Sci 6 Thakur S, Bhardwaj N, Chahota RK (2017) Evaluation of genetic diversity in rice bean [Vigna umbellata (Thunb.) Ohwi and Ohashi] germplasm using SSR markers. Electronic J Plant Breed 8:674–679 Tharanathan RN, Mahadevamma S (2003) Grain legumes —a boon to human nutrition. Trends Food Sci Technol 14:507–518 Tian J, Isemura T, Kaga A, Vaighan DA, Tomooka N (2013) Genetic diversity of the rice bean (Vigna umbellata) as assessed by SSR markers. Genome 56:717–727 Tomooka N, Lairungreang C, Nakeeraks P, Egawa Y, Thavarasook C (1991) Mung bean and the genetic

T. Kaul et al. resources. The final report of the cooperative research work between Thailand and Japan submitted to the national research council of Thailand, Tropical Research Center, Tsukuba Tomooka N, Kashiwaba K, Vaughan D, Ishimoto M, Egawa Y (2000) The effectiveness of evaluating wild species, searching for sources of resistance to bruchid beetle in the genus Vigna subgenus Caratotropis. Euphytica 115:27–41 Tomooka N, Maxted N, Thavarasook C, Jayasuriya AHM (2002) Two new species, new species combinations and sectional designations in Vigna subgenus Ceratotropis (Piper) Verdcourt (Leguminosae, Phaseoleae). Kew Bull 57:613–624 Tripathi A, Iswarya V, Singh N, Rawson A (2021) Chemistry of pulses—micronutrients. Processing, Quality and Nutraceutical Appl 61-68. https://doi. org/10.1016/B978-0-12-818184-3.00004 Wang LX, Chen HL, Peng B, Wu JX, Wang SH, Blair MW, Cheng XZ (2015) The transferability and polymorphism of mung bean SSR markers in rice bean germplasm. Mol Breed 35:77. https://doi.org/10. 1007/s11032-015-0280-y Wang M, Wang Z, Mao Y, Lu Y, Yang R, Tao X, Zhu JK (2019) Optimizing base editors for improved efficiency and expanded editing scope in rice. Plant Biotechnol J 17(9):1697–1699. https://doi.org/10. 1111/pbi.13124 Wei Y, Yan J, Long F, Lu G (2015) Vigna umbellata (Thunb.) Ohwi et Ohashi or Vigna angularis (Willd.) Ohwi et Ohashi (Chixiaodou, Rice Bean). In: Liu Y, Wang Z, Zhang J (eds) Dietary chinese herbs chemistry, pharmacology and clinical evidence. Springer, New York, pp 551–560 Yang JL, Zhang L, Li YY, You JF, Wu P, Zheng SJ (2006) Citrate transporters play a critical role in aluminium-stimulated citrate efflux in rice bean (Vigna umbellata) roots. Ann Bot 97:579–584 Zafar SA, Zaidi SS, Gaba Y, Singla-Pareek SL, Dhankher OP, Li X, Mansoor SP (2020) A engineering abiotic stress tolerance via CRISPR/Cas-mediated genome editing. J Exp Bot 71(2):470–479. https://doi. org/10.1093/jxb/erz476 Zhang J, Zhang H, Botella JR, Zhu JK (2018) Generation of new glutinous rice by CRISPR/Cas9-targeted mutagenesis of the Waxy gene in elite rice varieties. J Integr Plant Biol 60:369–375. https://doi.org/10. 1111/jipb.12620

The Winged Bean Genome Winged Bean—One Species Supermarket

17

Niki Tsoutsoura, Yuet Tian Chong, Wai Kuan Ho, Hui Hui Chai, Alberto Stefano Tanzi, Luis Salazar-Licea, Festo Massawe, John Brameld, Andrew Salter, and Sean Mayes Abstract

Climate change, population growth and increasingly homogenised diets are a threat to food security and human nutritional status. There is an urgent need to incorporate highly nutritious crops into the human diet to provide new sources of nutrition, to diversify agriculture and to meet the challenges of climate change. Winged bean (Psophocarpus tetragonolobus (L.) DC.) is an underutilised crop with a relatively high protein content, grown in the humid tropic regions. Despite its many strengths, the crop suffers from a number of production, yield and utilisation-related constraints. In this chapter, we discuss the nutritional value of winged bean and how it can be improved by utilising genomic and transcriptomic data. We discuss the importance of

identifying genes and gene functions, generating genetic linkage maps and developing molecular markers that could be used to accelerate plant breeding. Considerable genomics resources have been developed in major legumes such as soybean (Glycine max) and common bean (Phaseolus vulgaris) through transcriptome and genome sequencing. These provide opportunities for comparative genomic studies and translational research to improve minor crops such as winged bean. Winged bean genome sequencing is underway and will be published shortly. This will contribute to breeding improvement efforts. More research is needed to combine genomics, transcriptomics and metabolomics data to further improve winged bean for food and nutritional security.

17.1 N. Tsoutsoura (&)  A. S. Tanzi  L. Salazar-Licea  J. Brameld  A. Salter  S. Mayes School of Biosciences, University of Nottingham, Sutton Bonington Campus, Loughborough, Leicestershire LE12 5RD, UK e-mail: [email protected] Y. T. Chong  W. K. Ho  H. H. Chai  F. Massawe School of Biosciences, University of Nottingham Malaysia, Jalan Broga, 43500 Semenyih, Selangor Darul Ehsan, Malaysia S. Mayes Crops For the Future (UK) CIC, NIAB, 93 Lawrence Weaver Road, Cambridge CB3 0LE, UK

Introduction

Food security is under threat not only from the rising population, but also from the adverse effects of climate change. The rise in temperature and the increased frequency of extreme weather conditions directly affect crop yields and local food supplies (Policy Brief Changing Policy Concepts of Food Security 2006). About 60% of the global food consumption is based on major crop cereals: Triticum spp. (wheat), Oryza sativa (rice), Hordeum vulgare (barley) and Zea mays (maize) (FAO 2010). These major crops may not

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_17

315

316

perform well in future given that the annual temperatures are expected to rise by approximately 1 °C in the areas where they are grown (Zhao et al. 2017). The predicted effect of climate change on the ten globally most cultivated crops barley (Hordeum vulgare), cassava (Manihot esculenta), maize (Zea mays), oil palm (Elaeis guineensis), rapeseed (Brassica napus), rice (Oryza sativa), sorghum (Sorghum bicolor), soybean (Glycine max), sugarcane (Saccharum officinarum) and wheat (Triticum aestivum) would be mostly negative on production and yield in Europe, Southern Africa and Australia (Ray et al. 2019). Based on production changes and using the average harvested area information over a 5-year period from 2003 to 2008, Ray et al. (2019) estimated the impact of climate change to greatly affect the global yields of oil palm with a decrease of 13.4% but surprisingly the global yield of soybean would be expected to increase by 3.5% per year. Overall, the global yield of barley, cassava, rice and wheat could decrease by 7.9%, 0.5%, 0.3% and 0.9%, respectively, whereas, the global yield of rapeseed, sorghum and sugarcane could increase by 0.5%, 2.1% and 1.0%, respectively, with global maize yield most probably not changing (Ray et al. 2019). Overall, yields are expected to be more variable and unpredictable across regions (Ray et al. 2019). The impact of climate change on global food production is real and has already caused negative effects on yields of major crops (Frolov et al. 2014; Porter et al. 2015; Hertel 2016; Ray et al. 2019). The impact is predicted to be more severe in the most food insecure countries, affecting both the availability and affordability of food for the most vulnerable groups. The world’s reliance on staple crops for food and nutrition has contributed to the loss of genetic diversity of crops in the fields as well as a decline in dietary diversity. Considering the number of edible plant species is close to 30,000; only a small fraction of these, around 200 species, are used for human nutrition (Khoury et al. 2014; Massawe et al. 2016; Voss-Fels et al. 2019). In order to adapt to the challenges of food insecurity and climate change, a broad diversity of crop plants, with their wild relatives and

N. Tsoutsoura et al.

underutilised species, should be assessed (FAO 2010). Underutilised crops, also known as ‘neglected’ or ‘orphan’ crops, are indigenous crops often closely related to the culture and diet of the growers (Mayes et al. 2012). Underutilised crops, compared to the non-native crops, often carry resilience traits to abiotic and biotic stresses, such as tolerance to drought and extreme temperatures as well as pests and diseases (Mayes et al. 2012; Massawe et al. 2016). Two of the underutilised legume crops that have been researched in the recent years include bambara groundnut (Vigna subterranea) and winged bean (Psophocarpus tetragonolobus (L.) DC.) (Ebert 2014; Tanzi et al. 2019a, b; Mohanty et al. 2020). The incorporation of underutilised crops into agroecosystems and rotations would not only enhance agrobiodiversity but also increase harvestable yields in response to climate change and global warming, thus contributing to food and nutritional security (Ebert 2014; Tanzi et al. 2019a, b). Winged bean is an underutilised legume with high nutritional value (Ochiai Yanagi 1983). Most parts of the plant are edible and highly nutritious, for instance, the immature pods and seeds are commonly stir-fried, boiled, baked or fermented into local cuisine (National Research Council 1981). Apart from the seeds and the immature pods, the tuberous roots, leaves and flowers are also edible (Amoo et al. 2006; Cheng et al. 2019). Therefore, winged bean could be an important crop, particularly in the humid tropics where the crop is predominantly grown by small scale farmers (Lepcha et al. 2017). Like other legume crops, winged bean has the ability to fix atmospheric nitrogen, improving soil fertility and potentially contributing to sustainable production of other crops, such as rice, in tropical legume crop rotation (Rahman et al. 2014). Analysis of nitrogen in nodules, roots, and shoots of winged bean and other legumes showed that accumulation of nitrogen depends on the Rhizobium strains in root nodules (Yoneyama et al. 1986). In summary, apart from its nutritionally rich edible parts (immature pods, seeds, tuberous roots and leaves), winged bean is a good nitrogen fixer and can be utilised in intercropping and

17

The Winged Bean Genome

317

crop rotation systems to improve soil fertility in low input cultivation systems. However, there are limitations to its large-scale cultivation that need to be addressed. For example, because of its architecture, the plants require staking which limits mechanised harvesting. Winged bean seeds are rich in protein but contain antinutritional factors that lower their digestibility and palatability. These constraints could be minimised through genetic improvement of available germplasm using conventional and genomebased breeding approaches. Molecular tools and omics technologies could contribute in understanding genes controlling traits of interests and mechanisms of resistance in responses to biotic and abiotic stress, leading to development of improved cultivars in winged bean. Genetic markers could also be developed using genetic information and utilised in marker-assisted selection (MAS) breeding. The integration of metabolomics, transcriptomics and proteomics studies enable further understanding of the complex interactions between genes, proteins and metabolites within a desired phenotype.

17.2

Botanical Description, Origin and Domestication

17.2.1 Taxonomy, Plant Morphology and Reproductive Development Winged bean (2n = 2x = 18) (Fig. 17.1) is a dicotyledonous species grown mainly for its tuberous roots and unripe pods. It is classified in the Fabaceae family, Papilionoideae subfamily and genus Psophocarpus Neck. ex DC. The name Psophocarpus comes from the Greek, psophos (noise) and karpos (fruit), due to the cracking sound it produces when its mature pod bursts open. It is worth noting that the first publication of winged bean was in 1825 by De Candolle, while the first major public report appeared in the New York Times in 1975 (Brody 1975; Khan 1978; Kantha and Erdman 1984; Maxted 1990).

Fig. 17.1 Winged bean plant and tubers roots. Winged bean plant growing on a 2 m tall net structure in a shade house at the University of Nottingham Malaysia (top) and winged bean tuberous roots from the accession A13-5 (Photo by Yuet Tian Chong, University of Nottingham Malaysia, 2021)

A taxonomic revision of the genus Psophocarpus in the 1980s recognised nine species in the genus. Winged bean, a domesticated species in Asia, and eight species of African origin (Verdcourt and Halliday 1978). In 1990, a herbarium-based study of 126 specimens used 97 characteristics and revealed a tenth species, endemic in Africa (Maxted 1990). The same study grouped winged bean (P. tetragonolobus) with six African species (P. scandens, P. palustris, P. grandiflorus, P. lancifolius and P. lukafuensis) in a subgenus Psophocarpus, and three

318

African species (P. obovalis, P. monophyllus and P. lecomtei) in a subgenus Vignopsis. The research interest in winged bean was initiated by Masefield in 1973 followed by two meetings of the ad hoc advisory panel of U.S. National Academy of Sciences (NAS) in 1974 to promote winged bean as a crop with economic potential (NRC 1975; Levy and Hymowitz 1978). Research on winged bean and its wild relatives has led to a better understanding of the phylogenetic relationships within the genus in recent years. In 2012, a cladistic analysis of morphological traits of herbarium specimens of the nine species and species from three related genera (Vigna, Otoptera and Dysolobium), suggested that the genus Psophocarpus is monophyletic with its species been classified into four subclades, with P. lancifolius, and P. lukafuensis classified under the subgenus Vignopsis and P. obovalis, P. monophyllus and P. lecomtei grouped under the subgenus Lophostigma (Fatihah et al. 2012). More specifically, the results proposed: subgen. Psophocarpus sect. Psophocarpus (P. palustris, P. tetragonolobus and P. scandens); subgen. Psophocarpus sect. Vignopsis (P. lancifolius and P. lukafuensis); subgen. Lophostigma (P. obovalis, P. monophyllus, and P. lecomtei); as well as a new subgen. Longipedunculares (P. grandiflorus) separating P. grandiflorus from the subgenus Psophocarpus (Fatihah et al. 2012). Later, phylogenetic analysis based on chloroplast genome (cpDNA) regions and internal transcribed spacer (ITS) partially supported these findings, although it separated the four analysed winged bean accessions from all the other species (with P. palustris, and P. scandens being relatively closely related; Yang et al. 2018). Interestingly, the authors also reported the successful hybridisation of winged bean with P. scandens. Further investigations into the genus Psophocarpus will likely benefit from this and from the increasing availability of molecular tools (Mohanty et al. 2013; Vatanparast et al. 2016; Abdullah et al. 2017; Cheng et al. 2017; Wong et al. 2017). Winged bean can be grown as an annual or a perennial crop (Erskine 1979). It is grown unsupported for tuber production, or traditionally

N. Tsoutsoura et al.

upon vertical structures for pods and seeds, where it can reach up to 4 m (Fig. 17.1) (Mohanty et al. 2014b). Seed germination varies greatly within the same accession and between accessions, and scarification of the seed coat is recommended to achieve a more uniform seedling emergence (National Research Council 1981). Wide differences in the shape and colour of leaves, stems and flowers have been observed. Leaves are typically trifoliate in form, with the leaflet shape being ovate, deltoid or lanceolate. Depending on the variety, the stem colours can vary between green, purple and greenish purple, with green being the most common colour (Khan 1976). As a legume, winged bean flowers are papilionaceous, with a cleistogamous floral system. While predominantly considered selfpollinating, flowers are often visited by bees and other insects that could lead to a certain degree of cross-pollination (Karikari 1979; Erskine 1980). The flower colour varies from near white to blue and deep purple (Fig. 17.2) (National Research Council 1981; Eagleton 2019). Depending on the variety and planting conditions, location and photoperiod, plants usually start flowering from five weeks after planting onwards (Herath and Ormrod 1979; Raai et al. 2020). In terms of podding, it takes two to four months after sowing for pods to set (Fig. 17.3) (Erskine and Khan 1980; Eagleton 2019). Pod shape is rectangular, semi-flat, flat on the sides or flat on the suture (‘winged’) (International Board for Plant Genetic (IBPGR) 1982). Variation of the colour of pods and wings has been observed, with the central portion of the pod being cream, green, pink or purple and the pod wings being green or purple (Erskine and Khan 1977). In addition, the seed colours can be cream, different shades of brown, deep purple, black or mottled (Khan 1976; International Board for Plant Genetic (IBPGR) 1982; Eagleton 2019). Pod size can reach 30–40 cm and contain between five and 21 seeds (Poole 1978). After successful fertilisation, the developing seeds reach maturity (Fig. 17.2) around sixty-five to eighty-five days after flowering (Kadam et al. 1982; Higuchi et al. 1988).

17

The Winged Bean Genome

319

Fig. 17.2 Winged bean flowers and seeds. Purple winged bean flower from the accession W103 (top left) and white winged bean flower from the accession T53 (top right). Dark brown winged bean seeds (bottom left) and light brown winged bean seeds (right) (Photo of flowers by Niki Tsoutsoura, University of Nottingham, 2021; Photo of seeds by Yuet Tian Chong, University of Nottingham Malaysia, 2021)

17.2.2 Origin, Distribution and Germplasm Collection Winged bean is cultivated throughout Asia; mainly in India, Southeast Asia, in the highlands of Papua New Guinea and in Africa (Verdcourt and Halliday 1978; Yang et al. 2018). Its centre of origin remains enigmatic with two hypotheses prevailing. The first hypothesis supports Africa as the centre of origin based on the morphological similarities among P. tetragonolobus and other African species, such as P. grandiflorus (Smartt 1980). Possibly, the winged bean

progenitor could have originated on the African side of the Indian Ocean and then been transferred to the east as a wild plant, where it was domesticated (Fig. 17.4) (Lepcha et al. 2017). The second hypothesis suggests that winged bean arose in Asia from an unknown progenitor that has now become extinct (Verdcourt and Halliday 1978). The results of the phylogenetic analysis from Yang et al. (2018) showed a large genetic distance between winged bean and the closest African relatives of the P. scandens and P. palustris group, supporting the conclusion made by Verdcourt and Halliday in 1978 that the wild progenitor of winged bean is now extinct

320

N. Tsoutsoura et al.

Fig. 17.3 Winged bean pods. Purple flower and green pod of winged bean (on the left) Photo taken two months after planting (Photo by Yuet Tian Chong, University of Nottingham Malaysia, 2021). Purple winged bean pod

from the accession W120 (in the middle) and green winged bean pod from the accession W040 (on the right) (Photo of flowers by Niki Tsoutsoura, University of Nottingham, 2021)

Fig. 17.4 Global distribution of winged bean. The countries where winged bean is cultivated as a crop, mainly in South and Southeast Asian and the Pacific regions, are shown with light blue colour. The regions in countries where winged bean research and experiments have been carried out are shown with yellow dots. Whereas winged bean specimens have been preserved in

regions and countries shown with the orange triangles (Global Biodiversity Information Facility; Plant For A Future; Abe and Nakamura 1987; Amoo et al. 2006; Yang and Tan 2011; Reddy and Reddy 2015; Tanzi et al. 2019a, b). (Created by Yuet Tian Chong, University of Nottingham Malaysia, 2021)

17

The Winged Bean Genome

(Verdcourt and Halliday 1978) or at least unsampled. The identification of wild ancestors would shed light not only on the genetic changes related to domestication, but also provide resources for the improvement of winged bean (Prohens et al. 2017). The identification of novel genes from wild relatives and their introgression into winged bean varieties via breeding or gene editing methods could ultimately lead to the development of more resilient and efficient winged bean lines with improved yield in an uncertain and rapidly changing environment. Most winged bean germplasm collections are kept in genebanks maintained either by national and international organisations. Yang et al. (2018) named IITA (International Institute of Tropical Agriculture), U.S. Department of Agriculture (USDA) and the National Agriculture and Food Research Organization (NARO, Japan) genebanks as the source of the materials used in their study on the origin and diversification of winged bean (Yang et al. 2018). Approximately 51 accessions are kept by the U.S. National Plant Germplasm System (USDA), 12 by NARO as well as 271 accessions in International Institute of Tropical Agriculture, IITA, Africa.

17.2.3 Genetic Diversity Extensive genetic diversity in a germplasm is of importance as it can act as a great source of variation for improving crops for higher yield and resilience (Prohens et al. 2017). Mohanty et al. (2013) analysed the genetic relationship between 24 winged bean genotypes using Random Amplified Polymorphic DNA (RAPD) and Inter Simple Sequence Repeat (ISSR) molecular markers. The varieties fell into two distinct clusters and seven sub-clusters. Overall, they detected high levels of polymorphism and genetic distances across the varieties, which suggested a wide genetic base for the winged bean germplasm. Chen et al. (2015) used ISSR markers to evaluate the genetic distances in 45 accessions of winged bean which were cultivated in eight countries. However, the ISSR analysis showed little genetic variation and did not detect

321

a significant correlation between the genetic distance and origin of the accessions. Mohanty et al. (2019) investigated the genetic diversity of 95 winged bean accessions from six countries using AFLP and the ITS of nuclear ribosomal DNA markers. The population structure analysis revealed five sub-populations based on estimation from the frequencies of the alleles. Among the accessions, the genetic diversity was at medium to low levels with the maximum similarity detected among accessions from India, indicating the existence of a possible ancestral origin. The cluster analysis grouped the accessions into four groups with accessions from different origins in the same sub-cluster. These results agreed with Chen et al. (2015), Yang et al. (2018) and with later results from Ojuederie et al. (2020). When trying to assess the amount of genetic diversity, especially in material held in genebanks across the world, relying too heavily on the declared geographic origin, or on the assumption that genetic relationship can be inferred from the phenotype should be avoided (Tanzi et al. 2019a, b). Following on from this, Tanzi and colleagues suggested that the utilisation of high-throughput technologies, such as genotyping-by-sequencing, could provide more information on the geographic origin and genetic diversity among the germplasm held in seedbanks. This would be crucial to revamp efforts towards preserving winged bean genetic diversity and to provide access to truly diverse material in breeding programmes.

17.3

Food and Nutritional Value

The seeds of winged bean are rich in protein, vitamins and minerals, as well as secondary metabolites such as phenolics and flavonoids, which act as antioxidants (Gross 1983; Kantha and Erdman 1984; Lepcha et al. 2017; Adegboyega et al. 2019; Bassal et al. 2020). However, winged bean also has a considerable content of antinutritional factors (ANFs) such as proteinase inhibitors, tannins and phytic acid. These ANFs have the ability to inhibit the absorption of

322

various nutrients either by preventing their release during digestion or by binding to proteins or other nutrients, resulting in them passing through the gastrointestinal tract (Adegboyega et al. 2019; Singh et al. 2019).

17.3.1 Protein The crude protein content of mature seeds of winged bean ranges from 30 to 40% (Černý et al. 1971; Claydon 1975; Kadam 1984; Adegboyega et al. 2019). Adegboyega et al. (2019) compared 25 winged bean accessions and found a crude protein content between 34 and 40% in processed seeds, and 28.5–31% in unprocessed seeds. The processed seeds were cleaned, slightly roasted under low heat, coarse-milled and winnowed to remove the seed coat, then the decorticated grains were milled into fine powder and sieved, whereas the unprocessed seeds were cleaned and milled into fine powder directly. There were significant differences observed for the moisture, fat, crude fibre and carbohydrate contents, as well as crude protein, in both the processed and unprocessed seeds. The observed protein levels in unimproved material offers opportunities for selection to target high protein contents for inclusion in genetic improvement programmes. Overall, winged bean has higher protein content than other legumes grown in the tropic regions, such as lentil (Lens culinaris) (24.6%), cowpea (Vigna unguiculata) (23.8%), chickpea (Cicer arietinum) (20.5) and pigeon pea (Cajanus cajan) (21.7%) (US Department of Agriculture 2019). Comparing winged bean to soybean, types of soybean meal that include parts of the hulls contain less than 47% protein, whereas high protein types of soybean meal is obtained from dehulled seeds contain 47–49% of protein (Heuzé et al. 2020).

17.3.2 Amino Acid Composition Winged bean seeds resemble soybean in amino acid composition. Mnembuka and Eggum (1995) compared the nutritive value of winged bean

N. Tsoutsoura et al.

with soybean and other legumes grown in Tanzania, including green gram (Vigna radiata), bambara groundnut, pigeon pea, field pea (Pisum sativum) and cowpea. The results showed methionine to be the most limiting amino acid in winged bean seeds, followed by tryptophan, histidine and cysteine, while winged bean seeds were rich in lysine and threonine. These results agree with other studies (Okezie and Martin 1980; Ekpenyong and Borchers 1982; King and Puwastien 1987; Wan Mohtar et al. 2014), suggesting that winged bean is a good alternative to soybean, based on the amino acid profile. The study of Prakash et al. (1987) showed variation in protein and amino acid composition between 16 strains of winged bean obtained from the Indian Institute of Horticulture Research, Bangalore, one NBRI selection and four strains from Sri Lanka which were cultivated under uniform conditions (Table 17.1).

17.3.3 Minerals and Vitamins The rich mineral and vitamin concentration of winged bean seeds, pods and leaves adds to its nutritional value. Winged bean seed flour could be an important source of minerals with the levels of phosphorous, calcium and magnesium being similar to soybean (Mnembuka and Eggum 1995; Amoo et al. 2006). Okezie and Martin (1980) compared 20 winged bean varieties from Puerto Rico and the United States of America by nutrient analysis of seeds, seed hulls and fresh leaves, detecting significant differences in nutrient contents between the genotypes and the different parts of the plant. The results showed high calcium (0.28–0.86%), phosphorus (0.36– 0.72%) and iron (58–308 ppm) levels in the dry dehulled and whole seeds, with the seed hull containing the highest levels of iron. Interestingly, the calcium content in leaves (3.21– 4.41%) was considerably higher than the seeds, and higher levels of iron (126–298 ppm), potassium (0.62–1.66%) and magnesium (0.28– 0.36%) were detected in leaves as well. Leaves of winged bean are used in soups and salads, making them an important source of calcium,

17

The Winged Bean Genome

Table 17.1 Protein and amino acid composition (g/100 g protein) of winged bean and soybean seeds

323

Amino acids

Winged bean

Soybean

Glutamic acid

8.2–12.00

9.9

Aspartic acid

7.5–11.0

10

Leucine

6.2–7.7

7.2

Lysine

4.2–6.5

5.4

Arginine

4.0–6.2

5.3

Proline

6.1–8.4

6.8

Serine

6.1–8.4

6.6

Valine

4.7–6.4

4.9

Tyrosine

3.3–4.5

2.2

Isoleucine

4.7–6.7

5.2

Phenylalanine

3.8–5.3

4.9

Alanine

6.2–8.3

6.6

Threonine

4.6–6.9

5.4

Glycine

6.5–7.9

7.3

Histidine

2.8–4.1

3.8

½ Cysteine

0.6–1.7

1.5

Methionine

0.1–1.0

1.2

Protein (%)

38.1–45.0

43.7

Data from Prakash et al. (1987)

phosphorus and iron in tropical countries. Leaves also contain high levels of carotenoids (5240– 20,800 IU/100 g fresh weight), vitamin C (14.5– 128 lg/100 g fresh weight) and folic acid (67 lg/100 g dry weight), while the seeds contain adequate levels of folic acid and tocopherols (de Lumen et al. 1982a, b; Kantha and Erdman 1984). Tuberous roots are also edible and rich in protein with the protein content ranging from 12 to 19%, with sulphur containing amino acids being limited. However, antinutritional factors such as trypsin and chymotrypsin inhibitors as well as haemagglutinins were present in the tubers (Kantha and Erdman 1984; Kortt and Caldwell 1984; Adegboyega et al. 2019).

1972; Garcia and Palmer 1980; King and Puwastein 1987; Lepcha et al. 2017). The oil content of mature winged bean seeds varied from 15 to 20.4%, with the saturated and unsaturated fatty acids ranging from 30 to 40% and 60 to 70%, respectively (Khor et al. 1982; Kantha and Erdman 1984). Mohanty et al. (2014a) determined the fatty acid composition in immature, mature and fully mature seeds of winged bean. They found that the immature seeds had the highest percentage of saturated fatty acids (61.3%), whereas the fully mature seeds had the highest percentage of unsaturated fatty acids (75.5%), with nearly equal ratio of mono-unsaturated fatty acids (38.6%) and poly-unsaturated fatty acids (36.9%) (Mohanty et al. 2014a).

17.3.4 Lipids 17.3.5 Antinutritional Factors (ANFs) Lipids are major components of winged bean seeds and the content is dependent upon the genotype, environment, location and soil type where the cultivar is grown (Worthington et al.

Legumes are an important part of human diets due to their high nutritional value. They are rich in protein, fibre and vitamins, but they also

324

contain ANFs, such as phytic acid, tannins and proteinase inhibitors. These ANFs are characterised as non-nutritional or toxic compounds that can inhibit the absorption of nutrients and have deleterious effects, particularly when the pulses are consumed raw or under-cooked. In humans and animals, antinutritional factors such as lectins can cause diarrhoea, vomiting, inflammation and blood agglutination (Peumans and Van Damme 1995). In plants, these compounds are used as defence mechanisms against fungi, insects and herbivores, as well as being an energy storage for the plants to continue their growth under extreme environmental conditions (Bessada et al. 2019). The ANFs can be classified into two groups based on their structure, with the first group containing proteins such as lectins, agglutinins, protease inhibitors and bioactive compounds, while the second group contains non-protein compounds such as phytic acid, tannins and saponins (Sánchez-Chino et al. 2015). Processing methods, such as germination, dehulling, moist heat and soaking in water or alkali solutions, are commonly used to reduce the levels or activities of ANFs in legumes and thereby eliminate their negative effects on digestion and absorption of nutrients, without compromising the nutritional value of the pulses (Samtiya et al. 2020). It is worth mentioning that when consumed in small amounts, ANFs can have a positive effect on human health. For example, studies have shown that proteinase inhibitors and isoflavones can act as anticancer agents (Messadi et al. 1986; DeClerck and Imren 1994; Gurfinkel and Rao 2003; Dong and Qin 2011; Sánchez-Chino et al. 2015). Adegboyega et al. (2019) showed significant variations in tannin content ranging from 1.8 to 2.5% in processed seeds and 1.3–3.4% in unprocessed winged bean seeds. The processing methods used not only increased the protein content in the flour, but also lowered the tannin content, probably due to the removal of the seed coat. The high protein content of winged bean seeds, combined with use of appropriate processing methods, suggests that winged bean flour could be added in food formulation in order to increase protein content.

N. Tsoutsoura et al.

It is important to understand the genetic mechanisms and biosynthetic pathways related to the production and accumulation of antinutritional factors in winged bean seeds. For example in soybean, where genetic improvement of seed protein content is desirable, major quantitative trait loci (QTL) for soybean protein have been detected and mapped on chromosomes 20 (LGI), and 15 (LG-E) (Patil et al. 2017). Shedding light on the genes and mechanisms regulating the synthesis of seed storage proteins and ANFs in winged bean would be useful in selecting accession or improved accessions of high nutritional value and reduced levels of ANFs. Genotyping with molecular markers is a requirement for quantitative trait loci (QTL) mapping and genome wide association studies (GWAS). The identification and utilisation of genetic markers would assist breeding programmes (Chapman 2015), while gene editing methods could also be a valuable tool in silencing genes related to undesirable traits, thereby contributing to the genetic improvement of winged bean.

17.3.5.1 Proteinase (Trypsin and Chymotrypsin) Inhibitors Legumes contain serine protease inhibitors that inhibit digestive enzymes, trypsin and chymotrypsin, by competitive binding. The protease inhibitors are categorised into two groups, the Kunitz trypsin inhibitor (KTI) of 20–24 kDa, and the Bowman-Birk inhibitor (BBI) with molecular weight of 7–8 kDa (Birk 1985; Wati et al. 2010; Muzquiz et al. 2012). Kortt (1983) examined 27 varieties of winged bean from six regions across Southeast Asia for proteinase inhibitor contents and found that levels of trypsin and chymotrypsin inhibitors varied between genotypes, with Malaysian and Indonesian varieties showing the lowest levels. The study also showed that the average chymotrypsin inhibitory activity ranged between 30 and 48 mg chymotrypsin inhibited per g of defatted seed and was higher than trypsin inhibitory activity (23–36 mg trypsin inhibited per g of defatted seed). It is also important to note that the stoichiometry of inhibition for the winged

17

The Winged Bean Genome

bean chymotrypsin inhibitor activity was 1:2 (i.e. 1 molecule inhibits 2 enzymes) whereas it was 1:1 for trypsin inhibitor activity (Kortt 1979, 1980, 1981). From winged bean, there have been three BBI trypsin inhibitors isolated and nine KTI (four trypsin, four chymotrypsin and one trypsin/ chymotrypsin inhibitor) (Shibata et al. 1986; Giri et al. 2003). Giri et al. (2003) purified seven winged bean trypsin inhibitors and showed different binding potentials against gut proteinases of Helicoverpa armigera. H. armigera is a major bollworm pest of cotton (Gossypium), legumes and other plant species. H. armigera has evolved insecticide resistance, significantly lowering yields in countries like India, Australia, Indonesia and Thailand, resulting in annual losses of $300–500 million (Srinivas et al. 2004). The 28 kD winged bean trypsin inhibitor showed at least a threefold higher inhibitory activity in the gut of H. armigera than the bovine version. In addition, the putative Kunitz-type chymotrypsin inhibitor genes, WCI2 and WCI5, isolated from winged bean, were shown to inhibit the gut proteinases of H. armigera larvae. These results suggest that the proteinase inhibitors of winged bean could be a fruitful target of further studies for the development of transgenic lines resistant to H. armigera, a pest that affects many important crops and develops fast resistance to pesticides (Giri et al. 2003; Telang et al. 2008).

17.3.5.2 Phytohaemagglutinins or Lectins Lectins (haemagglutinins or phytohaemagglutinins) are a group of proteins found in various organisms that have the ability to bind carbohydrates. When these proteins bind specifically to known sugars and agglutinate red blood cells, they are referred to as lectins. Lectins have at least one non-catalytic domain, which can bind reversibly to specific monosaccharides or oligosaccharides (Lagarda-Diaz et al. 2017). In plants, lectins are found abundantly in the cotyledons and endosperm of legume seed, accounting for 2–10% of the total protein (Lis and Sharon 1986). They contribute to physiological regulation, defence against

325

microorganisms, transport of carbohydrates, mitogenic stimulation and recognition of nitrogen-fixing bacteria from the genus Rhizobium (Sharon and Lis 1990, 2004; Chrispeels and Raikhel 1991; Nasi et al. 2009). Lectins apart from their deleterious effects in humans and animals (Peumans and Van Damme 1995) can also have positive effects. Based on their activities, the production of lectins may have practical applications, as they have numerous positive effects in human health, such as antitumor (Kwan Lam and Bun Ng 2011), antifungal and antiviral activities (Sánchez-Chino et al. 2015; LagardaDiaz et al. 2017). The presence of lectins in winged bean seeds relate to its toxicity (Kortt 1983). When a 30% raw winged bean diet was fed to rats, they showed significant growth depression, morphological changes in the small intestine and 100% mortality within 10–20 days. The lethal effect was eliminated by autoclaving winged bean seeds at 120 °C for 30 min. Rats fed with autoclaved winged bean seeds gained body weight comparable to the rats fed with casein diet, suggesting that the lethal action was eliminated by autoclaving (Higuchi et al. 1983).

17.3.5.3 Tannins Tannins, as well as other phenolic compounds, are secondary metabolites widely produced by plants that play an important role in defence strategies against insects, birds and fungi. Tannins also contribute to colour, flavour and astringency of fruits, but tannins seem to negatively impact the nutritional quality of the food (Chiba 2003). Tannins can be classified into four groups depending on the structure of the monomer: proanthocyanidins (or condensed tannins), hydrolysable tannins, complex tannins and phlorotannins (Serrano et al. 2009). Hydrolysable tannins are hydrolysed by enzymes, acids or alkalis, whereas condensed tannins are resistant to hydrolysis. Condensed tannins are the major polyphenols in commonly consumed food (Salunkhe and Chavan 1990; Gilani et al. 2005). Tannins have the ability to bind and precipitate proteins, therefore reducing protein and amino acid digestibility in monogastrics, such as pigs

326

N. Tsoutsoura et al.

and poultry (Smulikowska et al. 2001). The complexes formed with glycoproteins and the astringent properties of tannins, reduce their palatability and lower the nutritional value of pulses (Gilani et al. 2005; Bessada et al. 2019). Unfortunately, heat is not the best method for the reduction of tannin content, as tannins are heat resistant (de Lumen and Salamat 1980). Instead, soaking winged bean seeds in different salts has been effective in decreasing the tannins. Like many pulses, the seed coat of winged bean has the highest tannin levels, but its removal is difficult and is not commonly practised when cooking (de Lumen and Salamat 1980; Tan et al. 1984). The high tannin content and indigestible fibre of the seed coat are thought to be responsible for the lower metabolisable energy and the poor response of broilers fed winged beans (de Lumen et al. 1982a, b). Therefore, winged bean lines with lower tannin content in the seed coat and improved techniques for seed coat removal during food processing should be investigated to avoid the ingestion of high tannin contents.

17.4

Barriers to the Greater Utilisation

In this context, a leguminous species such as winged bean has the potential to contribute in different ways, especially considering the nutritional values mentioned in the previous section, although several constraints limit its wider utilisation. Winged bean has the potential to be a cash crop with limited input requirements, but when grown as a horticultural crop it requires a vertical structure (trellising or staking) to increase pod productivity, leading to increase in the cultivation cost in the short term and also limit mechanised harvesting (Rachie and Luse 1978; Wong et al. 2015). On the contrary, winged bean could make use of structures set up for other major crops (i.e. tomatoes) in rotation systems. In addition, improving its plant architecture and yield-component traits could allow an increase in planting density, thus further improving the final harvest per unit of land (Tanzi et al. 2019a, b).

As a pulse crop, ideotypes with early maturing, ‘bushy’ architecture, dwarf type with side branches and short or few internodes that produces many pods would be desirable for increased seed production and cultivation in large scale. Kesavan and Khan (1978) isolated winged bean mutants with determinate and dwarf growth habits using gamma rays and ethyl methyl sulphonate (EMS) on a number of pure line genotypes. No determinate mutants had been isolated from that experiment, however, mutants with single cotyledon, albino types and darker colour foliage were reported (Jugran et al. 1986; Quan et al. 2011). The generation or identification of a dwarf variety is still considered an objective in winged bean research, probably to facilitate harvesting through mechanisation and improve its utilisation as pulse crop, by adapting to arable cropping. However, it will be crucial to assess the final impact on pods and seed productivity that such a change may cause. Another limiting factor to the greater utilisation of winged bean is the hard-to-cook seeds it produces. The seeds contain antinutritional factors and a thick seed coat with cooking time lasting for several hours, leading to significant loss of protein quality by lowering protein digestibility and bioavailability as well as minerals such as potassium and magnesium (Ekpenyong and Borchers 1980; Henry et al. 1985). However, processing methods such as seed coat removal, heating, soaking, boiling and pressure cooking the seeds can improve nutritional quality by reducing ANF concentration, thus increasing protein digestibility (Ekpenyong and Borchers 1980; Kadam et al. 1987; Gilani et al. 2012). The quantification of the ANFs and the effect of seed treatments (such as seed coat removal) should be also evaluated in view of the quality and quantity of seed protein. Protein quality is affected by the digestibility and quantity of the essential amino acids. For example, the dehulling process would increase protein content and reduce the antinutritional factors contained in the seed coat such as tannins. This would improve the palatability, taste and digestibility of pulses as well as decrease the cooking time (Bessada et al. 2019). The dehulling of winged

17

The Winged Bean Genome

bean seeds would have a positive impact on protein quality and quantity but it is practically difficult.

17.5

Genome

17.5.1 Genome Sequencing Winged bean has a diploid (2n = 2x = 18) genome of around 1.22 Gbp (Vatanparast et al. 2016). Ho et al. (unpublished data) utilised the combination of Illumina (for accuracy) and Oxford Nanopore Technologies (ONT) (for long reads) platforms to generate genomic resources for winged bean. Combining with Bionano Genomics optical mapping, the current genome assembly comprised 48 hybrid scaffolds covering 536,131,541 bp in total with a N50 size of 23,875,316 bp (N90 = 6,932,124 bp). The scaffolds generated from this approach range from 122,770 to 38,637,442 bp. Two genetic maps in winged bean using a common paternal parent have been used to place 38 hybrid scaffolds onto pseudo chromosomes, resulting in a draft genome with nine pseudochromosomes encompassing 530,283,461 bp and 26,354 protein coding genes annotated. The pseudochromosome length ranges from 24,607,972 to 85,053,349 bp (Ho et al. 2022).

17.5.2 Transcriptome Assembly and Molecular Markers Utilising transcriptome sequences for genetic studies of crop origins, genetic diversity and identifying molecular markers for genetic mapping can be useful tools to accelerate the genetic improvement of crops through breeding and/or gene editing. Chapman (2015) sequenced, assembled and annotated the transcriptomes of four underutilised crops, hyacinth bean (Lablab purpureus (L.) Sweet), grasspea (Lathyrus sativus L.), winged bean (accession 477,137 from Nigeria) and bambara groundnut (Vigna subterranea (L) Verdc.). The aim was to identify SSR markers and conserved orthologous set of

327

markers across the legume species that could be used to investigate the genetic variation. The Illumina-based transcriptome by Chapman (2015) was used in the study of Vatanparast et al. (2016) to identify contigs from the transcriptome assembly corresponding to the Kunitz trypsin inhibitor (KTI) gene family within the Psophocarpus transcriptome. A de novo transcriptome assembly and annotation of two winged bean accessions (CPP34 and CPP37) from Sri Lanka were produced by Vatanparast et al. (2016). In this study, using single-end 454 pyrosequencing that produces long reads (300–800 bp) the genotype CPP34 produced 369,820 single-end reads (total 136,943,216 bp), and the genotype CPP37 produced 334,639 singleend reads (total 92,126,948 bp). When comparing the independent reads from the transcripts of CPP34 and CPP37, less than 200 high-confidence SNPs were detected, corresponding to approximately one SNP every 150,000 bp and indicating a high similarity between the two accessions. Combining the reads from the independently sequenced accessions, a single assembly CPP34-7 was produced. Unassembled reads, notated as singletons post-assembly, were included to the final assembly of CPP34-7 as they could possibly be full-length mRNA transcripts and used in the Gene Ontology (GO) and SNP analyses. The assembled contigs from the CPP34-7 were compared to protein sequence databases from chickpea (Cicer arietinum L.), pigeon pea (Cajanus cajan (L.) Huth), soybean (Glycine max (L.) Merr.), common bean (Phaseolus vulgaris L.), Medicago truncatula Gaertn. and Lotus japonicus (Regel) K. Larsen using the BLASTX (translated nucleotide sequence searched against protein sequences). The BLASTX revealed that from CPP34-7, 96.5% of the contigs had significant sequence similarity to one or more protein sequences used, with most of the contigs (57.3%) being most similar to soybean. From the GO analysis, 274 transcripts were annotated as transcription factors and 176 putative winged bean transcription factor genes were identified and classified in at least ten different families with their overall distribution being similar to other legumes (Vatanparast et al. 2016). The top five categories were as follows:

328

basic leucine zipper (bZIP; 32), TeosinteBranched1/Cycloidea/PCF (TCP; 19), MADS (17), MYB (11) and WRKY (9). In terms of identifying SSRs, the analysis showed that of a total of 12,956 SSRs. From those, 10,984 were perfect SSRs, consisting of a single motif repeats; 13 were imperfect SSRs, containing a base pair not belonging to the motif between repeats and 1959 compound SSRs composed of two or more adjacent individual repeats. From the perfect SSRs, 7933 SSRs were hexamers with only two repeats and the remaining 2994 perfect SSRs were 405 di-, 1288 tri-, 482 tetra-, 211 penta- and 608 hexamer SSRs. The repeat motif type (AG/GA/TC/CT)n accounted for the 77.7% of all the dinucleotide repeats, whereas the motif types (AT/TA)n and (AC/CA/GT/TG)n accounted for the 13.7% and 8.6%, respectively. A high-confidence set of 5190 SNPs were identified with 96% being one-to-one point mutations between the Sri Lankan samples and a geographically separated Nigerian winged bean genotype. The latter was sequenced by Chapman (2015). From the 5190 SNPs, around 4% (211 SNPs) were length variants with one or more point mutation. The one-to-one polymorphisms were found to be 4979 SNPs, with 3433 (68.9%) being transitions and 1546 (31.1%) being transversions, making a transition: transversion ratio of 2.22 (Vatanparast et al. 2016). SNPs can be used to access variation among genotypes and utilised in quantitative trait loci (QTL) mapping, linkage maps and breeding studies as reported in the study of Vatanparast et al. (2016). However, further research needs to be done for validation of SNPs from this study. Vatanparast et al. (2016) also analysed the soybean trypsin inhibitor (STI) gene family and found similarity to winged bean KTI. Understanding the evolution and diversity of the Kunitztype trypsin inhibitors gene family in winged bean, the STI sequences were used to generate a gene tree and showed that 28 out of the 32 putative Psophocarpus STI regions were clustered with soybean (Glycine max) and at least eight Kunitz trypsin inhibitor loci were linked within 68 kbp on chromosome 8 (positions 44,850,000–

N. Tsoutsoura et al.

44,918,000). Lineage-specific amplification of Psophocarpus Kunitz-type trypsin inhibitors sequences was suggested, considering the conserved synteny between soybean and Psophocarpus. Identification of molecular markers linked to the trypsin inhibitor genes family in winged bean would assist genotyping and breeding. In addition, the investigation of structure and regulation of winged bean Kunitz trypsin inhibitor genes could be utilised not only in breeding selection, but also in gene editing. Genes, transcription factors and markers identified in these lines could be utilised in breeding to reduce the amount of antinutritional factors, such as trypsin and chymotrypsin inhibitors as well as tannins, and improve the nutritional value of the seeds. The first set of validated SSR markers in winged bean was reported by Wong et al. (2017), with 18 genic-SSRs, 9 of which were further used by Yang et al. (2018) in their population genetic analysis. Wong et al. (2017) developed a de novo transcriptome assembly from leaf, root, pod and reproductive tissues of six Malaysian winged bean accessions. From the 198,554 contigs (with a N50 of 1462 bp), 138,958 contigs (70.0%) were annotated. The majority of the SSR motifs identified were AAG/AGA/GAA/CTT/ TCT/TTC trinucleotide repeats (4855), followed by dinucleotide repeats (4500) with dimer motifs AG/GA/CT/TC type and AT/TA. These results are similar to the study of Vatanparast et al. (2016) and Jayashree et al. (2006) which reported SSRs distributed in Expressed Sequence Tags (ESTs) from soybean (Glycine max), barrel medic (Medicago truncatula) and lotus (Lotus japonicus). The study of Wong et al. (2017) identified 18 SSR markers with 8 of them consisting of dinucleotide and 10 consisted of trinucleotide repeated motifs. The 18 SSR markers were validated as polymorphic across nine winged bean accessions originated from five countries. For the 18 SSR markers, the individual Polymorphism Information Content (PIC) ranged from 0.16 to 0.67. More specifically, for the 8 dimer SSR markers an average number of 2.5 alleles per locus was observed with an average PIC value 0.37. In addition, the 10 trimer SSR

17

The Winged Bean Genome

markers amplified on an average of 2.4 alleles per locus, with an average PIC value 0.39. The limited number of accessions used could be one of the reasons why the validation rate of the polymorphic markets was low. Wong et al. suggested that increasing the number of screened accessions from different geographical origins could possibly lead to a higher validation rate of polymorphic markers. Interestingly, the study of Yang et al. (2018) used five primer pairs of these markers in their study aiming to identify genetic clusters of winged bean accessions and relations to their geographic origin. In addition, the study of Singh et al. (2017) on two winged bean lines with high and low condensed tannin levels reported that in the total number of sequences examined 2237 and 1618 SSRs were revealed in the high and low condensed tannin lines, respectively. In high condensed tannin line, 881 SSRs were trinucleotide repeats whereas there were 663 trinucleotide SSRs in low condensed tannins. The development and validation of genic SSR markers provides greater information on the winged bean genome. The construction of linkage map with the molecular markers would help to identify QTLs and assist plant breeding. The study by Singh et al. (2017) examining the leaf transcriptome of two winged bean lines, containing different condensed tannin content, revealed more than 1200 contigs that were differentially expressed. They were selected based on low and high condensed tannin content in leaf tissues and variable metabolite concentration in the seeds of the contrasting lines. The transcriptome and pathway analysis revealed that the anthocyanidin synthase and chalcone synthase genes were less expressed in the low condensed tannin line, whereas they were highly expressed in the high condensed tannin winged bean line. Singh et al. (2017) proposed that the condensed tannin biosynthesis could take place in the leaves and then be transported to the seeds. Singh et al. (2017) also identified genes and contigs responsible for the biosynthesis of the condensed tannins. The RNA-seq data, using Illumina Nextseq 500 sequencer, generated 102,586 contigs for the high condensed tannin line and 88,433 contigs for the low condensed

329

tannin winged bean line. Contig generation using the same hash length resulted in 87,925 and 69,464 contigs for the high and low condensed tannin winged bean lines, respectively. From both samples, the total number of contigs after clustering at 95% identity and query coverage was 44,972. The 44,972 contigs were annotated with Arabidopsis thaliana, Glycine max and Lycopersicum esculentum. These contigs were assembled and mapped to the reference canonical pathways in KEGG. Similarity searches on the databases gene ontology (GO) and Kyoto encyclopaedia of genes and genomes (KEGG) showed that 5210 contigs were involved in 229 different pathways. The results also revealed differential expression of 1235 contigs detected between the two lines. KEGG analysis showed that 10 condensed tannin biosynthesis genes, including anthocyanidin synthase (ANS), 4-coumarate-CoA ligase (4CCL), chalcone synthase (CHS), chalcone–flavonone isomerase (CHFI), chalcone isomerase (CHI), cinnamyl alcohol dehydrogenase (CAD), dihydroflavonol 4-reductase (DFR), cinnamoyl CoA reductase (CCR), phenylalanine ammonialyase (PAL) and anthocyanidin 3-O-glucosyltransferase (A3GT) had lower expression in the leaves of the low condensed tannin winged bean line compared to the high condensed tannin winged bean line. In the same study (Singh et al. 2017), the de novo assembly of contigs revealed 15 different transcription factor families comprised of 33 and 5 contigs from the high and low condensed tannin lines of winged bean, respectively. The transcription factors were more frequently present in high condensed tannin line than in low condensed tannin line, with the bHLH group of transcription factors only present in the high condensed tannin line. However, it would also be interesting to perform transcriptome sequence analysis in the growing pods of winged bean lines with high and low tannin content to reveal more information about the genes involved in the pathways of condensed tannin biosynthesis. Tannin content in the seeds has been correlated with the colour of the seeds and flowers as tannins are part of the flavonoid biosynthesis

330

N. Tsoutsoura et al.

pathway (Klu et al. 1997; Smulikowska et al. 2001). Indirect selection on a distinguishable trait like the colour of flowers and seeds could improve selection of lines with lower levels on antinutritional factors, like tannins, in the winged bean seeds. Klu et al. (1997) generated four mutants with altered tannin content using gamma radiation in the parental lines and F1 and F2 seeds with altered seed coat colour were selected. Interestingly, all the plants from the F3 generation produced only seeds with altered seed coat colour, providing a strong indication that a recessive mutation could be involved. Identifying genes, transcription factors and markers in high and low condensed tannin lines of winged bean could be a useful tool in creating new lines with silenced genes, or in breeding programmes with the aim of reducing the levels of the antinutritional effect of the tannins and improve the nutritional value of the winged beans.

17.6

Future Prospects

It is important to understand the genetics underlying the desirable quantitative and qualitative traits, as it could assist breeding selection and contribute to the development of winged bean varieties with desirable plant architecture and high-quality nutrition products. In the 1980s, Erskine and Khan studied qualitative traits and found that a single gene controlled the shape of the pod and the colour of stem, calyx, pod and pod wings (Erskine and Khan 1977). They also studied the overall variability within and between landraces of winged bean collected in the Highlands of Papua New Guinea for three quantitative characters (flowering, pod length and seed weight). Significant differences between the landraces were detected for the three traits, suggesting differences in the selection pressures on the landraces probably caused by the local farmers. These differences, especially in adjacent landraces, could be maintained by a low or absence, of gene flow among the landraces (Erskine and Khan 1980). This is of high probability as winged bean is a cleistogamous, selffertilising crop with limited gene flow even in

close proximity (Tanzi et al. 2019a, b). In the small sample size of 14 landraces examined for variability in stem colour, pod specking, pod wing colour and pod shape, all the loci examined showed allelic polymorphism (Erskine and Khan 1980). Genetic diversity among the accessions could be utilised in breeding programmes to obtain higher variability and introduce desirable traits. Next generation sequencing technologies and high-throughput phenotyping techniques, when combined in genome wide association studies, have the potential to reveal genetic loci that are associated with key traits (D’Agostino and Tripodi 2017). The integration of genomic studies from other legumes, such as soybean and common bean, with the transcriptome sequences of winged bean could be useful in identifying molecular markers. Comparative genomics, the development of molecular markers, linkage maps and QTL analysis would contribute, by identifying the genetic bases of desirable traits, for the improvement of winged bean plant architecture, yield and nutritional value (Wong et al. 2017). The best winged bean ideotype, for growing it as a pulse crop, is proposed to be early maturing, with a ‘bushy’ and dwarf architecture, producing seeds with high nutritional value and reduced antinutritional factors without the need for trellising (Klu et al. 1997). Aiming to improve the protein content of winged bean seeds, linkage analysis and genome wide association studies (GWAS) should be combined with quantitative trait loci associated with protein content in winged bean. This will help identifying genes related to protein content and closely linked markers, which would also serve as useful tools for plant breeders. As an underutilised and largely unimproved crop, it will be important to couple genomics and marker-assisted selection with new approaches to breeding species with flat genetic structures. Moreover, the desirable ideotype for winged bean will depend on the end use and context of cultivation, so a number of ideotypes may be required and a series of selection indices are also needed, ranging from genomic to protein functionality and processing. Such a flat genetic

17

The Winged Bean Genome

structure together with complex selection indices may provide opportunities for Genomic Selection (GS) models, supplemented by high-throughput phenotyping (Montesinos-López et al. 2021). At the least, it is clear that multiple alleles will need to be combined to achieve the breeding objectives, so Multiple Advanced Generational Inter Crosses (MAGIC; Huynh et al. 2018) may be appropriate, combined with Nested Association Mapping (NAM; Gangurde et al. 2020) populations to elucidate and identify the desirable alleles in a hybrid-GS approach. Once desirable ideotypes are in development, and they need to be coupled with processing methods to decrease the antinutritional factors; processing methods such as boiling and autoclaving are widely used, with the time and temperature varying based on the amount of the antinutritional factors. In addition, antinutritional factors that are heat resistant are more difficult to decrease, adding extra steps to the processing methods, such as seed coat removal, of the seeds would raise the cost of food products. Therefore, improving the nutritional value of winged bean by identifying varieties with lower levels of antinutritional factors and understanding their genetic control would assist breeding selection, but will take longer to achieve. Breeding should focus on developing not only greater protein quantity but also quality, taking into account amino acid content as well as reducing levels of antinutritional factors. In addition, processing methods and production of fermented products such as tempe in winged bean would be more appealing to the consumers. Fermentation and germination of seeds are also methods that can reduce the amount of antinutritional factors (Samtiya et al. 2020). As the winged bean protein is limited in S-amino acids such as methionine and rich in lysine, combining winged bean flour or protein isolates with cereal flour from rice, maize or wheat that have higher methionine and lower lysine would increase the protein quality of the flour. The quality of these two proteins combined will be higher than either of the two components, increasing the biological value due to protein complementation.

331

17.7

Conclusion

The domestication of crops over thousands of years has led to significant changes in their morphology, plant architecture and yield as compared to their wild ancestors. Nowadays, increased food demand drives the need for higher production of quality food. Donald (1968) suggested the design of a crop ideotypes that would have a predictable performance in a specific environment. Winged bean as a crop has multiple uses, and multiple parts of the plant are eaten (e.g. favouring tubers in Thailand and immature pods in Malaysia). This is a major advantage, as tubers, pods and seeds are of high nutritional value. Winged bean has been cultivated by indigenous communities in Asia and sold in local markets. As an underutilised crop, it has received limited research for the improvement of its vining plant architecture and the utilisation of its high nutritional value. Recent advances in transcriptomics could assist genomic research and accelerate breeding selection using genetic markers. Winged bean could play an important role in food security, and more research needs to be done in order to explore the potential of winged bean to become a new soybean for the tropics.

References Abdullah SNA, Ho C-L, Wagstaff C (2017) Crop improvement: sustainability through leading-edge technology. Crop improvement. Springer International Publishing. http://doi.org/10.1007/978-3-319-65079-1 Abe J, Nakamura H (1987) Evaluation of winged bean in okinawa Adegboyega TT et al (2019) Nutrient and antinutrient composition of winged bean (Psophocarpus tetragonolobus (L.) DC.) seeds and tubers. J Food Qual 1– 8. http://doi.org/10.1155/2019/3075208 Amoo IA, Adebayo OT, Oyeleye AO (2006) Chemical evaluation of winged beans (Psophocarpus tetragonolobus), Pitanga cherries (Eugenia uniflora) and orchid fruit (Orchid fruit myristica). Afr J Food Agric Nutr Dev 6(2):1–12 Bassal H et al (2020) Psophocarpus tetragonolobus: an underused species with multiple potential uses. Plants 9(12):1–11. https://doi.org/10.3390/plants9121730 Bessada SMF, Barreira JCM, Oliveira MBPP (2019) Pulses and food security: dietary protein, digestibility, bioactive and functional properties. Trends Food Sci

332 Technol 93:53–68. https://doi.org/10.1016/j.tifs.2019. 08.022 Birk Y (1985) The Bowman‐Birk inhibitor. Trypsin‐ and chymotrypsin‐inhibitor from soybeans. Int J Pept Protein Res 113–131. http://doi.org/10.1111/j.13993011.1985.tb02155.x Brody (1975) Plant is praised as rich in protein. The New York Times (nytimes.com) Černý K et al (1971) Nutritive value of the winged bean (Psophocarpus palustris Desv.). Br J Nutr 26(2):293– 299. http://doi.org/10.1079/bjn19710035 Chapman MA (2015) Transcriptome sequencing and marker development for four underutilized legumes. Appl Plant Sci 3(2):1400111. https://doi.org/10.3732/ apps.1400111 Chen D et al (2015) Genetic diversity evaluation of winged bean (Psophocarpus tetragonolobus (L.) DC.) using inter-simple sequence repeat (ISSR). Genet Resour Crop Evol 62(6):823–828. http://doi.org/10. 1007/s10722-015-0261-3 Cheng A et al (2017) Crop improvement, pp 47–70. http://doi.org/10.1007/978-3-319-65079-1 Cheng A et al (2019) In search of alternative proteins: unlocking the potential of underutilized tropical legumes. Food Secur 1205–1215. http://doi.org/10. 1007/s12571-019-00977-0 Chiba T (2003) In: Caballero B, Paul F, Luiz T (eds) Encyclopedia of food sciences and nutrition, 2nd edn. Academic Press. http://doi.org/10.1016/b012-227055-x/01121-4 Chrispeels MJ, Raikhel NV (1991) Lectins, lectin genes, and their role in plant defense. Plant Cell 1–9. http:// doi.org/10.1105/tpc.3.1.1 Claydon A (1975) A review of the nutritional value of the winged bean Psophocarpus tetragonolobus (L.) DC. with special reference to Papua New Guinea. Sci New Guinea 103–114 D’Agostino N, Tripodi P (2017) NGS-based genotyping, high-throughput phenotyping and genome-wide association studies laid the foundations for next-generation breeding in horticultural crops. Diversity 9(3):1–20. https://doi.org/10.3390/d9030038 de Lumen BO, Salamat LA (1980) Trypsin inhibitor activity in winged bean (Psophocarpus Tetragonolobus) and the possible role of tannin. J Agric Food Chem 28(3):533–536. https://doi.org/10.1021/ jf60229a042 de Lumen BO, Fiad S, Fiad S (1982a) Tocopherols of winged bean (Psophocarpus tetragonolobus) oil. J Agric Food Chem 30(1):50–53. https://doi.org/10. 1021/jf00109a010 de Lumen BO, Gerpacio AL, Vohra P (1982b) Effects of winged bean (Psophocarpus tetragonolobus) meal on broiler performance. Poult Sci 61(6):1099–1106. https://doi.org/10.3382/ps.0611099 DeClerck YA, Imren S (1994) Protease inhibitors: role and potential therapeutic use in human cancer. Eur J Cancer (Oxford, England: 1990) 30A(14):2170–2180. http://doi.org/10.1016/0959-8049(94)00460-m

N. Tsoutsoura et al. Donald CM (1968) The breeding of crop ideotypes. Euphytica 17(3):385–403. https://doi.org/10.1007/ BF00056241 Dong JY, Qin LQ (2011) Soy isoflavones consumption and risk of breast cancer incidence or recurrence: a meta-analysis of prospective studies. Breast Cancer Res Treat 125(2):315–323. https://doi.org/10.1007/ s10549-010-1270-8 Eagleton GE (2019) Prospects for developing an early maturing variety of winged bean (Psophocarpus tetragonolobus) in Bogor, Indonesia. Biodiversitas 20(11):3142–3152. https://doi.org/10.13057/biodiv/ d201106 Ebert AW (2014) Potential of underutilized traditional vegetables and legume crops to contribute to food and nutritional security, income and more sustainable production systems. Sustainability (Switzerland) 6 (1):319–335. https://doi.org/10.3390/su6010319 Ekpenyong TE, Borchers RL (1980) Effect of cooking on the chemical composition of winged beans (Psophocarpus tetragonolobus). J Food Sci 45(6):1559–1560. https://doi.org/10.1111/j.1365-2621.1980.tb07562.x Ekpenyong TE, Borchers RL (1982) Amino acid profile of the seed and other parts of the winged bean. Food Chem 9(3):175–182. https://doi.org/10.1016/03088146(82)90095-4 Erskine W (1979) The exploitation of genetic diversity in the winged bean (Psophocarpus tetragonolobus (L. DC.) for grain yield. Cambridge University, Cambridge Erskine W (1980) Measurements of the cross-pollination of winged bean in Papua New Guinea. SABRAO J 12 (1):11–13. Available at: https://www.cabdirect.org/ cabdirect/abstract/19840214803. Accessed: 20 Feb 2021 Erskine W, Khan TN (1977) Inheritance of pigmentation and pod shape in winged bean. Euphytica 829–831. http://doi.org/10.1007/BF00021714 Erskine W, Khan TN (1980) Variation within and between land races of winged bean (Psophocarpus tetragonolobus (L.) DC.). Field Crops Res 3(C):359– 364. http://doi.org/10.1016/0378-4290(80)90041-6 FAO (2010) The second report on the state of the world’s animal genetic resources for food and agriculture. http://doi.org/10.4060/i4787e Fatihah HNN, Maxted N, Rico Arce L (2012) Cladistic analysis of Psophocarpus Neck. ex DC. (Leguminosae, Papilionoideae) based on morphological characters. S Afr J Bot 83:78–88. https://doi.org/10.1016/j. sajb.2012.07.010 Frolov AV et al (2014) Second roshydromet assessment report on climate change and its consequences in the russian federation. Available at: http://cc.voeikovmgo. ru/images/dokumenty/2016/od2/resume_ob_eng.pdf Gangurde SS et al (2020) Nested-association mapping (NAM)-based genetic dissection uncovers candidate genes for seed and pod weights in peanut (Arachis hypogaea). Plant Biotechnol J 18(6):1457–1471. http://doi.org/10.1111/pbi.13311

17

The Winged Bean Genome

Garcia VV, Palmer JK (1980) Proximate analysis of five varieties of winged beans, Psophocarpus tetragonolobus (L.) DC. Int J Food Sci Technol 15(5):469–476 Gilani GS, Cockell KA, Sepehr E (2005) Effects of antinutritional factors on protein digestibility and amino acid availability in foods. J AOAC Int 88 (3):967–987. https://doi.org/10.1093/jaoac/88.3.967 Gilani GS, Xiao CW, Cockell KA (2012) Impact of antinutritional factors in food proteins on the digestibility of protein and the bioavailability of amino acids and on protein quality. Br J Nutr 108 (Suppl. 2):S315–S332. https://doi.org/10.1017/ S0007114512002371 Giri AP et al (2003) Identification of potent inhibitors of Helicoverpa armigera gut proteinases from winged bean seeds. Phytochemistry 63(5):523–532. https:// doi.org/10.1016/S0031-9422(03)00181-X Global Biodiversity Information Facility, Psophocarpus tetragonolobus (L.) DC. (gbif.org). https://www.gbif. org/species/2944683 Gross R (1983) Composition and protein quality of winged bean (Psophocarpus tetragonolobus). Qualitas Plantarum Plant Foods Hum Nutr 32(2):117–124. https://doi.org/10.1007/BF01091332 Gurfinkel DM, Rao AV (2003) Soyasaponins: the relationship between chemical structure and colon anticarcinogenic activity. Nutr Cancer 47(1):24–33. http:// doi.org/10.1207/s15327914nc4701_3 Henry CJK, Donachie PA, Rivers JPW (1985) The winged bean. Will the wonder crop be another flop? Ecol Food Nutr 16(4):331–338. https://doi.org/10. 1080/03670244.1985.9990872 Herath HMW, Ormrod DP (1979) Effects of temperature and photoperiod on winged beans [Psophocarpus tetragonolobus (L.) D.C.]. Ann Bot 43(6):729–736. https://doi.org/10.1093/oxfordjournals.aob.a085686 Hertel TW (2016) Food security under climate change. Nature Climate Change. https://doi.org/10.1038/ nclimate2834 Heuzé V, Tran G, Kaushik S (2020) Soybean meal. Feedipedia. Available at: https://feedipedia.org/node/ 674. Accessed: 5 May 2021 Higuchi M, Suga M, Iwai K (1983) Participation of lectin in biological effects of raw winged bean seeds on rats. Agric Biol Chem 47(8):1879–1886. https://doi.org/10. 1080/00021369.1983.10865872 Higuchi M, Fukumoto Y, Iwai K (1988) Appearance of lectin in winged bean pods during seed development after flowering. J Agric Food Chem 36(3):534–536. Available at: https://pubs.acs.org/sharingguidelines. Accessed: 23 Feb 2021 Ho WK, Tanzi A, Sang F, Tsoutsoura N, Shah N, Moore C, Wright V, Massawe F, Mayes S (2022) A genomic toolkit for ‘the soybean of the tropics’ – winged bean (Psophocarpus tetragonolobus). https:// doi.org/10.21203/rs.3.rs-1355353/v1 Huynh B-L, Ehlers JD, Huang BE, Muñoz-Amatriaín M, Lonardi S, Santos JRP, Ndeve A, Batieno BJ, Boukar O, Cisse N, Drabo I, Fatokun C, Kusi F, Agyare RY, Guo Y-N, Herniter I, Lo S,

333 Wanamaker SI, Xu S, Close TJ, Roberts PA (2018) A multi-parent advanced generation inter-cross (MAGIC) population for genetic analysis and improvement of cowpea (Vigna unguiculata L. Walp.). Plant J 93:1129–1142. https://doi.org/10. 1111/tpj.13827 International Board for Plant Genetic (IBPGR) (1982) Descriptors_winged_bean_revised.pdf. Rome, Italy Jayashree B et al (2006) A Database of simple sequence repeats from cereal and legume expressed sequence tags mined in silico: survey and evaluation. In Silico Biol 6(6):607–620 Jugran HM et al (1986) Gamma ray induced dwarf mutant of winged bean. J Nucl Agric Biol 15(3):175–178. Available at: https://inis.iaea.org/search/search.aspx? orig_q=RN:19050128. Accessed: 5 May 2021 Kadam SS (1984) Winged bean in human nutrition. C R C Crit Rev Food Sci Nutr 21(1):1–40. https://doi. org/10.1080/10408398409527395 Kadam SS et al (1982) Changes in chemical composition of winged bean (Psophocarpus tetragonolobus L.) during seed development. J Food Sci 47(6):2051–2053. https://doi.org/10.1111/j.1365-2621.1982.tb12943.x Kadam SS et al (1987) Effects of heat treatments of antinutritional factors and quality of proteins in winged bean. J Sci Food Agric 36:267–294 Kantha SS, Erdman JW (1984) The winged bean as an oil and protein source: a review. J Am Oil Chem Soc 515–525. http://doi.org/10.1007/BF02677021 Karikari SK (1979) The potential of the winged bean (Psophocarpus tetragonolobus (L). DC.) as a root crop. In: 5th international symposium on tropical root crops, pp 135–145 Kesavan V, Khan TN (1978) Induced mutations in winged bean. In: Seed protein improvement programme. FAO, Vienna (Austria). Joint FAO/IAEA Division of Atomic Energy in Food and Agriculture Engineering; Research Co-ordination Meeting of the Seed Protein Improvement Programme Engineering, 28 Mar 1977, 4 Baden (Austria). https://agris.fao.org/ agris-search/search.do?recordID=XF2016051592. Accessed: 5 May 2021 Khan TN (1976) Papua New Guinea: a centre of genetic diversity in winged bean. Euphytica 25(1):693–705 Khan TN (1978) Proceedings of international winged bean workshop. Los Banos, Laguna, Philippines Khor H-T, Tan N-H, Wong K-C (1982) The protein, trypsin inhibitor and lipid of the winged bean [Psophocarpus tetragonolobus (L.) DC] seeds. J Sci Food Agric 33(10):996–1000. https://doi.org/10.1002/ jsfa.2740331009 Khoury CK et al (2014) Increasing homogeneity in global food supplies and the implications for food security. Proc Nat Acad Sci USA 111(11):4001–4006. http:// doi.org/10.1073/pnas.1313490111 King RD, Puwastein P (1987) Effects of germination on the proximate composition and nutritional quality of winged bean (Psophocarpus tetragonolobus) seeds. J Food Sci 52(1):106–108. https://doi.org/10.1111/j. 1365-2621.1987.tb13982.x

334 Klu GYP, Jacobsen E, Van Harten AM (1997) Induced mutations in winged bean (Psophocarpus tetragonolobus L. DC) with low tannin content. Euphytica 98 (1–2):99–107. https://doi.org/10.1023/a:1003032408 885 Kortt AA (1979) Isolation and characterization of the trypsin inhibitors from winged bean seed (Psophocarpus tetragonolobus (L) Dc.). BBA Protein Struct 577 (2):371–382. https://doi.org/10.1016/0005-2795(79) 90040-0 Kortt AA (1980) Isolation and properties of a chymotrypsin inhibitor from winged bean seed (Psophocarpus tetragonolobus (L) Dc.). Biochim Biophys Acta 624(1):237–248. http://doi.org/10.1016/00052795(80)90243-3 Kortt AA (1981) Specificity and stability of the chymotrypsin inhibitor from winged bean seed (Psophocarpus tetragonolobus (L) Dc.). BBA Enzymol. http:// doi.org/10.1016/0005-2744(81)90145-5 Kortt AA (1983) Comparative studies on the storage proteins and anti-nutritional factors from seeds of Psophocarpus tetragonolobus (L.) DC from five South-East Asian countries. Qualitas Plantarum Plant Foods Hum Nutr 33(1):29–40. http://doi.org/10.1007/ BF01093735 Kortt AA, Caldwell JB (1984) Characteristics of the proteins of the tubers of winged bean (Psophocarpus tetragonolobus (L.) DC). J Sci Food Agric 35(3):304– 313. http://doi.org/10.1002/jsfa.2740350310 Kwan Lam S, Bun Ng T (2011) Lectins: production and practical applications. Appl Microbiol Biotechnol 89:45–55. https://doi.org/10.1007/s00253-010-2892-9 Lagarda-Diaz I, Guzman-Partida AM, Vazquez-Moreno L (2017) Molecular sciences legume lectins: proteins with diverse applications. Int J Mol Sci 2–18. http:// doi.org/10.3390/ijms18061242 Lepcha P et al (2017) A review on current status and future prospects of winged bean (Psophocarpus tetragonolobus) in tropical agriculture. Plant Foods Hum Nutr 225–235. http://doi.org/10.1007/s11130017-0627-0 Levy, Hymowitz (1978) The winged bean. In: 1st International symposium on developing the potentials of the winged bean. Philippine Council for Agriculture and Resources Research, Manila, Philippines Lis H, Sharon N (1986) Lectins as molecules and as tools. Ann Rev Biochem 55:35–67. Available at: www. annualreviews.org. Accessed: 1 May 2020 Massawe F, Mayes S, Cheng A (2016) Crop diversity: an unexploited treasure trove for food security. Trends Plant Sci 21(5):365–368. https://doi.org/10.1016/j. tplants.2016.02.006 Maxted N (1990) A phenetic investigation of Psophocarpus neck. Bot J Linnean Soc 102:103–122 Mayes S et al (2012) The potential for underutilized crops to improve security of food production. J Exp Bot 63 (3):1075–1079. https://doi.org/10.1093/jxb/err396 Messadi DV et al (1986) Inhibition of oral carcinogenesis by a protease inhibitor. J Nat Cancer Inst 76(3):447–

N. Tsoutsoura et al. 452. Available at: http://www.ncbi.nlm.nih.gov/ pubmed/3081747. Accessed: 7 April 2020 Mnembuka BV, Eggum BO (1995) Comparative nutritive value of winged bean (Psophocarpus tetragonolobus (L) DC) and other legumes grown in Tanzania. Plant Foods Hum Nutr 47(4):333–339. https://doi.org/10. 1007/BF01088271 Mohanty CS et al (2013) Characterization of winged bean (Psophocarpus tetragonolobus (L.) DC.) based on molecular, chemical and physiological parameters. Am J Mol Biol 03(04):187–197. http://doi.org/10. 4236/ajmb.2013.34025 Mohanty CS et al (2014a) Physicochemical analysis of Psophocarpus tetragonolobus (L.) DC seeds with fatty acids and total lipids compositions. J Food Sci Technol 52(6):3660–3670. http://doi.org/10.1007/ s13197-014-1436-1 Mohanty SC et al (2014b) Characterization of winged bean (Psophocarpus tetragonolobus (L.) DC.) based on molecular, chemical and physiological parameters. Am J Mol Biol 3:187–197. https://doi.org/10.4236/ ajmb.2013.34025 Mohanty SC et al (2019) Estimation of genetic diversity, structure and trait association of winged bean (Psophocarpus tetragonolobus (L.) DC.), genotypes through AFLP and ITS markers. Indian J Biotechnol 18:235–245. Available at: www. biodiversityinternational.org. Accessed: 25 Mar 2020 Mohanty CS, Singh V, Chapman MA (2020) Winged bean: an underutilized tropical legume on the path of improvement, to help mitigate food and nutrition security. Sci Hortic 108789. http://doi.org/10.1016/j. scienta.2019.108789 Montesinos-López OA, Montesinos-López A, PérezRodríguez P et al (2021) A review of deep learning applications for genomic selection. BMC Genomics 22:19. https://doi.org/10.1186/s12864-020-07319-x Muzquiz M et al (2012) Bioactive compounds in legumes: pronutritive and antinutritive actions. Implications Nutr Health Phytochem Rev 11(2–3):227–244. https://doi.org/10.1007/s11101-012-9233-9 Nasi A, Picariello G, Ferranti P (2009) Proteomic approaches to study structure, functions and toxicity of legume seeds lectins. Perspectives for the assessment of food quality and safety. J Proteomics 72 (3):527–538. http://doi.org/10.1016/j.jprot.2009.02. 001 National Research Council (1981) Winged bean: a highprotein crop for the tropics. The National Academies Press, Washington, DC. http://doi.org/10.17226/ 19754 NRC (1975) The winged bean: a high-protein crop for the tropics: report of an Ad Hoc Panel of the Advisory Committee on Technology Innovation, Board on Science and Technology for International Development, Commission on International Relations, National Academy of Sciences Ochiai Yanagi S (1983) Properties of winged bean (Psophocarpus tetragonolobus) protein in comparison

17

The Winged Bean Genome

with soybean (Glycine max) and common bean (Phaseolus vulgaris) protein. Agric Biol Chem 47 (10):2273–2280. https://doi.org/10.1080/00021369. 1983.10865943 Ojuederie OB et al (2020) Genetic diversity assessment of winged bean (pso-phorcarpus tetragonolobus) accessions revealed by Inter-Simple Sequence Repeat (ISSR) markers. J Plant Biol Crop Res 3(1):1014. http://meddocsonline.org/ Okezie BO, Martin FW (1980) Chemical composition of dry seeds and fresh leaves of winged bean varieties grown in the U.S. and Puerto Rico. J Food Sci 45 (4):1045–1051. https://doi.org/10.1111/j.1365-2621. 1980.tb07509.x Patil G et al (2017) Molecular mapping and genomics of soybean seed protein: a review and perspective for the future. Theor Appl Genet 130:1975–1991. https://doi. org/10.1007/s00122-017-2955-8 Peumans WJ, Van Damme EJM (1995) Lectins as plant defense proteins. Plant Physiol 995:347–352. Available at: www.plantphysiol.org. Accessed: 1 May 2020 Plant For A Future, Psophocarpus tetragonolobus Winged Bean PFAF Plant Database. https://pfaf.org/user/Plant. aspx?LatinName=Psophocarpus+tetragonolobus Policy Brief Changing Policy Concepts of Food Security (2006) Available at: http://www.foodsecinfoaction. org/ Poole MM (1978) Pollen morphology of Psophocarpus (Leguminosae) in relation to its taxonomy. Kew Bull 34(2):211–220 Porter JR et al (2015) Food security and food production systems. In: Climate change 2014 impacts, adaptation and vulnerability: Part A: Global and sectoral aspects. https://doi.org/10.1017/CBO9781107415379.012 Prakash D, Misra PN, Misra PS (1987) Amino acid profile of winged bean (Psophocarpus tetragonolobus (L.) DC): a rich source of vegetable protein. Plant Foods Hum Nutr 37:261–264 Prohens J et al (2017) Introgressiomics: a new approach for using crop wild relatives in breeding for adaptation to climate change. Euphytica 213(7):158. https://doi. org/10.1007/s10681-017-1938-9 Quan M et al (2011) Identification and analysis of a new dwarf mutant of winged bean. Available at: https://en. cnki.com.cn/Article_en/CJFDTotal-ZGSC201108011. htm. Accessed: 5 May 2021 Raai MN et al (2020) Effects of shading on the growth, development and yield of winged bean (Psophocarpus tetragonolobus). Cienc Rural 50(2):1–7. https://doi. org/10.1590/0103-8478cr20190570 Rachie, Luse (1978) The winged bean. In: 1st International symposium on developing the potentials of the winged bean. Philippine Council for Agriculture and Resources Research, Manila, Philippines Rahman MM et al (2014) Agronomic and nitrogen recovery efficiency of rice under tropical conditions as affected by nitrogen fertilizer and legume crop rotation. J Animal Plant Sci 24(3):891–896 Ray DK et al (2019) Climate change has likely already affected global food production. PLoS ONE 14

335 (5):1–18. https://doi.org/10.1371/journal.pone.0217 148 Reddy PP, Reddy PP (2015) Winged bean, psophocarpus tetragonolobus. In: Plant protection in tropical root and tuber crops. Springer, India, pp 293–303. https:// doi.org/10.1007/978-81-322-2389-4_10 Salunkhe DK, Chavan JK (1990) Dietary tannins: consequences and remedies. Available at: https://books. google.co.uk/books?printsec=frontcover&vid= ISBN0849368111&redir_esc=y#v=onepage&q= condensedtannin&f=false. Accessed: 15 June 2020 Samtiya M, Aluko RE, Dhewa T (2020) Plant food antinutritional factors and their reduction strategies: an overview. Food Prod Process Nutr 2(1):1–14. https:// doi.org/10.1186/s43014-020-0020-5 Sánchez-Chino X et al (2015) Nutrient and nonnutrient components of legumes, and its chemopreventive activity: a review. Nutr Cancer 67(3):401–410. https:// doi.org/10.1080/01635581.2015.1004729 Serrano J et al (2009) Tannins: current knowledge of food sources, intake, bioavailability and biological effects. Mol Nutr Food Res 53(Suppl. 2):S310–S329. https:// doi.org/10.1002/mnfr.200900039 Sharon N, Lis H (1990) Legume lectins—a large family of homologous proteins. FASEB J 4(14):3198–3208. https://doi.org/10.1096/fasebj.4.14.2227211 Sharon N, Lis H (2004) History of lectins: from hemagglutinins to biological recognition molecules. Glycobiology 14(11):53–62. https://doi.org/10.1093/ glycob/cwh122 Shibata H et al (1986) Purification and characterization of proteinase inhibitors from winged bean (Psophocarpus tetragonolobus (L.) DC.) seeds. J Biochem 99 (4):1147–1155. https://doi.org/10.1093/ oxfordjournals.jbchem.a135578 Singh V et al (2017) De novo sequencing and comparative analysis of leaf transcriptomes of diverse condensed tannin-containing lines of underutilized Psophocarpus tetragonolobus (L.) DC. Sci Rep 7:1– 13. https://doi.org/10.1038/srep44733 Singh A et al (2019) Domesticating the undomesticated for global food and nutritional security: four steps. Agronomy 9(9):491. https://doi.org/10.3390/ agronomy9090491 Smartt J (1980) Some observations on the origin and evolution of the winged bean (Psophocarpus tetragonolobus). Euphytica 29(1):121–123. https://doi.org/ 10.1007/BF00037256 Smulikowska S et al (2001) Tannin content affects negatively nutritive value of pea for monogastrics. J Anim Feed Sci 10(3):511–523. https://doi.org/10. 22358/jafs/68004/2001 Srinivas R et al (2004) Identification of factors responsible for insecticide resistance in Helicoverpa armigera. Comp Biochem Physiol C Toxicol Pharmacol 137 (3):261–269. http://doi.org/10.1016/j.cca.2004.02.002 Tan N-H, Wong K-C, de Lumen BO (1984) Relationship of tannin levels and trypsin inhibitor activity with the in vitro protein digestibilities of raw and heat-treated winged bean (Psophocarpus tetragonolobus). J Agric

336 Food Chem 32:819–822. Available at: https://pubs. acs.org/sharingguidelines Tanzi AS, Ho WK et al (2019a) Development and interaction between plant architecture and yieldrelated traits in winged bean (Psophocarpus tetragonolobus (L.) DC.). Euphytica 215(2). http://doi.org/ 10.1007/s10681-019-2359-8 Tanzi AS, Eagleton GE et al (2019b) Winged bean (Psophocarpus tetragonolobus (L.) DC.) for food and nutritional security: synthesis of past research and future direction. Planta 911–931. http://doi.org/10. 1007/s00425-019-03141-2 Telang MA et al (2008) Winged bean chymotrypsin inhibitors retard growth of Helicoverpa armigera. 431:80–85. http://doi.org/10.1016/j.gene.2008.10.026 US Department of Agriculture (2019) A. R. S. 2019. N. D. L. U. F. C. (FDC). FoodData Central. Available at: https://fdc.nal.usda.gov/fdc-app.html#/food-details/ 172436/nutrients. Accessed: 31 Jan 2020 Vatanparast M et al (2016) Transcriptome sequencing and marker development in winged bean (Psophocarpus tetragonolobus; Leguminosae). Sci Rep 6:1–14. https://doi.org/10.1038/srep29070 Verdcourt B, Halliday P (1978) A revision of Psophocarpus (Leguminosae-Papilionoideae-Phaseoleae). Kew Bull 33(2):191. https://doi.org/10.2307/4109575 Voss-Fels KP, Stahl A, Hickey LT (2019) Q&A: modern crop breeding for future food security. BMC Biol. http://doi.org/10.1186/s12915-019-0638-4 Wan Mohtar WAAQI et al (2014) Preparation of bioactive peptides with high angiotensin converting enzyme inhibitory activity from winged bean [Psophocarpus tetragonolobus (L.) DC.] seed. J Food Sci Technol 51(12):3658–3668. https://doi. org/10.1007/s13197-012-0919-1 Wati RK et al (2010) Trypsin inhibitor from 3 legume seeds: fractionation and proteolytic inhibition study.

N. Tsoutsoura et al. J Food Sci 75(3):C223–C228. https://doi.org/10.1111/ j.1750-3841.2010.01515.x Wong QN, Massawe F, Mayes S (2015) Improving winged bean (Psophocarpus tetragonolobus) productivity: an analysis of the determinants of productivity. Acta Hort 1102:83–88. https://doi.org/10.17660/ ActaHortic.2015.1102.9 Wong QN et al (2017) Development of gene-based SSR markers in winged bean (Psophocarpus tetragonolobus (L.) DC.) for diversity assessment. Genes 8(3):1– 12. https://doi.org/10.3390/genes8030100 Worthington RE, Hammons RO, Allison JR (1972) Varietal differences and seasonal effects on fatty acid composition and stability of oil from 82 peanut genotypes. J Agric Food Chem 20(3):729–730. https://doi.org/10.1021/jf60181a032 Yang J, Tan H (2011) Study on winged bean milk. In: ICAE 2011 Proceedings: 2011 International conference on new technology of agricultural engineering, pp 814–817. https://doi.org/10.1109/ICAE.2011. 5943916 Yang S, Grall A, Chapman MA (2018) Origin and diversification of winged bean (Psophocarpus tetragonolobus (L.) DC.), a multipurpose underutilized legume. Am J Bot 105(5):888–897. https://doi.org/10. 1002/ajb2.1093 Yoneyama T et al (1986) Variation in natural abundance of 15N among plant parts and in 15N/14N fractionation during N2 fixation in the legume-rhizobia symbiotic system. Plant Cell Physiol 27(5):791–799. Available at: https://academic.oup.com/pcp/article/27/ 5/791/1899683. Accessed: 23 Feb 2021 Zhao C et al (2017) Temperature increase reduces global yields of major crops in four independent estimates. Proc Nat Acad Sci USA 114(35):9326–9331. https:// doi.org/10.1073/pnas.1701762114

Castor Bean: Recent Progress in Understanding the Genome of This Underutilized Crop

18

Sammy Muraguri and Aizhong Liu

Abstract

Castor bean (Ricinus communis L.) remains underutilized worldwide due to limited plantable varieties, deficiency in genetically improved seeds and plantation restrictions in some countries due to the presence of a toxic protein in the seed. However, the availability of castor genome has facilitated tremendous growth in castor genomic research. The genome has provided a framework for gene discovery using orthologs not only in castor bean but also in other members of Euphorbiaceae family. Through genome-wide association study candidate genes for agronomic traits have also been identified for trait improvement. In addition, genome resequencing at the population level is providing important resources for comparative genomics studies between wild and domesticated castor varieties to detect variants crucial in unravel-

S. Muraguri Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China A. Liu (&) Southwest Forestry University, Kunming, China e-mail: [email protected]

ling the genetic basis of key agronomic traits and domestication history. Usually, variants are linked to specific phenotypes of individuals selected. In addition, castor bean genome resource will be the backbone to more research on how physiological molecular processes are controlled to shape phenotypic plasticity of castor bean, serving genetic improvement and breeding towards from underutilized to utilized crop.

18.1

Introduction

18.1.1 Botanical Description Castor bean (Ricinus communis L.), a member of the Euphorbiaceae family, is a non-edible oilseed crop. Despite its name, it is actually not a true bean or legume. Taxonomically, although it is phenotypically highly variable, castor bean is considered to be monotypic. Castor bean can either be a perennial shrub (Fig. 18.1a) grown in the tropical zones or an annual herb (Fig. 18.1b) cultivated in temperate zones since it is not able to survive the cold winter. Typically, the wild castor bean is a perennial tree reaching to 7–8 m, and the height of cultivated castor bean often attains 1.5–3 m depending on the environment. With adequate amount of moisture in fertile soils, cultivated castor plants may reach a height of 3–4 m but attain a height of less 1 m in poor soils with low moisture content (Naik 2019).

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_18

337

338

S. Muraguri and A. Liu

Fig. 18.1 a A wild perennial castor bean tree growing in Ethiopia, b cultivated annual castor bean crop growing in field

18.1.2 Morphological Features Stem: The castor stem is cylindrical, glabrous, often brittle, and green or red in colour (Fig. 18.2). The stem is also multi-branched, with sequence of primary branches giving rise to secondary branches through the life of the plant. The stem is solid in perennial giant tree types but generally hollow as age progress in dwarf types. Well-developed nodes can be seen on the stem from which a leaf arises, with internodes shortening towards the base of the stem. At the 6 to 10th node, a stem ends in inflorescence in dwarf early cultivars, at the 8 to 16th node in latermaturing cultivars, while in perennial tall trees at the 40th or more node. During the development of the pinnacle, 2–3 sympodial branches grow from each node below it, terminating in inflorescences. The amount of branching changes significantly (Purseglove 1968), depending on environmental factors (Kulkarni and Ramanamurthy 1977).

Leaf: The leaves have a long petiole, are stipulate, palmate with five to eleven lobes with conspicuous veins on the surface (Fig. 18.3). They are also glossy and alternate except for the two opposite leaves at the node above cotyledons. The leaf colour varies from light green to dark red depending on the level of anthocyanin pigmentation present (Weiss 2000). The leaf size varies in different cultivars with some cultivars having large leaves while others bearing smaller leaves. The leaf length is approximately 15– 45 cm. The leaf petiole is round, greenish to reddish with a length of about 8–50 cm. The stipules measure about 1–3 cm in length and are broad, conspicuous and deciduous. The leaf blade is orbicular with a diameter spanning between 10 and 75 cm in diameter and has 5–11 partite lobes (segments). The leaf growth and expansion are not affected by lengthy exposure to sunlight provided there is enough moisture for transpiration. However, water scarcity over time affects leaf growth and expansion (Weiss 2000).

18

Castor Bean: Recent Progress in Understanding the Genome …

339

Fig. 18.2 Castor bean stems of different colours; green (left) and red (right) Fig. 18.3 Shape of castor bean leaf

Root: Castor has a well-established taproot with protruding laterals that form mat-like structures (Purseglove 1968). The taproot extends up to 5 m below the soil and appears somewhat like a stem extension below the

ground. The secondary roots measure about 90– 120 cm long and do not sink more than 75 cm in the soil since they spread parallel to the ground. The plant has short tertiary roots 30–45 cm long. Root hairs are absent. During early growth,

340

annuals have rapid root development rate, whereas in perennials it is slow. Perennials exhibit longer lateral roots with more penetration than annuals (Kulkarni and Ramanamurthy 1977). Inflorescence: The main shoot and branches usually end in inflorescences. The inflorescence is upright, 10–40 cm in length, terminating in a panicle of cymes (Fig. 18.4). The panicle first emerges as a bud. The time takes such a bud to be visible depends on the cultivar type. The bud takes 8–15 days to develop into a panicle. The flowers are mostly unisexual and monoecious with the pistillate and staminate flowers emerging on the same inflorescence. The staminate flowers occur on the lower part (base) 3–16 flowered cymes while the pistillate flowers occur on the upper part in 1–7 flowered cymes (Fig. 18.4), taking up about 50–70% and 30– 50%, respectively (Cobley 1956; Purseglove 1968). Flower: The male flower has a pedicel measuring 0.5–1.5 cm long with no petals. It has 3–5 spreading sepals, about 5–7 mm in length which

Fig. 18.4 Inflorescence in castor bean. The female flowers are positioned on the upper part, while the male flowers are located on the lower part

S. Muraguri and A. Liu

is green in colour and membranous. There are five stamens consisting of well spread-out filaments and anthers that are tiny, round and pale yellow in colour. The female flower also has a pedicel but longer; 4–5 mm long with no petals. The sepals are 3–5, about 3–5 mm in length that burst irregularly, and are also green in colour. The ovary is large, enclosed with soft spines that end in a bristle that separates as fruit forms. It is trilocular with one large ovule in each locule and has 3 short styles and 3 stigmas (Cobley 1956; Purseglove 1968). Fruit: The fruit is a globular capsule that is spiny and becomes brittle and hard when ripe. The fruit is schizocarp with a diameter of about 1.5–2.5 cm with 3 lobes. In wild castor and some cultivars, the lobes rupture open into separate parts at maturity and then explosively break away, dispersing seeds. Modern cultivars on the other hand are indehiscent with seeds present in regma for many weeks. Some castor accessions may also have capsules with rudimentary, soft and flexible spines. The fruit is green when unripe, but red in some cultivars and changes to brown when ripe (Fig. 18.5). Panicle shape may be cylindrical conical, or oval with different fruit arrangements, that is, loose, semi-compact, compact (Salihu et al. 2014). Seed: Seeds may have different shapes: oval, elongated or square. It has a tick-like appearance, compressed at the back, shiny with mottling. The seeds maybe have a brown, dark brownish-red, dark chocolate, red or black colour but usually a mixture of colours occur as very attractive mottle on the testa (Fig. 18.6). The seed has a small and brittle testa encapsulating a white kernel. The seeds measure about 0.5–1.5 cm long and a weight of 9–100 mg. The seed coat differentiates into testa and tegmen where the testa is brittle and takes up 20% of the seed weight. The tegmen is thin and covered by testa. The endosperm is large while the embryo is small. The cotyledons are also thin but poisonous (Purseglove 1968; Salihu et al. 2014). Pollination: A large number of castor plants self-pollinate due to floral structure and positioning. For example, centrally located flowers of

18

Castor Bean: Recent Progress in Understanding the Genome …

341

Fig. 18.5 Mature fruits of castor bean

Fig. 18.6 Castor bean seeds of different colour and appearance

the panicle are 30–70% self- pollinated. However, laterally located flowers of the panicle are cross-pollinated (Hegde and Lavanya 2015). Therefore, castor bean possesses a mixed mating system. Environmental conditions and genotypes determine the level of cross-pollination in castor bean (Milani and Nobrega 2013). Naturally, 5– 14% outcrossing takes place in castor bean and is thought as high in crop improvement programme; thus, castor bean is a highly crosspollinated crop (Sundararaj and Thulasidas 1976). Although castor bean is mainly referred to as a wind-pollinated plant, its flowers open and glands present in young leaves attached to the sympodial branches under the inflorescence produce large amounts of nectar. It therefore suggests that probably insects are partly involved in pollination (Purseglove 1968). In large plantations of the crop, pollen can be dispersed 2.5– 3.0 km away from the originating plant (Hegde and Lavanya 2015).

18.1.3 Geographic Distribution Castor bean is believed to be native to Eastern Africa but has become naturalized across the African continent, from West Africa to the Northern Africa, all the way to South Africa and the islands found in the Indian Ocean. A different school of thought suggested that castor bean has polyphyletic origins in Eastern Africa, India, and Arabian Peninsula, Iran-Afghanistan region and Palestine (Moshkin 1986). Nevertheless, castor bean was cultivated in ancient Egypt as early as 4000 BC and documented in China during the Tang dynasty era between 618 and 906 AD. Immediately after Columbus, castor bean was introduced into the New World (Purseglove 1968). The oldest documentation of castor bean in the North America was in 1818 in Illinois (Weibel 1948). It is now widely cultivated in the tropical and subtropical areas of Asia and America as well as many temperate regions of Europe (Govaerts 2014).

342

S. Muraguri and A. Liu

Based on high-throughput SNP genotyping, worldwide collections of castor bean accessions from over 45 countries showed a mixture of genotypes and lack of phylogeographic structure (Foster et al. 2010), confirming previous nongeographical based patterns of genetic relatedness using amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSRs) (Allan et al. 2008). A broad geographic distribution of chloroplast genotypes was also recorded by Rivarola et al. (2011). Although a recent study by Fan et al. (2019) found an established geographic pattern in Chinese castor accessions clustered into northern, middle and southern groups, the accessions were not a representation of a worldwide collection. According to Qiu et al. (2010), it is acceptable to note that castor bean landraces acquired from South or North America today were probably introduced from Africa or West Asia in early society through anthropogenic activities. It therefore may be difficult to determine the phylogeographical distribution of castor bean at molecular level since most castor bean accessions were introduced across many continents through human activities.

18.1.4 Accessions in Seedbanks Accessions refer to sample of seeds or germplasm uniquely identifying a cultivar or population and are preserved in repositories (seed/ germplasm bank) for conservation and use. Most

castor bean storage banks are spread out in a few countries that cumulatively hold about 17,995 accessions (Table 18.1). The earliest castor collection can be traced back to the former USSR between the years 1773 and 1976. Russia’s Vavilov Institute of Plant Industry (VIR), Research Institute of Oil Crops (VNIIMK) and Botanical Institute of the Academy of Science of the USSR (BIN) were the oldest institutes involved in initial collection of castor germplasm in the nineteenth century. BIN and VIR now hold more than 400 samples of castor germplasm collected worldwide. India boasts of having the largest collection of 4307 accessions at the Germplasm Maintenance Unit in the Directorate of Oilseed Research of which 365 are exotic collections from 39 countries (Anjani 2012) while the USDA, ARS, Plant Genetic Resources Conservation Unit at Griffin, GA, contains the most geographically diverse collection of castor germplasm from 51 countries. In Ethiopia, a country believed to be the centre of origin of castor bean, very few collections are recorded despite the widespread indigenous diversity of castor in the wild. Altogether, there are no combined records explaining the precise status of castor germplasm collections conserved in the world. According to Severino et al. (2012), 6588 castor accessions are stored in different genebanks in the world. On the other hand, FAO (2010) maintains that there are 17,995 castor germplasms conserved globally but may contain duplicates; therefore, assessing the

Table 18.1 Major castor bean germplasm collections in seedbanks across the globe (FAO 2010) Country

Repository

Number of accessions

India

National Bureau of Plant Genetic Resources (NBPGR)

4307

China

Institute of Crop Germplasm Resources, Chinese Academy of Agricultural Sciences (ICGR-CAAS)

2111

USA

United States Department of Agriculture-Agricultural Research Service

1390

Brazil

Centro Nacional de Pesquisa de Algodao (CNPA)

1000

Russia

Vavilov Institute of Plant Industry (VIR)

Ethiopia

Institute of Biodiversity Conservation (IBC)

Others

52 institutes

696 510 8699

18

Castor Bean: Recent Progress in Understanding the Genome …

number of unique accessions worldwide is a tall order. Even with the available germplasm resources, castor bean has been characterized poorly and untapped for genetic improvement (Anjani 2012). The utilization of the available global resources by the international community can however be improved if there is coordinated characterization of accessions, unified records on available resources, free access to information in banks, and standardized collection protocols among repositories. Such improvements would permit accurate estimation of the genetic diversity in collections without the flux in accessions between countries.

18.1.5 Usage Owing to its rich hydroxylated fatty acids that make high viscosity, low freezing and melting point, castor oils have many industrial uses (Severino et al. 2012). Through acyloxy castor polyol esters synthesis, castor oil has been used to produce low-point lubricant base stocks (Kamalakar et al. 2015). The low pour point property aids to bring full lubrication when starting a machine and is friendly to handle during cold temperature (Heinz 2009). In addition to its utility in car engine lubricant, an altered form of castor oil lubricant made of 100 parts of castor oil and 20–110 parts of a stable, low viscosity blending fluid, soluble in castor oil exhibited its lubricity property for refrigerator systems (Gainer and Luck 1979). Moreover, in the past, castor oil was also used as a vehicle brake fluid (Rudnick 2013). Renewable monomers and polymers are synthesized using castor oil and its derivatives (Mutlu and Meier 2010). Castor oil has been polymerized and cross-linked with sulphur or diisocyanates to make vulcanized and urethane derivatives, respectively. Furthermore, castor oilbased polyurethane (PU) and epoxy have been applied to make full-interpenetrating polymer networks (IPNs) through the sequential mode synthesis (Raymond and Bui 1998). IPN is a

343

unique polymer type which has a blend of two polymers where one is produced or polymerized in the presence of another (Ajithkumar et al. 2000). Therefore, IPN formulation is an important method for developing products with outstanding physicomechanical properties better than the usual polyblends. An example of castor oil polymer (COP) application is in making rootend filling material with sealing ability in the dental medicine field. Castor oil has been applied in making soap and waxes (Budai et al. 2012). Dwivedi and Sapre (2002) also produced total vegetable oil grease using castor oil. Total vegetable oil greases refer to those in which the gellant and lubricant are made from vegetable oil. Coatings and paints are also made from castor oil. Dehydration of castor oil through nonconjugated oil– maleic anhydride adducts can create paint or furniture oil applications (Grummitt and Marsh 1953). Conversion of the hydroxyl functionalities of castor oil to b-ketoesters utilizing t-butyl acetoacetate creates coatings. Cutting-edge surface coating materials are also produced from castor oil based on hyperbranched polyurethanes (HBPUs), an extremely branched macromolecule (Thakur and Karak 2013). HBPs show good performance as surface coating materials when used with monoglyceride-based HBPU, showing more tensile strength than direct oil-based coatings. In particular, the world is looking at the reduction of greenhouse gas emission by decreasing dependency on petroleum. Use of castor seeds as feedstock for biodiesel production is feasible since it is advantageous in its high energy content and lubricity (Berman et al. 2011). Castor contains approximately 85% ricinoleic acid (12-hydroxy oleic acid) which makes it soluble in alcohol. This means that it can easily be changed into biodiesel with no external heat unlike other vegetable oils. In addition, it has 5% more oxygen, low residual levels of carbon and phosphorus, high cetane number and lack of aromatic hydrocarbons (Scholz and Silva 2008) making it ideal for biodiesel production. The

344

many castor oil utilities have increased demand for castor bean in many countries, prompting the need to breed and genetically improve castor bean (Sujatha et al. 2008). Despite the myriad industrial purposes of castor oil, castor bean seeds contain an extremely toxic protein ricin which largely hampers its usage. Ricin is a water-soluble type 2 ribosomeinactivating protein (RIP) that hinders synthesis of protein by inactivating ribosomes. It is made up of two subunits connected by a disulphide bond: a ricin toxin A (RTA) chain that has the ribosome-inactivating activity, and a ricin toxin B (RTB) chain that has a galactose-binding lectin domain. RTB attaches to cell surface receptors to enable the movement of the toxin into the cell membrane. Once internalized by the cell, RTA works on the 60S ribosome to prevent elongation factor (EF)-1 and EF-2, blocking protein synthesis and causing cell death (Robertus 1991). This makes ricin one of the most lethal natural poisons when ingested orally, administered intravenously, or inhaled as fine particles. Such toxicity and the relative ease of production have made ricin a potential bioweapon. The infamous “Umbrella Murder” of a Bulgarian, Georgi Markov, on 7 September 1978 (Crompton and Gall 1980) is an example in which ricin has been weaponized. Since then, other incidences of bioterror activities using ricin have been reported (Roxas-Duncan and Smith 2011). For this reason, countries like the USA tend to limit production of castor oil. A better understanding of castor genomics will go a long way in aiding the development of less toxic varieties, as well as improve public safety by tracing the origins of samples used in potential bioterror attacks. However, castor bean is still an underutilized crop because of limited breeding practices in agriculture and restricted plantation in many countries. Firstly, plantable varieties are limited in agriculture though castor bean is cultivated in many regions; secondly, the genetic improvement of varieties is very limited though germplasm has been broadly collected in many agents; thirdly, plantation of castor bean is restricted in some countries because its seed contains a toxic protein.

S. Muraguri and A. Liu

18.2

Characterization of Castor Bean Genome

18.2.1 Sequencing and Characterization Whole-genome shotgun (WGS) sequencing: In the Euphorbiaceae family, consisting of economically important crops such as physic nut (Jatropha curcas), cassava (Manihot esculenta) and rubber tree (Hevea brasiliensis), castor bean was the first member to be sequenced using the whole-genome shotgun (WGS) sequencing strategy (Chan et al. 2010). Before the advent of next-generation sequencing platforms, WGS and clone-by-clone strategies were the traditional approaches for sequencing large complex genomes. WGS involved shearing (using restriction enzymes or mechanical shredding) of nuclear DNA and modification using restriction site adaptors. The products were then cloned into vectors (plasmids) that were directly sequenced. The sequencing chemistry employed was the dideoxy chain termination method (Sanger et al. 1977), that required mixture of single-strand DNA, DNA polymerase, four deoxynucleotide triphosphates (dNTPs) and a fluorescently labelled dideoxynucleotide triphosphates (ddNTP). The complementary strand is synthesized by DNA polymerase using dNTPs, but the enzyme ends the process once it recognizes a ddNTP (Metzker 2005). Based on overlaps, the product sequences are reassembled into contigs using computer algorithms that eventually generate the complete sequence. On the other hand, a cloneby-clone approach requires preparation of a large insert library such as bacterial artificial chromosome (BAC) (Shizuya et al. 1992) that classically held inserts of about 100–200 kb. Next was to generate a physical map to highlight partially overlapping BAC clones across the chromosome arms. This was achieved through fingerprinting the clones in the library by nicking each clone with endonuclease enzymes. The BAC clone fingerprints were then aligned to detect a minimal number of overlapping clones projected to encompass the whole genome (Soderlund et al.

18

Castor Bean: Recent Progress in Understanding the Genome …

1997). WGS bypasses the physical map step making the process faster and less expensive, encouraging its application. According to Chan et al. (2010), WGS generated a draft genome sequence of castor bean Hale cultivar spanning 350 Mb, confirming previous flow cytometry estimate of 320 Mb (Arumuganathan and Earle 1991). Plasmid and fosmid libraries generated over 2 million dideoxy termination reads, which were assembled using Celera assembler software (Myers et al. 2000) to create contigs that eventually linked to produce 25,800 scaffolds. The genome coverage was approximately 4.6. The largest scaffold covered 4.7 Mb while the number of contigs that joined up to make the scaffolds was 54,000 with 190 kb size as the longest contig. Considering contigs longer than 2 kb, the genome was composed of 3500 scaffolds that spanned 325 Mb. The N50 was 0.56 Mb, highlighting that 50% of the aligned bases were present in contigs of size 0.56 Mb or longer. GC content was about 32.5% of the total nucleotides in the genome. The average gene length is 2258 bp with average exon and intron length of 251 bp and 381 bp, respectively. More than 50% of the genome consist of repetitive DNA, where a third of these repeats were retrotransposons while less than 2% are DNA transposons. Among the retrotransposons, long terminal repeat elements were the most abundant of which 22.7% were Gypsy-type and 9.5% Copia-type (Chan 2019). Genome annotation of protein-coding genes was carried out using multiple gene-prediction softwares, sequence similarity search against rice and Arabidopsis protein databases as well as use of cDNA spliced-alignment tool programme to assemble spliced alignments (PASA) (Haas et al. 2003). In addition, 52,165 expressed sequence tags (ESTs) from five cDNA non-normalized libraries were generated to assist in the annotation process. Together with other castor bean cDNA sequences from GenBank, the ESTs were aligned to 5491 known genes and 688 genomic regions previously not known to harbour genes. This permitted the formation of other gene models. To generate consensus gene models arising from predicted genes and alignment of

345

cDNA, ESTs and proteins, the EvidenceModeler tool (Haas et al. 2008) was applied. 31,237 gene models were noted, of which 58.5% of the genes could be clustered into 3020 protein families applying the TIGR paralogous family pipeline. Table 18.2 summarizes the assembly and gene annotation of castor bean genome. PacBio sequencing: The PacBio platform uses SMART cells (Eid et al. 2009; Rhoads and Au 2015; Ardui et al. 2018), made up of hundreds of sequencing units (called zero-mode waveguide, ZMW). Each unit contains a single DNA polymerase molecule immobilized at the bottom, to which a single DNA molecule binds and starts replication by the incorporation of fluorescently labelled nucleotides. During incorporation, distinct light pulse signals are produced by each of the four nucleotides. A “movie” of light pulses is recorded for each ZMW, which are interpreted to obtain base calls. Because PacBio technology offers the ability to produce long and contiguous genomic sequences and deep sequencing provides improved errors rates, whole-genome sequencing and assembly have been attempted using exclusively PacBio data. The castor bean wild accession Rc039 collected from Ethiopia was recently sequenced using PacBio technology, generating a *36.5 Gb reads with 102-fold genome coverage. Combined with the Hi-C sequencing data (*49.2 Gb reads), a high-quality chromosome-scale genome (*336 Mb) with contig N50 of 11.59 Mb and scaffold N50 of 32.06 Mb was assembled (see Table 18.3 and Fig. 18.7; Xu et al. 2021). Approximately 97.4% (*328 Mb) of the sequencing data were anchored onto 10 pseudochromosomes that were further validated by a physical map (Fig. 18.7). Based on this newly assembled castor bean genome, approximately 53.9% of the wild castor bean genome is composed of repetitive elements (Table 18.3), comparable to that in inbred cultivar Hale (52.2% of genome) (Chan et al. 2010). The long terminal repeat (LTR) retrotransposons were the most abundant, making up 26.02% of the genome, with LTR/Gypsy elements making up more than half of the repetitive elements. In total, 25,826 protein-coding genes, 40,966

346 Table 18.2 Genome summary of the castor bean draft genome assembly and gene annotation (Chan et al. 2010)

Table 18.3 Genome summary of the wild castor bean with assembly and gene annotation (Xu et al. 2021)

S. Muraguri and A. Liu

Fold genome coverage

All scaffolds

Scaffolds longer than 2 kb

4.59

4.59

Number of scaffolds

25,828

3500

Total span

350.6 Mb

325.5 Mb

N50 (scaffolds)

496.5 kb

561.4 kb

Largest scaffold

4.7 Mb

4.7 Mb

Average scaffold length

14 kb

93 kb

Number of contigs

54,000

24,500

Largest contig

190 kb

190 kb

Average contig length

6 kb

13 kb

N50 (contigs)

21.1 kb

GC content

32.5%

Gene models

31,237

Gene density

11,220 bp/gene

Mean gene length

2258.6 bp

Mean coding sequence length

1004.2 bp

Longest gene

15,849 bp

Mean number of exons per gene

4.2

Mean exon length

251 bp

Longest exon

6590 bp

GC content in exons

44.5%

Mean intron length

381 bp

Longest intron

33,291 bp

GC content in introns

31.8%

Mean intergenic region length

6846 bp

Longest intergenic region

691,597 bp

GC content in intergenic regions

30.7%

Genome assembly Genome size

336 Mb

N50 of contig

11.59 Mb

N50 of scaffold

32.06 Mb

GC content

33.21%

Chromosome number

10

Genome completeness (complete BUSCOs)

98%ara>

Number of genes

25,826

Percentage of repetitive sequence

53.89%

Number of noncoding RNAs

3180

18

Castor Bean: Recent Progress in Understanding the Genome …

347

Fig. 18.7 Genomic physical map assembled onto 10 pseudochromosomes for the wild progenitor. Taken from Xu et al. (2021) under a Creative Commons license

transcripts and 3180 noncoding RNAs were annotated in this genome. Over 92% of the predicted genes showed homology to genes with known functional annotation in public database. Genome resequencing: Genome resequencing occurs when an organism’s genome is sequenced and assembly carried out using a prior reference genome as the template, with the objective of identifying variants between genomes of the same species. Comparison between sequenced genomes and reference yields an array of mutations specific to each individual sequenced in form of single nucleotide polymorphism (SNPs) and insertions/deletions (InDels). Such variants avail crucial insight into the genetic history of the individuals. Usually, this is linked to specific phenotypes of individuals selected. Nonetheless, large rearrangements such as inversions, translocations and large copy number variations can also be detected through resequencing. The Illumina platform is the most commonly used high-throughput technology for

resequencing due to its low cost and high accuracy in base calling. It is known as a short-read approach because it generates short reads of about 150–300 bp. The Illumina mechanism entails sequencing-by-synthesis reactions where single DNA molecules that attach to a flow cell’s solid surface are PCR amplified to produce distinct clusters, a process called bridge amplification. The cluster sequencing then takes place simultaneously through recurrent flow cycles of the four nucleotides with reversible dyeterminators. Illumina HiSeq attains a read length of 150 bp while the MiSeq attains 300 bp read length. Xu et al. (2019) resequenced and analysed genomic variation among three castor accessions to shed light on the genetic variation and potential targets of selection during domestication of castor bean. One wild type line from Kenya (WT001), one landrace from China (ZB107) and one inbred cultivar from China (ZB306) were used for the study. Relative to the reference genome (Hale cultivar), the cultivated

348

line ZB306 showed the lowest SNP density at about one SNP per 727 bp, while WT001 and ZB107 lines contained one SNP per 442 and 399 bp, respectively. Collectively, the SNP density was 3.72 per kb, greater than that reported in poplar at 2.6 per kb (Tuskan et al. 2006) and bamboo at 1.0 per kb (Peng et al. 2013). Further comparative analyses of the two cultivars (Hale and ZB306) to the wild line (WT001) revealed 722,358 and 933,549 homozygous SNPs, and 80,564 and 80,438 InDels, respectively. On the other hand, comparison between landrace ZB107 and the wild line only detected 197,732 homozygous SNPs and 23,421 InDels. Additionally, when compared to the Hale genome sequence, heterozygous SNP numbers in the wild (150,120) and landrace genomes (149,444) were considerably higher than in ZB306 (79,091). Xu et al. (2019) also studied these polymorphisms in castor bean and discovered 1776 CNVs in at least one of the lines studied relative to Hale. Out of these CNVs, 1129 were in the wild germplasm (WT001), 1065 in ZB107, and 699 in ZB306, with almost the same amount of CNVs gain and loss relative to cultivar Hale. Furthermore, 438 genes that could be functionally altered by the CNVs were discovered. Overall, the results of polymorphism analysis between accessions studied showed that the greatest share of polymorphism (21–44%) occurs between wild (WT001) and landrace (ZB107) accessions, an indication that wide genetic variation is unexploited in cultivated castor bean accessions. Genome resequencing of 405 accessions (including 26 wild accessions collected from East Africa and 379 cultivated accessions collected from China) was conducted by Fan et al. (2019). The low genetic diversity in cultivated line and distinct genetic differentiation were identified in this study. Genome resequencing of 505 accessions (including 56 wild accessions collected from Ethiopia population, 126 wild accessions collected from Kenya population and 323 domesticated accessions collected worldwide) was conducted by Xu et al. (2021). This study confirmed not only the genetic differentiation between wild and cultivated lines, but also

S. Muraguri and A. Liu

highlighted the genetic variation among wild accessions. In particular, this study demonstrated that the occurrence of domestication origin and dispersal of castor bean worldwide. Muraguri et al. (2020) resequenced plastome of wild and cultivated castor bean lines and identified 162 chloroplast SNPs, 92 InDels and inverted repeat (IR) contraction (structural variation) among accessions. The genetic variation and phylogenetic relationships from wild to cultivated accessions were further demonstrated in this study.

18.2.2 Castor Bean Genome Comparison to Other Crops Genome duplication occurs when extra copies of a genome are produced due to nondisjunction error in the meiosis process. Plant evolutionary history can be uncovered through genome duplication studies. According to Chan et al. (2010), triplicated regions were found in castor bean suggesting an ancient hexaploidization event. Comparative genomic studies showed that the diploid castor bean genome did not undergo additional whole-genome duplications (WGDs), but, in poplar and Arabidopsis thaliana genomes there were one and two additional duplication events, respectively. In cassava (2n = 36) and rubber tree (2n = 36) paleotetraploidy is shared probably due to the shared number of chromosomes in the two species (De Carvalho and Guerra 2002). Both species have numerous homologous gene pairs that diverged about 10 million years before speciation occurred between them (Bredeson et al. 2016). Castor bean genome size is relatively small compared to some other members of the Euphorbiaceae family. Case in point, rubber tree and cassava genomes have sizes of 1.34 Gb (Tang et al. 2016) and 432 Mb (Wang et al. 2014), respectively, which are larger than that of castor bean 350 Mb size (Chan et al. 2010). In contrast, physic nut, a non‐edible oilseed crop, has a smaller size of 320 Mb (Wu et al. 2015). When comparing genome assembly differences,

18

Castor Bean: Recent Progress in Understanding the Genome …

castor bean scaffold N50 length of 496.5 kb only surpasses that of cassava scaffold N50 length of 43 kbp. Rubber tree and physic nut have longer scaffold N50 length of 1.3 Mb and 746 kbp, respectively. During genome annotation 25,826 proteincoding genes were identified in castor bean and 27,619 genes were predicted in physic nut (Ha et al. 2019). However, in rice (389 Mb; International Rice Genome Sequencing Project and Sasaki 2005) a crop of similar genomic size to castor bean, 37,544 gene models were predicted; more than in castor bean. Based on gene models, castor bean genome shares orthologous gene groups with physic nut, cassava, rubber tree, black cottonwood (Populus trichocarpa) and flax (Linum usitatissimum) all belonging to the order Malpighiales. A Bayesian phylogenetic tree based on 42 orthologous genes demonstrated that castor bean diverged 54.2 mya, the earliest among the Euphorbiaceae family members, followed by physic nut (54.0 mya), and cassava and rubber tree (35.7 mya; Ha et al. 2019). Nonetheless, transcriptomic data showed that pathways such as fatty acid (FA) elongation, phospholipid signalling and triacylglycerol biosynthesis enriched in physic nut (Ha et al. 2019) are also enriched in castor bean (Chan et al. 2010), oil palm (Singh et al. 2013) and soybean (Li et al. 2014).

18.2.3 Gene Discovery in Castor Bean Using Orthologs from Other Species Orthologs are genes found in different species that occurred through speciation. They generally retain similar functions in different species (Tatusov et al. 1997). On the other hand, paralogs are genes in the same species arising due to gene duplication and may have different functions. At the molecular level, many features are largely shared across species. Many proteins and genes have been found to possess orthologs in distantly related species (Mushegian 2010). Therefore, the inference of orthologs plays a key role in gene discovery and predicting gene

349

function in newly annotated genomes. In castor bean, many gene families have been discovered through use of orthologs from different species. Below are some examples: Aquaporins (AQPs) are a group of vital membrane proteins that enable the passive movement of water and other molecules (ammonia glycerol, carbon dioxide, boric acid, urea, ammonia and hydrogen peroxide) across cell membranes (Gomes et al. 2009). According to Zou et al. (2015) homology analysis identified 37 castor bean AQP genes which had 30 orthologs in poplar, and only 29 out of these genes had orthologs in Arabidopsis. WRKY transcription factors are involved in regulating plant metabolism, growth, development and response to biotic and abiotic stress (Rushton et al. 2010). A search of WRKY genes from the castor bean genome against orthologs in physic nut and Arabidopsis identified 58 WRKY family genes across 41 scaffolds (Zou et al. 2016). This number of genes is comparable to 58 present in physic nut and 59 in grapevine (Xiong et al. 2013; Guo et al. 2014). However, Arabidopsis and poplar present a higher number of WRKY family members (72 and 105, respectively; Eulgem et al. 2000; He et al. 2012). bZIP transcription factors are broadly involved in regulating plant growth and development and response to abiotic stress. Jin et al. (2013) characterized genome-wide bZIP family and identified relatively less bZIP members in castor bean compared to other taxa in the Euphorbiaceae, such as Jatropha curcas (Wang et al. 2021a, b) and Manihot esculenta (Hu et al. 2015). Fewer bZIP transcription factors were considered to be related to the smaller genome size of castor bean. Papain-like cysteine proteases (PLCPs) are proteolytic enzymes involved in protein storage, recruitment, seed germination and stress response (Van der Hoorn 2008). 23 PLCP genes encoding for PLCP proteolytic enzymes have been identified in castor bean (Zou et al. 2018) using orthologs from Arabidopsis. New groups or subgroups absent in Arabidopsis were discovered in the RD21, CEP, XBCP3 and SAG12 subfamilies.

350

DNA binding with one finger (Dof) transcription factors, known due to the occurrence of the Dof domain, play different functions in gene regulation of numerous plant biological processes such as flowering control, development of vascular system, light-mediated control, flower abscission, germination, development of pollen, endosperm and seed as well carbon/nitrogen metabolism, and hormonal responses (Noguero et al. 2013; Gupta et al. 2015). Putative Dof genes were discovered in castor bean based on their orthologs in Arabidopsis and rice, whose functions have already been determined (Jin et al. 2014; Zou and Zhang 2019). The Nuclear FactorY genes are critical in regulation of seed development. Wang et al. (2018) identified and characterized the Nuclear Factor-Y family in castor bean and discussed their functions in regulation of castor bean seed development.

18.2.4 Identification of Candidate Genes for Agronomic Traits Identification of candidate genes for agronomic traits is a prerequisite for genetic improvement and breeding. Usually, QTL mapping and genome-wide association study (GWAS) are powerful approaches to detect the candidate genes for a given agronomic trait. In recent years, several QILs and candidate genes have been identified by QTL mapping or GWAS analyses. Seed size and weight directly determine production levels in oilseed crops. Based on a construct of RIL population and genome resequencing, 16 QTLs that control seed size and weight, covering 851 candidate genes, were identified (Yu et al. 2019). A gene (LOC8275756), a homolog of Arabidopsis MEDIATOR 3 responsible for controlling seed weight in castor bean was identified (Fan et al. 2019). Capsule dehiscence is a trait that significantly affects yield in castor bean. Based on a GWAS analysis, genes LOC8272207 and gene LOC8272215 related to dehiscence and endocarp thickness were identified (Fan et al. 2019). Castor bean is a typical woody oilseed crop and dwarfism is a critical way to breed varieties because dwarfed

S. Muraguri and A. Liu

varieties often not only bring higher yields, but also allow compact planting and convenient management in the field. Based on bulked segregant analyses of an F2 population generated from the crossing of a tall and a dwarf accession, 29 candidate genes tightly associated with plant height were identified. In particular, the candidate gene Rc5NG4-1 (29822.t000050) encoding a putative IAA transport protein localized in the tonoplast was functionally demonstrated as a target gene controlling castor height (Wang et al. 2021a, b). Panicle height is a yield-linked trait that co-associates with branch and seed numbers in castor bean. Fan et al. (2019) highlighted 11 potential candidate genes associated with panicle height. During the development of the apical meristem into a panicle, the axillary buds present in lateral organs of the stem quickly transform into new meristems to replace it and the process repeats itself. The candidate gene LOF2 (LOC8281822) could be linked to regulation of panicle height through the control of axillary bud development in castor bean (Fan et al. 2019). However, owing to a lack of efficient transform system for castor bean, these identified candidate genes tightly related to a given trait need to be functionally confirmed in future studies.

18.3

Future Goals and Prospects

The availability of draft genome sequence has opened up the possibility of multiple studies in castor bean and other species in the Euphorbiaceae (spurge) family. In particular, the genome is facilitating molecular breeding of castor bean with the aim to provide high oil or seedyielding varieties for commercial production. The genome resource is spearheading changes in castor breeding by aiding relationship studies between genotype and phenotype for quantitative trait loci traits (QTLs) analyses as well as GWAS. In particular, Wang et al. (2021a, b) identified 29 candidate genes potentially controlling plant height, Yu et al. (2019) identified 16 QTLs regulating seed size and weight, while Fan et al. (2019) identified candidate genes associated with panicle height or dehiscence and

18

Castor Bean: Recent Progress in Understanding the Genome …

endocarp thickness. These studies will aid in future breeding programmes. Furthermore, the castor genome is enabling resequencing of different accessions (Xu et al. 2019; Muraguri et al. 2020) with the aim of understanding genetic variants crucial for domestication, genome diversity and population structure, providing a platform for breeding and trait improvement. Although the genetic diversity of cultivated castor bean is relatively low, the wild projectors that harbour richer genetic diversity could provide genetic resources for castor bean breeding (Xu et al. 2021). Continuous exploration of wild castor accessions from Africa and applying genomic approaches will enhance discovery of novel traits and alleles specific for increased oil production and reduced ricin toxin content in castor bean. In particular, it is important to establish an efficient transform system of castor bean for determining the exact functions of identified candidate genes responsible for key agronomic traits, serving genetic improvement and breeding towards from underutilized to utilized crop. In addition, the genome resource is also facilitating the determination of the epigenetic landscape of DNA and histone methylation in the genome as well as their functions in gene regulation in castor bean. Using the castor reference genome, Han et al. (2020) determined that histone methylation gain or loss was closely linked to activated or repressed gene expression in response to salt stress in castor bean. More research in epigenetic modifications will provide more knowledge on how physiological processes are controlled at the molecular level and also determine whether epigenetics play a role in phenotypic plasticity of castor bean.

References Ajithkumar S, Patel NK, Kansara S (2000) Sorption and diffusion of organic solvents through interpenetrating polymer networks (IPNs) based on polyurethane and unsaturated polyester. Eur Polym J 36(11):2387–2393. http://doi.org/10.1016/S0014-3057(00)00025-2 Allan G, Williams A, Rabinowicz PD, Chan AP, Ravel J, Keim P (2008) Worldwide genotyping of castor bean germplasm (Ricinus communis L.) using AFLPs and

351

SSRs. Genet Resour Crop Evol 55:365–378. https:// doi.org/10.1007/s10722-007-9244-3 Anjani K (2012) Castor genetic resources: a primary gene pool for exploitation. Ind Crops Prod 35(1):1–14. https://doi.org/10.1016/j.indcrop.2011.06.011 Ardui S, Ameur A, Vermeesch JR, Hestand MS (2018) Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res 46:2159–2168. http://doi.org/ 10.1093/nar/gky066 Arumuganathan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9:208–218. https://doi.org/10.1007/BF02672069 Berman P, Nisri S, Wiesman Z (2011) Castor oil biodiesel and its blends as alternative fuel. Biomass Bioenergy 35:2861–2866. http://doi.org/10.1016/j.biombioe.2011. 03.024 Bredeson JV, Lyons JB, Prochnik SE, Wu GA, Ha CM, Edsinger-Gonzales, Edsinger-Gonzales, E, Grimwood J, Schmutz J, Rabbi IY, Egesi C, Nauluvula P, Lebot V, Ndunguru J, Mkamilo G, Bart RS, Setter TL, Gleadow RM, Kulakow P Ferguson ME, Rounsley S, Rokhsar DS (2016) Sequencing wild and cultivated cassava and related species reveals extensive interspecific hybridization and genetic diversity. Nat Biotechnol 34:562–570. https://doi.org/10.1038/nbt. 3535 Budai L, Antal I, Klebovich I, Budai M (2012) Natural oils and waxes: studies on stick bases. J Cosmet Sci 63 (2):93–101 Chan A (2019) Genome sequence of castor bean. In: Kole C, Rabinowicz P (eds) The castor bean genome, compendium of plant genomes. Springer Nature, Cham, pp 115–133. http://doi.org/10.1007/978-3319-97280-0_7 Chan A, Crabtree J, Zhao Q, Lorenzi H, Orvis J, Puiu D, Melake-Berhan A, Jones KM, Redman J, ChenG CEB, Gedil M, Stanke M, Haas BJ, Wortman JR, Fraser-Liggett CM, Ravel J, Rabinowicz PD (2010) Draft genome sequence of the oilseed species Ricinus communis. Nat Biotechnol 28:951–956. https://doi.org/10.1038/nbt.1674 Cobley LS (1956) An introduction to the botany of tropical crops. Orient Longmans, Calcutta Crompton R, Gall D (1980) Georgi Markov: death in a pellet. Med Leg J 48:51–62. https://doi.org/10.1177/ 002581728004800203 De Carvalho R, Guerra M (2002) Cytogenetics of Manihot esculenta Crantz (cassava) and eight related species. Hereditas 136:159–168 Dwivedi MC, Sapre S (2002) Total vegetable-oil based greases prepared from castor oil. J Synth Lubr 19 (3):229–241. http://doi.org/10.1002/jsl.3000190305 Eid J, Fehr A, Gray J et al (2009) Real-time DNA sequencing from single polymerase molecules. Science 323:133–138. http://doi.org/10.1126/science. 1162986 Eulgem T, Rushton PJ, Robatzek S, Somssich IE (2000) The WRKY superfamily of plant transcription factors.

352 Trends Plant Sci 5:199–206. https://doi.org/10.1016/ s1360-1385(00)01600-9 Fan W, Lu J, Pan C, Tan M, Lin Q, Liu W, Li D, Wang L, Hu L, Wang L, Chen C, Wu A, Yu X, Ruan J, Yu J, Hu S, Yan X, Lü S, Cui P (2019) Sequencing of Chinese castor lines reveals genetic signatures of selection and yield-associated loci. Nat Commun 10:1. https://doi.org/10.1038/s41467-019-11228-3 FAO (2010) The second report on the state of the world’s plant genetic resources for food and agriculture. http:// www.fao.org/3/i1500e/i1500e00.htm. Accessed 30 Nov 2020 Foster JT, Allan GJ, Chan AP, Rabinowicz PD, Ravel J, Jackson PJ, Keim P (2010) Single nucleotide polymorphisms for assessing genetic diversity in Castor bean (Ricinus communis). BMC Plant Biol 10:13. http://doi.org/10.1186/1471-2229-10-13 Gainer GC, Luck RM (1979) Modified castor oil lubricant for refrigerator systems employing halocarbon refrigerants Gomes D, Agasse A, Thiébaud P, Delrot S, Gerós H, Chaumont F (2009) Aquaporins are multifunctional water and solute transporters highly divergent in living organisms. Biochim Biophys Acta 1788(6):1213– 1228. http://doi.org/10.1016/j.bbamem.2009.03.009 Govaerts R (2014) Family Euphorbiaceae—world checklist of Euphorbiaceae. Royal Botanic Gardens, Kew, London. http://apps.kew.org/wcsp/. Accessed 14 Mar 2019 Grummitt O, Marsh D (1953) Alternative methods for dehydrating castor oil. J Am Oil Chem Soc 30(1):21– 25 Guo C, Guo R, Xu X, Gao M, Li X, Song J, Zheng Y, Wang X (2014) Evolution and expression analysis of the grape (Vitis vinifera L.) WRKY gene family. J Exp Bot 65:1513–1528. https://doi.org/10.1093/jxb/eru007 Gupta S, Malviya N, Kushwaha H, Nasim J, Bisht NC, Singh VK, Yadav D (2015) Insights into structural and functional diversity of Dof (DNA binding with one finger) transcription factor. Planta 241:549–562. http:// doi.org/10.1007/s00425-014-2239-3 Ha J, Shim S, Lee T, Kang YJ, Hwang WJ, Jeong H, Laosatit K, Lee J, Kim SK, Satyawan D, Lestari P, Yoon MY, Kim MY, Chitikineni A, Tanya P, Somta P, Srinives P, Varshney RK, Lee SH (2019) Genome sequence of Jatropha curcas L., a non-edible biodiesel plant, provides a resource to improve seedrelated traits. Plant Biotechnol J 17(2):517–530. http:// doi.org/10.1111/pbi.12995 Haas BJ, Delcher AL, Mount SM, Wortman JR, Smith RK, Hannick L, Maiti R, Ronning C, Rusch D, Town C, Salzberg S, White O (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res 31:5654–5666. https://doi.org/10.1093/ nar/gkg770 Haas BJ, Salzberg SL, Zhu W, Pertea M, Allen JE, Orvis J, White O, Buell C, Wortman J (2008) Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble

S. Muraguri and A. Liu spliced alignments. Genome Bio 9:1–22. http://doi. org/10.1186/gb-2008-9-1-r7 Han B, Xu W, Ahmed N, Yu A, Wang Z, Liu A (2020) Changes and associations of genomic transcription and histone methylation with salt stress in castor bean. Plant Cell Physiol 61(6):1120–1133. https://doi.org/ 10.1093/pcp/pcaa037 He H, Dong Q, Shao Y, Jiang H, Zhu S, Cheng B, Xiang Y (2012) Genome-wide survey and characterization of the WRKY gene family in Populus trichocarpa. Plant Cell Rep 31:1199–1217. https:// doi.org/10.1007/s00299-012-1241-0 Hegde DM, Lavanya C (2015) Castor (Ricinus communis L.). In: Bharadwaj D (ed) Breeding of field crops. Agrobios, Jodhpur, pp 471–512 Heinz PB (2009) Practical lubrication for industrial facilities. Fairmont Press New York Hu W, Yang HB, Yan Y, Wei YX, Tie WW, Ding ZH, Zuo J, Peng M, Li KM (2015) Genome-wide characterization and analysis of bZIP transcription factor gene family related to abiotic stress in cassava. Sci Rep 6:22783. https://doi.org/10.1038/srep22783 International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800. https://doi.org/10.1038/nature03895 Jin Z, Xu W, Liu A (2013) Genomic surveys and expression analysis of bZIP gene family in castor bean (Ricinus communis L.). Planta 239(2):299–312 Jin Z, Chandrasekaran U, Liu A (2014) Genome wide analysis of the Dof transcription factors in castor bean (Ricinus communis L.). Genes Genomics 36:527–537 Kamalakar K, Mahesh G, Prasad R, Karuna M (2015) A novel methodology for the synthesis of acyloxy castor polyol esters: low pour point lubricant base stocks. J Oleo Sci 64:1283–1295. https://doi.org/10.5650/jos. ess15133 Kulkarni LG, Ramanamurthy GV (1977) Castor (Revised edition). Indian Council of Agricultural Research, New Delhi Li YH, Zhou G, Ma J, Jiang W, Jin L, Zhang Z, Guo Y, Zhang J, Sui Y, Zheng L, Zhang S, Zuo Q, Shi X, Li Y, Zhang W, Hu Y, Kong G, Hong H, Tan B, Song J, Liu Z, Wang Y, Ruan H, Yeung C, Liu J, Wang H, Zhang L, Guan R, Wang K, Li W, Chen S, Chang R, Jiang Z, Jackson S, Li R, Qiu L (2014) De novo assembly of soybean wild relatives for pangenome analysis of diversity and agronomic traits. Nat Biotechnol 32:1045–1052. http://doi.org/10.1038/nbt. 2979 Metzker ML (2005) Emerging technologies in DNA sequencing. Genome Res 15(12):1767–1776. https:// doi.org/10.1101/gr.3770505 Milani M, Nobrega MB (2013) Castor breeding. In: Andersen S (ed) Plant breeding from laboratories to field. Ven Bode Andersen, Intech Open, pp 239–254. http://doi.org/10.5772/56216 Moshkin VA (1986) Castor. Amerind Publishing, New Delhi Muraguri S, Xu W, Chapman M, Muchugi A, Oluwaniyi A, Oyebanji O, Liu A (2020) Intraspecific variation within

18

Castor Bean: Recent Progress in Understanding the Genome …

Castor bean (Ricinus communis L.) based on chloroplast genomes. Ind Crops Prod 155:112779. http://doi.org/10. 1016/j.indcrop.2020.112779 Mushegian AR (2010) Foundations of comparative genomics. Elsevier, Amsterdam Mutlu H, Meier M (2010) Castor oil as a renewable resource for the chemical industry. Eur J Lipid Sci Technol 112(1):10–30. https://doi.org/10.1002/ejlt. 200900138 Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, Anson EL, Bolanos RA, Chou HH, Jordan CM, Halpern AL, Lonardi S, Beasley EM, Brandon RC, Chen L, Dunn PJ, Lai Z, Liang Y, Nusskern DR, Zhan M, Zhang Q, Zheng X, Rubin GM, Adams MD, Venter JC (2000) A wholegenome assembly of Drosophila. Science 287:2196– 2204. http://doi.org/10.1126/science.287.5461.2196 Naik B (2019) Botanical descriptions of castor bean. In: Kole C, Rabinowicz P (eds) The castor bean genome, compendium of plant genomes. Springer Nature, Cham, pp 1–14. http://doi.org/10.1007/978-3-319-97280-0_1 Noguero M, Atif RM, Ochatt S, Thompson RD (2013) The role of the DNA-binding one zinc finger (DOF) transcription factor family in plants. Plant Sci 209:32– 45. https://doi.org/10.1016/j.plantsci.2013.03.016 Peng Z, Lu Y, Li L, Zhao Q, Feng QI, Gao Z, Lu H, Hu T, Yao N, Liu K, Li Y, Fan D, Guo Y, Li W, Lu Y, Weng Q, Zhou C, Zhang L, Huang T, Zhao Y, Zhu C, Liu X, Yang X, Wang T, Miao K, Zhuang C, Cao X, Tang W, Liu G, Liu Y, Chen J, Liu Z, Yuan L, Liu Z, Huang X, Lu T, Fei B, Ning Z, Han B, Jiang Z (2013) The draft genome of the fast‐growing non‐timber forest species moso bamboo (Phyllostachys heterocycla). Nat Genet 45:456–461. https://doi.org/10.1038/ ng.2569 Purseglove JW (1968) Tropical crops: dicotyledons. ELBS and Longman, London, pp 180–186 Qiu L, Yang C, Tian B, Yang JB, Liu A (2010) Exploiting EST databases for the development and characterization of EST-SSR markers in Castor bean (Ricinus communis L.). BMC Plant Biol 10(1):278. http://doi.org/10.1186/1471-2229-10-278 Raymond MP, Bui VT (1998) Epoxy/castor oil graft interpenetrating polymer networks. J Appl Polym Sci 70(9):1649–1659. https://doi.org/10.1002/(SICI)10974628(19981128)70:9%3C1649::AID-APP2%3E3.0. CO;2-A Rhoads A, Au F (2015) PacBio sequencing and its applications. Genom Proteom Bioinf 13:278–289. http://doi.org/10.1016/j.gpb.2015.08.002 Rivarola M, Foster JT, Chan AP, Williams AL, Rice DW, Liu X, Melake-Berhan A, Creasy HH, Puiu D, Rosovitz MJ, Khouri HM, Beckstrom-Sternberg SM, Allan GJ, Keim P, Ravel J, Rabinowicz PD (2011) Castor Bean Organelle genome sequencing and worldwide genetic diversity analysis. PLoS ONE 6:7. https://doi.org/10.1371/journal.pone.0021743 Robertus J (1991) The structure and action of ricin, a cytotoxic N-glycosidase. Semin Cell Biol 2:23–30

353

Roxas-Duncan VI, Smith LA (2011) Of beans and beads: ricin and abrin in bioterrorism and biocrime. J Bioterror Biodef 2:002. https://doi.org/10.4172/2157-2526. s2-002 Rudnick LR (2013) Synthetics, mineral oils, and biobased lubricants: chemistry and technology. CRC Press, Boca Raton Rushton PJ, Somssich IE, Ringler P, Shen QJ (2010) WRKY transcription factors. Trends Plant Sci 15:247– 258. https://doi.org/10.1016/j.tplants.2010.02.006 Salihu BZ, Gana AK, Apuyor BO (2014) Castor oil plant (Ricinus communis L.): botany, ecology and uses. Int J Sci Res 3(5):1333–1341 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors. Proc Nat Acad Sci 74(12):5463–5467 Scholz V, Silva JN (2008) Prospects and risks of the use of castor oil as a fuel. Biomass Bioenergy 32:95–100. https://doi.org/10.1016/j.biombioe.2007.08.004 Severino LS, Auld DL, Baldanzi M, Cândido MJ, Chen G, Crosby W, Tan D, Lakshmamma HP, Lavanya C, Machado O, Mielke T, Milani M, Miller TD, Morris JB, Morse SA, Navas A, Soares DJ, Sofiatti V, Wang ML, Zanotto MD, Zieler H (2012) A review on the challenges for increased production of castor. Agron J 104:853–880. https://doi.org/10.2134/agronj2011.0210 Shizuya H, Birren B, Kim UJ, Mancino V, Slepak T, Tachiiri Y, Simon M (1992) Cloning and stable maintenance of 300-kilobase-pair fragments of human DNA in Escherichia coli using an F-factor-based vector. Proc Natl Acad Sci USA 89:8794–8797. https://doi.org/10.1073/pnas.89.18.8794 Singh R, Ong-Abdullah M, Low ETL, Manaf MAA, Rosli R, Nookiah R, Ooi SE, Chan K, Halim MA, Azizi N, Nagappan J, Bacher B, Lakey N, Smith SW, He D, Hogan M, Budiman MA, Lee EK, DeSalle R, Kudrna D, Goicoechea JL, Wing RA, Wilson RK, Fulton RS, Ordway JM, Martienssen RA, Sambanthamurthi R (2013) Oil palm genome sequence reveals divergence of interfertile species in old and new worlds. Nature 500:335–339. https://doi.org/10.1038/ nature12309 Soderlund C, Longden I, Mott R (1997) FPC: a system for building contigs from restriction fingerprinted clones. Comput Appl Biosci 13:523–535. https://doi.org/10. 1093/bioinformatics/13.5.523 Sujatha M, Reddy T, Mahasi MJ (2008) Role of biotechnological interventions in the improvement of castor bean (Ricinus communis L.) and Jatropha curcas L. Biotechnol Adv 26:424–435. https://doi.org/ 10.1016/j.biotechadv.2008.05.004 Sundararaj DD, Thulasidas G (1976) Botany of field crops. MacMillan, Delhi Tang C, Yang M, Fang Y, Luo Y, Gao S, Xiao X, An Z, Zhou B, Zhang B, Tan X, Yeang H, Qin Y, Yang J, Lin Q, Mei H, Montoro P, Long X, Qi J, Hua Y, He Z, Sun M, Li W, Zeng X, Cheng H, Liu Y, Yang J, Tian W, Zhuang N, Zeng R, Li D, He P, Li Z, Zou Z, Li S, Li C, Wang J, Wei D, Lai C, Luo W, Yu J, Hu S,

354 Huang H (2016) The rubber tree genome reveals new insights into rubber production and species adaptation. Nat Plants 2:16073. https://doi.org/10.1038/nplants. 2016.73 Tatusov RL, Koonin EV, Lipman DJ (1997) A genomic perspective on protein families. Science 278:631–637. https://doi.org/10.1126/science.278.5338.631 Thakur S, Karak N (2013) Castor oil-based hyperbranched polyurethanes as advanced surface coating materials. Prog Org Coat 1:157–164. http://doi.org/10. 1016/j.porgcoat.2012.09.001 Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U, Putnam N et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. and Gray). Science 313:1596–1604. http://doi. org/10.1126/science.1128691 Van der Hoorn RA (2008) Plant proteases: from phenotypes to molecular mechanisms. Annu Rev Plant Biol 59:191–223. https://doi.org/10.1146/annurev.arplant. 59.032607.092835 Wang W, Feng B, Xiao J, Xia Z, Zhou X, LiP ZW et al (2014) Cassava genome from a wild ancestor to cultivated varieties. Nat Commun 5:5110. https://doi. org/10.1038/ncomms6110 Wang Y, Xu W, Chen Z, Han B, Haque ME, Liu A (2018) Gene structure, expression pattern and interaction of nuclear factor-Y family in castor bean (Ricinus communis). Planta 247(3):559–572 Wang Z, Yu A, Li F, Xu W, Han B, Cheng X, Liu A (2021a) Bulked segregant analysis reveals candidate genes responsible for dwarf formation in woody oilseed crop castor bean. Sci Rep 11:6277 Wang ZJ, Zhu J, Yuan WY, Wang Y, Hu PP, Jiao CY, Xia HM, Wang DD, Cai QW, Li J (2021b) Genomewide characterization of bZIP transcription factors and their expression patterns in response to drought and salinity stress in Jatropha curca. Int J Biol Macromol 181:1207–1223 Weibel RO (1948) The castor-oil plant in the United States. Eco Bot 2(3):273–283 Weiss EA (2000) Oilseed crops. Blackwell Science, Oxford Wu P, Zhou C, Cheng S, Wu Z, Lu W, Han J, Chen Y, Chen Y, Ni P, Wang Y, Xu X, Huang Y, Song C,

S. Muraguri and A. Liu Wang Z, Shi N, Zhang X, Fang X, Yang Q, Jiang H, Chen Y, Li M, Wang Y, Chen F, Wang J, Wu G (2015) Integrated genome sequence and linkage map of physic nut (Jatropha curcas L.), a biodiesel plant. Plant J 81:810–821. https://doi.org/10.1111/tpj.12761 Xiong W, Xu X, Zhang L, Wu P, Chen Y, Li M, Jiang H, Wu G (2013) Genome-wide analysis of the WRKY gene family in physic nut (Jatropha curcas L.). Gene 524:124–132. https://doi.org/10.1016/j.gene.2013.04.047 Xu W, Yang T, Qiu L, Chapman MA, Li DZ, Liu A (2019) Genomic analysis reveals rich genetic variation and potential targets of selection during domestication of castor bean from perennial woody tree to annual semi-woody crop. Plant Direct 3(10):1–16. https://doi. org/10.1002/pld3.173 Xu W, Wu D, Yang T, Sun C, Wang Z, Han B, Wu S, Yu A, Chapman MA, Muraguri S, Tan Q, Wang W, Bao Z, Liu A, Li D-X (2021) Genomic insights into the origin, domestication and the genetic basis of agronomic traits of castor bean. Genome Biol 22:113 Yu A, Li F, Xu W, Wang Z, Sun C, Han B, Wang Y, Wang B, Cheng X, Liu A (2019) Application of a high-resolution genetic map for chromosome-scale genome assembly and fine QTLs mapping of seed size and weight traits in castor bean. Sci Rep 9(1):1–11. https://doi.org/10.1038/s41598-019-48492-8 Zou Z, Zhang X (2019) Genome-wide identification and comparative evolutionary analysis of the Dof transcription factor family in physic nut and castor bean. Peer J 2:1–25. https://doi.org/10.7717/peerj.6354 Zou Z, Gong J, Huang Q, Mo Y, Yang L, Xie G (2015) Gene structures, evolution, classification and expression profiles of the aquaporin gene family in castor bean (Ricinus communis L.). PLoS One 10:e0141022. http://doi.org/10.1371/journal.pone.0141022 Zou Z, Yang L, Wang D, Huang Q, Mo Y, Xie G (2016) Gene structures, evolution and transcriptional profiling of the WRKY gene family in castor bean (Ricinus communis L.). PLoS One 11:e0148243. http://doi.org/ 10.1371/journal.pone.0148243 Zou Z, Huang QX, Xie GS, Yang LF (2018) Genome-wide comparative analysis of papain-like cysteine protease family genes in castor bean and physic nut. Sci Rep 8:331. https://doi.org/10.1038/s41598-017-18760-6

Genome Resources for Ensete ventricosum (Enset) and Related Species

19

Lakshmipriya Venkatesan, Sadik Muzemil, Filate Fiche, Murray Grant, and David J. Studholme

Abstract

The most iconic crop plant of the Ehtiopian highlands is Ensete ventricosum (Welw.) Cheeseman, known colloquially in English as ‘Ensete’, ‘false banana’, and ‘Abyssinian banana’. It is a staple, orphan, and multipurpose crop, domesticated in Ethiopia about 10,000 years ago. While largely absent from widespread commercial use, Ensete species have great usage and value outside of Ethiopia in several indigenous groups in Africa and Asia. Despite its importance and value, enset has been relatively neglected by scientific research and is arguably the least-studied African crop. Breeding is technically feasible, but traditional breeding approaches cannot be applied. Enset offers a number of advantages for food security. Unlike most food crops, it is a perennial that can be harvested at almost any time during a multi-year window after reaching adequate size but before flowering. Currently, (November 2020), available genome

L. Venkatesan  D. J. Studholme (&) Biosciences. University of Exeter, Exeter, UK e-mail: [email protected] S. Muzemil  M. Grant University of Warwick, Coventry, UK e-mail: [email protected] F. Fiche Hawassa University, Hawassa, Ethiopia

resources for E. ventricosum are limited to four draft-quality genome assemblies and unassembled sequence reads for several additional accessions of E. ventricosum plus a few other species of the genus. The size of the genome is estimated at 547 Mb. In the accession Bedadeti of Ensete ventricosum, the chloroplast DNA is 168,843 base pairs (bp) long, encompassing the IRa and IRb regions consisting of 34,285 bp each, an SSC region of 11,298 bp, and an LSC region of 79,872 bp. Substantial enset germplasm has not yet been systematically evaluated and is estimated to represent only *40% of the landraces known to the Wolayita farming communities, implying further untapped genetic potential for enhancing yield and stress tolerance.

19.1

Introduction

The Ethiopian highlands are an important center for domestication of globally and locally important crop plants including wheat, barley, coffee, various pulses, teff, and enset (Harlan 1969; Hummer 2015; Tidiane Sall et al. 2019). The region also supports a wide variety of edible wild plants that are important for food security during times of war, famine, and political unrest (Sina and Degu 2015). Arguably, the most iconic plant of this region is enset. Shack’s 1966 book ‘The Gurage A people of the Ensete culture’ describes

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_19

355

356

how this remarkable crop has dominated thoughts and interests of many of those who cultivate it and has influenced their way of life (Peveri 2015). Botanically, this plant is Ensete ventricosum (Welw.) Cheeseman and is known colloquially in English also as ‘Ensete’, ‘false banana’, and ‘Abyssinian banana’. It is a staple, orphan, and multipurpose crop, domesticated in Ethiopia about 10,000 years ago. Today, enset is highly integrated into the economic, social, and cultural life of a large part of Ethiopia, feeding around 20% of the more than 100 million inhabitants of Africa’s second most populous country (Yesuf and Hunduma 2012). Enset is more than a year-round guaranteed food source utilized for social fabric, traditional medicine, providing shade, fiber, building materials, cattle fodder, and packaging materials (Brandt et al. 1997; Brandt 1996); and it is also remarkably drought tolerant and can survive extended seasons without water. It can tolerate freezing injuries during cold seasons, while other crops totally fail (Quinlan et al. 2015). Enset is one of the major components of Ethiopian farming systems, and its cultivation contributes significantly to key ecosystem functions such as production of organic matter, erosion control, and provision of shade for intercrops (Tsegaye and Struik 2002; Dobo et al. 2017; Abebe et al. 2013). Enset is cultivated exclusively in Ethiopia, particularly in southern and southwest part of the country, and is probably the most under-researched resilient starch crop given its considerable economic and social importance for millions. Enset agriculture is an indigenous African farming system, characterized by low input, resilience to stress, and sustainable farming, as well large numbers of traditional varieties (landraces) (Brandt et al. 1997; Borrell et al. 2019). However, enset breeding is particularly challenging and not viable on the scale required for conventional breeding programs. Current landraces represent clones adapted to the diverse agro-ecologies of south and southwest Ethiopia and the foothills of the Rift Valley. The specific traits of these landraces arose through repeated selection through vegetative propagation by local peoples over hundreds of years.

L. Venkatesan et al.

Its long history of cultivation has resulted in regional landraces showing significant diversity in genetics and agronomic traits due to selection based upon farming practices, taste preferences of ethnic groups, and contrasting environmental regimes (Birmeta et al. 2002; Gerura et al. 2019; Negash et al. 2002; Tobiaw 2011; Olango et al. 2015). Enset is often called ‘the tree against hunger’ (Brandt 1996; Brandt et al. 1997) or ‘tree of the poor’ (Peveri 2015), reflecting its historical importance as a year-round source of food during persistent famine and droughts and its role in subsistence farming. Food self-sufficiency is Ethiopia’s national priority goal, and the country is particularly vulnerable to climate change. Beyond the key ecosystem functions (Woldetensaye 1997; Tsegaye and Struik 2002), every part of the plant can be used, whether for food, fiber/cloth, bedding, cattle feed, and even as a plate (Brandt et al. 1997; Tsegaye and Struik 2002). The pseudostem (Fig. 19.1 panels b and f) and corm (swollen base or rhizome) (Fig. 19.1c) are the most important sources of food. These are consumed as kocho (i.e., fermented starch obtained from the mixture of grated corm; Fig. 19.1f) and decorticated leaf sheaths (Fig. 19.1 panels h and i); bulla—a white powder produced by dehydrating squeezed sap from scraped leaf sheaths (Fig. 19.1 panel j) and grated corms (Fig. 19.1 panel f); and amicho— pieces of boiled corm, eaten like other root and tuber crops (Brandt et al. 1997; Alemu and Stephen 1991; Tsegaye and Struik 2002).

19.2

Botanical Description of Enset

Enset (Ensete ventricosum (Welw.) Cheeseman), the world’s largest herbaceous perennial monocarpic plant (Fig. 19.2). It belongs to the Musaceae botanical family, but unlike banana, its fruit is not eaten (Fig. 19.2). Instead the plant is grown for three to 12 years or more (depending on landrace, management, and ecology) before the leaf sheath and corm are harvested and processed into starchy food products. E. ventricosum is the only domesticated species of the eight accepted species in the genus, which also include E. homblei

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

357

Fig. 19.1 Enset (Ensete ventricosum) and preparation of its food products. The entirety of the enset plant is used in making food (corm and leaf starch), fiber, animal bedding, cattle feed or used as packaging. a A typical five-year old enset plant provides shade, moisture retention, and a sheltered environment for intercropping. b In the initial stages of enset processing, soil is removed to expose corm and leaf sheathes removed for starch and fiber production. c Corm removed (note leaves used for processing). d Unlike banana, enset has a single flower forming

pendant thyrses covered by large pink bracts and does not bear edible fruit. e The thick leaves provide a rich source of fiber and starch, as well as the capacity to retain water, literally for years. f: Corm is pulped by hand. This will be pulped, wrapped in enset leaves, buried in a soil pit, and fermented for up to six months. g Half the corm weighs— 26 kg. h and i The large, thick fibrous leaves are ‘cortified’, manually scraped, and the starch-rich pulp collected in enset leaves. j The starch-rich juice obtained from squeezing the leaf pulp is dried to make ‘Bula’ flour

(Bequaert ex De Wild.) Cheesman, E. lasiocarpum (Franch.) Cheesman, E. livingstonianum (J. Kirk) Cheesman, E. perrieri (Claverie), E. glaucum (Roxb.) Cheesman, E. superbum (Roxb.) Cheesman and E. wilsonii (Tutcher) Cheesman (The Plant List Version 1 2010; Väre and Häkkinen 2011; Cheesman 1947; J and Trust 2016). Its cultivation is concentrated in southern and southwestern parts of Ethiopia and surrounding neighboring areas within the country (Fig. 19.3). Wild enset is distributed across eastern and southern African countries extending to South Africa (Borrell et al. 2019). Other species

of the genus Ensete exist in different parts of Africa and Asia but have not been domesticated for food and agriculture (Baker and Simmonds 1953; Sethiya et al. 2016). These are further discussed below.

19.3

Enset (Ensete ventricosum) as a Crop

Domesticated enset is cultivated at altitudes between 1200 and 3100 m above sea level (masl) (Brandt et al. 1997) and reaches up to 3400 masl

358

L. Venkatesan et al.

Fig. 19.2 Enset plants growing in farmer’s field in Ethiopia

Fig. 19.3 Enset plants in a garden in Ethiopia

(personal observation, Sadik Muzemil), but patches of enset plants can also be found at lower elevations (Haile et al. 1996). Farming is restricted by low temperature at higher altitudes and water availability at lower altitudes. Many landraces are adapted to particular ecosystems (Yemataw et al. 2016b); however, a smallholder may grow up to 28 landraces to exploit specific traits (Yemataw et al. 2016b; Tsegaye and Struik 2002). Productivity of enset-based farming systems exceeds any other cropping systems for the same agroecology and inputs (Brandt et al. 1997;

Tsegaye and Struik 2002), trailing only cassava for its energy yield/area/year (Pijls et al. 1995). Enset’s capacity to enhance soil and provide shade enables intercropping, usually with coffee, root and tuber crops, vegetable (e.g., kale), legumes, or various cereals. Maize, introduced in the 1950’s to compensate for escalating enset losses from the devastating bacterial disease enset Xanthomonas wilt (EXW), is prone to failure in dry years and requires input of expensive fertilizer, impacting sustainability and integrity of the traditional perennial enset-based

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

systems. By contrast, intercropping with root or tuber crops supports some of the most densely populated rural areas in Ethiopia, with an average daily consumption of 0.5 kg of corm providing 70% of total energy and 20% of protein. It is also well established that enset cultivation areas carry higher human capacity (>200 people per sq. km) than any other crops and cropping systems in Ethiopia. Enset cultivation is usually integrated with livestock farming, providing enset plants and plant parts for feed while animal manure fertilizes the fields improving nutrient recycling. Over hundreds of years, farmer selection has resulted in landraces that reflect cultural preferences, the key drivers being taste, quantity, and quality of kocho or amicho. As a consequence, the large variation in agro-ecologies suitable for enset cultivation farming systems has led to historical selection that has driven strong regional differentiation of enset genotypes (Negash and Niehof 2004; Shumbulo et al. 2012). Currently, landraces are distinguished phenotypically, by characteristics such as petiole color, mid-rib, leaf sheath, angle of leaf orientation, leaf size and color, and pseudostem circumference and length (Yemataw et al. 2014, 2016a, 2018a; Bekele and Shigeta 2010). These morphometric data along with DNA-based studies (Olango et al. 2015; Negash et al. 2002; Tsegaye 2002; Birmeta et al. 2002; Getachew et al. 2014; Gerura et al. 2019; Tobiaw 2011) have revealed that there are numerous enset landraces showing significant genetic diversity, having evolved regionally, reflecting the extent of enset cultivation, and the culture and distribution pattern of the different ethnic groups and environmental regimes present in traditional farming systems. However, defining a catalog of landraces is confounded by complex vernacular naming systems of by multiple ethno-linguistic communities, with alteration and duplication of vernacular names for the same landrace. This restricts validation, selection, and the development of highyielding food/ fiber landraces, a situation potentially resolved using molecular markers. Edible parts of enset are primarily the processed leaf sheath and its underground corm. Kocho, bulla, and amicho are the major food

359

types obtained from enset. Kocho, a fermented product, is obtained from mixture of grated corm and scraped leaf sheath left in an earthen pit. Various traditional starters (Andeta et al. 2018) are used to facilitate the fermentation process and lactic acid bacteria largely dominate this process (Andeta et al. 2018, 2019). The fermented and ready-to-prepare kocho can be stored long-term and is used to make a ready-to-serve pancakelike bread. Enset harvesting, traditionally preparing of the plant parts for earthen fermentation and managing the fermentation process, is a cumbersome task solely performed by women (Peveri 2015). While enset is traditionally processed for kocho, the starch-rich liquid from decorticated leaf sheaths is also sedimented to collect the starch granules and then rehydrated into a white powder called bulla, used to make a pancakes, porridge, and dumplings. Bulla can additionally be used in the manufacture of agar and plastic (Ayenew et al. 2012; Mengesha et al. 2012). Amicho is the sweet corms of some enset landraces eaten boiled, usually from younger plants, as other root and tuber crops (reminiscent of potato or sweet potato) and is relatively rich in protein content (Mohammed et al. 2013). The excellent nutrient composition of enset has recently been extensively reviewed (Yemata 2020; Borrell et al. 2020). As a multipurpose crop, enset also has various non-food values. Fiber of enset, remaining after scraping leaf sheaths for bulla production, is known for its excellent structure with a quality equivalent to that of the banana-like fiber crop abacá (Blomme et al. 2018).

19.4

Availability of Germplasm

Despite its importance and value, enset has been relatively neglected by scientific research and is arguably the least-studied African crop. Breeding is technically feasible, but traditional breeding approaches cannot be applied. Enset is monocarpic, and many landraces do not flower until several years after establishment, restricting the capacity to cross. Further, often seeds are sterile.

360

L. Venkatesan et al.

Fig. 19.4 Germplasm collection of The Southern Agriculture Research Institute (SARI)

Thus, traditional breeding is challenging and time-consuming, and improvement is mainly based on clonal selection of landraces for specific traits tested over many years and across multiple locations. Conservation of enset genetic resources ex situ as seed in cold storage is difficult if not impossible; consequently, little effort has been made to conserve enset seeds (Guzzon and Müller 2016). For understandable reasons of avoiding iniquitous exploitation, export of germplasm outside of Ethiopia is carefully controlled. It is much easier for international researchers to obtain material from the ornamental trade, especially the characteristically red Maurelii variety; recently, it was shown that this variety is genetically close to cultivated varieties (Biswas et al. 2020), suggesting that it may be a suitable model for genomic studies. The Southern Agriculture Research Institute (SARI) (https://www.sari.gov.et/) is mandated to improve enset, and its Areka Agricultural Research Center (AARC) was established in 1986 to characterize and preserve in situ, the genetic diversity of enset landraces in Ethiopia. AARC currently holds a non-exhaustive collection of more than 800 enset accessions on two field sites from 12 major enset growing areas in Ethiopia (Fig. 19.4).

AARC’s key objective is to characterize and effectively utilize this unique germplasm resource to improve enset farming systems. This currently entails detailed assessment of morphological variation for quantitative characters among the landraces, e.g., kocho quality based upon time to maturity, yield, and quality (Yeshitla and Yemataw 2012). Importantly, certain landraces such as Arkia have been identified that better resist/tolerate EXW (Muzemil et al. 2019), the most important enset pathogen (Ashagari 1985; Welde-Michael et al. 2010; Mekuria et al. 2016).

19.5

Enset as an Underutilized Crop

Enset offers a number of advantages for food security. Unlike most food crops, it is a perennial that can be harvested at almost any time during a multi-year window after reaching adequate size but before flowering (which is followed by death). This flexibility can help to buffer against food shortages between seasons as well as between years. Even after harvesting, the fermented food products can be stored without spoiling. With its versatility and its resilience against abiotic stress, subsistence systems based on this remarkable crop are credited for the

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

survival of communities during the devastating famines of the 1980s in Ethiopia (Brandt et al. 1997). Enset starch products improve with age, unlike other root and tuber crops that spoil, making them attractive for urban food supplies (Wilkin et al. 2018). This is significant as city populations expand; for example, the population of Addis Ababa is projected to increase more than tenfold between 2016 and 2100 (Peveri 2015). Various lines of evidence previously reviewed (Brandt et al. 1997) suggest that historically enset was cultivated in northern regions of Ethiopia and possibly even in ancient Egypt (Simoons 1965). One of the major ethnic groups in the region, the Sidama, in the past had a diet dominated by enset supplemented with meat and milk but since the imperialisation of the Ethiopian state, they have increasingly adopted corn, teff, potato, and tomato more characteristic of the central region of the country. Given the species’ natural distribution stretching over a much wider area of the continent, why is enset not more widely adopted? It is claimed that Ethiopians located outside of the cultivation areas see this as a foodstuff of the poor and marginal groups and even that its rejection results from a ‘cultural war’ (Peveri 2015). On the other hand, there are suggestions of renewed interest in enset by urban Ethiopians as an element of their traditional cuisine (Wilkin et al. 2018).

19.6

Ethnopharmacology of Ensete Spp. Outside of Ethiopia

While largely absent from widespread commercial use, Ensete species have great usage and value outside of Ethiopia in several indigenous groups in Africa and Asia (Sethiya et al. 2016). While it is only in Ethiopia that domesticated E. ventricosum is used as a staple or co-staple food, the Ensete species in Asia have been found to have many different roles. Despite being labeled as predominantly ornamental plants (Borrell et al. 2019), ethnic groups all over Asia have also found a myriad of other uses for the wild Ensete species. Valued for their medicinal properties,

361

food and feed products, and potential for handiwork, the Ensete species have both very similar and yet, very unique uses in the different ethnic groups separated by geographical location, ethnicity, and culture (Sethiya et al. 2016; Vasundharan et al. 2015). Of the seven recognized species of Ensete, two (E. superbum and E. glaucum) have been recorded in Asia (Borrell et al. 2019). Ensete superbum, endemic to India, is found in the Western Ghats and Aravalli mountain ranges in west India as well as in the Himalayan foothills in Eastern India (Vasundharan et al. 2015). In India, Ensete glaucum is only present in the hilly regions of the north-eastern states (Subbaraya 2006), from where the distribution range extends to Bangladesh, southern China, and Thailand, past the Philippines into New Guinea (Borrell et al. 2019). In India, the two species of genus Ensete found are mostly used by indigenous groups where Ensete plants grow locally in the wild. Found mostly in rocky and hilly areas, E. superbum has vernacular names in native languages that mostly translate into English as cliff banana, rock banana, or wild banana (Kumar et al. 2013; Vasundharan et al. 2015). On the Indian peninsula, while E. superbum is grown as an ornamental plant in gardens, it is chiefly utilized for its medicinal properties, as well as a source of food by the ethnic groups. In Ayurveda, a holistic, traditional treatment style, E. superbum seeds and pseudostem are used in the preparation of medicines for diabetes, kidney stones, measles, and even assisting childbirth (Sethiya et al. 2016, 2019). However, most of the uses of Ensete superbum are by the ethnic groups living in western and southern India. The different tribes and groups consume a varied mix of the flowers, unripe fruits, strips of the petiole, corm, pseudostems, and tender stems, prepared by traditional methods. For example, people of the Melghat forest in western India’s Maharashtra region, eat boiled corm, curry prepared with the flowers as well as the inner part of the leaf base, considered a vegetable: all on the large leaf, which is used as a plate. Apart from usage as food, the cliff banana leaves have other cultural

362

uses as well. The priests in the Madhava Temple in Kerala use the large leaves for ceremonial rituals (Sethiya et al. 2019). Ensete superbum’s fruits and flowers have been found to contain a significant amount of micronutrients, such as potassium, phosphorus, calcium, sodium, and magnesium (Sethiya et al. 2019). The various groups living in geographically and culturally diverse areas have all found different medicinal uses for all the parts of the Ensete plant. Interestingly, there are key applications common among all the indigenous groups. For example, the majority of the ethnic communities use the black, hard seeds for treating similar ailments, like kidney-related problems and leucorrhoea. And yet, these groups also have developed specific uses of the various parts of the E. superbum plant, utilizing their unique methods and ancestral medicinal knowledge to treat a vast array of illnesses (Vasundharan et al. 2015). In the ethnomedicine practices of the indigenous groups, ground-up seeds, especially the powdered endosperm, are used alone or in concoctions of milk, honey, or water to treat kidney stones, urinary problems, stomach aches, and also to improve vitality. A topical ointment made from a seed paste is applied to allay aching hips or body pain and fever in the Kadar and Malasars tribes in south India. In the Shimoga area of Karnataka, south India, local groups use the pseudostem to treat appendicitis. The juice from the pseudostem is used to cure food poisoning in the Jalgaon area in west India, while communities in Mizoram, East India, use it either as a topical ointment for snake bites and bee stings, or as an ingested medicine for dysentery and jaundice. They also use a blend made from the fruit to treat typhoid. In Kerala, the ash gathered after burning the leaves is used as a treatment for asthma. The Garasia community in North-west India mix the ash with butter and apply it as a treatment for leucoderma spots. In Maharashtra, a set of local tribes including the Bhils, Wanjar, and Pardhi uses the seeds, stem, leaves, and root in the treatment of dog bites and reportedly also to help cure psychosomatic diseases (Sethiya et al. 2019; Vasundharan et al. 2015).

L. Venkatesan et al.

The seeds are recognized as non-timber forest produce and contribute to the trade in indigenous products (Kumar et al. 2010). In recent years, E. superbum has been getting more recognition for its medicinal properties, with the spread of information to people in villages and towns. In the north-eastern states of India, where E. glaucum is prevalent, the pseudostem and leaf sheath are cooked and eaten as a vegetable by the different local tribes and peoples (Subbaraya 2006; Vasundharan et al. 2015). The Mizo tribes also remove the fiber from old pseudostems of E. glaucum to make handicrafts, some of which are exported out of India, bolstering the livelihood of the indigenous tribe. E. glaucum is also a valued ornamental plant, desired due to its unusual flowers and arrangement of large green leaves. It is grown in home gardens and primarily tended to by women (Subbaraya 2006). The Chepang people of Nepal traditionally consume E. glaucum fruits and use the tuber/corm as a medicine for heat sickness and urine infections. Additionally, leaves are used as plates and the whole plant contributes to animal fodder or is used in religious ceremonies (Rijal 2011). In Thailand, one of the biggest ethnic hill tribes is the Karen, who use E. glaucum to treat diarrhea-type digestive disorders (Tangjitman et al. 2015). In adjacent Myanmar, ethnic communities like the Yunnanese, Lisu, Burmese, and Kachin use E. glaucum pseudostem as animal fodder and feed, especially for pigs. The Burmese in the Shan province have the unique practice of trapping gold with the inner layer of the pseudostem, while washing it. In the neighboring country of Laos, the seeds were used for making bead ornaments and accessories. Medicines made of combinations of roots, pseudostem, and seeds are used to treat fatigue or swelling. The Ensete plants are either collected from the wild areas they grow in or are grown in home gardens for use by the local community (Ochiai 2012). In areas of East Java in Indonesia, E. glaucum is used mostly as an ornamental plant because of its large size and beautiful leaf arrangement (Haspar 2017).

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

Preliminary studies conducted into the phytochemicals conferring the medicinal properties to Ensete have revealed that the leaves contain secondary metabolites like phenols, while the roots contain glycosides, and the corm has alkaloids and flavonoids, among other chemicals (Kumar et al. 2013). Phenols and flavonoids have been found to provide antioxidant qualities in the concentrations they occur in E. superbum (Sethiya et al. 2016). Initial experiments have shown that alcohol extract of seeds and chloroform extracts from the pseudostem improve kidney function and inhibit kidney stones. Compounds isolated from the seeds’ alcohol extract have been shown to have anti-diabetic and anti-viral properties (Sethiya et al. 2019). Such results support the scientific basis of the healing methods traditionally used by the different ethnic groups, as well as offer new opportunities to discover novel uses of Ensete. Medicinal phytoactives is a growing area of interest and becoming increasingly more feasible as analytical technologies and data analysis improve. As further research is carried out into the pharmacology of the different parts of Ensete plants, more specific knowledge will be gained about its constituent phytochemicals, which have great potential to be used in modern medicine.

19.7

363

Genome Sequencing

19.7.1 Overview Currently (November 2020), available genome resources for E. ventricosum are limited to four draft-quality genome assemblies (Table 19.1) and unassembled sequence reads for several additional accessions of E. ventricosum plus a few other species of Ensete. The size of the genome is estimated at 547 Mb (Harrison et al. 2014). An incomplete chloroplast genome sequence was published for E. ventricosum under GenBank accession MH603417.1 in the context of a phylogenomics study (Givnish et al. 2018), while we assembled a complete chloroplast genome from enset cultivar Bedadeti (GenBank: MT810123). In the absence of high-quality reference genome assemblies, some enset researchers have instead resorted to using banana genome sequences for genome-wide analyzes such as SNP-calling (Tesfamicael 2020).

19.7.2 Reference Genome Assemblies All of the available nuclear genome assemblies are built from short Illumina sequencing reads, yielding large numbers of rather short contigs.

Table 19.1 Assembled genome sequences for Ensete ventricosum Scaffold N50 (nucleotides)

References

20,943

21,097

(Yemataw et al. 2018b)

81.3

15,546

16,208

(Yemataw et al. 2018b)

429,479,738

78.5

10,278

n.d

(Yemataw et al. 2018c)

437,268,592

80.0

11,721

13,866

(Harrison et al. 2014)

Enset variety

GenBank accession

Total length (nucleotides)

Percentage of estimated genome size of 547 Mbp

Bedadeti

GCA_000818735.3

451,284,018

82.5

Onjamo

GCA_001884845.1

444,841,970

Derea

GCA_001884805.1

Jungle Seeds

GCA_000331365.3

Contig N50 (nucleotides)

364

L. Venkatesan et al.

Table 19.2 Estimation of genome assembly completeness using BUSCO3 Assembly

Complete, single copy

Complete, duplicated

Fragmented

Missing

Total

Jungle Seeds (GCA_000331365.2)

1015

88

134

203

1440

Bedadeti (GCA_000818735.2)

1135

101

71

133

1440

Derea (GCA_001884805.1)

1075

99

86

180

1440

Onjamo (GCA_001884845.1)

1079

106

89

166

1440

Annotation of the nuclear genome assemblies was performed automatically using the MAKER pipeline (Cantarel et al. 2008), without manual curation, and therefore, the reliability of the >228,000 predicted protein sequences is questionable. The accuracy of the gene models is further limited by the current lack of transcriptome sequencing data. The fragmented nature of the genome assemblies means that many genes may be split across two or more contigs, artificially inflating the numbers of predicted genes, and reducing their apparent lengths. The currently available draft genome assemblies cover between 78.5% and 82.5% of the estimated genome size. The completeness of the gene-space in a genome assembly can be estimated by performing a census of matches to genes that are expected to occur exactly once in the genome. A common approach to quantifying this is Benchmarking Universal Single-Copy Orthologs or BUSCO (Simão et al. 2015; Klioutchnikov et al. 2017; Seppey et al. 2019). Using version 3 of the BUSCO software, we found that the best currently available assembly is that of variety Bedadeti. Of 1440 genes that are expected to be found as single copies across the embryophytes, 78.8% were found intact and as single copies (Table 19.2). A further 6% were duplicated. This could indicate that the assembly needs to be de-duplicated to remove redundant haplotigs arising from heterozygosity. The remaining expected genes were either fragmented (9.3%) or missing (14.1%) from the Bedadeti assembly. Three of the genome assemblies originate from varieties cultivated in Ethiopia, namely

Bedadeti, Onjamo, and Derea (Yemataw et al. 2018b). However, the source of the first enset genome that we sequenced was a plant raised from seed purchased from the Jungle Seeds company (http://www.jungleseeds.co.uk), who specialize in the ornamentals trade. The provenance of this seed is not known, nor is the nature of its relationship with Ethiopian landraces. Subsequent analysis of the chloroplast genome revealed that the Jungle Seeds plant was genetically distant from sequenced Ethiopian accessions but very close to an enset plant from a Chinese botanical garden (Yemataw et al. 2018b). Once data from large-scale genomic surveys of wild and cultivated enset become available (Tesfamicael 2020), it should be possible to more precisely place this ornamental accession. Clearly, there is a need for an improved reference genome assembly for enset. This is a likely prospect since enset has been identified as one of the target species for sequencing by the African Orphan Crops Project http://african orphancrops.org/ensete-ventricosum/ and data are expected to be available shortly.

19.7.3 Whole-Genome Resequencing Cost-effective surveying of genetic variation can be achieved by applying cheap short-read sequencing technologies (e.g., Illumina HiSeq, MiSeq, etc.) to whole-genome shotgun DNA libraries. This approach, where the aim is usually comparison against a reference genome rather than de-novo genome assembly, is often called

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

resequencing. In the case of enset, it is possible to align against one of the four available genome assemblies. Given the fragmented and incomplete nature of these short sequence read assemblies, the alternative is to align enset genome reads against one of the high-quality banana genome sequence assemblies (D’Hont et al. 2012; Martin et al. 2016; Wang et al. 2019); we found that in coding regions, the degree of conservation between Musa and Ensete is high enough that reads can be aligned across the genus boundary using aligners such as BWA and BowTie (Kim et al. 2015; Langmead et al. 2009; Li and Durbin 2009). We sequenced a total of 18 E. ventricosum accessions at between 7 and 45 X genomic coverage (Yemataw et al. 2018b). This enabled discovery of single-nucleotide polymorphisms (SNPs) at a density of about 1 per kilobase over the nuclear genome, based on aligning against the Bedadeti reference genome. Having discovered SNPs based on a fairly small number of resequenced genome, it is then possible to implement various assays to genotype a much larger germplasm collection based on that data. To demonstrate this, we designed and implemented set of PCR-based assays to genotype 480 enset accessions (Yemataw et al. 2018b). Another study performed genome-wide genotyping by sequencing across 230 accessions to discover 89 genes containing SNPs that separate wild from cultivated accessions (Tesfamicael et al. 2020). Other research groups have applied very shallow genome-wide sequencing or ‘genome skimming’ to Ensete species. For example, under BioProject PRJNA530661 can be found genome skimming data consisting of 300-bp Illumina reads for E. glaucum and E. livingstonianum along with data for other Musaceae (https://www.ncbi.nlm. nih.gov/bioproject/?term=PRJNA530661). This skimming approach successfully harvests highly abundant sequences such as those from the chloroplasts and mitochondria, facilitating phylogenetic analysis.

365

19.7.4 Sequencing of ReducedRepresentation Libraries Several phylogenetic studies that include Ensete species have employed methods for capture of specific DNA targets for sequencing. For example, BioProject PRJNA302314 from the University of California, Berkley, sequenced libraries based on capture of 494 exons from 418 genes in E. superbum and E. ventricosum (Sass et al. 2016; Givnish et al. 2018). Similarly, as part of their tree-of-life project (Eiserhardt et al. 2018), the Royal Botanical Gardens, Kew are using a system that targets 353 nuclear genes (Johnson et al. 2019) and have used Illumina MiSeq to sequence a library from E. superbum (BioPoject PRJEB38536). Furthermore, Charles University have applied MiSeq sequencing to a library enriched for 382 nuclear genes in a phylogenetic study that includes E. superbum (BioProject PRJNA4514). Other studies have applied target capture to repeat sequences rather than to genes to facilitate discovery of short sequence repeats (SSRs) that can be used as markers of genetic diversity (Olango et al. 2015; Novák et al. 2014). Recently, a collection of 230 enset accessions, wild, and cultivated was subjected to genotyping by synthesis to discover patterns of SNP variation across the genome. The study also applied amplified fragment length polymorphism (AFLP) analyzes (Tesfamicael 2020). This revealed a clear distinction between wild versus cultivated varieties involving 89 genes, among which 17 were possibly implicated in development of flowers and seeds. This could go some way to elucidating the consequences of sustained vegetative propagation of landraces by farmers while wild enset reproduces sexually via seeds. However, despite much of the literature emphasizing vegetative propagation of cultivated enset, many farmers do also propagate from seed (personal observations, Filate Fiche). Significant insights into the relationship between bananas and enset, their phenotypic divergence (e.g., enlargement of corm versus

366

fruit) and possibly domestication may be revealed by comprehensive comparative genomics studies.

19.7.5 The Ensete ventricosum Chloroplast Genome Chloroplasts are organelles in the plant cell that are involved in photosynthesis, starch storage, synthesizing amino acids, vitamins, key components of the photosynthetic machinery and phytohormones, as well as in metabolizing nitrogen and sulfur. Through retrograde signaling, they are critical for integrating environmental signals and initiating transcriptome reprogramming. While many ancient chloroplast genes are now located in the nuclear genome, chloroplasts have retained their own genetic material, a circular DNA with a distinctive quadripartite structure consisting of two segments of inverted repeats (IRs) separated by a small single-copy region (SSC) and a large single-copy region (LSC) (Bendich 2004; Martin et al. 2013; Chumley et al. 2006). In the accession Bedadeti of Ensete ventricosum, the chloroplast DNA is 168,843 base pairs (bp) long, encompassing the IRa and IRb regions consisting of 34,285 bp each, an SSC region of 11,298 bp, and an LSC region of 79,872 bp. Overall, it is predicted to encode 84 protein coding genes, 38 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. The protein genes belong to three major categories. The first one includes the proteins for subunits of photosystems I and II, ATP synthase, cytochrome b/f complex, NADH-dehydrogenase, and a subunit of rubisco, all of which are involved in photosynthetic light reactions. There are also genes required for self-replication, involving transcription and translation, such as the genes coding for large and small subunits for ribosomes as well as RNA polymerase. The last category of genes includes a miscellaneous assortment of, for example, those encoding translation initiation factor InfA, c-type cytochrome synthesis CcsA, envelope membrane protein CemA, and a protease ClpP. The genes reflect the known

L. Venkatesan et al.

functions of the chloroplast, with over half the genes being involved in photosynthesis and the others functioning to aid self-replication or metabolic pathways. Due to the IR regions, there are two copies each of protein coding genes ndhA, ndhB, ndhH, rpl2, rpl23, rps12, rps15, the five tRNA genes, and four rRNA genes in IRa and IRb (Fig. 19.5); one copy is coded in sense or a positive orientation and other in antisense, or a negative orientation. In Fig. 19.5, three brown boxes denote the gene rps12, one in LSC region and one in IRa and IRb as the rps12 gene is trans-spliced, with its 5’ end exon present in the LSC region and is joined together with an exons in IRa or IRb, to create a messenger RNA (mRNA) that will be translated into a functional protein. Another gene, ndhA, is present at the boundary of the IR and SSC regions, with its sequences extending in both regions. The copy at the boundary of IRa and SSC is the full gene and translated into a functional protein. However, the copy at the IRbSSC boundary is incomplete, mirroring only the section of sequence present in IRa, and not the sequence in SSC. Consequently, the incomplete gene is expected to encode a non-functional protein and is considered a pseudo-gene. Among chloroplast genes, ATG is overwhelmingly the most common start codon for translation, with a minority starting with GTG (Kudla et al. 1992; Neckermann et al. 1994; Hirose 1997). However, two enset chloroplast genes, rpl2 and ndhD, are predicted to start with a ‘ACG’ codon. Such an occurrence has also been recorded for rpl2 in rice and maize where it is associated with RNA editing (Kössel et al. 1993). Similar observations are seen in bananas (Martin et al. 2013; Shetty et al. 2016).

19.8

Future Goals and Prospects

Sequencing the first enset genome and recently another 17, provided evidence that these resources had remarkable added value for enset improvement by combining the unique germplasm and phenotyping facilities of SARI with

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

367

Fig. 19.5 Chloroplast genome of Ensete ventricosum. The innermost ring shows forward repeats, represented by red arcs and reverse repeats, shown in green. The second ring shows tandem repeats, and the third ring shows microsatellites, both denoted with blue lines. The bold

black lines along the third ring show the IRa region on the right and IRb region on the left. The outermost ring shows the genes coded by the DNA, with the different colors denoting the different gene families

the genomics, bioinformatics, and molecular pathology at Exeter and more recently Warwick Universities. Not only could genomics benefit understanding underlying disease resistance but development of marker assisted selection will dramatically enhance current enset improvement research, help ensure maximal success of challenging breeding programs struggling due to infertility and the long time to maturity before

key agronomic traits can be scored. Moreover, deployment of molecular markers will unambiguously address the historical confusion arising from vernacular naming errors. Furthermore, inclusion of wild enset genomes will enable the study of domestication, identifying signatures of selection, and in the longer-term high-quality assemblies can be used to compare the evolution of enset and banana.

368

These goals have recently received substantial financial and technical support from The African Orphan Crop Genome Consortium, who recognize the underlying benefits to Ethiopian Food Security and have agreed substantial financial and technical support to generate a high-quality reference genome sequence and extensive mRNA-Seq for annotation, plus resequencing of an additional 100 enset genomes. When combined with sequencing of additional landraces selected for specific agronomic traits, we will be in an unprecedented position to identify key loci and develop robust molecular markers for key traits such as corm quality, drought resistance, fiber quality, and disease resistance that can be used at AARC and in the field to make a real step change in enset farming practices. SARI have evaluated 387 accessions originating from nine different regions of Ethiopia for 15 quantitative traits to determine the extent and pattern of distribution of morphological variations (Yemataw et al. 2017). The variations among the accessions and regions were significant for all 15 traits, and significant correlations between key agronomic characters were identified, notably: mean plant height, corm weight, and kocho yield/hectare/year revealed regional variation along an altitude gradient and across cultural differences related to landrace origin. However, substantial enset germplasm at AARC has not yet been systematically evaluated and is estimated to represent only *40% of the landraces known to the Wolayita farming communities (Olango et al. 2014), implying further untapped genetic potential for enhancing yield and stress tolerance.

References Abebe T, Amdie T, Abebe T (2013) Determinants of crop diversity and composition in Enset-coffee agroforestry homegardens of Southern Ethiopia. J Agric Rural Dev Trop Subtrop 114:29–38 Alemu K, Stephen S (1991) Enset in North Omo Region. FRP Tech Pam (Ethiopia) 1 Andeta AFF, Vandeweyer D, Woldesenbet F, Eshetu F, Hailemicael A, Woldeyes F et al (2018) Fermentation of enset (Ensete ventricosum) in the Gamo highlands

L. Venkatesan et al. of Ethiopia: Physicochemical and microbial community dynamics. Food Microbiol 73:342–350 Andeta AF, Vandeweyer D, Teffera EF, Woldesenbet F, Verreth C, Crauwels S et al (2019) Effect of fermentation system on the physicochemical and microbial community dynamics during enset (Ensete ventricosum) fermentation. J Appl Microbiol 126:842–853 Ashagari D (1985) Studies on the bacterial wilt of ensat (Ensete ventricosum) and prospects for its control. Ethiop J Agric Sci 7:1–14 Ayenew B, Mengesha A, Tadesse T, Gebremariam E (2012) Ensete ventricosum (WELW.) Cheesman: a cheap and alternative gelling agent for pineapple (Ananas comosus VAR. smooth cayenne) in vitro propagation. J Microbiol Biotechnol Food Sci 2:640– 652 Baker RED, Simmonds NW (1953) The Genus Ensete in Africa. Kew Bull 8:405 Bekele E, Shigeta M (2010) Phylogenetic relationships between Ensete and Musa species as revealed by the trnT trnF region of cpDNA. Genet Resour Crop Evol 58:259–269 Bendich AJ (2004) Circular chloroplast chromosomes: the grand illusion. Plant Cell 16:1661–1666 Birmeta G, Nybom H, Bekele E (2002) RAPD analysis of genetic diversity among clones of the Ethiopian crop plant Ensete ventricosum. Euphytica 124:315–325 Biswas MK, Darbar JN, Borrell JS, Bagchi M, Biswas D, Nuraga GW et al (2020) The landscape of microsatellites in the enset (Ensete ventricosum) genome and web-based marker resource development. Sci Rep 10:15312 Blomme G, Yemataw Z, Tawle K, Sinohin V, Gueco L, Kebede R et al (2018) Assessing enset fibre yield and quality for a wide range of enset [Ensete ventricosum (Welw.) Cheesman] landraces in Ethiopia. Fruits 73:328–341 Borrell JS, Biswas MK, Goodwin M, Blomme G, Schwarzacher T, Heslop-Harrison JS et al (2019) Enset in Ethiopia: a poorly characterized but resilient starch staple. Ann Bot 123:747–766 Borrell JS, Goodwin M, Blomme G, Jacobsen K, Wendawek AM, Gashu D et al (2020) Enset‐based agricultural systems in Ethiopia: a systematic review of production trends, agronomy, processing and the wider food security applications of a neglected banana relative. Plants, People, Planet 3:10084 Brandt Steven A (1996) A model for the origins and evolution of enset food production. In: Abate T, Hiebsch C, Brandt SA, Gebremariam S (eds) Ensetebased Sustain. Agric Ethiop 36–46 Brandt SA, Spring A, Hiebsch C, McCabe JT, Tabogie E, Diro M et al (1997) The “tree against hunger” ensetbased agricultural systems in Ethiopia. American Association for the Advancement of Science Cantarel BL, Korf I, Robb SMC, Parra G, Ross E, Moore B et al (2008) MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Res 18:188–196

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

Cheesman E (1947) Classification of the Bananas: the genus ensete horan. Kew Bull 2:97–106 Chumley TW, Palmer JD, Mower JP, Fourcade HM, Calie PJ, Boore JL et al (2006) The complete chloroplast genome sequence of pelargonium  hortorum: organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Mol Biol Evol 23:2175–2190 D’Hont A, Denoeud F, Aury J-M, Baurens F-C, Carreel F, Garsmeur O et al (2012) The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488:213–217 Dobo B, Asefa F, Asfaw Z (2017) Effect of tree-ensetcoffee based agro-forestry practices on arbuscular mycorrhizal fungi (AMF) species diversity and spore density. Agrofor Syst Eiserhardt WL, Antonelli A, Bennett DJ, Botigué LR, Burleigh JG, Dodsworth S et al (2018) A roadmap for global synthesis of the plant tree of life. Am J Bot 105:614–622 Gerura FN, Meressa BH, Martina K, Tesfaye A, Olango TM, Nasser Y (2019) Genetic diversity and population structure of enset (Ensete ventricosum Welw Cheesman) landraces of Gurage zone. Ethiopia Genet Resour Crop Evol 66:1813–1824 Getachew S, Mekbib F, Admassu B, Kelemu S, Kidane S, Negisho K et al (2014) A look into genetic diversity of enset (Ensete ventricosum (Welw.) cheesman) using transferable microsatellite sequences of banana in Ethiopia. J Crop Improv 28:159–183 Givnish TJ, Zuluaga A, Spalink D, Soto Gomez M, Lam VKYY, Saarela JM et al (2018) Monocot plastid phylogenomics, timeline, net rates of species diversification, the power of multi-gene analyses, and a functional model for the origin of monocots. Am J Bot 105:1888–1910 Guzzon F, Müller JV (2016) Current availability of seed material of enset (Ensete ventricosum, Musaceae) and its Sub-Saharan wild relatives. Genet Resour Crop Evol 63:185–191 Haile B, Diro M, Endale Tabogie IAR AA (Ethiopia)) (1996) Agronomic research on enset Harlan JR (1969) Ethiopia: a center of diversity author (s). In: R Jack (eds) Harlan Published by: Springer on behalf of New York Botanical Garden Press Stable URL: https://www.jstor.org/stable/4253081 Ethiopia : A Center of Diversity ’. 23:309–314 Harrison J, Moore K, Paszkiewicz K, Jones T, Grant M, Ambacheew D et al (2014) A Draft genome sequence for Ensete ventricosum, the drought-tolerant “tree against hunger.” Agronomy 4:13–33 Haspar L (2017) Ethnobotanical survey of bananas (Musaceae) in six districts of East Java, Indonesia. Biodiversitas. J. Biol. Divers. 18:160–174 Hirose T (1997) Both RNA editing and RNA cleavage are required for translation of tobacco chloroplast ndhD mRNA: apossible regulatory mechanism for the expression of a chloroplast operon consisting of functionally unrelated genes. EMBO J 16:6804–6811

369

Hummer KE (2015) In the footsteps of vavilov: plant diversity then and now. HortScience 50:784–788 J CB, Trust D (2016) Genus ensete (musaceae) in India. Telopea. 19:437–439 Johnson MG, Pokorny L, Dodsworth S, Botigué LR, Cowan RS, Devault A et al (2019) A universal probe set for targeted sequencing of 353 nuclear genes from any flowering plant designed using k-medoids clustering ed. Susanne Renner. Syst Biol 68:594–606 Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12 Klioutchnikov G, Kriventseva EV, Zdobnov EM, Seppey M, Waterhouse RM, Ioannidis P et al (2017) BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol 35:543– 548 Kössel H, Hoch B, Igloi GL, Maier RM, Ruf S (1993) Editing creates the initiator codon of the rpl2 transcript from maize chloroplasts. In: The translational apparatus. Springer US, Boston, MA, p 609–616 Kudla J, Igloi GL, Metzlaff M, Hagemann R, Kössel H (1992) RNA editing in tobacco chloroplasts leads to the formation of a translatable psbL mRNA by a C to U substitution within the initiation codon. EMBO J 11:1099–1103 Kumar VS, Jaishanker R, Annamalai A, Iyer CSP (2010) Ensete superbum (Roxb.) Cheesman: a rare medicinal plant in urgent need of conservation. Curr Sci 98:602– 603 Kumar P, Badgujar SK, VN N (2013) Preliminary screening of different phytochemicals from Ensete superbum (roxb.) Cheesman: a highly medicinal plant of Indian origin. Int J Res Phytochem Pharmacol 3:57–60 Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25 Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–1760 Martin G, Baurens F-C, Cardi C, Aury J-M, D’Hont A (2013) The complete chloroplast genome of banana (Musa acuminata, Zingiberales): insight into plastid monocotyledon evolution. In: James G (ed) Umen. PLoS One. 8:e67350 Martin G, Baurens F-C, Droc G, Rouard M, Cenci A, Kilian A et al (2016) Improvement of the banana “Musa acuminata” reference sequence using NGS data and semi-automated bioinformatics methods. BMC Genomics 17:243 Mekuria W, Amare A, Alemayehu C (2016) Assessment of bacterial wilt (Xanthomonas campestris pv. musacearum) of enset in Southern Ethiopia. African J Agric Res 11:1724–1733 Mengesha A, Ayenew B, Gebremariam E, Tadesse T (2012) Micro-propagation of vanilla planifolia using

370 enset (Ensete ventricosum (Welw, Cheesman)) starch as a gelling agent. Curr Res J Biol Sci 4:519–525 Mohammed B, Martin G, Laila MK (2013) Nutritive values of the drought tolerant food and fodder crop enset. African J Agric Res 8:2326–2333 Muzemil S, Chala A, Tesfaye B, Studholme D, Grant M, Yemataw Z et al (2019) Evaluation of 20 enset (Ensete ventricosum) landraces for response to Xanthomonas vasicola pv. musacearum infection. bioRxiv Neckermann K, Zeltz P, Igloi GL, Kössel H, Maier RM (1994) The role of RNA editing in conservation of start codons in chloroplast genomes. Gene 146:177– 182 Negash A, Niehof A (2004) The significance of enset culture and biodiversity for rural household food and livelihood security in southwestern Ethiopia. Agric Human Values 21:61–71 Negash A, Tsegaye A, Treuren R, Visser B (2002) AFLP analysis of enset clonal diversity in south and southwestern ethiopia for conservation. Crop Sci 42:1105–1111 Novák P, Hřibová E, Neumann P, Koblížková A, Doležel J, Macas J (2014) Genome-wide analysis of repeat diversity across the family musaceae. In: A Houben (ed) PLoS One 9:e98918 Ochiai Y (2012) From forests to homegardens: a case study of Ensete glaucum in Myanmar and Laos. Tropics 21:59–66 Olango T, Tesfaye B, Catellani M, Pè M (2014) Indigenous knowledge, use and on-farm management of enset (Ensete ventricosum (Welw.) Cheesman) diversity in Wolaita, Southern Ethiopia. J Ethnobiol Ethnomed 10:41 Olango TM, Tesfaye B, Pagnotta MA, Pè ME, Catellani M (2015) Development of SSR markers and genetic diversity analysis in enset (Ensete ventricosum (Welw.) Cheesman), an orphan food security crop from Southern Ethiopia. BMC Genet 16:98 Peveri V (2015) The exquisite political fragrance of enset: silent protest in Southern Ethiopia through culinary themes and variations. Partecip. e Conflitto. 8:555–584 Pijls LTJ, Timmer AAM, Wolde-Gebriel Z, West CE, Pijls T, J Ainoid, AM Timmer, Wolde-Gwbriel Z, West CE (1995) Cultivation, preparation and consumption of ensete (Ensete ventricosum) in Ethiopia. J Sci Food Agric 67:1–11 Quinlan RJ, Quinlan MB, Dira S, Caudell M, Sooge A, Assoma AA (2015) Vulnerability and resilience of sidama enset and maize farms in southwestern Ethiopia. J Ethnobiol 35:314–336 Rijal A (2011) Surviving on knowledge: ethnobotany of chepang community from mid-hills of Nepal. Ethnobot Res Appl 9:181 Sass C, Iles WJD, Barrett CF, Smith SY, Specht CD (2016) Revisiting the Zingiberales: using multiplexed exon capture to resolve ancient and recent phylogenetic splits in a charismatic plant lineage. PeerJ 4: e1584

L. Venkatesan et al. Seppey M, Manni M, Zdobnov EM (2019) BUSCO: assessing genome assembly and annotation completeness, 227–245 Sethiya NK, Brahmbhat K, Chauhan B, Mishra SH (2016) Pharmacognostic and phytochemical investigation of Ensete superbum (Roxb.) Cheesman pseudostem. Indian J Nat Prod Resour 7:51–58 Sethiya NK, Shekh MR, Singh PK (2019) Wild banana [Ensete superbum (Roxb.) Cheesman.]: Ethnomedicinal, phytochemical and pharmacological overview. J Ethnopharmacol 233:218–233 Shetty SM, Md Shah MU, Makale K, Mohd-Yusuf Y, Khalid N, Othman RY (2016) Complete chloroplast genome sequence of Musa balbisiana corroborates structural heterogeneity of inverted repeats in wild progenitors of cultivated bananas and plantains. Plant Genome. 9:0089 Shumbulo A, Gecho Y, Tora M (2012) Diversity, challenges and potentials of enset (Ensete ventricosum) production. In case of Offa woreda, wolaita zone. Southern Ethiopia. Food Sci Qual Manag 7:24–31 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinformatics 31:3210–3212 Simoons FJ (1965) Some questions on the economic prehistory of Ethiopia. J Afr Hist 6:1–13 Sina B, Degu HD (2015) Knowledge and use of wild edible plants in the hula district of the sidama zone. Int J Bio-Resour Stress Manag 6:352 Subbaraya U (2006) Farmers’ knowledge of wild Musa in India 46 Tangjitman K, Wongsawad C, Kamwong K, Sukkho T, Trisonthi C (2015) Ethnomedicinal plants used for digestive system disorders by the Karen of northern Thailand. J Ethnobiol Ethnomed 11:27 Tesfamicael (2020) Accumulation of mutations in genes associated with sexual reproduction contributed to the domestication of a vegetatively propagated staple crop, enset 1–31 Tesfamicael KG, Gebre E, March TJ, Sznajder B, Mather DE, Rodríguez López CM (2020) Accumulation of mutations in genes associated with sexual reproduction contributed to the domestication of a vegetatively propagated staple crop, enset. Hortic Res 7 The Plant List Version 1 (2010) Available at: http://www. theplantlist.org/. Accessed 22 Sept 2020 Tidiane Sall A, Chiari T, Legesse W, Seid-Ahmed K, Ortiz R, van Ginkel M, et al (2019) Durum wheat (Triticum durum Desf.): origin, cultivation and potential expansion in sub-Saharan Africa. Agronomy 9:263 Tobiaw (2011) Analysis of genetic diversity among cultivated enset (Ensete ventricosum) populations from Essera and Kefficho, southwestern part of Ethiopia using inter simple sequence repeats (ISSRs) marker. African J Biotechnol 10:15697–15709

19

Genome Resources for Ensete ventricosum (Enset) and Related Species

Tsegaye, A. 2002. On indigenous production, genetic diversity and crop ecology of enset (Ensete ventricosum (Welw.) Cheesman) Tsegaye A, Struik PC (2002) Analysis of enset (Ensete ventricosum) indigenous production methods and farm-based biodiversity in major enset-growing regions of southern Ethiopia. Exp Agric 38:291–315 Väre H, Häkkinen M (2011) Typification and check-list of Ensete Horan. names (Musaceae) with nomenclatural notes. Adansonia 33:191–200 Vasundharan SK, Jaishanker RN, Annamalai A, Sooraj NP (2015) Ethnobotany and distribution status of Ensete superbum (Roxb.) Cheesman in India: a geo-spatial review. J Ayurvedic Herb Med 1:54–58 Wang Z, Miao H, Liu J, Xu B, Yao X, Xu C et al (2019) Musa balbisiana genome reveals subgenome evolution and functional divergence. Nat Plants Welde-Michael G, Bobosha K, Addis T, Blomme G, Mekonnen S, Mengesha T (2010) Mechanical transmission and survival of bacterial wilt on enset. African Crop Sci J 16:97–102 Wilkin P, Davis A, Demissew S, Etherington T, Goodwin M, Heslop-Harrison P et al (2018) A perspective to enhance innovative research with emphasis on varietal diversity and sustainable utilization of enset (Ensete ventricosum). Biol Soc Ethiop 17(Suppl) Woldetensaye A (1997) The ecology and production of Ensete ventricosum in Ethiopia. Swedish University Agr Sci Yemata G (2020) Ensete ventricosum: a multipurpose crop against hunger in Ethiopia. Sci World J Yemataw Z, Mohamed H, Diro M, Addis T, Blomme G (2014) Enset (Ensete ventricosum) clone selection by farmers and their cultural practices in southern Ethiopia. Genet Resour Crop Evol 61:1091–1104 Yemataw Z, Kassahun T, Tesfaye T, Tesfaye D, Sadik M, Zeritu S et al (2016a) Genetic variation for corm yield and other traits in Ethiopian enset (Ensete ventricosum (Welw.) Cheesman). J Plant Breed Crop Sci 8:150–156

371

Yemataw Z, Tesfaye K, Zeberga A, Blomme G (2016b) Exploiting indigenous knowledge of subsistence farmers’ for the management and conservation of Enset (Ensete ventricosum (Welw.) Cheesman) (musaceae family ) diversity on-farm. J Ethnobiol Ethnomed 1–25 Yemataw Z, Chala A, Ambachew D, Studholme D, Grant M, Tesfaye K (2017) Morphological variation and inter-relationships of quantitative traits in enset (Ensete ventricosum (welw.) Cheesman) Germplasm from South and South-Western Ethiopia. Plants 6:56 Yemataw Z, Bekele A, Blomme G, Muzemil S, Tesfaye K, Jacobsen K (2018a) A review of enset [Ensete ventricosum (Welw.) Cheesman] diversity and its use in Ethiopia. Fruits 73:301–309 Yemataw Z, Muzemil S, Ambachew D, Tripathi L, Tesfaye K, Chala A et al (2018b) Genome sequence data from 17 accessions of Ensete ventricosum, a staple food crop for millions in Ethiopia. Data Br 18:285–293 Yemataw Z, Tesfaye K, Grant M, Studholme DJ, Chala A (2018c) Multivariate analysis of morphological variation in enset (Ensete ventricosum (Welw.) Cheesman) reveals regional and clinal variation in germplasm from south and south western Ethiopia. Aust J Crop Sci 12:1849–1858 Yeshitla M, Yemataw Z (2012) Past research achievement and existing gaps on enset (Ensete ventricosum (Welw.) Cheesman) Breeding. In: Yesuf M, Hunduma T (eds) Enset research and development experiences in Ethiopia, Proceedings of the Enset National Workshop, Wolkite, Ethiopia, 19–20 August 2010. Addis Ababa (ETH): Ethiopian Institute of Agricultural Research (EIAR) Yesuf M, Hunduma T (eds) (2012) Enset research and development experiences in Ethiopia. In: Proceedings of Enset National Workshop 19–20 August 2010, Wolkite, Ethiopia, EIAR, Addis Abeba

20

Yam Genomics Hana Chaïr, Gemma Arnau, and Ana Zotta Mota

Abstract

Yams are clonally propagated crops, cultivated mainly for their starchy tubers. The genus has a pantropical distribution encompassing species with variable ploidy levels. Greater and guinea yams are the two main edible species whilst around ten species have a local importance. Breeding programmes target mainly the improvement of tuber quality, the increasing of tuber yield and biotic resistance. Tremendous efforts have been dedicated to assess the ploidy level and diversity of the cultivated species and their wild relatives using microsatellite markers. In only the last ten years have we seen an increase in yam genomics resources. This has been amplified by the advent of the next generation sequencing. To date, draft genomes were produced for five species, together with eight transcriptomes from different species. However, yams stand behind the major crops. These new resources have not been fully harnessed yet and their use for breeding programmes remains underutilised.

H. Chaïr (&)  G. Arnau CIRAD, UMR AGAP Institut, 34398 Montpellier, France e-mail: [email protected] A. Z. Mota UMR AGAP Institut, Univ Montpellier, CIRAD, INRAE, Institut Agro, 34398 Montpellier, France

20.1

Crop Background

Yams (Dioscorea spp., Dioscoreaceae) are staple crops of noteworthy importance for food security in many countries in West Africa, Pacific, Caribbean and Highland Asia. The genus had a pantropical distribution long before humans colonised these regions, with most of the species being isolated by natural barriers into three continental groups: Asiatic, African and American (Hahn 1995). Most of the species are grown for their starchy tubers, but their medicinal properties are equally important. The most important cultivated edible species are D. cayenensis Lam. and D. rotundata Poir. that originated in West Africa; D. nummularia Lam., D. oppositifolia L., and D. transversa R. Br. from Southeast Asia and Oceania; D. alata L. originating from Mainland Southeast Asia; D. trifida L. f. originating from South America (Lebot 2009) and finally D. bulbifera that has a pantropical distribution and being found in different agrosystems, even in Florida where it is considered an invasive species (Croxton et al. 2011). In addition to their food importance, yams are also indigenous crops tied up with the cultures and traditions of the people in the Pacific Islands and in the so-called ‘yam belt’ which stretches from Côte d’Ivoire to western Cameroon (Chaïr et al. 2010). Other Dioscorea species are considered as minor, in spite of their great impact in the areas

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_20

373

374

where they are cultivated. These species are equally important for food security in local communities and for their medical properties. In West and Central Africa, D. dumetorum serves as a food crop and an important source for traditional medical practitioners who often use it in herbal preparations. In Asia and Oceania, even though the main cultivated species is D. alata, other species are cultivated for their starchy tubers or for the production of diosgenin and other steroids which are used to synthesise pharmaceutically relevant derivatives such as corticosteroids and sex hormones (Lubbe and Verpoorte 2011; Epping and Laibach 2020). This is the case of D. polystachya Turcz., D. villosa L., D. zingiberensis C. H. Wright, D. composita L. f., D. glabra Roxburgh, D. hamiltonii Hook. F., D. pentaphylla L. D. pubera Blume., D. hispida Dennstedt, D. oppositifolia L., D. wallichii Hook. f., D. esculenta var. spinosa (Roxb. Ex Wall.) R. Knuth. and D. hastifolia Nees (Paul et al. 2018).

20.1.1 Dioscorea Spp., Underutilised Crops Despite its importance for food security, few efforts have been dedicated to yam improvement. Over the past fifty years, breeding efforts were mainly dedicated to the two major species, D. rotundata (also referred to as Guinea yam) and D. alata (also referred to as greater yam or water yam). The former accounts for nearly 90% of world yam production, whilst the latter has a pantropical distribution and is more adapted to the shift from slash and burn traditional agrosystems specific to D. rotundata to intensive systems for urban supply. Yams have a low yield when compared to other roots and tubers. Many species, in particular D. alata, are sensitive to pests and diseases, mainly anthracnose caused by the fungus Colletotrichum gloeosporioides (Penz.) Penz. and Sacc leading to enormous economic loss. As a clonally propagated crop, yams are sensitive to many viruses in the growing areas whether it is in the Pacific (Sukal et al. 2015), Africa (Nkere et al. 2020), Caribbean, or

H. Chaïr et al.

South America (Umber et al. 2020). Finally, the genetic determinism of tuber quality, one of the main criteria of acceptance of varieties along the value chain including farmers, processors and consumers, has been poorly investigated. The development of breeding programmes is hampered by several challenges related to the plant biology and reproduction (polyploidy, asynchronous and erratic flowering, dioecy, etc.) and the paucity of genetic and genomic resources. Yams are vegetatively propagated which results in an increase of clonal lineages within the available germplasm and a high heterozygosity (Vandenbroucke et al. 2016; Sharif et al. 2020). These several biological constraints combined with more importantly the limited sources of funding led to referring yams as an orphan or underutilised crop.

20.1.2 Botanical Description Yams belong to the order of Dioscoreales which comprises three families: Nartheciaceae, Burmanniaceae, and Dioscoreaceae (Caddick et al. 2002a, b). The families belonging to the Dioscoreales clade are small in number of species, except for Dioscoreaceae which comprises four genera (Caddick et al. 2002a, b). Three are monoecious and contain relatively few species: Tacca J. R. and G. Forst., with at least ten species; Stenomeris Planch. and Trichopus Gaertn. with two species each (Wilkin et al. 2005). Dioscorea L., is the only dioecious genus, comprising c. 640 species (Govaerts et al. 2007) historically assembled into 32–59 sections (Knuth 1924; Ayensu 1972). Occasionally male and female flowers can be found in the same plant of some species in this genus (Lebot 2009). Dioscoreaceae have a characteristic pantropical distribution with most taxa concentrated within intertropical latitudes (Knuth 1924; Burkill 1960; Viruel et al. 2016). Systematics of Dioscorea has been challenging for many years due to its great morphological diversity, dioecy and small flowers. The Dioscorea spp. are perennial or annual and produce tubers (Fig. 20.1). There are

20

Yam Genomics

tremendous variations in size, form and number of tubers per plant within and between species (Lebot 2009). The stems are vines which climb by twinning either clockwise or anticlockwise depending on the species (Bai and Ekanayake 1998). Some species produce bulbils in the axils of the leaves, contributing to the clonal propagation of the crop in natural conditions and its dispersal (Lebot 2009). The leaves are usually simple, cordate or accuminate of three leaflets (D. trifida) or five (D. pentaphylla). Yam plants are dioecious with either female or male flowers. In D. rotundata, monoecious plants are occasionally observed (Bai and Ekanayake 1998). The male flowers are composed of calyx of three sepals and corolla of three petals and usually two whorls of three stamens. The female flowers have a trilocular ovary and three stigmas. Each locule of the ovary contains two ovules. The fruits are dry dehiscent trilocular capsules and theoretically each fruit produces six seeds (Lebot 2009).

375

20.1.3 Ploidy Level Dioscorea spp. are polyploid species. For a long time, the base number of chromosomes was considered to be 10, with tetraploid, hexaploid and octoploid species (40, 60 and 80, respectively). The cytogenetic status, i.e. inheritance and chromosome number of the most cultivated species has been reviewed recently using standard chromosome counting and flow cytometry analysis coupled with segregation analysis. To assess the chromosome number in D. rotundata, Scarcelli et al. (2005) used eight markers in selfed progeny of the Gnidou variety parent, two isozyme and six microsatellite loci. The analysis showed diploid segregation. Then, 181 samples of D. rotundata, 245 D. abyssinica and 63 D. praehensilis, considered as wild relatives of D. rotundata (Terauchi et al. 1992) were genotyped and their alleles distribution supported a diploid level of ploidy with (2n = 40). Using flow cytometry, Girma et al. (2014) analysed the

Fig. 20.1 Yam morphology. a–b leaves; c male flowers; d female flowers; e tubers; f yam field

376

ploidy level of seventy-seven accessions of Guinea yams. Based on a chromosome number of 40 for diploid accessions, they found amongst 44 accessions of D. rotundata, 33 were diploids whilst 11 were triploids, which is in contradiction to findings of Scarcelli et al. (2005). The seven D. praehensilis were diploids, the 21 D. cayenensis and the five D. mangenotiana were triploids. In D. alata, applying a Bayesian approach to segregation patterns observed with microsatellite markers in four different crosses provided evidence of diploid segregation with 40 chromosomes as well. The chromosome number of 110 accessions showed the presence of triploid and tetraploid accessions (60 and 80 chromosomes, respectively) (Arnau et al. 2009). Using the same approach, Bousalem et al. (2006) found that D. trifida is an autopolyploid species with tetrasomic behaviour of the genome based on chromosomal segregation pattern analysis using eight microsatellite markers in three different crosses. The chromosome number was determined by flow cytometry in 80 accessions showing 2n = 4X = 80 chromosomes. For the remaining species the cytology status is still uncertain and thus will need further investigations.

20.1.4 Yam Geographical Distribution Dioscorea spp. have a pantropical distribution. In ‘Plants of the World Website’ (http://www. e-monocot.org/?q=Dioscorea), 1366 entries of Dioscorea species were found associated with their geographic distribution. This latter could be restricted to a country such as for the endemic species of Madagascar or Sri Lanka, or wider, encompassing different countries and regions. The mapping of the distribution is based essentially on the occurrence of the species after its identification based on botanical criteria. The advent of molecular markers allowed the refinement of the geographic distribution of many species and its consequences on the genetic structure. However, such studies were conducted on few agronomically and economically important species.

H. Chaïr et al.

The first worldwide genetic diversity study of D. alata showed a complex pattern of zymotypes shared between continents (Lebot et al. 1998). Later, eighty-nine accessions of D. alata L. (landraces and breeder’s lines) from nine countries of West and Central Africa (Benin, Congo, Côte d’Ivoire, Equatorial Guinea, Gabon, Ghana, Nigeria, Sierra Leone and Togo) were genotyped using thirteen microsatellite markers (Obidiegwu et al. 2009). Even though eight genetic groups were identified in the Neighbour Joining tree, no geographical clustering was found suggesting that clones must have been widely distributed amongst the countries investigated. Several studies were also conducted at country level, such as in Brazil (Siqueira et al. 2014), Jamaica (Asemota et al. 1996), China (Wu et al. 2014) and Vanuatu (Malapa et al. 2005; Vandenbroucke et al. 2016). Later, using microsatellite markers, and accessions from the Caribbean, West Africa, Vanuatu and India, the highest levels of genetic diversity were encountered in India and Vanuatu, which led to the hypothesis of two centres of diversification (Arnau et al. 2017). This hypothesis was deeply investigated recently on a wider sample encompassing 643 accessions and spanning the four continents, using genotyping by sequencing (GBS). Based on genetic parameters and demographic inference, evidence of early divergence of mainland Southeast Asian and Pacific gene pools was found, suggesting independent domestication origins. Moreover, triploids and tetraploids were found to have originated from these two regions before westward yam migration. Evidence of an Indian Peninsula origin of the African gene pool, followed by an introduction in the Caribbean region from Africa, likely during the Colombian exchange was identified. Low genetic diversity and high clonality were found in farming agrosystems, which is consistent with a strong domestication bottleneck followed by thousands of years of widespread vegetative propagation (Sharif et al. 2020). The first comprehensive genetic diversity study of Guinea yams was conducted on 95 accessions from International Institute of Tropical Agriculture (IITA) yam genebank collected

20

Yam Genomics

from the west African countries, using GBS coupled to ploidy level assessment (Girma et al. 2014). The maximum parsimony analysis showed clustering according to the species assignment. Here again, no structuring according to the geographical origin was found. More recently, whole genome resequencing followed by demographic history analysis pointed to the Niger River basin, between eastern Ghana and western Nigeria as the geographical origin of D. rotundata, before the expansion of African yam in neighbouring countries (Scarcelli et al. 2019). Few works tackled the geographical distribution of wild yams. Using botanical descriptors, the distribution of seventeen wild species in West Africa was investigated. The species are widely distributed in the ‘yam belt’ (Hamon 1995). Perennial species were found mainly in tropical rainforests and mesophyll forests, whereas annual species were found in mesophyll forests, dry forests, forest galleries and woody savannah (Hamon 1995).

377

French West Indies (http://crb-tropicaux.com/ Portails). It contains around 500 accessions, mainly D. alata, D. bulbifera, D. rotundata and D. trifida. This collection was constructed through field sampling in different continents carried out since the 1970s and addition of newly developed breeding lines. In the Pacific, the Centre for Pacific Crops and Trees (CePaCT), based in Fiji, conserves around 329 accessions representing mainly D. alata species (230), D. esculenta, D. rotundata, D. bulbifera, D. nummularia, D. pentaphylla, D. transversa, D. trifida and 12 non-identified species. They were collected from different islands in the Pacific (Vanuatu, Fiji, Papua New Guinea, Micronesia, Samoa and Tonga). The D. rotundata accessions are from Nigeria (https://www.genesys-pgr.org/). In addition, most yam-producing countries have their own national collections conserved in experimental fields.

20.2 20.1.5 Yam Genetic Resources Conservation The yam genetic resources are crucial for meeting the food and nutritional security, trait discovery and exploitation for crop improvement. An important diversity is maintained worldwide in national and international genebanks. The accessions are either conserved in the field or in vitro or kept in both. Of these, the IITA genebank in Africa contains the largest collection with 5839 accessions from twelve African countries mainly from West Africa (Benin, Burkina Faso, Congo, Côte d'Ivoire, Equatorial Guinea, Gabon, Ghana, Guinea, Nigeria, Sierra Leone, Tanzania and Togo). It encompasses eleven species native or introduced in Africa (D. abyssinica, D. alata, D. bulbifera, D. bulkiliana, D. cayenensis, D. dumetorum, D. esculenta, D. mangenotiana, D. preusii and D. rotundata). The second most important genetic resource collection is maintained by the Centre de Ressources BiologiquesPlantes Tropicales (CRB-PT) in Guadeloupe-

Development of Yam Genomic Resources

Transcriptome and genome sequencing of yams started very late in comparison with many other crops. Most of the previous studies focused on the development of microsatellite markers for yam genetic diversity studies (Hochu et al. 2006; Tostain et al. 2006; Andris et al. 2010; Siqueira et al. 2011). In 2015, the first paper on yam transcriptomics was published on differential expression of genes associated with purple-flesh trait (Table 20.1). Transcriptomes from two cultivars were generated, one elite purple-flesh and one conventional white-flesh (Wu et al. 2015). Two separate RNA pools for the two contrasting cultivars were prepared for cDNA library construction, each comprising 15 RNA samples from 15 tubers of five plants per cultivar. The two cDNA libraries were sequenced on Illumina HiSeq2000. The high-quality filtered reads were further de novo assembled into 125,123 unigenes using Trinity program. The same year, Saski et al. (2015) generated 40,000 expressed sequence tags (EST) from two cultivars, resistant

378

H. Chaïr et al.

Table 20.1 Transcriptome and reference genomes produced on Dioscorea spp Species

Type of omic data

Number of protein-coding genes

N50

Number of reads

References

D. rotundata

Transcriptome

30 551

1 626

308 742 095

Sarah et al. (2017)

D. praehensilis

Transcriptome





166 610 729

Sarah et al. (2017)

D. abyssinica

Transcriptome





123 450 645

Sarah et al. (2017)

D. alata

Transcriptome

26 681

1 525

143 208 831

Sarah et al. (2017)

D. trifida

Transcriptome

29 448

1 405

135 893 760

Sarah et al. (2017)

D. composita

Transcriptome

62.341

2.303

114 136 043

Wang et al. (2015)

D. opposita

Transcriptome

54.781

1.508

170 957 171

Zhou et al. (2020)

D. zingiberensis

Transcriptome

143.245

814

143 729 604

Li et al. (2018)

D. alata

Transcriptome

125.123

875

74 208 948

Wu et al. (2015)

D. rotundata

Genome

26 198

2.2 Mbp

NA

Tamiru et al. (2017)

D. rotundata

Genome

30 344

0.15 Mbp

NA

Sugihara et al. (2020)

D. alata

Genome

21 728

145.7 kb

NA

https://phytozome-next.jgi. doe.gov/info/Dalata_v1_1

D. alata version 2

Genome

24 614

24.0 Mb

NA

https://phytozome-next.jgi. doe.gov/info/Dalata_v2_1

D. zingiberensis

Genome

26 022

44.55 Mbp

NA

Cheng et al. (2021)

D. dumetorum

Genome

35 269

3.2 Mb

NA

Siadjeu et al. (2020)

T. zeynilacus

Genome

34 452

433.3 Kb

NA

Chellappan et al. (2019)

and susceptible to anthracnose. The sequences were used to identify EST-SSRs for further studies. The same year, Wang et al. (2015) sequenced Dioscorea composita transcriptome using Illumina deep sequencing strategy. They obtained 62,341 unigenes, with an N50 of 2303 bp. From these unigenes, they found 79 genes covering almost integrally the steroidal sapogenins pathway. In 2017, Sarah et al. aiming to provide reference transcriptomes to investigate crop domestication, generated the reference transcriptomes of 26 crop plants and wild relatives. Amongst these species, five yams species were included, (D. rotundata, D. alata, D. trifida D. abyssinica and D. praehensilis). They used RNA samples extracted from a combination of different organs, including leaves, fruits/grains and inflorescence tissues. The pooled cDNA libraries were sequenced using the Illumina HiSeq2000 technology. The transcriptome assembly and

annotation of yams allowed the production of three de novo transcriptomes of cultivated species and reads for the two wild types D. abyssinica and D. praehensilis. The annotation of these yam transcriptomes allowed the prediction of 29,448, 26,681 and 30,551 genes in D. trifida, D. alata, and D. rotundata, respectively (Table 20.1). The annotations were later used to predict the gene structure from the genome assembly of D. rotundata and D. alata. To understand the dioscin biosynthesis pathway in D. zingiberensis, a de novo transcriptome assembly was generated (Li et al. 2018). In this case, the RNA was isolated from leaves and rhizomes and the cDNA libraries were sequenced using Illumina HiSeq2000 technology as well. After the assembly, 143,245 genes were obtained, with an N50 of 814 bp. Functional annotation identified 485 cytochrome P450 genes and 195 UDP-glycosyltransferase genes, which belong to the dioscin pathway.

20

Yam Genomics

Zhou and collaborators produced both the mRNA and the small RNA transcriptomes of D. opposita to study the dynamics in tuber expansion. The de novo assembly of mRNA produced 54,781 transcripts with an N50 of 1508 bp, and the final set of protein-coding genes was 32,207 (Zhou et al. 2020). This set of genes was used as a reference for the mapping of raw reads from six RNAseq libraries and the small RNA from different development stages of tubers. Coexpression analysis revealed that 14 miRNAtarget mRNA pairs were co-expressed in the expansion stages of tubers. Still in D. opposita, to analyse microtuber formation, Lie et al. produced six RNASeq libraries. They obtained 258,849,400 reads, which after the de novo assembly resulted in 181,047 transcripts with an average length of 1,297 bp. These transcripts were associated with several genes belonging to primary metabolism, showing that microtuber formation is associated with this pathway (Li et al. 2020). Finally, Wu and collaborators produced a new set of RNASeq data of D. alata to analyse bulbil growth and the regulatory genes involved in this process. Twelve libraries were produced from different stages of bulbil formation. A total of 687,709,257 reads were assembled in 199,270 transcripts, with an N50 of 2173 bp (Wu et al. 2020). These studies are very important for the discovery of new important pathways in the Dioscoreales family. They pave the way to the deciphering of the pathways and genes involved in tuberisation and yam flesh colour. Even though they have been initiated in minor species, they can be extended to the two major species D. alata and D. rotundata. Equally important, these transcriptomes were of great value for the assembly of genomes, being used for protein prediction and gene structure annotation. The first published genome sequence of a monocotyledon was for Oryza sativa in 2002 (Yu et al. 2002) and only 15 years later, the first genome sequence for yams was published (Tamiru et al. 2017). To date, five complete genome sequences were produced for the Dioscoreaceae family: D. alata, D. rotundata, D.

379

zingiberensis, D. dumetorum and Trichopus zeylanicus (Fig. 20.2). Considering the economical and nutritional importance of yam for people in West Africa, and D. rotundata being the most widely cultivated yam species, the first efforts were put into obtaining its complete genome (Tamiru et al. 2017). Using flow cytometry, the DNA content of the haploid genome of D. rotundata was estimated to be 570 Mb. To generate the genome sequence, an individual plant, TDr96_F1 from the progeny of an open-pollinated breeding line was used. Using data from paired-end (PE) library, mate-pair (MP) jump and BAC libraries, they generated 85.14 Gb of sequencing reads, representing *149.4  coverage, which assembled into 4723 scaffolds (N50 = 2.12 Mb) with the final length of 594.23 Mb, which is longer than the genome size estimated by flow cytometry (570 Mb). These methods allowed the prediction of 26,198 genes. Then two parental lines (TDr97/00917-P1 and TDr99/02627-P2) and 150 F1 individuals were used to produce parental line-specific linkage maps in order to anchor the scaffolds, allowing 76.4% of the assembly (454 Mb) to be anchored onto 21 linkage groups (LGs), whereas 20 chromosomes were expected (Scarcelli et al. 2005). Recently, a second version of the D. rotundata genome sequence was generated using Oxford Nanopore Technologies and Illumina short-reads for polishing (Sugihara et al. 2020). The total genome length was 579.41 Mb, with an N50 of 0.15 Mb and a completeness of 90.1%, assessed by BUSCO. For gene prediction, 20 RNAseq data sets representing 15 different organs and three different flowering stages in male and female plants were used. The final set of genes presented on the 20 pseudochromosomes and eight scaffolds was 30,344 genes. The same year, the draft genome of D. alata using both Illumina PE (2  250 bp) and MP (with 2, 4, 7 kb insert sizes) was produced from the accession TDa 95/00328 from the IITA collection (https://phytozome-next.jgi.doe.gov/info/ Dalata_v1_1). Authors also integrated 88 depth of 2  100 bp PE Illumina sequences with 255 bp insert size, previously used to identify

380

H. Chaïr et al.

Fig. 20.2 Timeline indicating the transcriptomes and genomes sequencing of yams species in comparison with the dates of Arabidopsis thaliana and Oryza sativa first reference genomes release

EST-microsatellites (Saski et al. 2015). To obtain the gene annotation, they integrated the data from D. rotundata gene structure (Tamiru et al. 2017), and the transcriptome data from Wu et al. (2015) and Sarah et al. (2017). The total scaffold sequence represents 287.3 Mb with 450 scaffolds longer than 145.7 kb and 7052 contigs longer than 9.0 kb, representing half of the assembly. The final set of genes comprised 21,728 proteincoding transcripts. To improve this first version of D. alata genome, and to provide chromosomescale information, new data was produced, using the Single Molecule Real-Time (SMRT) longread technology. In total, 21 SMRT cells were sequenced, producing 177 genome coverage. Hi–C sequencing was also added which allowed the production of 297 million Hi–C read pairs. This second version of the D. alata reference genome was 480 Mb in length, close to the 455 ± 39 Mbp predicted from flow cytometry, a scaffold N50 of 24 Mb and 25,189 predicted protein-coding genes and anchored onto a total of 20 chromosomes (Bredeson et al. 2021).

The third reference genome sequence from the Dioscoreaceae family was published in 2019 (Chellappan et al. 2019). The genome of the monoecious species T. zeylanicus, an important medicinal plant in India, was sequenced and led to a high-quality draft assembly. Both Illumina PE short-reads and PacBio long-reads were combined to produce 22,601 scaffolds with an N50 of 433.3 Kb and 34,452 protein-coding genes. Oxford Nanopore Technology (ONT) was used to generate a long-read genome sequence for D. dumetorum, a species known for its hardening after harvest, making the tuber inedible (Siadjeu et al. 2020). Three accessions which do not display post-harvest hardening were chosen and sequenced in four flow cells in a GridION platform, generating 66.7 Gbp of ONT reads. This represents 207 of genome coverage (the predicted genome size was 322 Mbp) and a final N50 of 3.2 Mb. In total, 35,269 and 9,941 protein-coding genes and non-coding RNA were predicted, respectively.

20

Yam Genomics

Recently, complete genome sequencing of D. zingiberensis was produced to better understand the diosgenin pathway. Using Illumina HiSeq short-reads, PacBio SMRT and Hi–C platforms, the assembly of these data resulted in 26,022 protein-coding genes and a scaffold N50 of 44.55 Mb (Cheng et al. 2021). The production of transcriptome and genome resources allow the better understanding of the evolutionary history of yams, as well as acquiring knowledge on its gene structure, which is a powerful tool for breeding and genetic transformation. These genomic resources have opened the opportunity, equally to the discovery of new compounds for pharmaceutical and medical industries, being used as reference for genomic comparison.

20.3

Comparative Genomics in Dioscorea

Synteny analysis amongst the sequenced yam species was performed by Bredeson et al. (2021). They compared the collinearity of the chromosomes of D. alata, D. rotundata and D. zingiberensis. It was possible to observe a 1:1 collinearity for the genomes of D. alata and D. rotundanta, which indicates its evolutionary proximity. And despite the chromosome difference between D. alata and D. rotundata to their more distant relative D. zingiberensis (2n = 10), it is still possible to observe a high level of collinearity. Collinearity analyses were also performed for D. alata and D. dumetorum, but the latter was not assembled to chromosome units. Comparative genomics is a powerful analysis which allows a phylogenetic relation inference amongst the studied species, as well as a functional comparison for non-model species. To date, only three studies have conducted comparative genomics analysis on yam. The first one was performed by Tamiru et al. (2017), by comparing D. rotundata genome sequence to the genomes of Arabidopsis thaliana, Brachypodium distachyon and O. sativa. They identified 5557

381

D. rotundata orthologous genes amongst the four species analysed. However, a smaller number of genes were orthologous with more distant species such as Elaeis guineensis, Phoenix dactylifera and Musa acuminata. The phylogenetic tree based on 2381 orthologous genes, showed that D. rotundata is more distant from the five monocotyledonous, suggesting an early divergence from these taxa. No synteny conservation at a chromosome level was observed between D. rotundata and the three species analysed (O. sativa, Spirodela polyrhiza and P. dactylifera). Large proportions of the genomes were conserved at the microsyntenic level suggesting that the D. rotundata genome has undergone many recombination events after its divergence (Tamiru et al. 2017). The second study compared the predicted proteins from the whole genome of T. zeylanicus to the genome of E. guineensis, Asparagus. officinalis, Ananas. comosus and D. rotundata. (Chellappan et al. 2019). They found that 46, 43, 47 and 40% of the genes in T. zeylanicus were orthologous to genes in E. guineensis, A. officinalis, A. comosus and D. rotundata, respectively. In total, these genes were divided into 7532 orthologous groups. In a one-to-one comparison, T. zeylanicus had more orthologous groups in common with E. guineensis (10,440), followed by 10,040 orthogroups with A. comosus, 9409 with A. officinalis and only 8,888 with D. rotundata. Despite the low number of common genes between T. zeylanicus and D. rotundata, in the phylogenetic analysis, using the amino acid sequences for alignment, these two species were closely related (Chellappan et al. 2019). Recently, a comparative study using the predicted protein sequences of D. alata and the four sequenced Dioscoreales species, D. rotundata, D. dumetorum, D. zingiberensis and Trichopus zeylanicus, showed the presence of 9306 orthologous groups with at least one gene from each Dioscoreales species. These groups were composed of 71,688 total genes. A second comparison was conducted, amongst thirteen monocot species and two outgroups, A. thaliana and

382

H. Chaïr et al.

Amborella trichopoda. These comparisons showed the presence of 456,577 proteins clustered into 25,425 orthologous gene clusters (Bredeson et al. 2021). In a broader comparison, using different monocotyledons and dicotyledons species (Fig. 20.3a), it is possible to observe that the species of the Dioscoreaceae family share most of the orthology groups with others. However, exclusive orthology groups can be found for the five species, with T. zeylanicus with the highest number of exclusive groups (1288) and D. alata with the lowest (184). This orthology comparisons shows that these species have evolved or kept gene families in the course of plant evolution, besides their phylogenetic relations (Fig. 20.3b). The development of new genomic comparisons amongst plants from different families from monocotyledons and dicotyledons, could bring more lights on the evolutionary events in the Dioscoreales. This analysis can facilitate the prediction and annotation of gene functions, using a model plant, as Arabidopsis thaliana, to infer the functions in the newly produced genomic data for Dioscoreaceae family. Comparative genomics is also a powerful tool to analyse the expansion and reduction of gene families and to select candidate genes, when combined with other data (e.g. transcriptomics and polymorphism) (Mota et al. 2020).

QTL mapping, association analysis and detection of candidate genes for key agronomic traits in yam The first linkage maps developed for yams were produced using Amplified Fragment Length polymorphism (AFLP) markers on D. rotundata (Mignouna et al. 2001) and on D. alata (Petro et al. 2011). None of the early linkage maps were developed using co-dominant markers such as microsatellite, reflecting the underutilised status of this crop. It was only these last years that two new D. alata genetic linkage maps were developed, using EST-SSRs and SNPs. The first one was constructed from 380 EST-SSRs (Saski et al. 2015) and used for the identification of QTLs controlling anthracnose resistance. A major QTL that explained 68.5% of the total phenotypic variation was identified (Bhattacharjee et al. 2018). The second one is the first high-density reference genetic map from two D. alata F1 outcrossed populations using 1579 SNPs obtained using the GBS approach and mapping onto the D. rotundata reference genome (Cormier et al. 2019). This genetic reference map served to identify a major QTL for sex determination located on the pseudo-chromosome 6, involving male heterogamy (male = XY and female = XX). To investigate the genomic regions controlling different tuber traits, the same genetic map has been used (Ehounou et al.

Fig. 20.3 Comparative analysis of different plant genomes. a Intersections amongst the plants orthology groups. The bars represent the number of groups and the

dots the species that share these groups. b Phylogenetic relationships amongst the analysed species

20

Yam Genomics

2021). A total of 34 QTLs for six studied traits (tuber shape, regularity of tuber shape, skin texture, presence of tubercular roots, flesh colour and oxidative browning) were identified. They are distributed on eight different chromosomes and explained from 11.1 to 43.3% of phenotypic variance. This work has been followed by an investigation of the dry matter and oxidative browning, two important quality attributes that determine cultivar acceptability, using also QTL mapping populations and mapping on D. alata v2 (Bredeson et al. 2021). In this recent work, QTLs in D. alata associated with oxidative browning were found on chromosome 18. Moreover, a cluster of three peroxidases on chromosome 18 close to the QTLs identified raised the possibility that oxidation is affected by genetic variation in peroxidase activity. Whilst for dry matter, a single, minor QTL on chromosome 18 was found. The same traits were investigated using in this case a GWAS approach (Gatarira et al. 2020). After GBS and mapping the reads on the D. alata v2, two significant SNPs associated with dry matter were identified on chromosomes 6 and 19, and one associated with oxidative browning on chromosome 5. Gene annotation for the significant SNP loci identified important genes associated with the process of the proteolytic modification of carbohydrates. Thereafter, using a GWAS approach and the D. rotundata reference genome, Cormier et al. (2021) identified a genetic barrier to reproduction in D. alata yam on Chromosome 1, represented by two candidate genes, one homologous to cyclin-dependent kinase F-4 CDKF4 of O. sativa (Zheng et al. 2014) and one to E3 ubiquitinprotein ligase SINAT2 of A. thaliana (Schmid et al. 2005; Kelley 2018). The availability of the genome reference also allowed the development of KASPar marker for sex determination. These findings will be very useful in order to facilitate the design of future promising parental combinations in D. alata breeding programmes. In D. alata, the first comprehensive work on candidate gene identification was carried out by Wu et al. (2015). They investigated the flavonoid biosynthesis pathway to understand the

383

mechanism behind yam flesh pigmentation, using transcriptome analysis of one white and one purple yam. They found unigenes encoding chalcone isomerase (CHS), flavanone 3hydroxylase (F3H), flavonoid 3′-monooxygenase (F3′H), dihydroflavonol 4-reductase (DFR), leucoanthocyanidin dioxygenase (LDOX) and flavonol 3-O-glucosyltransferase (UF3GT) to be significantly up-regulated in the purple yam. The expression of these genes was further confirmed by Quantitative real-time PCR (qRT-PCR). Despite its economic importance, unlike Dioscorea alata, few works have been conducted on D. rotundata. To understand the genetic mechanism of sex determination, QTL-seq approach on two DNA bulks representing male and female plants was used (Tamiru et al. 2017). A single region from 0.65 Mb to 2.35 Mb on the pseudo-chromosome 11 was identified, whose SNP-index values differed for male and female bulks. The D. rotundata sex determination system was found to involve female heterogamy (male = ZZ, female = ZW). Consequently, two markers, sp1 linked to the putative Z-linked region and sp6 to the W-linked region were designed. To identify genes under selection during yam domestication, a demographic study was conducted on D. rotundata and its two wild relatives D. abyssinica and D. praehensilis (Akakpo et al. 2017), using the transcriptomic data produced previously (Sarah et al. 2017). Four methods were used for the search of selection signature: Tajima’s D, marked reduction in the diversity in the cultivated samples, Fst between wild and cultivated species and principal component analysis. The candidate contigs were compared with the annotation of the yam transcriptome reference (Sarah et al. 2017). Then, some genes corresponding to putative targets for selection during yam domestication were retrieved. Finally, two genes associated with the earliest stages of starch biosynthesis and storage, the sucrose synthase 4 and the sucrose-phosphate synthase 1 showed evidence of selection. An adventitious root development gene, a Scarecrow-like gene, was also selected during

384

yam domestication. The same work conducted with a larger sample size using whole genome resequencing led to retrieve six genes including the previous studies, two genes involved in root development [Scarecrow-like (SCL) gene Dr11126 and argininosuccinate lyase (ASL) gene Dr04385], one in starch formation and storage [sucrose synthase (SUS) gene Dr18284], and three genes involved in stomata regulation and osmotic stress [cellulose synthase–like (CSL) genes Dr13651, Dr13652 and Dr13653] (Scarcelli et al. 2019). The availability of D. rotundata genomic data made it possible to carry out the first bioinformatic analysis of the NBS-LRR gene profiles (Zhang et al. 2020). Using the raw RNAseq data generated previously (Tamiru et al. 2017), 167 NBS-LRR genes from the D. rotundata genome were identified. One gene was assigned to the resistance to powdery mildew8 (RPW8)-NBSLRR (RNL) subclass and the other 166 genes to the coiled-coil (CC)-NBS-LRR (CNL) subclass. Amongst them, 124 genes are located in 25 multigene clusters and 43 genes are singletons. Overall, this study provides a complete set of NBS-LRR genes for D. rotundata, which may serve as a fundamental resource for mining functional NBS-LRR genes against various pathogens. As it can be seen, the recent works are at their preliminary stage. The QTLs identified so far have not been validated and are not yet implemented in breeding programmes. For now, no follow-up to the investigations carried out to identify the genes involved in certain pathways (steroidal sapogenins or diosgenin) has been published. Finally, since most of the works on genome sequencing and annotation were conducted recently, few studies tackled the identification of candidate genes for agronomic traits using transcriptome analysis. These scarce works did not go as far as validation. The availability of new genomic resources accompanied by an increasingly complete annotation of genomes should permit more investments in these research fields.

H. Chaïr et al.

20.4

Future Goals and Prospects

The main goals for yam breeding are the tuber quality, including nutritional properties and processing and after cooking characteristics, resistance to pests and diseases mainly viruses, resistance to anthracnose and agronomic traits, especially yield. Thus, the genomic resources developed recently will certainly allow a better understanding of the genetic architecture of many traits of interest in the main cultivated species D. alata and D. rotundata and of the consequences on trait transmission and breeding. These physical maps combined with the genetic ones will enable fine mapping of the QTLs and identification of the genes present in the identified portions of the genome. These will also facilitate comparative analysis of QTLs detected across multiple populations and environments, which is important to be in measure to accumulate in the new varieties the favourable alleles for different loci identified. Cormier et al. (2021) have identified putative male-specific structural variation on chromosome six, responsible for sex determination. These structural variations, if they occur in regions related to traits of interest, will have great consequences on their introgression during the breeding process. The access to whole genome sequences will facilitate the identification of these structural variations and assessment of their impact on yam evolution and breeding. In addition, as a clonal crop, yam has been for a long time propagated vegetatively leading to the accumulation of somatic mutations. Investigation of epigenomic variation should facilitate better understanding of the effect of these somatic mutations on the phenotype. Recent years have seen a surge in plant pangenome initiatives. The construction of a pangenome enables rapid access to variable genes linked with different traits from the sequenced individuals that are absent in the reference genomes. In addition, a pan-genome highlights chromosomal rearrangements between genotypes, improves short-read mapping accuracy,

20

Yam Genomics

can also provide a compact polyploid genomes representation and, allows for the quantitation of allele dosage between autopolyploid individuals (The Computational Pan-Genomics Consortium 2018). The number of reference genomes of different species and the whole genome resequencing (WGS) of several D. alata and D. rotundata varieties is steadily increasing, the pan-genome and super pan-genome reconstruction can be carried out with confidence on yam. Combined with the approaches detailed above, it will allow a better understanding of genome structure and identification of a larger range of genes linked to traits of interest. At the same time, the genomes and transcriptomes developed, combined with the phenotyping, will pave the way to the identification of candidate genes involved in tuber quality, pest and disease resistance and crop development (Fig. 20.4). Until now, most of the work was focused on the production of genomic resources. Some genes involved in important pathways such as the one of flavonoids or diosgenin were

385

identified but none of them was further validated through genetic transformation approaches. Recently, a CRISPR/Cas9-based genome-editing system was developed for yam (Syombua et al. 2020), and validated by the transfer of the phytoene desaturase gene (PDS) into Amala, a D. rotundata variety. The combination of the genome sequence and the availability of genetic transformation methods will enable targeting favourable genes to validate their involvement in the phenotype. The Dioscorea phylogeny has remained elusive leading to extensive gene tree discordance and taxonomic uncertainty. The availability of whole chloroplast and nuclear genomes will facilitate untangling the phylogenetic relationships between different species by taking into account the duplication processes and hybridisation between several species, resulting in gene flow and/or polyploidization. In addition to resolving the taxonomic uncertainty, the progress made in hybridisation will facilitate the identification of introgressions occurring during yam

Fig. 20.4 Integration of genomic resources and other ‘omic data and their use in breeding programmes’

386

evolution and therefore favourable alleles selected over time in different environments. Moreover, resolving the yam phylogeny will be of benefit in identifying related species to be mined for new alleles and subsequently used in prebreeding programmes. Finally, the development of more genomic resources will also permit acceleration in the understanding of the evolutionary processes of the different species of yam, their demographic history, the detection of genomic signatures of selection following domestication and their geographical adaptation. Climate change is the major challenge and an ongoing process which is threatening crop distribution and production. Therefore, dissecting the yam evolutionary process and understanding the genetic and genomic mechanisms underlying geographical adaptation will contribute to mitigating the effect of climate change on yam genetic diversity and their adaptation to changing environments. In conclusion, yams stand at the dawn of the genomic era. Whilst we are writing this chapter, many papers are under publication on the development of new genomic resources and investigation of genes involved in the pathways of traits of interest. Similarly, different projects are ongoing on deciphering the functional determinism of the traits, production of metabolomics and proteomics data and development of high throughput phenotyping methods. The knowledge being acquired should facilitate progress in the understanding of the genetic determinism of traits and their mode of transmission, to be implemented in yam breeding and selection programmes.

References Akakpo R, Scarcelli N, Chair H, Dansi A, Djedatin G, Thuillet AC, Rhone B, Francois O, Alix K, Vigouroux Y (2017) Molecular basis of African yam domestication: analyses of selection point to root development, starch biosynthesis, and photosynthesis related genes. BMC Genomics 18:782. https://doi.org/ 10.1186/s12864-017-4143-2 Andris M, Aradottir GI, Arnau G, Audzijonyte A, Bess EC, Bonadonna F, Bourdel G, Bried J,

H. Chaïr et al. Bugbee GJ, Burger PA, Chair H, Charruau PC, Ciampi AY, Costet L, Debarro PJ, Delatte H, Dubois MP, Eldridge MDB, England PR, Enkhbileg D, Fartek B, Gardner MG, Gray KA, Gunasekera RM, Hanley SJ, Havil N, Hereward JP, Hirase S, Hong Y, Jarne P, Qi JF, Johnson RN, Kanno M, Kijima A, Kim HC, Kim KS, Kim WJ, Larue E, Lee JW, Lee JH, Li CH, Liao MH, Lo N, Lowe AJ, Malausa T, Male PJG, Marko MD, Martin JF, Messing R, Miller KJ, Min BW, Myeong JI, Nibouche S, Noack AE, Noh JK, Orivel J, Park CJ, Petro D, Prapayotin-Riveros K, Quilichini A, Reynaud B, Riginos C, Risterucci AM, Rose HA, Sampaio I, Silbermayr K, Silva MB, Tero N, Thum RA, Vinson CC, Vorsino A, Vossbrinck CR, Walzer C, White JC, Wieczorek A, Wright M, Dev C (2010) Permanent genetic resources added to molecular ecology resources database 1 June 2010–31 July 2010. Molecul Ecol Resour 10(6):1106–1108. https:// doi.org/10.1111/j.1755-0998.2010.02916.x Arnau G, Nemorin A, Maledon E, Abraham K (2009) Revision of ploidy status of Dioscorea alata L. (Dioscoreaceae) by cytogenetic and microsatellite segregation analysis. Theor Appl Genet 118(7):1239– 1249. https://doi.org/10.1007/s00122-009-0977-6 Arnau G, Bhattacharjee R, Sheela MN, Chair H, Malapa R, Lebot V, Abraham K, Perrier X, Petro D, Penet L, Pavis C (2017) Understanding the genetic diversity and population structure of yam (Dioscorea alata L.) using microsatellite markers. PLoS One 12 (3):e0174150. https://doi.org/10.1371/journal.pone. 0174150 Asemota HN, Ramser J, LopezPeralta C, Weising K, Kahl G (1996) Genetic variation and cultivar identification of Jamaican yam germplasm by random amplified polymorphic DNA analysis. Euphytica 92(3):341–351 Ayensu ES (1972) Anatomy of the monocotyledons. VI. Dioscoreales. Oxford University Press, Oxford Bai KV, Ekanayake IJ (1998) Taxonomy, morphology and floral biology. In: Orkwor GC, Asiedu R, Ekanayake IJ (eds) Food Yams. Advances in research. IITA and NRCRI, Nigeria, pp 13–38 Bhattacharjee R, Nwadili CO, Saski CA, Paterne A, Scheffler BE, Augusto J, Lopez-Montes A, Onyeka JT, Kumar PL, Bandyopadhyay R (2018) An EST-SSR based genetic linkage map and identification of QTLs for anthracnose disease resistance in water yam (Dioscorea alata L.). PLOS ONE 13(10):e0197717. https://doi.org/10.1371/journal.pone.0197717 Bousalem M, Arnan G, Hochu I, Arnolin R, Viader V, Santoni S, David J (2006) Microsatellite segregation analysis and cytogenetic evidence for tetrasomic inheritance in the american yam Dioscorea trifida and a new basic chromosome number in the Dioscoreae. Theor Appl Genet 113(3):439–451 Bredeson JV, Lyons JB, Oniyinde IO, Okereke NR, Kolade O, Nnabue I, Nwadili CO, Hřibová E, Parker M, Nwogha J, Shu S, Carlson J, Kariba R, Muthemba S, Knop K, Barton GJ, Sherwood AV,

20

Yam Genomics

Lopez-Montes A, Asiedu R, Jamnadass R, Muchugi A, Goodstein D, Egesi CN, Featherston J, Asfaw A, Simpson GG, Doležel J, Hendre PS, Van Deynze A, Kumar PL, Obidiegwu JE, Bhattacharjee R, Rokhsar DS (2021) Chromosome evolution and the genetic basis of agronomically important traits in greater yam Burkill IH (1960) The organography and the evolution of the Dioscoreacea, the family of the yams. Bot J Linn Soc 56(367):319–420 Caddick LR, Rudall PJ, Wilkin P, Hedderson TAJ, Chase MW (2002a) Phylogenetics of Dioscoreales based on combined analyses of morphological and molecular data. Bot J Linn Soc 138(2):123–144 Caddick LR, Wilkin P, Rudall PJ, Hedderson TAJ, Chase MW (2002b) Yams reclassified: a recircumscription of Dioscoreaceae and Dioscoreales. Taxon 51(1):103–114 Chaïr H, Cornet D, Deu M, Baco MN, Agbangla A, Duval MF, Noyer JL (2010) Impact of farmer selection on yam genetic diversity. Conserv Genet 11(6):2255–2265. https://doi.org/10.1007/s10592010-0110-z Chellappan B, PR S, Vijayan S, Rajan VS, Sasi A, Nair AS (2019) High quality draft genome of arogyapacha (Trichopus zeylanicus), an important medicinal plant endemic to Western Ghats of India. G3 Genes|Genomes|Genetics 9(8):2395–2404. https:// doi.org/10.1534/g3.119.400164 Cheng J, Chen J, Liu X, Li X, Zhang W, Dai Z, Lu L, Zhou X, Cai J, Zhang X, Jiang H, Ma Y (2021) The origin and evolution of the diosgenin biosynthetic pathway in yam. Plant Commun 2(1):100079. https:// doi.org/10.1016/j.xplc.2020.100079 Cormier F, Lawac F, Maledon E, Gravillon M-C, Nudol E, Mournet P, Vignes H, Chaïr H, Arnau G (2019) A reference high-density genetic map of greater yam (Dioscorea alata L.). Theoretical and Applied Genetics 132(6):1733–1744. https://doi.org/ 10.1007/s00122-019-03311-6 Cormier F, Martin G, Vignes H, Lachman L, Cornet D, Faure Y, Maledon E, Mournet P, Arnau G, Chaïr H (2021) Genetic control of flowering in greater yam (Dioscorea alata L.). BMC Plant Biol 21(1):163. https://doi.org/10.1186/s12870-021-02941-7 Croxton MD, Andreu MA, Williams DA, Overholt WA, Smith JA (2011) Geographic origins and genetic diversity of air-potato (Dioscorea Bulbifera) in Florida. Invasive Plant Sci Manag 4(1):22–30. https://doi.org/10.1614/IPSM-D-10-00033.1 Ehounou AE, Cornet D, Desfontaines L, MarieMagdeleine C, Maledon E, Nudol E, Beurier G, Rouan L, Brat P, Lechaudel M, Nous C, N’Guetta ASP, Kouakou AM, Arnau G (2021) Predicting quality, texture and chemical content of yam (Dioscorea alata L.) tubers using near infrared spectroscopy. J Near Infrared Spectrosc 29(3):128–139 https://doi.org/10.1177/09670335211007575 Epping J, Laibach N (2020) An underutilized orphan tuber crop—Chinese yam: a review. Planta 252(4):58. https://doi.org/10.1007/s00425-020-03453

387 Gatarira C, Agre P, Matsumoto R, Edemodu A, Adetimirin V, Bhattacharjee R, Asiedu R, Asfaw A (2020) Genome-wide association analysis for tuber dry matter and oxidative browning in water yam (Dioscorea alata L.). Plants 9(8):969. https://doi.org/10.3390/ plants9080969 Girma G, Hyma KE, Asiedu R, Mitchell SE, Gedil M, Spillane C (2014) Next-generation sequencing based genotyping, cytometry and phenotyping for understanding diversity and evolution of guinea yams. Theor Appl Genet 127(8):1783–1794. https://doi.org/ 10.1007/s00122-014-2339-2 Govaerts R, Wilkin P, Saunders R (2007) World checklist of Dioscoreales, yams and their allies. Kew Publishing, Kew Hahn SK (1995) Yams—Dioscorea spp. (Dioscoreaceae). Longman, London Hamon P (ed) (1995) Les ignames sauvages d’Afrique de l’Ouest: caractéristiques morphologiques: morphological characteristics = Wild yams in West Africa. Éditions de l`Orstom, Paris Hochu I, Santoni S, Bousalem M (2006) Isolation, characterization and cross-species amplification of microsatellite DNA loci in the tropical American yam Dioscorea trifida. Mol Ecol Notes 6(1):137–140 Kelley DR (2018) E3 ubiquitin ligases: key regulators of hormone signaling in plants. Mol Cell Proteomics 17 (6):1047–1054. https://doi.org/10.1074/mcp.MR117. 000476 Knuth R (1924) Dioscoreaceae. In: Engler HGA, Engelmann HR, J Cramer (ed) Das Pflanzenreich. Leipzig, pp 1–387 Lebot V (2009) Tropical root and tuber crops: cassava, sweet potato, yams, aroids. CABI Lebot V, Trilles B, Noyer JL, Modesto J (1998) Genetic relationships between Dioscorea alata L. cultivars. Genetic Resour Crop Evolution 45(6):499–509. https://doi.org/10.1023/A:1008603303314 Li J, Liang Q, Li C, Liu M, Zhang Y (2018) Comparative transcriptome analysis identifies putative genes involved in dioscin biosynthesis in Dioscorea zingiberensis. Molecules 23(2):454. https://doi.org/10. 3390/molecules23020454 Li J, Zhao X, Dong Y, Li S, Yuan J, Li C, Zhang X, Li M (2020) Transcriptome analysis reveals key pathways and hormone activities involved in early microtuber formation of Dioscorea opposita. Biomed Res Int 2020:e8057929. https://doi.org/10.1155/2020/8057929 Lubbe A, Verpoorte R (2011) Cultivation of medicinal and aromatic plants for specialty industrial materials. Indus Crops Prod 34(1):785–801. https://doi.org/10. 1016/j.indcrop.2011.01.019 Malapa R, Arnau G, Noyer JL, Lebot V (2005) Genetic diversity of the greater yam (Dioscorea alata L.) and relatedness to Dioscorea nummularia Lam. and Dioscorea transversa Br. as revealed with AFLP markers. Genetic Resour Crop Evolution 52(7):919– 929. https://doi.org/10.1007/s10722-003-6122-5 Mignouna HD, Njukeng P, Abang MM, Asiedu R (2001) Inheritance of resistance to Yam mosaic virus, genus

388 Potyvirus, in white yam (Dioscorea rotundata). Theor Appl Genet 103(8):1196–1200 Mota APZ, Fernandez D, Arraes FBM, Petitot A-S, de Melo BP, de Sa MEL, Grynberg P, Saraiva MAP, Guimaraes PM, Brasileiro ACM, Albuquerque EVS, Danchin EGJ, Grossi-de-Sa MF (2020) Evolutionarily conserved plant genes responsive to root-knot nematodes identified by comparative genomics. Mol Genet Genomics 295(4):1063–1078. https://doi.org/10.1007/ s00438-020-01677-7 Nkere CK, Otoo E, Atiri GI, Onyeka J, Silva G, Bömer M, Seal SE, Kumar PL (2020) Assessment of Yam mild mosaic virus coat protein gene sequence diversity reveals the prevalence of cosmopolitan and African group of isolates in Ghana and Nigeria. Curr Plant Biol 23:100156. https://doi.org/10.1016/j.cpb.2020. 100156 Obidiegwu JE, Asiedu R, Ene-Obong EE, Muoneke CO, Kolesnikova-Allen M (2009) Genetic characterization of some water yam (Dioscorea alata L.) accessions in West Africa with simple sequence repeats. J Food Agri Environ 7(3–4):634–638 Paul C, Debnath A, Debnath B (2018) A documentation on Dioscorea spp. (A neglected wild tuber) with special reference to its domestication and seasonal food fecurity to the indigenous forest dwellers in Tripura, India. Ambient Sci 5(2):51–53 Petro D, Onyeka TJ, Etienne S, Rubens S (2011) An intraspecific genetic map of water yam (Dioscorea alata L.) based on AFLP markers and QTL analysis for anthracnose resistance. Euphytica 179(3):405–416. https://doi.org/10.1007/s10681-010-0338-1 Sarah G, Homa F, Pointet S, Contreras S, Sabot F, Nabholz B, Santoni S, Sauné L, Ardisson M, Chantret N, Sauvage C, Tregear J, Jourda C, Pot D, Vigouroux Y, Chair H, Scarcelli N, Billot C, Yahiaoui N, Bacilieri R, Khadari B, Boccara M, Barnaud A, Péros J-P, Labouisse J-P, Pham J-L, David J, Glémin S, Ruiz M (2017) A large set of 26 new reference transcriptomes dedicated to comparative population genomics in crops and wild relatives. Mol Ecol Resour 17(3):565–580. https://doi.org/10. 1111/1755-0998.12587 Saski CA, Bhattacharjee R, Scheffler BE, Asiedu R (2015) Genomic resources for water yam (Dioscorea alata L.): analyses of EST-sequences, De novo sequencing and GBS libraries. PLoS One 10(7):n.p Scarcelli N, Dainou O, Agbangla C, Tostain S, Pham JL (2005) Segregation patterns of isozyme loci and microsatellite markers show the diploidy of African yam Dioscorea rotundata (2n=40). Theor Appl Genet 111(2):226–232. https://doi.org/10.1007/s00122-0052003-y Scarcelli N, Cubry P, Akakpo R, Thuillet A-C, Obidiegwu J, Baco MN, Otoo E, Sonké B, Dansi A, Djedatin G, Mariac C, Couderc M, Causse S, Alix K, Chaïr H, François O, Vigouroux Y (2019) Yam genomics supports West Africa as a major cradle of crop domestication. Sci Adv 5(5):eaaw1947. https:// doi.org/10.1126/sciadv.aaw1947

H. Chaïr et al. Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M, Schölkopf B, Weigel D, Lohmann JU (2005) A gene expression map of Arabidopsis thaliana development. Nat Genet 37(5):501–506. https://doi.org/10.1038/ng1543 Sharif BM, Burgarella C, Cormier F, Mournet P, Causse S, Van KN, Kaoh J, Rajaonah MT, Lakshan SR, Waki J, Bhattacharjee R, Badara G, Pachakkil B, Arnau G, Chaïr H (2020) Genomewide genotyping elucidates the geographical diversification and dispersal of the polyploid and clonally propagated yam (Dioscorea alata). Ann Bot 126 (6):1029–1038. https://doi.org/10.1093/aob/mcaa122 Siadjeu C, Pucker B, Viehover P, Albach DC, Weisshaar B (2020) High contiguity de novo genome sequence assembly of trifoliate yam (Dioscorea dumetorum) using long read sequencing. Genes 11 (3):274. https://doi.org/10.3390/genes11030274 Siqueira MVBM, Marconi TG, Bonatelli ML, Zucchi MI, Veasey EA (2011) New microsatellite loci for water yam (Dioscorea alata, Dioscoreaceae) and crossamplification for other Dioscorea species. Am J Bot 98(6):e144–e146. https://doi.org/10.3732/ajb.1000513 Siqueira MVBM, Bonatelli ML, Gunther T, Gawenda I, Schmid KJ, Pavinato VAC, Veasey EA (2014) Water yam (Dioscorea alata L.) diversity pattern in Brazil: an analysis with SSR and morphological markers. Genetic Resour Crop Evolution 61(3):611–624 Sugihara Y, Darkwa K, Yaegashi H, Natsume S, Shimizu M, Abe A, Hirabuchi A, Ito K, Oikawa K, Tamiru-Oli M, Ohta A, Matsumoto R, Agre P, Koeyer DD, Pachakkil B, Yamanaka S, Muranaka S, Takagi H, White B, Asiedu R, Innan H, Asfaw A, Adebola P, Terauchi R (2020) Genome analyses reveal the hybrid origin of the staple crop white Guinea yam (Dioscorea rotundata). PNAS 117(50):31987–31992. https://doi.org/10.1073/pnas.2015830117 Sukal AC, Taylor M, Tuia VS (2015) Viruses and their impact on the utilization of plant genetic resources in the Pacific. Acta Hort 1101:127–132 Syombua ED, Zhang Z, Tripathi JN, Ntui VO, Kang M, George OO, Edward NK, Wang K, Yang B, Tripathi L (2020) A CRISPR/Cas9‐based genome‐editing system for yam (Dioscorea spp.). Plant Biotechnol J pbi.13515. https://doi.org/10.1111/pbi.13515 Tamiru M, Natsume S, Takagi H, White B, Yaegashi H, Shimizu M, Yoshida K, Uemura A, Oikawa K, Abe A, Urasaki N, Matsumura H, Babil P, Yamanaka S, Matsumoto R, Muranaka S, Girma G, Lopez-Montes A, Gedil M, Bhattacharjee R, Abberton M, Kumar PL, Rabbi I, Tsujimura M, Terachi T, Haerty W, Corpas M, Kamoun S, Kahl G, Takagi H, Asiedu R, Terauchi R (2017) Genome sequencing of the staple food crop white Guinea yam enables the development of a molecular marker for sex determination. BMC Biol 15(1):86. https://doi.org/10.1186/s12915-0170419-x Terauchi R, Chikaleke VA, Thottappilly G, Hahn SK (1992) Origin and phylogeny of Guinea yams as revealed by RFLP analysis of chloroplast DNA and

20

Yam Genomics

nuclear ribosomal DNA. Theoret Appl Genetics 83–83 (6–7):743–751. https://doi.org/10.1007/BF00226693 The Computational Pan-Genomics Consortium (2018) Computational pan-genomics: status, promises and challenges. Brief Bioinform 19(1):118–135. https:// doi.org/10.1093/bib/bbw089 Tostain S, Scarcelli N, Brottier P, Marchand JL, Pham JL, Noyer JL (2006) Development of DNA microsatellite markers in tropical yam (Dioscorea sp.). Molecul Ecol Notes 6(1):173–175. https://doi.org/10.1111/j.14718286.2005.01182.x Umber M, Filloux D, Gelabale S, Gomez RM, Marais A, Gallet S, Gamiette F, Pavis C, Teycheney PY (2020) molecular viral diagnosis and sanitation of yam genetic resources: implications for safe yam germplasm exchange. Viruses-Basel 12(10):1101. https:// doi.org/10.3390/v12101101 Vandenbroucke H, Mournet P, Vignes H, Chaïr H, Malapa R, Duval MF, Lebot V (2016) Somaclonal variants of taro (Colocasia esculenta Schott) and yam (Dioscorea alata L.) are incorporated into farmers’ varietal portfolios in Vanuatu. Genetic Resour Crop Evolution 63(3):495–511 Viruel J, Segarra-Moragues JG, Raz L, Forest F, Wilkin P, Sanmartin I, Catalan P (2016) Late cretaceous-early eocene origin of yams (Dioscorea, Dioscoreaceae) in the Laurasian Palaearctic and their subsequent Oligocene-Miocene diversification. J Biogeogr 43(4):750–762 Wang X, Chen DJ, Wang YQ, Xie J (2015) De novo transcriptome assembly and the putative biosynthetic pathway of steroidal sapogenins of dioscorea composita. PLoS One 10(4):n.p. Wilkin P, Schols P, Chase MW, Chayamarit K, Furness CA, Huysmans S, Rakotonasolo F, Smets E, Thapyai C (2005) A plastid gene phylogeny of the yam genus, Dioscorea: Roots, fruits and Madagascar. Syst Bot 30(4):736–749. https://doi.org/10.1600/ 036364405775097879 Wu ZG, Li XX, Lin XC, Jiang W, Tao ZM, Mantri N, Fan CY, Bao XQ (2014) Genetic diversity analysis of yams (Dioscorea spp.) cultivated in China using ISSR and SRAP markers. Genetic Resour Crop Evolution 61(3):639–650 Wu ZG, Jiang W, Mantri N, Bao XQ, Chen SL, Tao ZM (2015) Transciptome analysis reveals flavonoid

389 biosynthesis regulation and simple sequence repeats in yam (Dioscorea alata L.) tubers. BMC Genomics 16:346. https://doi.org/10.1186/s12864-015-1547-8 Wu Z-G, Jiang W, Tao Z-M, Pan X-J, Yu W-H, Huang H-L (2020) Morphological and stage-specific transcriptome analyses reveal distinct regulatory programs underlying yam (Dioscorea alata L.) bulbil growth. J Exp Bot 71(6):1899–1914. https://doi.org/10.1093/ jxb/erz552 Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, Cao M, Liu J, Sun J, Tang J, Chen Y, Huang X, Lin W, Ye C, Tong W, Cong L, Geng J, Han Y, Li L, Li W, Hu G, Huang X, Li W, Li J, Liu Z, Li L, Liu J, Qi Q, Liu J, Li L, Li T, Wang X, Lu H, Wu T, Zhu M, Ni P, Han H, Dong W, Ren X, Feng X, Cui P, Li X, Wang H, Xu X, Zhai W, Xu Z, Zhang J, He S, Zhang J, Xu J, Zhang K, Zheng X, Dong J, Zeng W, Tao L, Ye J, Tan J, Ren X, Chen X, He J, Liu D, Tian W, Tian C, Xia H, Bao Q, Li G, Gao H, Cao T, Wang J, Zhao W, Li P, Chen W, Wang X, Zhang Y, Hu J, Wang J, Liu S, Yang J, Zhang G, Xiong Y, Li Z, Mao L, Zhou C, Zhu Z, Chen R, Hao B, Zheng W, Chen S, Guo W, Li G, Liu S, Tao M, Wang J, Zhu L, Yuan L, Yang H (2002) A Draft Sequence of the Rice Genome (Oryza sativa L. ssp. indica). Science 296(5565):79–92. https://doi. org/10.1126/science.1068037 Zhang YM, Chen M, Sun L, Wang Y, Yin JM, Liu J, Sun XQ, Hang YY (2020) Genome-Wide identification and evolutionary analysis of NBS-lrr genes from Dioscorea rotundata. Front Genet 11:484. https://doi. org/10.3389/fgene.2020.00484 Zheng T, Nibau C, Phillips DW, Jenkins G, Armstrong SJ, Doonan JH (2014) CDKG1 protein kinase is essential for synapsis and male meiosis at high ambient temperature in Arabidopsis thaliana. Proc Natl Acad Sci USA 111(6):2182–2187. https://doi.org/10.1073/ pnas.1318460111 Zhou YY, Luo SZ, Hameed S, Xiao D, Zhan J, Wang AQ, He LF (2020) Integrated mRNA and miRNA transcriptome analysis reveals a regulatory network for tuber expansion in Chinese yam (Dioscorea opposita). BMC Genomics 21(1):117. https:// doi.org/10.1186/s12864-020-6492-5

The African Eggplant Susan M. Moenga and Damaris Achieng Odeny

Abstract

The African eggplant (Solanum aethiopicum L.) is a promising nutritious African vegetable that is also grown in South and Central America and certain parts of Italy and France. There are four known cultivar groups of the African eggplant, which together with its progenitor, Solanum anguivi, form the hypervariable scarlet eggplant complex. Despite its importance as food, medicine, and source of disease resistance genes, there has been limited research investment in the improvement of the African eggplant and it remains an orphan crop. We review the botanical description of the cultivar groups, the available genetic and genomic resources, and the germplasm conservation efforts within the primary and secondary genepools. We present the recently published draft genome sequence and make detailed comparisons of the genome with other genomes within the Solanaceae

S. M. Moenga The Plant Pathology Department, University of California Davis, Davis, CA 95616, USA e-mail: [email protected] D. Achieng Odeny (&) The International Crops Research Institute for the Semi-Arid Tropics, P.O Box 39063, Nairobi 00623, Kenya e-mail: [email protected]

21

family. We further demonstrate the immediate utilization of the draft genome for gene discovery by retrieving orthologous seed dormancy candidate genes that can be characterized to improve this trait in the African eggplant. We finally provide evidence of why the African eggplant is underutilized and make some recommendations for future breeding, research investment, and marketing efforts that promise to enhance its utilization.

21.1

Background

The African eggplant, also known as scarlet eggplant (Solanum aethiopicum, 2n = 2x = 24), is a nutritious cultivated vegetable of the Solanaceae family. It is one of the only three cultivated species of the Leptostemonum clade together with the gboma (S. macrocarpon) and brinjal (S. melongena) eggplant (Table 21.1). Although the African eggplant is the second most important eggplant after brinjal eggplant globally (Gramazio et al. 2016), it is the most important in West and Central Africa, excluding Benin and certain rain forest regions of coastal Africa and Congo River Basin, where the gboma eggplant is most important (Maundu et al. 2009; Plazas et al. 2014; Schippers 2000). It is well adapted to diverse ecologies, ranging from deserts to swampy areas with variable temperatures (Kamenya et al. 2021). Unlike many vegetables or fresh produce, the African eggplant can yield over an extended period,

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_21

391

392

S. M. Moenga and D. Achieng Odeny

transports well, and is shelf-stable for up to three months (National Research Council 2006), further positioning its economic significance in the tropical African climate. It serves an important role as a cover crop (National Research Council 2006; Ofori and Gamedoagbao 2005) given its elevated shade tolerance that facilitates intercropping with taller crops. There are four recognized, completely interfertile cultivar groups of the African eggplant, namely Shum, Gilo, Kumba, and Aculeatum, with Gilo considered as the most important group (Acquadro et al. 2017; Lester 1986; Lester et al. 1986; Lester and Daunay 2003). Cytological studies suggest that the Gilo group is likely to have evolved from the Shum group through hybridization and selection (Anaso 1991). The group distinctions reflect specific accompanying plant growth characteristics, uses, and geographical distribution (Sakhanokho et al. 2014). The Shum group is used for its leaves; Kumba for both fruits and leaves; Gilo for its fruits; and Aculeatum as an ornamental (Kamenya et al. 2021; Lester 1986; Lester and Daunay 2003; Schippers 2000). Different tissues of the African eggplants are also used for their medicinal value, with its roots and fruits extensively used in different parts of Africa to treat colic and high blood pressure as well as a sedative (Food and Agriculture Organization of the United Nations 2021). Whereas crushed fruit is used as an enema, leaf-derived juice is used to treat uterine complications, miscarriage-associated tetanus, and as an anti-emetic (Adeniji et al. 2012; Food and Agriculture Organization of the United Nations 2021; Lester and Seck 2004).

21.1.1 Botanical Description Significant variation in morphological traits has been reported within and among the four African eggplant cultivar groups (Plazas et al. 2014) that are often distinguished using various morphological descriptors including leaf shape, prickliness, fruit color, and pubescence. The leaf margins range from strong to weakly lobbed (Fig. 21.1a) (Adeniji et al. 2013; Plazas et al. 2014) with up to three lobes on each side (Fig. 21.1a, b). The mature leaves of both Shum and Kumba groups are glabrous (Fig. 21.1b) with diminutive glandular hairs that are distinct from the stellate hairs on the leaf surfaces of Gilo and Aculeatum groups. The leaves have 3–6 pairs of primary veins (Fig. 21.1b), with a petiole that is *1/4 the length of the leaf (Fig. 21.1a) (POWO 2019). Short, porrect, or sessile trichomes may characterize the abaxial pubescence of Gilo, Kumba, and Aculeatum with fewer rays and midpoints as well as thick stalks on the adaxial surface (Lester and Seck 2004). Both Kumba and Shum leaves are edible, with the latter relatively smaller in size (Plazas et al. 2014). Plants of Gilo are generally taller than those of Kumba group while Aculeatum group has higher anthocyanin content and prickliness than the other groups (Plazas et al. 2014). Shum group has relatively more branches than the rest (Lester and Seck 2004; Taher et al. 2017). An unbranched inflorescence with up to ten perfect flowers and more than 5 perianth lobes on short and thick pedicels denotes the African eggplant (Fig. 21.1c, d). The calyx (3.5–9 mm long) and corolla (0.8–1.8 cm) have deltate

Table 21.1 Differences in morpho-physiological characteristics among cultivated eggplants Trait descriptor

S. aethiopicum

S. melongena

S. macrocarpon

Fruit diameter (cm)

1.5–8

3–4

5–12

Fruit weight (g)

1–351

30–116

22–177

Prickliness

None to high

None to moderate

None to slight

Bitterness

None to very high

None to moderate

Slight to high

Toxicity (glycoalkaloid content)

Safe range

Safe range

5–10x > safety threshold

Plant height (cm)

74–208

60–120

50–129

Source Sánchez-Mata et al. (2010); Plazas et al. (2014); Naroui Rad et al. (2017); Taher et al. (2017)

21

The African Eggplant

393

Fig. 21.1 Leaf and floral features of select African eggplants. a Herbarium tissues with both weakly and strongly lobed margins as shown with arrows. b Adaxial side of a glabrous Shum leaf blade with 6 pairs of primary

veins. c Shum flowers with white petals. d Gilo flowers with purple petals. Herbarium image (a) adapted from POWO (2019) and Gilo flower image (d) adapted from http://www.blendingflavours.in/2016/07/solanum-Gilo/

lobes, while the ovary is glabrous or stellate pubescent and the stamen equal (POWO 2019). Both Shum and Gilo have five white and pink purple petals, respectively (Fig. 21.1c, d) (Adeniji et al. 2013). Kumba has comparably larger flowers with a greater count of flower parts than Shum (Plazas et al. 2014). Aculeatum has more flowers per inflorescence than Gilo, Kumba, and Shum groups.

Perhaps the fruit features, which have varied colors, shapes and sizes (Yang and Ojiewo 2013) are the most distinguishing of various African eggplant cultivar groups. Each infructescence is made of 1–4 berries that range in color from green or striped when immature and later turns to white, cream, orange, violet, lavender, dusky black, burgundy, plum, pink, lime, or red depending on the cultivar group (Fig. 21.2)

394

(National Research Council 2006). The Gilo group has fleshy edible fruits that range in color from white, creamy white, pale green, dark green, brown, or purple when mature with stellate hairs (Fig. 21.2a, b) (Lester and Seck 2004). Gilo fruits (Fig. 21.2c) can be further classified into three distinct classes based on the fruit shape, color, and size (Kouassi et al. 2014). Mature Shum fruits turn orange and subsequently red (Fig. 21.2d). Relative to other groups, both Aculeatum and Kumba fruits are more flattened, broad, and deeply grooved (Plazas et al. 2014) with Kumba fruits known to be sweet and edible. In addition to the morphological characteristics, several fruit compositional traits such as carbohydrates, starch, vitamin C, total phenolics, and sugars have been shown to be highly variable among the four S. aethiopicum cultivar groups, with discernable breeding potential (San José et al. 2016). The seeds are *3.5 mm in breadth (Fig. 21.2e) (Page et al. 2019).

21.1.2 Geographic Distribution The African eggplant belongs to the section Oliganthes (Dunal) Bitter of the species-rich subclade Leptostemonum, which includes several cultivated and wild eggplants that are native to the Old World; Africa, Australia, and Asia (Lester and Daunay 2003; Acquadro et al. 2017). In spite of its morphological hyper-diversity (Vorontsova et al. 2013), several phylogenetic reconstructions of Leptostemonum clade have long led to the conclusion that it is monophyletic (Levin et al. 2006; Särkinen et al. 2013; Stern et al. 2011; Vorontsova et al. 2013). A recent study has however departed from this understanding based on greater sampling of tropical Asian species belonging to this clade. Aubriot et al. (2016) supported three independent introductions from the New World that form distinct lineages in the Old World. In agreement with this placement, the African eggplant belongs to the Anguivi clade, with S. anguivi as its immediate wild progenitor (Knapp et al. 2019; Lester and Niakan 1986). The domestication of the African eggplant most likely occurred in West Africa

S. M. Moenga and D. Achieng Odeny

from its wild progenitor, S. anguivi (Lester and Niakan 1986; Sakata and Lester 1997), perhaps via the semi-domesticated S. distichum (Lester and Seck 2004). This theory is supported by cytological studies (Anaso 1991), the fully fertile nature of the crosses between S. aethiopicum and S. anguivi (Lester and Niakan 1986; Plazas et al. 2014; Taher et al. 2017), the wide cultivation of the African eggplant in West Africa, and the existence of a rich diversity of both cultivated and wild relatives in West Africa (Sękara et al. 2007). Cultivation of the African eggplant is almost completely restricted to Africa, but it is also cultivated in the Caribbean and Brazil, where it is thought to have been introduced by slaves (Schippers 2000; Sunseri et al. 2010). There is occasional cultivation of the African eggplant in the Southern parts of Italy and France (Lester and Seck 2004), with further evidence of an early introduction into the United Kingdom in the late 1500s by British traders (National Research Council 2006). Gilo group, which is consumed raw or cooked as an immature or ripe fruit, is the most widely grown African eggplant and is cultivated in Senegal, Nigeria, Ivory Coast, Angola, Zimbabwe, Mozambique, and Central and East African countries (Kouassi et al. 2014; National Research Council 2006). Gilo must have been introduced into Madagascar, where it is used as medicine and occasionally as food (D’Arcy 1992). The Gilo group is the most common African eggplant in Brazil, where it is referred to as jiló (Miamoto et al. 2020) and is cultivated by small-scale farmers, mainly in Southeastern Brazil (Suaste-Dzul et al. 2021). The Shum group is distributed across Central Africa, Nigeria, Togo, Benin, Ghana, and Uganda. In Uganda, both Shum and Gilo have grown in popularity, especially in central Uganda, where they are known by local names, Nakati (Shum) and Entula (Gilo), and are used either as main or side dishes. The Kumba group, also known as the orange African eggplant, is restricted to the sub-Sahelian region, expanding across the old Mali empire, reflecting arid agroecology (Yang and Ojiewo 2013). There are records indicating that some of the African

21

The African Eggplant

395

Fig. 21.2 The extent of fruit color diversity in the African eggplant cultivar groups. Mature fruit colors that range from white, creamy white, purple, orange, and red (a). Immature (b) and Mature cream (c) Gilo fruits. Both yellow and red shum fruits on the same stalk (d). Seeds

derived from the shum group fruits (e). Figure (a) and (d) are adapted from Plazas et al. (2014) and Kamenya et al. (2021), respectively, which are published under a Creative Commons License

eggplants maintained at the USDA Plant Genetic Resources Conservation Unit (Georgia, USA) originated from Yugoslavia (Furini and Wunder 2004) although no record of its cultivation in Yugoslavia could be verified.

and their intermediate forms (Fig. 21.3). All members of the primary genepool are inter-fertile (Lester and Niakan 1986; Taher et al. 2017) with regular gene flow between the species (Acquadro et al. 2017). This primary genepool is also commonly referred to as the scarlet eggplant complex (Plazas et al. 2014) and constitutes an important genetic resource that is critical for African eggplant improvement. The secondary genepool (Fig. 21.3) consists of gboma and brinjal eggplants, together with their respective

21.1.3 Genetic Resources The primary genepool of the African eggplant comprises the four cultivar groups, S. anguivi,

396

S. M. Moenga and D. Achieng Odeny

ancestors, S. dasyphyllum, S. insanum, and other Anguivi clade wild species, all of which can cross with the African eggplant resulting in hybrids with intermediate fertility (Daunay et al. 1991; Prohens et al. 2012; Plazas et al. 2014; Afful et al. 2019). All other accessions within the Leptostemonum sub-clade comprise the tertiary genepool. There are currently *794 and *82 accessions of S. aethiopicum and S. anguivi, respectively, in various genebanks globally (GENESYS 2021; Taher et al. 2017), and the majority of which are maintained at the World Vegetable Centre in Taiwan (Table 21.2). There is a backup of 175 accessions of the African eggplant from 30 countries at the Svalbard Global Seed Vault (SGSV 2021). Additional collections exist in India and Japan (31 accessions) (Sękara et al. 2007) and at the Institute of Vegetables and Flowers in China (Taher et al. 2017) that may not have been recorded in GENESYS (https://www. genesys-pgr.org/). Given the hyper-variability that exists within the scarlet eggplant complex, the numbers of accessions in genebanks are a

likely under-representation of the germplasm and will need to be urgently improved. There is a better conservation of the brinjal eggplant germplasm (*5665) in global genebanks, but not for the gboma eggplants (*169 accessions) and the wild relatives of both gboma and brinjal eggplants (1221) (Taher et al. 2017). Other than the detailed genetic characterization of brinjal eggplant germplasm (Liu et al. 2018; Miyatake et al. 2019), the true genetic diversity of the rest of the collections and the extent of intraspecific variation are yet to be systematically captured. There have been a few, disparate efforts at the characterization of some accessions within the scarlet eggplant complex (Kouassi et al. 2014; Sakhanokho et al. 2014), which sometimes included a few accessions of S. anguivi (Kouassi et al. 2014; Sakhanokho et al. 2014; San José et al. 2016), but nothing significant for gboma and other wild eggplant germplasm within the secondary genepool. The few morphological and fruit composition characterization reports in gboma eggplant reveal great variation (San José et al. 2016; Adeniji et al. 2018) that should be

Fig. 21.3 Schematic representations of the phylogenies of cultivated eggplants and their wild relatives. The primary and secondary genepools are highlighted in light green (a) and orange (b), respectively. This figure has

been modified from Taher et al. (2017) which was published under a Creative Commons License and is not drawn to scale

21

The African Eggplant

397

Table 21.2 A summary of the scarlet eggplant complex germplasm collections in global genebanks S. aethiopicum Country

S. anguivi No. of accessions

Country

No. of accessions

Taiwan ROC

480

United Kingdom

29

USA

84

Taiwan

24

Brazil

82

Kenya

8

Netherlands

79

Netherlands

6

Spain

42

USA

5

Germany

10

Spain

4

Armenia

9

Germany

2

Kenya

5

Ethiopia

2

Others (Austria, Italy, Ukraine)

3

Uganda

2

Total

794

Total

82

Source Summarized from GENESYS (2021)

exploited further for overall improvement of eggplants. The conservation of the African eggplant genetic resources will not only be useful for future food security, but also for medicinal and agronomic use. Accessions of the scarlet eggplant complex have been reported to have ethnomedicinal value (Elekofehinti et al. 2013) and are good sources of disease resistance in other Solanaceae crops (Schippers 2000; Collonnier et al. 2001; Toppino et al. 2008). The high number of fruits per inflorescence in S. anguivi (Bukenya-Ziraba 2004; Osei et al. 2010; Afful et al. 2018) can be exploited to improve fruit output in other fruit-edible eggplants and also within the broader Solanaceae family. The African eggplant is also a valued candidate for biotic stress resilience (Lester et al. 1990).

21.2

S. aethiopicum Genetics and Genomics

All the four cultivar groups of the African eggplant appear to be diploid with a basic chromosome number of x = 12 (Anaso 1991; Sakhanokho and Islam-Faridi 2014). Spontaneous tetraploidy resulting in plants that are shorter with broader leaves and small fruits has

also been reported (Sakhanokho and Islam-Faridi 2014). African eggplants are predominantly selfpollinating with up to 30% cross-pollination (Adeniji et al. 2012). The genome size has been estimated using both flow cytometry (1.312– 1.538 pg/1C) (Sakhanokho et al. 2014) and Kmers (1.17 Gb) (Song et al. 2019). Molecular characterization within the scarlet eggplant complex has been reported using SSRs (Prohens et al. 2012), RAPDs (Aguoru et al. 2015), AFLPs (Furini and Wunder 2004), and SNPs (Acquadro et al. 2017; Song et al. 2019). There are currently no known linkage maps that have been developed, and there are no reports of association mapping studies within the scarlet eggplant complex. However, the recent availability of a draft whole genome sequence (Song et al. 2019) will create a world of opportunities for genetic and genome characterization.

21.2.1 Whole Genome Sequencing, Statistics, and Strategy The draft reference genome was sequenced (Song et al. 2019) through the effort of scientists from Uganda, Kenya, USA, and China under the African Orphan Crops Consortium (AOCC) initiative (Jamnadass et al. 2020). This was the first

398

S. M. Moenga and D. Achieng Odeny

and only genome sequence for this crop and was generated from a homozygous Shum cultivar (303) originating from Uganda. Approximately 94% of the estimated 1.17 Gb genome was assembled from short-insert (250 and 500 bp) and mate-pair (2, 6, 10, and 20 kb) Illumina libraries using Platanus assembler (Kajitani et al. 2019; Song et al. 2019). The proportion of the reference genome assembled (1.02 Gb; 94%) compares well with those of other orphan crops that have been sequenced using Illumina technology exclusively (Table 21.3). The genome was annotated using transcript reads from leaves, flowers, stems, fruits, and roots of the same Shum genotype (303), as well as from a Gilo genotype (Song et al. 2019). An estimated 34,906 protein coding genes were predicted using a combination of homologous search and ab initio prediction (Song et al. 2019). More details on how the genome was assembled and annotated are provided in Song et al. (2019). The availability of a draft reference genome is a great achievement for the African eggplant breeding community and will provide opportunities for marker and candidate gene discovery, linkage and association mapping, and genomic selection. However, more advanced breeding applications will require a third-generation sequencing technology such as PacBio (Rhoads and Au 2015) and Hi-C (van Berkum et al. 2010) to improve the current assembly to pseudomolecule level.

21.2.2 Genome Comparison with Other Crops The African and brinjal eggplants have comparable genome sizes and numbers of predicted genes (Table 21.4). The two eggplant genomes are both larger than the genomes of potato (Solanum tuberosum L.) and tomato (Solanum lycopersicum L) but smaller than that of hot pepper (Capsicum annuum L.) and tobacco (Nicotiana tabacum L.) (Table 21.4). The African eggplant has one of the highest proportions of repetitive elements among economically important Solanaceae plants at 78.9% (Table 21.4). Long terminal repeat–retrotransposons (LTR-Rs) are the most abundant repetitive elements in both African (69.9%) and brinjal eggplants (65.8%) (Song et al. 2019; Wei et al. 2020). Despite having a genome three times bigger than that of the eggplants, the number of predicted genes in hot pepper is similar to those observed in the two eggplants (Table 21.4). Phylogenetic studies reveal a more recent divergence (*2.5 Mya) of the African and brinjal eggplant genomes from a common ancestor than between potato and tomato (*6.6 Mya) (Song et al. 2019; Wei et al. 2020) (Fig. 21.4). Although whole genome sequence data is not available for gboma eggplant, recent phylogenetic analysis using SNP data suggests that brinjal eggplant might be genetically more

Table 21.3 A comparison of the assembled African eggplant genome with other “Illumina only”-sequenced orphan crops Species

Ploidy

Estimated genome size (Mbp)

Proportion assembled (%)

N50 (kbp)

References

Solanum aethiopicum

2X

1170

94

516

Song et al. (2019)

Vigna subterranea

2X

535

97

641

Chang et al. (2018)

Lablab purpureus

2X

395

93

621

Chang et al. (2018)

Dioscorea rotundata

2X

579

97

2120

Tamiru et al. (2017)

Lupinus angustifolius

2X

609

64

14

Hane et al. (2017)

Fagopyrum esculentum

2X

1177

88

25

Yasui et al. (2016)

Cucurbita maxima

4X

271.4

70

3717

Sun et al. (2017)

Momordica charantia

2X

285.5

98

1100

Cui et al. (2020)

21

The African Eggplant

399

Table 21.4 Genome statistics of S. aethiopicum alongside select cultivated Solanaceae crops Solanaceae crop

Estimated genome size (Gb)

Predicted genes

Potato

0.84

38,492

Tomato

0.90

34,725

Repetitive elements %  62.2 71.8

References The Potato Genome Sequencing Consortium et al. (2011) The Tomato Genome Consortium et al. (2012), Hosmani et al. (2019)

The African eggplant

1.17

34,906

78.9

Song et al. (2019)

Brinjal eggplant

1.21

34,916

73.0

Barchi et al. (2019)

Hot pepper

3.48

34,899

76.4

Kim et al. (2014)

Tobacco

4.41

69,500

72–79

Sierro et al. (2014), Edwards et al. (2017)

closely related to gboma than with the African eggplant (Acquadro et al. 2017). One of the notable differences between the African and brinjal eggplant genomes is the GC content, which are *33% (Song et al. 2019) and *36% (Hirakawa et al. 2014; Wei et al. 2020), respectively. Two whole genome duplication (WGD) events have been reported in the African eggplant, one of which is shared by asterids and rosids, while the second is shared by other Solanaceae crops (Song et al. 2019).

21.2.3 Disease Resistance Genes Despite the lack of genomic resources in the past and the poor understanding of the genome, the African eggplant and its progenitor, S. anguivi, have been used as sources of disease resistance for other Solanaceae crops (Schippers 2000;

Fig. 21.4 The clustering of S. aethiopicum with other economically important crops within the Solanaceae family. a Phylogenetic analysis done by Song et al. (2019) using single-copy gene families. b Phylogenetic tree constructed by Wei et al. (2020) using maximum

Collonnier et al. 2001; Rizza et al. 2002; Toppino et al. 2008). The rootstocks of the African eggplant have been used to improve disease resistance in tomato (Nkansah et al. 2013). Toppino et al. (2008) studied the inheritance of resistance to Fusarium wilt (Fusarium oxysporum f. sp. melongenae) in brinjal eggplant that had previously been introgressed from the African eggplant (Rizza et al. 2002) and reported the likely involvement of a single major gene. Barchi et al. (2018) later mapped this locus on chromosome E02 of S. melongena. There is now evidence that the African eggplant genome harbors more disease resistance genes relative to other Solanaceae crops (Wei et al. 2020). A total of 436 nucleotide binding site (NBS)-encoding genes (Fig. 21.5) were identified in the African eggplant genome in comparison with 223 in tomato and 250–301 in S. melongena (Wei et al. 2020). Of the 436, 219 lacked either LRR or TIR

likelihood algorithm and 1000 bootstrap replicates. Both figures reproduced with permission from Song et al (2019) and Wei et al (2020) which are published under a Creative Commons License

400

domains and were therefore classified as NBS genes (Fig. 21.5). Disease resistance genes have been reported to occur in clusters (Meyers et al. 2003; Richard et al. 2017), which may arise through tandem duplication, unequal recombination, and diversifying selection (Graham et al. 2008). Song et al. (2019) also observed significant enrichment of disease resistance GO terms within the LTR-R– captured genes, suggesting that the abundant LTR-Rs in the African eggplant genome play a role in the elevated levels of resistance to diseases. We searched the current draft of the African eggplant genome for any obvious clusters of the 436 NBS-encoding genes and observed their occurrence in duplicates and/or clusters, with the largest clusters having 14 genes. The two largest clusters were found in scaffold83105 and scaffold2427, each of which contained 6 NBS-LRR and 8 NBS-encoding R genes. While a further characterization of these R genes will be necessary, the 34,171 SNPs identified by Song et al. (2019) from within disease resistance genes in the African eggplant genome are available for immediate validation and use by the eggplant research and breeding community.

Fig. 21.5 Different classes of disease resistance genes identified in the African eggplant genome. The figure was drawn using data from Song et al. (2019) and Wei et al. (2020)

S. M. Moenga and D. Achieng Odeny

21.2.4 Re-sequencing Sixty African eggplant accessions including 38 Shum and 22 Gilo types have been re-sequenced alongside five S. anguivi accessions at an average coverage of 60 (Song et al. 2019). This effort led to the generation of a pan-genome with additional 7069 unique genes, and the identification of 18.6 M SNPs, 2 M indels, and 1.2 M structural variations (SVs) (Song et al. 2019). A phylogenetic analysis of the 65 accessions distinguished Shum and Gilo accessions into two major groups, with the five S. anguivi accessions clustering randomly across the two groups (Song et al. 2019). The lack of distinct clustering of S. anguivi away from Shum or Gilo groups is not surprising, given the frequent hybridization between the African eggplant and S. anguivi accessions that have resulted in several intermediate forms (Plazas et al. 2014). Acquadro et al. (2017) also reported a similar non-distinct clustering pattern of S. anguivi among different African eggplant groups highlighting the frequent gene flow between the two species. There were fewer SNPs and indels shared between Shum and S. anguivi as compared to those shared between Gilo and S. anguivi. This finding was

21

The African Eggplant

consistent with earlier studies reporting that the Gilo group is likely to have evolved from the Shum group through hybridization and selection (Anaso 1991) though it is still unclear whether Shum was the African eggplant group that was domesticated from S. anguivi. The accessory gene sets for Shum (29,389), Gilo (23,726), and S. anguivi (12,829) that were generated from the re-sequencing effort will further facilitate studies of evolution and genetics of the scarlet eggplant complex.

21.2.5 Orthologous Candidate Genes for Seed Dormancy One of the most important breeding objectives in the African eggplant is the enhancement of domestication traits such as reduced seed dormancy (Mshida 2014). Seed dormancy in plants can be physiological or physical. Physiological dormancy is the most common and is caused largely by plant hormonal interactions (See FinchSavage and Leubner‐Metzger 2006 for detailed classification) but could also arise from growth inhibitor production in fleshy fruits (Kathpalia and Bhatla 2018). Physical dormancy is reported to be common mainly in legumes and is caused by the physical characteristics of the seed that prevent water uptake (Soltani et al. 2021). The true mechanism of seed dormancy in the African eggplant is not known, and no candidate genes have been reported. There are indications that both primary and secondary physiological dormancy could be at play in the African eggplant. Light and alternating temperatures, as well as Gibberellic acid (GA3) and Potassium nitrate (KNO3), have been shown to improve germination of some dormant African eggplant genotypes (Mshida 2014). This same study also reported an improvement in germination when seeds were harvested at physiological maturity. Under primary physiological dormancy, four transcription factors, ABSCISIC ACID INSENSITIVE 3 (ABI3), FUSCA 3 (FUS3), LEAFY COTYLEDON 1 (LEC1), and LEC2, have been implicated to play a central role (Holdsworth et al. 2008). Indirect regulators such as

401

VIVIPAROUS 8 (VP8) in maize (Zea mays L.) (Suzuki et al. 2008), PLASTOCHRON 3/GOLIATH (PLA3/GO) in rice (Oryza sativa L.) (Kawakatsu et al. 2009), and ALTERED MERISTEM PROGRAM 1 (AMP1) in Arabidopsis (Griffiths et al. 2011) have also been reported (Graeber et al. 2012). There are other dormancy-specific genes that have been cloned from QTL studies including DELAY OF GERMINATION 1 (DOG1), SEED DORMANCY 4 (SDR4), DESPIERTO (DEP), and ATHB20, among others that regulate dormancy at chromatin level (Bentsink et al. 2006, 2010; Barrero et al. 2010; Sugimoto et al. 2010; Liu et al. 2011). There is a recent report of a candidate gene, PECTIN ACETYLESTERASE 8, controlling physical dormancy in common bean (Soltani et al. 2021). We retrieved these genes from public databases and used them as query to identify orthologous candidate genes in the African eggplant using the blastp option of NCBI local blast (Altschul et al. 1990). We retrieved 49 orthologs from the African eggplant genome that had at least 50% identity with the query and an evalue cutoff of 1e-10 (Table 21.5).

21.3

Why is the African Eggplant Underutilized?

21.3.1 Undesirable Traits The African eggplant is still largely semidomesticated or grown with little to no improvement among the communities that grow it. Some of the domestication traits that are still under selection within the African eggplant complex include prickliness, small seed size, seed dormancy, and fruit bitterness. Wang et al. (2008) observed that primitive eggplants were bitter with small fruits in comparison with more advanced cultivars. Larger seeds are easier to sow, harvest, and process, while small seeds are often associated with reduced fitness (Zhang et al. 2015). There are reports in brinjal eggplants and other species revealing that some of these domestication traits are controlled by a few major genes, including prickliness (Doganlar et al. 2002). Such

402

S. M. Moenga and D. Achieng Odeny

Table 21.5 Candidate orthologous genes for seed dormancy retrieved from S. aethiopicum-annotated draft genome Gene

S. aethiopicum ortholog

Putative function

References

ABI3

Soaet10041614; Soaet10041613

Seed development regulator

Tian et al. (2020)

LEC1

Soaet10013374; Soaet10009072; Soaet10020004; Soaet10023771; Soaet10007523; Soaet10007525; Soaet10036449; Soaet10011094; Soaet10005103;

Soaet10021413; Soaet10008414; Soaet10023770; Soaet10004833; Soaet10006568; Soaet10002119; Soaet10032863; Soaet10020322; Soaet10030573

Embryo and seed development

Jo et al. (2019)

VP8; PLA3/GO; AMP1

Soaet10031855; Soaet10031857; Soaet10031858

Embryo and endosperm development

Suzuki et al. (2008), Kawakatsu et al. (2009), Griffiths et al. (2011)

DOG1

Soaet10004210; Soaet10023421; Soaet10026319; Soaet10016738; Soaet10031650;

Regulator of seed dormancy

Bentsink et al. (2006), Soppeand Bentsink (2020)

ATHB10

Soaet10018968; Soaet10018084; Soaet10021078; Soaet10026529; Soaet10013784

Involved in ABA sensitivity during germination

Barrero et al. (2010)

PECTIN ACETYLESTERASE 8

Soaet10037428; Soaet10030404; Soaet10013589; Soaet10007206; Soaet10007048; Soaet10005598

Controls physical seed dormancy

Soltani et al. (2021)

Soaet10023422; Soaet10023423; Soaet10039919; Soaet10023420; Soaet10023425

Soaet10030407; Soaet10016986; Soaet10009746; Soaet10007205; Soaet10005761;

reports suggest that these undesirable traits would be easy to improve. For example, the development of a pure line with reduced prickliness is possible to achieve in backcross generations (Hurtado et al. 2014). Better-structured and well-resourced breeding programs are needed to enhance these domestication traits and expand the utilization of the African eggplant.

21.3.2 Introduction of Exotic, Higher Yielding Vegetable Species The introduction of improved foreign vegetables such as those of the Brassicaceae and

Amaranthaceae family have led to a major decline in the use of the local indigenous vegetables (Kedling et al. 2008; Ekesa et al. 2009) such as the African eggplant. This is despite the local vegetables being more adapted to the arid and semi-arid conditions of Africa and require very little input for production (Stöber et al. 2017). The exotic vegetables might be high input demanding, but are especially preferred due to optimized agronomic packages that allow for mechanization and ease of large-scale production. Although the situation is slowly changing for local vegetables in a few African countries, where traditional vegetables can now be sold in supermarkets and high end restaurants, there is need for more genetic, agronomic, and marketing

21

The African Eggplant

403

research to be done to make them competitive (Cernansky 2015). The African eggplant’s adaptability to varied climates, fast maturity, and tolerance to shade further makes it an important candidate for urban farming, an emerging source of income across the African continent.

digitized and made available even to the remotest villages as long as the farmers have access to mobile phones. Without improved extension services and established seed systems, the African eggplant will remain underutilized and its nutrition benefits unexploited.

21.3.3 Low Research Investments

21.3.5 Lack of Structured Marketing Channels

Like many orphan crops with discrete distribution across the African landscape, this species has received little research attention and overall little monetary investment by relevant bodies and governments (Hendre et al. 2019). This continued neglect by the research and development community has resulted in limited improvement and subsequent lack of improved varieties and low demand from farmers and consumers. Most African governments and majority of donors use priority-setting methodologies for agricultural research investment that rely on areas of production and numbers of beneficiaries (Kamenya et al. 2021). Such a methodology automatically excludes important crops such as the African eggplant. A different framework for prioritizing agricultural research investment needs to be considered in order to capture nutritious and climate resilient vegetables such as the African eggplant.

The African eggplant’s value chain is highly underdeveloped resulting in poor quality products that are likely to be undesirable to the consumers. Post-harvest storage of both leafy and fruity African eggplant is not yet well established in most countries and there are hardly any processed products in the market. The increasing demand for healthy products in the west and among the growing middle and upper class in Africa has the potential to drive the demand for this crop beyond the local small-scale consumption in rural areas. More investment will need to be done to improve the complete value chain and make the crop profitable for farmers. The release of improved varieties such as MShumaa (released in Tanzania), Soxna, and L10 (both released in Mali) that are sweet and/or less bitter and high yielding (Taher et al. 2017) are a great start toward developing relevant product profiles for the African eggplant.

21.3.4 Non-existent Extension Services and Seed Systems

21.3.6 Lack of Genetic and Genomic Resources

There are no known extension services for the African eggplant in most of the African countries where this crop is grown. This makes it difficult for the farmers to enhance their farm technical and managerial skills (Danso-Abbeam et al. 2018). The seed systems are fully informal, except in a few cases where the World Vegetable Centre has provided some support. However, most farmers recycle or exchange seeds locally resulting in poor quality seeds with poor germination, weak seedlings, and subsequently, poor harvest. Extension services can be easily

Availability of genomic resources has the potential of accelerating gene discovery and crop improvement in the African eggplant. Yet up to date, there is no linkage map for the African eggplant and crop improvement is fully conventional. Until recently, there was no genome sequence and genetic characterization was mostly morphological. As a result, the process of cultivar development has been extremely slow and random. Given the high variation and the frequent gene flow within the scarlet eggplant complex, morphological characterization is not

404

S. M. Moenga and D. Achieng Odeny

enough to enable full characterization and exploitation of the desirable traits. There is now renewed hope for advanced breeding in the African eggplant with the availability of the draft reference genome discusses in Sect. 21.2.

21.4

Future Goals and Prospects

The African eggplant is increasingly gaining importance, both within Africa and outside the continent as an alternative healthy, nutritious and resilient vegetable. The increasing demand needs to be matched with a robust and efficient breeding program that can lead to the release of varieties with the desired product profiles. The existing unstructured and largely conventional breeding programs that focus on selection of landraces will need to be replaced with modern and efficient breeding structures. The availability of a well-annotated draft genome presents a great opportunity for crop improvement and comparative genetic studies with other closely related species. Breeders will need to embark on an immediate effort to develop relevant mapping populations that will enable the full anchoring of the draft genome and enhance the process of gene discovery and characterization. Modern breeding techniques such as speed breeding (Chiurugwi et al. 2019) and high throughput phenotyping (Mir et al. 2019) in combination with genomic selection (Goddard 2009) promise to increase the genetic gain in several crops, but especially in orphan crops such as the African eggplant where no previous efforts exist. Genome editing (Zhang et al. 2017) alongside mutation breeding and TILLING (Szarejko et al. 2017) will go a long way in helping enhance domestication traits such as seed dormancy, small seed size in leafy cultivar groups, and fruit bitterness in edible fruity African eggplant cultivars. However, genome editing will be best accomplished with a fully anchored genome sequence and an efficient regeneration protocol (Gisbert et al. 2006) that is within reach for breeders. The current draft genome will enable better characterization of novel traits (such as R genes) from the African eggplant for use by

other closely related species. Here, we have demonstrated the power of the reference genome in the identification of candidate orthologous genes for seed dormancy. The same procedure can be deployed for other traits of interest and should be coupled with proper functional characterization. In order for the proposed activities to be successful, respective governments, especially in Africa, will need to prioritize the improvement of the African eggplant and many other local, nutritious, and resilient vegetables by allocating appropriate research funds to the responsible institutions.

References Acquadro A, Barchi L, Gramazio P et al (2017) Coding SNPs analysis highlights genetic relationships and evolution pattern in eggplant complexes. PLoS ONE 12(7):e0180774. https://doi.org/10.1371/journal.pone. 0180774 Adeniji OT, Kusolwa PM, Reuben SO (2012) Genetic diversity among accessions of Solanum aethiopicum L. groups based on morpho-agronomic traits. Plant Genet Resour 110(3):177 Adeniji OT, Kusolwa PM, Reuben SO (2013) Morphological descriptors and micro satellite diversity among scarlet eggplant groups. Afr Crop Sci J 21(1):37–49 Adeniji OT, Kusolwa PM, Reuben SW (2018) Genetic diversity based on agronomic and fruit quality traits in gboma eggplant (Solanum macrocarpon L.). Bangladesh J Agric Res 43(1):25–38 Afful NT, Nyadanu D, Akromah R et al (2018) Evaluation of crossability studies between selected eggplant accessions with wild relatives S. torvum, S. anguivi and S. aethopicum (Shum group). J Plant Breed Crop Sci 10(1):1–2 Afful NT, Nyadanu D, Akromah R et al (2019) Nutritional and antioxidant composition of eggplant accessions in Ghana. African Crop Sci J 27(2):193– 211 Aguoru CU, Omoigui LO, Olasan JO (2015) Molecular characterization of Solanum species (Solanum aethiopicum complex; Solanum macrocarpon and Solanum anguivi) using multiplex RAPD primers. J Plant Stud 4(1):27 Altschul SF, Gish W, Miller W et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410 Anaso HU (1991) Comparative cytological study of Solanum aethiopicum Gilo group, Solanum aethiopicum Shum group and Solanum anguivi. Euphytica 53:81–85. https://doi.org/10.1007/BF00023786 Aubriot X, Singh P, Knapp S (2016) Tropical Asian species show that the old world clade of ‘Spiny

21

The African Eggplant

solanums’ (Solanum subgenus Leptostemonum pro parte: Solanaceae) is not monophyletic. Botan J Linn Soc 181:199–223. https://doi.org/10.1111/boj.12412 Barchi L, Pietrella M, Venturini L et al (2019) A chromosome-anchored eggplant genome sequence reveals key events in Solanaceae evolution. Sci Rep 9:11769. https://doi.org/10.1038/s41598-019-47985-w Barchi L, Toppino L, Valentino D et al (2018). QTL analysis reveals new eggplant loci involved in resistance to fungal wilts. Euphytica 214(2):1–15 Barrero JM, Millar AA, Griffiths J et al (2010) Gene expression profiling identifies two regulatory genes controlling dormancy and ABA sensitivity in Arabidopsis seeds. Plant J 61:611–622. https://doi.org/10. 1111/j.1365-313X.2009.04088.x Bentsink L, Hanson J, Hanhart CJ et al (2010) Natural variation for seed dormancy in Arabidopsis is regulated by additive genetic and molecular pathways. Proc Nat Acad Sci USA 107:4264–4269. https://doi. org/10.1073/pnas.1000410107 Bentsink L, Jowett J, Hanhart CJ et al (2006) Cloning of DOG1, a quantitative trait locus controlling seed dormancy in Arabidopsis. Proc Nat Acad Sci USA 103:17042–17047. https://doi.org/10.1073/pnas.0607 877103 Bukenya-Ziraba R (2004) Solanum anguivi Lam. In: Grubben GJH, Denton OA (ed) Protabase. PROTA (Plant Resources of Tropical Africa), Wageningen, The Netherlands. http://database.prota.org Cernansky R (2015) The rise of Africa’s super vegetables. Nature News 522(7555):146 Chang Y, Liu H, Liu M, Liao X et al (2018) The draft genomes of five agriculturally important African orphan crops. Gigascience 8:1–16. https://doi.org/10. 1093/gigascience/giy152 Chiurugwi T, Kemp S, Powell W et al (2019) Speed breeding orphan crops. Theor Appl Genet 132:607– 616. https://doi.org/10.1007/s00122-018-3202-7 Collonnier C, Mulya K, Fock I et al (2001) Source of resistance against Ralstonia solanacearum in fertile somatic hybrids of eggplant (Solanum melongena L.) with Solanum aethiopicum L. Plant Sci 160:301–313. https://doi.org/10.1016/S0168-9452(00)00394-0 Cui J, Yang Y, Luo S et al (2020) Whole-genome sequencing provides insights into the genetic diversity and domestication of bitter gourd (Momordica spp.). Hortic Res 7:85. https://doi.org/10.1038/s41438-0200305-5 Danso-Abbeam G, Ehiakpor DS, Aidoo R (2018) Agricultural extension and its effects on farm productivity and income: insight from Northern Ghana. Agric Food Secur 7:74. https://doi.org/10.1186/s40066-018-0225-x D’Arcy WG (1992) Solanaceae of Madagascar: form and geography. Ann Mo Bot Gard 1:29–45 Daunay MC, Lester RN, Laterrot HE (1991) The use of wild species for the genetic improvement of brinjal egg-plant (Solanum melongena) and tomato (Lycopersicon esculentum). The R Bot Gardens Doganlar S, Frary A, Daunay MC et al (2002) Conservation of gene function in the Solanaceae as revealed

405 by comparative mapping of domestication traits in eggplant. Genetics 161:1713–1726 Edwards KD, Fernandez-Pozo N, Drake-Stowe K et al (2017) A reference genome for Nicotiana tabacum enables map-based cloning of homeologous loci implicated in nitrogen utilization efficiency. BMC Genomics 18:448. https://doi.org/10.1186/s12864017-3791-6 Ekesa BN, Walingo MK, Onyango MO (2009) Accesibility to and consumption of indigenous vegetables and fruits by rural households in Matungu division, western Kenya. African J Food Agric Nutr Dev 9:8 Elekofehinti OO, Kamdem JP, Boligon AA et al (2013) African eggplant (Solanum anguivi Lam) fruits with bioactive polyphenolic compounds exert in vitro antioxidant properties and inhibit Ca2+ induced mitochondrial swelling. Asian Pac J Trop Biomed 3:757– 766. https://doi.org/10.1016/S2221-1691(13)60152-5 Finch-Savage WE, Leubner-Metzger G (2006) Seed dormancy and the control of germination. New Phytol 171:501–523. https://doi.org/10.1111/j.1469-8137. 2006.01787.x Food and Agriculture Organization of the United Nations (2021) African garden eggplant. In: African garden eggplant. Food and Agriculture Organization of the United Nations. http://www.fao.org/traditional-crops/ africangardenegg/en/. Accessed 6 Mar 2021 Furini A, Wunder J (2004) Analysis of eggplant (Solanum melongena)-related germplasm: morphological and AFLP data contribute to phylogenetic interpretations and germplasm utilization. Theoret Appl Genet 108 (2):197–208. https://doi.org/10.1007/s00122-003-1439-1 GENESYS (2021) The global gateway to genetic resources. Available online at: https://www.genesyspgr.org. Accessed 2 April 2021 Gisbert C, Prohens J, Nuez F (2006) Efficient regeneration in two potential new crops for subtropical climates, the scarlet (Solanum aethiopicum) and gboma (S. macrocarpon) eggplants. New Zealand J Crop Hortic Sci 1:55–62. https://doi.org/10.1080/01140671.2006. 9514388 Gramazio P, Blanca J, Ziarsolo P et al (2016) Transcriptome analysis and molecular marker discovery in Solanum incanum and S. aethiopicum, two close relatives of the common eggplant (Solanum melongena) with interest for breeding. BMC Genomics 17:300. https://doi.org/10.1186/s12864-016-2631-4 Goddard M (2009) Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136:245–257. https://doi.org/10.1007/s10709008-9308-0 Graeber K, Nakabayashi K, Miatton E, Leubner-Metzger G et al (2012) Molecular mechanisms of seed dormancy. Plant Cell Env 35:1769–1786. https://doi. org/10.1111/j.1365-3040.2012.02542.x Graham MA, Silverstein KAT, VandenBosch KA (2008) Defensin‐like genes: genomic perspectives on a diverse superfamily in plants. Crop Sci 48:S‐3–S‐11 Griffiths J, Barrero JM, Taylor J et al (2011) ALTERED MERISTEM PROGRAM 1 is involved

406 in development of seed dormancy in Arabidopsis. PLoS ONE 6:e20408. https://doi.org/10.1371/journal. pone.0020408 Hane JK, Ming Y, Kamphuis LG et al (2017) A comprehensive draft genome sequence for lupin (Lupinus angustifolius), an emerging health food: insights into plant–microbe interactions and legume evolution. Plant Biotec J 15(3):318-330.https://doi. org/10.1111/pbi.12615 Hendre PS, Muthemba S, Kariba R et al (2019) African Orphan Crops Consortium (AOCC): status of developing genomic resources for African orphan crops. Planta 250:989–1003. https://doi.org/10.1007/s00425019-03156-9 Hirakawa H, Shirasawa K, Miyatake K et al (2014) Draft genome sequence of eggplant (Solanum melongena L.): the representative Solanum species indigenous to the old world. DNA Res 21:649–660. https://doi.org/ 10.1093/dnares/dsu027 Holdsworth MJ, Finch-Savage WE, Grappin P et al (2008) Post-genomics dissection of seed dormancy and germination. Trends Plant Sci 1:7–13. https://doi. org/10.1016/j.tplants.2007.11.002 Hosmani PS, Flores-Gonzalez M, van de Geest H et al (2019) An improved de novo assembly and annotation of the tomato reference genome using single-molecule sequencing, Hi-C proximity ligation and optical maps. bioRxiv [Preprint] https://doi.org/10.1101/767764 Hurtado M, Vilanova S, Plazas M et al (2014) Enhancing conservation and use of local vegetable landraces: the Almagro eggplant (Solanum melongena L.) case study. Genet Resour Crop Evol 61:787–795. https:// doi.org/10.1007/s10722-013-0073-2 Jamnadass R, Mumm RH, Hale I et al (2020) Enhancing African orphan crops with genomics. Nat Genet 52:356– 360. https://doi.org/10.1038/s41588-020-0601-x Jo L, Pelletier JM, Harada JJ (2019) Central role of the LEAFY COTYLEDON1 transcription factor in seed development. J Integr Plant Biol 61:564–580. https:// doi.org/10.1111/jipb.12806 Kajitani R, Yoshimura D, Okuno M et al (2019) Platanusallee is a de novo haplotype assembler enabling a comprehensive access to divergent heterozygous regions. Nat Commun 10:1702. https://doi.org/10. 1038/s41467-019-09575-2 Kamenya SN, Mikwa EO, Song B et al (2021) Genetics and breeding for climate change in orphan crops. Theoret Appl Genet 23:1–29. https://doi.org/10.1007/ s00122-020-03755-1 Kathpalia R, Bhatla SC (2018) Seed dormancy and germination. In: Plant physiology, development and metabolism. Springer, Singapore, pp 885–906. https:// doi.org/10.1007/978-981-13-2023-1_28 Kawakatsu T, Taramino G, Itoh J et al (2009) PLASTOCHRON3/GOLIATH encodes a glutamate carboxypeptidase required for proper development in rice. Plant J 58:1028–1040. https://doi.org/10.1111/j. 1365-313X.2009.03841.x Kedling G, Swai IS, Virchow D (2008) Traditional versus exotic vegetables in Tanzania. In: Smartt J, Haq N

S. M. Moenga and D. Achieng Odeny (eds) New crops and uses: their role in a rapidly changing world. Centre for Underutilised Crops, Southampton, pp 150–166 Knapp S, Aubriot X, Prohens J (2019) Eggplant (Solanum melongena L.): taxonomy and relationships. In: Chapman M (ed) The eggplant genome. Compendium of plant genomes. Springer, Cham. https://doi.org/10. 1007/978-3-319-99208-2_2 Kim S, Park M, Yeom SI et al (2014) Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nat Genet 46:270– 278. https://doi.org/10.1038/ng.2877 Kouassi A, Béli-Sika E, Tian-Bi TY-N et al (2014) Identification of three distinct eggplant subgroups within the Solanum aethiopicum Gilo group from Côte d’Ivoire by morpho-agronomic characterization. Agriculture 4:260–273. https://doi.org/10.3390/agriculture 4040260 Lester RN (1986) Taxonomy of scarlet eggpants, Solanum aethiopicum L. Acta Hortic 123–132. https://doi.org/ 10.17660/ActaHortic.1986.182.15 Lester RN, Daunay MC (2003) Diversity of African vegetable Solanum species and its implications for a better understanding of plant domestication. Schriften Zu Genetischen Ressourcen 22(2003):137–152 Lester RN, Hakiza JJH, Stavropoulos N, Teixiera MM (1986) Variation patterns in the African scarlet eggplant, Solanum aethiopicum L. In: Styles BT (ed) Infraspecific classification of wild and cultivated plants. Oxford, UK, pp 283–307 Lester RN, Jaeger PM, Bleijendaal S et al (1990) African eggplants: a review of collecting in West Africa. Plant Genet Resour Newslett 81:17–26 Lester RN, Niakan L (1986) Origin and domestication of the scarlet eggplant, Solanum aethiopicum, from S. anguivi in Africa. Columbia University Press Lester RN, Seck A (2004) Solanum aethiopicum L. In: Protabase. PROTA (Plant Resources of Tropical Africa/Ressources végétales de l’Afrique tropicale). Available online at. http://www.prota.org/search.htm. Accessed 2 April 2021 Levin RA, Myers NR, Bohs L (2006) Phylogenetic relationships among the “Spiny solanums” (Solanum subgenus Leptostemonum, Solanaceae). Am J Bot 93:157–169. https://doi.org/10.3732/ajb.93.1.157 Liu J, Yang Y, Zhou X et al (2018) Genetic diversity and population structure of worldwide eggplant (Solanum melongena L.) germplasm using SSR markers. Genet Resour Crop Evol 65:1663–1670. https://doi.org/10. 1007/s10722-018-0643-4 Liu Y, Geyer R, van Zanten M et al (2011) Identification of the Arabidopsis REDUCED DORMANCY 2 gene uncovers a role for the polymerase associated factor 1 complex in seed dormancy. PLoS ONE 6:e22241. https://doi.org/10.1371/journal.pone.0022241 Maundu P, Achigan-Dako E, Morimoto Y (2009) Biodiversity of African vegetables. In: Shackleton CM, Pasquini MW, Drescher AW (eds) African indigenous vegetables in urban agriculture. Earthscan, London, pp 65–104

21

The African Eggplant

Meyers BC, Kozik A, Griego A et al (2003) Genomewide analysis of NBS-LRR–encoding genes in Arabidopsis. Plant Cell 15:809–834. https://doi.org/10. 1105/tpc.009308 Miamoto JBM, Aazza S, Ruas NR et al (2020) Optimization of the extraction of polyphenols and antioxidant capacities from two types of Solanum gilo raddi using response surface methodology. J Appl Res Med and Aromat Plants 16:100238. https://doi.org/10. 1016/j.jarmap.2019.100238 Mir RR, Reynolds M, Pinto F et al (2019) Highthroughput phenotyping for crop improvement in the genomics era. Plant Sci 282:60–72. https://doi.org/10. 1016/j.plantsci.2019.01.007 Miyatake K, Shinmura Y, Matsunaga H et al (2019) Construction of a core collection of eggplant (Solanum melongena L.) based on genome-wide SNP and SSR genotypes. Breeding Sci 69(3):498–502. https://doi. org/10.1270/jsbbs.18202 Mshida DA (2014) Enhancement of seed germination in the African eggplant (Solanum aethiopicum L.). University of Eldoret, Kenya National Research Council (2006) Lost crops of Africa. Vegetables, vol II. The National Academies Press, Washington, DC. https://doi.org/10.17226/11763 Naroui Rad MR, Ghalandarzehi A, Koohpaygani JA (2017) Predicting eggplant individual fruit weight using an artificial neural network. Int J Veg Sci 23 (4):331–339 Nkansah GO, Ahwireng AK, Amoatey C et al (2013) Grafting onto African eggplant enhances growth, yield and fruit quality of tomatoes in tropical forest ecozones. J Appl Hortic 15:16–20. https://doi.org/10. 37855/jah.2013.v15i01.03 Ofori K, Gamedoagbao DK (2005) Yield of scarlet eggplant (Solanum aethiopicum L.) as influenced by planting date of companion cowpea. Sci Hortic (amsterdam) 105:305–312. https://doi.org/10.1016/j. scienta.2005.02.003 Osei MK, Oluoch MO, Osei CK et al (2010) Morphological characterisation of African eggplant (Solanum spp.) germplasm in some African countries. Agric Innov Sust Dev 163 Page AML, Daunay M-C, Aubriot X, Chapman MA (2019) Domestication of eggplants: a phenotypic and genomic insight. In: Chapman MA (ed) The eggplant genome. Springer, Cham, pp 193–212 Plazas M, Andújar I, Vilanova S et al (2014) Conventional and phenomics characterization provides insight into the diversity and relationships of hypervariable scarlet (Solanum aethiopicum L.) and gboma (S. macrocarpon L.) eggplant complexes. Front Plant Sci 5:318. https://doi.org/10.3389/fpls.2014.00318 POWO (2019) Plants of the world online. Kew Royal Botanic Gardens. Available online at. http://www. plantsoftheworldonline.org/taxon/urn:lsid:ipni.org: names:818158-1. Accessed 23 Feb 2021 Prohens J, Plazas M, Raigón MD et al (2012) Characterization of interspecific hybrids and first backcross generations from crosses between two cultivated

407 eggplants (Solanum melongena and S. aethiopicum Kumba group) and implications for eggplant breeding. Euphytica 186:517–538. https://doi.org/10.1007/ s10681-012-0652-x Richard MMS, Thareau V, Chen NWG et al (2017) What is present at common bean subtelomeres? Large resistance gene clusters, Knobs and Khipu Satellite DNA. In: Pérez de la Vega M, Santalla M, Marsolais F (eds) The common bean genome. Compendium plant genomes. Springer, Cham. https://doi. org/10.1007/978-3-319-63526-2_9 Rizza F, Mennella G, Collonnier C et al (2002) Androgenic dihaploids from somatic hybrids between Solanum melongena and S. aethiopicum group gilo as a source of resistance to Fusarium oxysporum f. sp. melongenae. Plant Cell Rep 20:1022–1032. https:// doi.org/10.1007/s00299-001-0429-5 Rhoads A, Au KF (2015) PacBio sequencing and its applications. Genomics Proteomics Bioinf 13:278– 289. https://doi.org/10.1016/j.gpb.2015.08.002 Sakata Y, Lester RN (1997) Chloroplast DNA diversity in brinjal eggplant (Solanum melongena L.) and related species. Euphytica 97:295–301. https://doi.org/10. 1023/A:1003000612441 Sakhanokho HF, Islam-Faridi MN (2014) Spontaneous autotetraploidy and its impact on morphological traits and pollen viability in Solanum aethiopicum. HortScience 49(8):997–1002. https://doi.org/10. 21273/HORTSCI.49.8.997 Sakhanokho HF, Islam-Faridi MN, Blythe EK et al (2014) Morphological and cytomolecular assessment of intraspecific variability in scarlet eggplant (Solanum aethiopicum L.). J Crop Imp 28(4):437–53. https://doi. org/10.1080/15427528.2014.913280 San José R, Plazas M, Sánchez-Mata MC et al (2016) Diversity in composition of scarlet (S. aethiopicum) and gboma (S. macrocarpon) eggplants and of interspecific hybrids between S. aethiopicum and common eggplant (S. melongena). J Food Compos Anal 45:130–140. https://doi.org/10.1016/j.jfca.2015. 10.009 Sánchez-Mata M-C, Yokoyama WE, Hong Y-J, Prohens J (2010) Alpha-solasonine and alpha-solamargine contents of gboma (Solanum macrocarpon L.) and scarlet (Solanum aethiopicum L.) eggplants. J Agric Food Chem 58:5502–5508. https://doi.org/10.1021/jf100709g Särkinen T, Bohs L, Olmstead RG, Knapp S (2013) A phylogenetic framework for evolutionary study of the nightshades (Solanaceae): a dated 1000-tip tree. BMC Evol Biol 13:214. https://doi.org/10.1186/1471-214813-214 Sierro N, Battey J, Ouadi S et al (2014) (2014) The tobacco genome sequence and its comparison with those of tomato and potato. Nat Commun 5:3833. https://doi.org/10.1038/ncomms4833 Schippers RR (2000) African indigenous vegetables: an overview of the cultivated species. University of Greenwich, Natural Resources Institute, London, p 222 Soltani A, Walter KA, Wiersma AT et al (2021) The genetics and physiology of seed dormancy, a crucial

408 trait in common bean domestication. BMC Plant Biol 21:58. https://doi.org/10.1186/s12870-021-02837-6 Song B, Song Y, Fu Y et al (2019) Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome. Gigascience 8:1–16. https:// doi.org/10.1093/gigascience/giz115 Soppe WJJ, Bentsink L (2020) Seed dormancy back on track; its definition and regulation by DOG1. New Phytol 228:816–819. https://doi.org/10.1111/nph.16592 Sękara A, Cebula S, Kunicki E (2007) Cultivated eggplants–origin, breeding objectives and genetic resources, a review. Folia Horticulturae 19:97–114 Stern S, Agra M de F, Bohs L (2011) Molecular delimitation of clades within new world species of the “Spiny solanums” (Solanum subg. Leptostemonum). Taxon 60:1429–1441. https://doi.org/10. 1002/tax.605018 Stöber S, Chepkoech W, Neubert S et al (2017) Adaptation pathways for African indigenous vegetables’ value chains. In: Filho WL, Belay S, Kalangu J et al (eds) Climate change adaptation in Africa. Springer, Cham, pp 413–433 Suaste-Dzul A, Veloso JS, Reis A (2021) First report of Verticillium dahliae race 2 on scarlet eggplant in Brazil. J Plant Pathol. https://doi.org/10.1007/s42161021-00754-z Sugimoto K, Takeuchi Y, Ebana K et al (2010) Molecular cloning of Sdr4, a regulator involved in seed dormancy and domestication of rice. Proc Natl Acad Sci USA 107:5792–5797 Sun H, Wu S, Zhang G et al (2017) Karyotype stability and unbiased fractionation in the paleo-allotetraploid Cucurbita genomes. Mol Plant 10:1293. https://doi. org/10.1016/j.molp.2017.09.003 Sunseri F, Polignano GB, Alba V et al (2010) Genetic diversity and characterization of African eggplant germplasm collection. African J Plant Sci 4:231–241. https://doi.org/10.5897/AJPS.9000128 Suzuki M, Latshaw S, Sato Y et al (2008) The maize Viviparous8 locus, encoding a putative ALTERED MERISTEM PROGRAM1-like peptidase, regulates abscisic acid accumulation and coordinates embryo and endosperm development. Plant Physiol 146:1193– 1206. https://doi.org/10.1104/pp.107.114108 Svalbard Global Seed Vault (2021) Seed Portal. http:// www.nordgen.org/sgsv/. Accessed 22 Mar 2021 Szarejko I, Szurman-Zubrycka M, Navrot M et al (2017) Biotechnologies for plant mutation breeding: protocols. In: Jankowicz-Cieslak J, Tai TH, Kumlehn J, Till BJ (eds) Creation of a tilling population in barley after chemical mutagenesis with sodium Azide and MNU. Springer, Chem, pp 91–111 Taher D, Solberg SØ, Prohens J et al (2017) World vegetable center eggplant collection: origin, composition, seed dissemination and utilization in breeding. Front Plant Sci 8:1484. https://doi.org/10.3389/fpls. 2017.01484 Tamiru M, Natsume S, Takagi H, White B et al (2017) Genome sequencing of the staple food crop white

S. M. Moenga and D. Achieng Odeny Guinea yam enables the development of a molecular marker for sex determination. BMC Biol 15:1–20. https://doi.org/10.1186/s12915-017-0419-x The Potato Genome Sequencing Consortium, BGIShenzhen, et al (2011) Genome sequence and analysis of the tuber crop potato. Nature 475:189–195. https:// doi.org/10.1038/nature10158 The Tomato Genome Consortium, Kazusa DNA Research Institute, Sato S et al (2012) The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635–641. https://doi.org/10.1038/nature11119 Tian R, Wang F, Zheng Q et al (2020) Direct and indirect targets of the arabidopsis seed transcription factor ABSCISIC ACID INSENSITIVE3. Plant J 103:1679– 1694. https://doi.org/10.1111/tpj.14854 Toppino L, Valè G, Rotino GL (2008) Inheritance of Fusarium wilt resistance introgressed from Solanum aethiopicum Gilo and Aculeatum groups into cultivated eggplant (S. melongena) and development of associated PCR-based markers. Mol Breeding 22:237– 250. https://doi.org/10.1007/s11032-008-9170-x van Berkum NL, Lieberman-Aiden E, Williams L et al (2010) Hi-C: A method to study the three-dimensional architecture of genomes. JoVE J vis Exp 6(39):e1869. https://doi.org/10.3791/1869 Vorontsova MS, Stern S, Bohs L, Knapp S (2013) African spiny Solanum (subgenus Leptostemonum, Solanaceae): a thorny phylogenetic tangle. Bot J Linn Soc 173:176–193. https://doi.org/10.1111/boj.12053 Wang J-X, Gao T-G, Knapp S (2008) Ancient Chinese literature reveals pathways of eggplant domestication. Annals Bot 102:891–897. https://doi.org/10.1093/aob/ mcn179 Wei Q, Wang J, Wang W et al (2020) A high-quality chromosome-level genome assembly reveals genetics for important traits in eggplant. Hortic Res 7:153. https://doi.org/10.1038/s41438-020-00391-0 Yang R-Y, Ojiewo C (2013) African nightshades and african eggplants: taxonomy, crop management, utilization, and phytonutrients. In: Juliani HR, Simon JE, Ho C-T (eds) African natural plant products, vol II. discoveries and challenges in chemistry, health, and nutrition. American Chemical Society, Washington, DC, pp 137–165 Yasui Y, Hirakawa H, Ueno M et al (2016) Assembly of the draft genome of buckwheat and its applications in identifying agronomically useful genes. DNA Res 23 (3):215–224.https://doi.org/10.1093/dnares/dsw012 Zhang D, Li J, Compton RO et al (2015) Comparative genetics of seed size traits in divergent cereal lineages represented by sorghum (Panicoidae) and rice (Oryzoidae). Genes Genomes Genet 5:1117–1128. https:// doi.org/10.1534/g3.115.017590 Zhang H, Zhang J, Lang Z et al (2017) Genome editing— principles and applications for functional genomics research and crop improvement. Crit Rev Plant Sci 36:291–309. https://doi.org/10.1080/07352689.2017. 1402989

Sequencing of the Bottle Gourd Genomes Enhances Understanding of the Ancient Orphan Crop

22

Ying Wang, Arun K. Pandey, Guojing Li, and Pei Xu

Abstract

Bottle gourd (Lagenaria siceraria) is an important vegetable crop as well as a rootstock for other cucurbit crops. It belongs to the Cucurbitaceae family which contains many other crops, including cucumber, melon, and watermelon. Despite its long history of cultivation, genomic research on bottle gourd started very late. Two bottle gourd reference genomes assembled from Illumina short reads were released, one for the rootstock-type, an Indian-origin accession “USVL1VR-Ls”, and the other for the food-type, a Chinese landrace “Hangzhou Gourd”. To overcome the disadvantages of short reads, including low (11,000 years ago) with a global distribution during pre-Columbian times (Heiser 1979; Kistler et al. 2014). Although originating from Africa, the bottle gourd was in use by humans in East Asia, the Americas, Europe, and the South Pacific. (Erickson et al. 2005; Schlumbaum and Vandorpe 2012; Kistler et al. 2014). Erickson et al. (2005) suggested that domesticated bottle gourd in the Americas was introduced from Asia. Further, the American domesticated bottle gourds were shown to have very likely arrived in the new world by transoceanic drift (Kistler et al. 2014). Bottle gourd populations are consistently grouped based on the morphological features such as fruit shape rather than geographical origin, whether in Turkish bottle gourd (Yildiz et al. 2015), Chinese bottle gourd (Xu et al. 2011), or Serbian bottle gourd (Mladenovic et al. 2012).

22.1.3 Accessible in Seed Banks Natural diversity in bottle gourd maintains a way to increase the genetic gain. Bottle gourd germplasm is preserved in several seed banks and is used in various researches by a multitude of institutions across the world. Approximately 1575 bottle gourd accessions are currently deposited in *13 major gene banks around the world (Table 22.1).

22.1.4 Why is It Underutilized? Bottle gourd is considered as an underutilized crop, especially in Sub-Saharan Africa, the center of bottle gourd, where bottle gourd is commonly

411

grown using unimproved landraces by the rural poorer to provide food and micronutrients to sustain livelihoods (Padulosi et al. 2002; Schonfeldt and Pretorius 2011; Mashilo et al. 2017). Extreme weather and low-agricultural productivity contribute to malnutrition and poverty in Sub-Saharan Africa, which leads to the neglect and underutilization of indigenous crops such as bottle gourd. In most cases, the neglect of research and development, as well as conservation, leads to the underutilization of species (Massawe et al. 2007). The limited genomic research and genetic improvement available for bottle gourd hinder the cultivation and utilization of this crop (Jain and Gupta 2013). Presently, scientific evaluation, conservation, strategic breeding, and research priorities are needed to promote the utilization of underutilized crops such as bottle gourd. Collection, characterization, and conservation of genetic resources are major components in crop improvement plans, especially in underutilized crops (Massawe et al. 2007).

22.1.5 What Qualities Does It Bring? In some areas, bottle gourd is used as a medicine, which can provide abundant health benefits, including diuretic, cardioprotective, and anticancerous agents (Ghule et al. 2007). Some bottle gourds are grown exclusively because of the essential amino acids and oil found in the seeds (Achigan-Dako et al. 2008). A young, tender bottle guard is used as a common vegetable due to its umami flavor when cooked, which contributes to its market value. Wu et al. (2017) reported that free glutamate content is a key factor conferring umami taste in the bottle gourd. Fruit bitterness is another taste trait in some landraces of bottle gourd, which is present as a defense against insects or herbivores (Balkema-Boomstra et al. 2003). Fruit bitterness not only affects the economic value of the bottle gourd but also causes severe food poisoning symptoms (Zhang 1981; Wu et al. 2017). This bitterness in the bottle gourd was considered to be controlled by a pair of complementary genes

412

Y. Wang et al.

Table 22.1 Bottle gourd germplasm resources maintained by various institutions globally S. No.

Gene bank

No. of accessions mentioned

Country

References

1

National Plant Germplasm System (NPGS)

74

USA

Decker-Walters et al. (2001)

2

International Plant Genetic Resource Institute (IPGRI)

425

Kenya

Morimoto et al. (2005)

3

Regional Station of the National Bureau of Plant Genetic Resources (NBPGR)

54

India

Sivaraj and Pandravada (2005)

4

United States Department of AgricultureAgricultural Research Service (USDA-ARS)

234

USA

Kousik et al. (2008)

5

Genbank of the Leibniz-Institute of Plant Genetics and Crop Research

117

Germany

6

University of Abobo-Adjamé

30

Côte d’Ivoire

Achigan-Dako et al. (2008) Koffi et al. (2009)

7

Bangladesh Agricultural Research Institute (BARI)

31

Bangladesh

Husna et al. (2011)

8

Zhejiang Academy of Agricultural Sciences

80

China

Xu et al. (2011, 2014)

9

University of Novi Sad

44

Serbia

Mladenovic et al. (2012)

10

National Bureau of Plant Genetic Resources (NBPGR)

42

India

Bhawna et al. (2014, 2015, 2016)

11

Erciyes University

362

Turkey

Gurcan et al. (2015)

12

University of KwaZulu-Natal

67

South Africa

Mashilo et al. (2015, 2016)

13

Center for Protected Cultivation Technology (CPCT), IARI

15

New Delhi

Kalyan et al. (2016)

(Zhang 1981). The bitterness genes were recently mapped, uncovering their relationships with known bitterness genes in related cucurbits (Wu et al. 2017).

parallel, for fruit shape mapping, an F2 population of 150 members derived from the crossing of HZ and YD-4, an inbred line bearing round fruits, was used.

22.2

22.2.2 Strategies and Tools for Sequencing

Genome Sequencing

22.2.1 Materials Used for Sequencing Two bottle gourd reference genomes were recently released (Wu et al. 2017; Wang et al. 2018), “USVL1VR-Ls” (hereafter, “USV”), an Indian-origin accession of the rootstock-type and other, of the food-type belongs to Chinese landrace “Hangzhou Gourd” (hereafter, “HZ”). Recently, a new genome sequence of inbred HZ bottle gourd was published (Xu et al. 2021). In

The first bottle gourd genome was published by using “USV” (Wu et al. 2017). Illumina (HiSeq 2500) paired-end and mate-pair libraries were used to provide cleaned sequences, representing 395  genome coverage based on the estimated genome size of 334 Mb. To assess assembly quality, Wu et al (2017) first mapped genomic DNA and RNA-Seq data to the genome. 89.0– 92.2% of the RNA-Seq reads from five different

22

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

tissues (flower, fruit, leaf, stem, and root) could be mapped to the genome in a proper paired-end relationship, and 94.7% of the reads from the 500-bp insert library could be mapped back in a proper paired-end connection. The Benchmarking Universal Single-Copy Orthologs (BUSCO) software was used to analyze assembly completeness in terms of gene content (Simão et al. 2015). The reconstructed bottle gourd genome contained around 96.9% of the highly conserved plant orthologues, with 95.4% being found complete (Wu et al. 2017). A combination of second- and thirdgeneration sequencing technologies was applied to draft the HZ-v1 genome. We obtained 314.6 Illumina shotgun reads and 12.8 PacBio single-molecule long reads from genomic DNA from the “HZ” inbred line. We assembled 1735 scaffolds (N50 = 1,534,212 bp) whose total length was 327.4 Mb (311.3 Mb after subtracting N calls), accounting for 97.7% of the genome size. Approximately 99.8% (326.8 Mb/ 327.4 Mb) of the assembled genome consisted of 695 scaffolds larger than 1 kb (mean length = 923.6 kb). The pseudo-chromosomes, assembled from 320 scaffolds (234 oriented, 86 unoriented) with a length of 295.5 Mb, constituted 90.3% of the assembled genome. Based on evaluating the discovery rate of the conserved core eukaryotic genes (CEGs) and the universal single-copy orthologs in the BUSCO’s plantae benchmark set (Simão et al. 2015), the completeness of the assembled genome was estimated to be 96.37% and 94.03%, respectively. The HZ-v1 genome was not published but has been integrated into the GourdBase (http://www. gourdbase.cn/), a genome-centered multi-omics database for the bottle gourd (Wang et al. 2018). The above published reference genomes were assembled mostly from Illumina short reads, resulting in low ( gi|700,204,047|gb| KGN59180.1| hypothetical protein Csa_3G778360 [Cucumis sativus]

PREDICTED: transcription repressor OFP6-like [Cucumis melo]

PREDICTED: transcription repressor OFP1 [Cucumis melo]

PREDICTED: transcription repressor OFP15-like [Cucumis sativus] > gi|700,193,086|gb| KGN48290.1| hypothetical protein Csa_6G454380 [Cucumis sativus]

transcription repressor OFP6like [Cucurbita moschata]

probable transcription repressor OFP9 [Momordica charantia]

DNA polymerase III gammatau subunit [Cucumis melo subsp. melo]

PREDICTED: transcription repressor OFP13-like [Cucumis melo]

PREDICTED: transcription repressor OFP12-like [Cucumis melo]

LsOFP3

28,062,254

MELO3C006531

1E-79

HG_GLEAN_10015982

Chr02

23,417,667

Annotation_NR

LsOFP2

23,417,053

E-value

HG_GLEAN_10015606

Chr01

Melon homolog

LsOFP1

E-value

HG_GLEAN_10012688

Tomato homolog

Suggested gene name

Bottle gourd gene model

Position end

Table 22.3 Composition and chromosomal location of the bottle gourd SUN, OVATE, YABBY, and WOX family genes Position start

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Chr

22 417

Suggested gene name

LsOFP10

LsOFP11

LsOFP12

LsOFP13

LsOFP14

LsOFP15

LsOFP16

LsOFP17

LsOFP18

LsOFP19

Bottle gourd gene model

HG_GLEAN_10020458

HG_GLEAN_10022831

HG_GLEAN_10023338

HG_GLEAN_10023346

HG_GLEAN_10010244

HG_GLEAN_10006715

HG_GLEAN_10004119

HG_GLEAN_10004872

HG_GLEAN_10001585

HG_GLEAN_10007946

Table 22.3 (continued)

Chr10

Chr09

chr08

chr08

Chr07

Chr06

chr05

chr05

chr05

Chr04

Chr

17,697,582

18,363,582

21,090,898

13,885,568

21,324,837

20,205,559

33,258,189

33,196,658

28,760,144

32,000,706

Position start

17,698,373

18,364,110

21,091,833

13,886,230

21,325,556

20,206,137

33,258,818

33,197,443

28,760,947

32,001,644

Position end

SlOFP31

SlOFP13

SlOFP12

SlOFP19

SlOFP8

SlOFP7

SlOFP18

Tomato homolog

1E-23

4E-29

1E-13

1E-23

3E-21

2E-49

3E-22

E-value

MELO3C025343

MELO3C026874

MELO3C010932

MELO3C019910

MELO3C004557

MELO3C024242

MELO3C024232

MELO3C015818

MELO3C009113

Melon homolog

1E-73

4E-49

2E-98

9E-90

3E-45

4E-102

5E-101

2E-82

4E-83

E-value

(continued)

PREDICTED: transcription repressor OFP8-like [Cucumis melo]

PREDICTED: transcription repressor OFP17 [Cucumis melo]

transcription repressor OFP7like [Cucurbita moschata]

PREDICTED: transcription repressor OFP13-like [Cucumis sativus] > gi|700,190,848|gb| KGN46052.1| hypothetical protein Csa_6G046300 [Cucumis sativus]

PREDICTED: transcription repressor OFP8-like [Cucumis melo]

PREDICTED: transcription repressor OFP12-like [Cucumis melo]

PREDICTED: uncharacterized protein LOC107991904 [Cucumis melo]

PREDICTED: transcription repressor OFP14 [Cucumis melo]

PREDICTED: transcription repressor OFP12-like [Cucumis melo]

transcription repressor OFP1like [Cucurbita maxima]

Annotation_NR

418 Y. Wang et al.

LsSun4

LsSun5

HG_GLEAN_10014262

Chr02

Chr02

8,963,322

6,473,846

23,626,824

8,965,117

6,477,097

23,628,297

SlSUN22

SlSUN26

SlSUN8

3E-80

5E-99

7E-47

MELO3C016880

MELO3C003773

MELO3C006504

MELO3C006884

0

0

0

0

(continued)

PREDICTED: protein IQDOMAIN 1 [Cucumis sativus] > gi|700,200,186|gb| KGN55344.1| hypothetical protein Csa_4G646180 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 32 isoform X1 [Cucumis melo]

PREDICTED: protein IQDOMAIN 14 [Cucumis sativus] > gi|700,201,814|gb| KGN56947.1| hypothetical protein Csa_3G146330 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 14 [Cucumis sativus] > gi|700,202,219|gb| KGN57352.1| hypothetical protein Csa_3G180400 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 1 isoform X2 [Cucumis sativus] > gi| 700,202,531|gb|KGN57664.1| hypothetical protein Csa_3G239280 [Cucumis sativus]

transcription repressor OFP7 [Momordica charantia]

HG_GLEAN_10013965

Chr01

20,215,398

9E-74

7E-41

LsSun3

20,212,061

MELO3C008499

MELO3C017554

HG_GLEAN_10012715

Chr01

1,747,411

1E-34

LsSun2

1,743,949

OVATE

Annotation_NR

HG_GLEAN_10012339

Chr01

2,190,179

E-value

LsSun1

2,189,256

Melon homolog

HG_GLEAN_10011027

chr11

E-value

LsOFP20

Tomato homolog

HG_GLEAN_10001965

Position end

Suggested gene name

Bottle gourd gene model

Position start

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Table 22.3 (continued)

Chr

22 419

Suggested gene name

LsSun6

LsSun7

LsSun8

LsSun9

LsSun10

LsSun11

LsSun12

Bottle gourd gene model

HG_GLEAN_10016028

HG_GLEAN_10017105

HG_GLEAN_10019768

HG_GLEAN_10020794

HG_GLEAN_10022272

HG_GLEAN_10023516

HG_GLEAN_10008892

Table 22.3 (continued)

Chr06

chr05

chr05

chr05

Chr04

Chr03

Chr03

Chr

523,829

34,893,364

22,517,054

2,483,013

25,306,930

11,054,880

2,249,227

Position start

526,814

34,895,189

22,520,348

2,498,437

25,309,468

11,059,996

2,253,427

Position end

SlSUN10

SlSUN13

SlSUN7

SlSUN12

SlSUN25

SlSUN5

Tomato homolog

8E-43

3E-92

3E-90

1E-48

3E-105

2E-74

E-value

MELO3C005888

MELO3C024434

MELO3C012442

MELO3C002201

MELO3C009812

MELO3C014258

MELO3C007235

Melon homolog

2E-128

1E-163

2E-96

0

1E-166

0

7E-145

E-value

(continued)

PREDICTED: protein IQDOMAIN 1-like [Cucumis sativus] > gi|700,194,684|gb| KGN49861.1| hypothetical protein Csa_5G139390 [Cucumis sativus]

hypothetical protein Csa_7G452320 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 1-like [Cucumis melo]

hypothetical protein Csa_1G045470 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 14 [Cucumis sativus] > gi|700,204,370|gb| KGN59503.1| hypothetical protein Csa_3G823020 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 1 [Cucumis melo]

PREDICTED: protein IQDOMAIN 1 [Cucumis melo] > gi|659,078,541|ref| XP_008439777.1| PREDICTED: protein IQDOMAIN 1 [Cucumis melo] > gi|659,078,545|ref| XP_008439779.1| PREDICTED: protein IQDOMAIN 1 [Cucumis melo]

Annotation_NR

420 Y. Wang et al.

LsSun17

LsSun18

LsSun19

HG_GLEAN_10004794

HG_GLEAN_10007188

Chr10

chr08

Chr06

2,346,498

20,468,884

25,781,355

23,699,179

2,347,946

20,479,568

25,786,154

23,702,046

SlSUN18

SlSUN14

SlSUN21

SlSUN24

5E-93

1E-129

1E-112

5E-72

MELO3C015418

MELO3C010997

MELO3C022423

MELO3C022253

MELO3C014290

MELO3C005137

3E-175

0

6E-162

0

0

0

(continued)

PREDICTED: protein IQDOMAIN 14-like [Cucumis melo]

PREDICTED: protein IQDOMAIN 32-like [Cucumis melo] > gi|659,088,067|ref| XP_008444784.1| PREDICTED: protein IQDOMAIN 32-like [Cucumis melo]

protein IQ-DOMAIN 14-like [Momordica charantia]

PREDICTED: protein IQDOMAIN 1 isoform X2 [Cucumis melo]

protein IQ-DOMAIN 14 [Momordica charantia]

PREDICTED: protein IQDOMAIN 14 [Cucumis sativus] > gi|778,702,757|ref| XP_011655257.1| PREDICTED: protein IQDOMAIN 14 [Cucumis sativus] > gi|700,195,922|gb| KGN51099.1| hypothetical protein Csa_5G440130 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 14 isoform X1 [Cucumis melo]

HG_GLEAN_10010776

Chr06

20,057,922

5E-76

8E-120

LsSun16

20,056,269

SlSUN3

MELO3C005636

HG_GLEAN_10010591

Chr06

8,854,809

4E-49

LsSun15

8,851,837

SlSUN23

Annotation_NR

HG_GLEAN_10010224

Chr06

3,194,478

E-value

LsSun14

3,191,623

Melon homolog

HG_GLEAN_10009682

Chr06

E-value

LsSun13

Tomato homolog

HG_GLEAN_10009178

Position end

Suggested gene name

Bottle gourd gene model

Position start

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Table 22.3 (continued)

Chr

22 421

Suggested gene name

LsSun20

LsSun21

LsSun22

LsWOX1

LsWOX2

Bottle gourd gene model

HG_GLEAN_10007640

HG_GLEAN_10002109

HG_GLEAN_10002296

HG_GLEAN_10013889

HG_GLEAN_10016037

Table 22.3 (continued)

Chr03

Chr02

chr11

chr11

Chr10

Chr

2,309,766

5,728,713

5,336,964

3,483,549

9,058,805

Position start

2,311,336

5,729,890

5,338,184

3,486,032

9,067,023

Position end

Solyc02g077390

SlSUN31

Tomato homolog

2E-42

3E-91

E-value

MELO3C007244

MELO3C017032

MELO3C013004

MELO3C017768

MELO3C009991

Melon homolog

6E-155

5E-124

2E-147

0

3E-167

E-value

(continued)

PREDICTED: WUSCHELrelated homeobox 9 isoform X1 [Cucumis sativus] > gi| 700,194,068|gb|KGN49272.1| hypothetical protein Csa_6G518270 [Cucumis sativus]

PREDICTED: WUSCHELrelated homeobox 8-like [Cucumis sativus] > gi| 700,200,353|gb|KGN55511.1| hypothetical protein Csa_4G663700 [Cucumis sativus]

PREDICTED: protein IQDOMAIN 14-like [Cucumis sativus] > gi|778,681,275|ref| XP_011651481.1| PREDICTED: protein IQDOMAIN 14-like [Cucumis sativus] > gi|778,681,280|ref| XP_011651482.1| PREDICTED: protein IQDOMAIN 14-like [Cucumis sativus] > gi|700,202,892|gb| …

PREDICTED: protein IQDOMAIN 14 isoform X2 [Cucumis melo]

PREDICTED: protein IQDOMAIN 1 [Cucumis melo] > gi|659,085,828|ref| XP_008443630.1| PREDICTED: protein IQDOMAIN 1 [Cucumis melo]

Annotation_NR

422 Y. Wang et al.

LsWOX7

LsWOX8

LsWOX9

HG_GLEAN_10004960

HG_GLEAN_10001423

Chr09

chr08

chr05

16,997,132

21,766,418

18,587,874

1,847,521

16,998,750

21,767,315

18,589,128

1,850,384

Solyc11g072790.1.1:1..210

Solyc04g078650

Solyc06g076000.1.1:1..241

Solyc03g118770.2.1:1..394

3E-29

5E-65

2E-45

1E-56

MELO3C025714

MELO3C010841

MELO3C002129

MELO3C009677

MELO3C008284

1E-67

1E-106

1E-122

7E-60

7E-83

(continued)

putative WUSCHEL-related homeobox 2 [Cucurbita moschata]

PREDICTED: WUSCHELrelated homeobox 4 [Cucumis melo]

PREDICTED: WUSCHELrelated homeobox 2 [Cucumis sativus] > gi|700,210,612|gb| KGN65708.1| hypothetical protein Csa_1G505930 [Cucumis sativus]

hypothetical protein Csa_1G042780 [Cucumis sativus]

PREDICTED: WUSCHELrelated homeobox 11 isoform X1 [Cucumis sativus] > gi| 700,204,220|gb|KGN59353.1| hypothetical protein Csa_3G812740 [Cucumis sativus]

PREDICTED: WUSCHELrelated homeobox 5 [Cucumis sativus]

PREDICTED: protein WUSCHEL [Cucumis sativus] > gi|700,193,704|gb| KGN48908.1| hypothetical protein Csa_6G505860 [Cucumis sativus]

HG_GLEAN_10021934

chr05

26,918,975

2E-38

8E-29

LsWOX6

26,916,485

Solyc03g096300

MELO3C007898

HG_GLEAN_10020722

Chr04

4,083,393

1E-53

LsWOX5

4,082,545

Solyc02g083950.2.1

Annotation_NR

HG_GLEAN_10019918

Chr04

4,603,994

E-value

LsWOX4

4,602,026

Melon homolog

HG_GLEAN_10018428

Chr03

E-value

LsWOX3

Tomato homolog

HG_GLEAN_10016376

Position end

Suggested gene name

Bottle gourd gene model

Position start

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Table 22.3 (continued)

Chr

22 423

Suggested gene name

LsWOX10

LsYABBY1

LsYABBY2

LsYABBY3

LsYABBY4

LsYABBY5

Bottle gourd gene model

HG_GLEAN_10002882

HG_GLEAN_10015224

HG_GLEAN_10009272

HG_GLEAN_10006646

HG_GLEAN_10001232

HG_GLEAN_10003168

Table 22.3 (continued)

chr11

Chr09

Chr07

Chr06

Chr02

chr11

Chr

18,301,478

15,198,643

20,681,074

4,223,841

24,971,197

15,023,063

Position start

18,305,237

15,203,514

20,684,212

4,226,287

24,972,616

15,025,074

Position end

SlYABBY5a

SlYABBY1b

SlINO

Solyc02g082670.2.1

Tomato homolog

2E-69

5E-56

5E-45

1E-84

E-value

MELO3C013970

MELO3C014527

MELO3C010330

MELO3C013727

Melon homolog

4E-80

2E-50

2E-42

4E-143

E-value

PREDICTED: protein YABBY 4-like [Cucumis melo] > gi| 1,035,405,079|ref| XP_016900570.1| PREDICTED: protein YABBY 4-like [Cucumis melo]

PREDICTED: putative axial regulator YABBY 2 [Cucumis melo] > gi|1,035,406,592|ref| XP_016900784.1| PREDICTED: putative axial regulator YABBY 2 [Cucumis melo]

PREDICTED: axial regulator YABBY 5-like [Cucumis sativus]

PREDICTED: axial regulator YABBY 5-like [Cucumis sativus] > gi|700,195,041|gb| KGN50218.1| Axial regulator YABBY1 [Cucumis sativus]

PREDICTED: axial regulator YABBY 4-like [Cucumis sativus]

WUSCHEL-related homeobox 13-like isoform X1 [Cucurbita moschata]

Annotation_NR

424 Y. Wang et al.

22

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Fig. 22.3 Circos display of the distribution of SNPs and indels in bottle gourd each chromosome (Xu et al. 2021). The outermost circle represents 11 bottle gourd chromosomes (Chr01-12), and inner two circles represent

22.3

Resequencing of Germplasm

Heterografting has long been used to enhance the chilling tolerance of watermelon, and YZ is a bottle gourd rootstock variety showing excellent tolerance to low temperature. YZ was used for genome resequencing to rectify both SNPs and indels across the experimental and reference genomes so that a universal pipeline to a watermelon–bottle gourd heterografting system could be built. Genome shotgun sequencing of YZ generated about 74,566,000 reads, with 35.9  genome coverage. A total of 49,830,767 paired-end reads from YZ were mapped to the HZ gourd reference genome with coverages of 77.53% (Wang et al. 2020). Recently, a single

425

SNP (in yellow) and indel (in blue) distributions (100 Kb window size), respectively. The densities of SNP or indel are indicated by the bar heights

nucleotide polymorphism (SNP)- and indelbased variation map of the genome was constructed based on resequencing of 50 inbred lines that included both food and rootstock types (Xu et al. 2021) (Fig. 22.3). In this, we found 2,004,276 SNPs and 399,664 indels in total across the genome. At the population level, the frequency of transitions (A/G and C/T; Ts = 1,096,413) was almost identical to that of transversions (A/C, A/T, G/C, and G/T; Tv = 907,863). The average SNP and indel density per 100 Kb were 20.04 and 3.99, respectively. Extremely high-density SNP areas of Chr2, Chr3, and Chr11 were identified, as well as extremely high-density indel regions in Chr8, which provides insight into the genome diversity of the bottle gourd (Xu et al. 2021).

426

22.4

Y. Wang et al.

Future Goals and Prospects

Advances in genome sequencing in recent time have allowed the precise selection of desirable traits along with offering opportunities to undertake genetic engineering of plants. The shift of technology from Illumina short to long-read sequencing technology such as the PacBio single-molecule real-time (SMRT) sequencing has significantly improved gene discovery has paved the way for discovery/construction of better reference genomes and the robust identification of SNPs and indels. In the last decade, the implementation of sequencing with other omics technologies has not only uncovered a plethora of known as well as novel genes but also allowed us to determine their specific contribution toward improving key plant attributes such as quality, yield, shelf life, and so on. To this end, highthroughput sequencing/resequencing platforms based on NGS technology have been a tremendous tool as a cost-effective and high throughput means to elucidate the architecture of various important traits. The newly created avenues such as genome-wide association analysis (GWAS), genome analysis (GS), and epigenome-wide association study (EWAS) that allow efficient integration of genotyping could provide a great impetus to genomics-assisted breeding.

22.4.1 What does a Reference Genome add to the Research of this Crop? To date, few genes or QTL have been cloned in bottle gourd, not to mention functional studies. The draft genome of the bottle gourd will aid discovery of important candidate genes for agronomic trait improvement, as well as comparative genomics analysis with other cucurbit crop genomes and genome-wide transcript analysis. In addition, the bottle gourd genome will enable the exploitation and utilization of germplasm resources through molecular-assisted breeding. However, in the long run, these

published draft genomes of bottle gourd will provide a genetic transformation system for future research in bottle gourd.

References Achigan-Dako EG, Fuchs J, Ahanchede A, Blattnerv FR (2008) Flow cytometric analysis in Lagenaria siceraria (Cucurbitaceae) indicates correlation of genome size with usage types and growing elevation. Plant Syst Evol 276:9–19 Balkema-Boomstra AG, Zijlstra S, Verstappen FW (2003) Role of cucurbitacin C in resistance to spider mite (Tetranychus urticae) in cucumber (Cucumis sativus L.). J Chem Ecol 29:225–235 Beevy SS, Kuriachan P (1996) Chromosome numbers of south Indian Cucurbitaceae and a note on the cytological evolution in the family. J Cytol Genet 31:65–71 Bhawna AMZ, Arya L, Saha D, Sureja AK, Pandey C, Verma M (2014) Population structure and genetic diversity in bottle gourd [Lagenaria siceraria (Mol.) Standl.] germplasm from India assessed by ISSR markers. Plant Syst Evol 300:767–773 Bhawna AMZ, Arya L, Verma M (2015) Transferability of cucumber microsatellite markers used for phylogenetic analysis and population structure study in bottle gourd (Lagenaria siceraria (Mol.) Standl.). Appl Biochem Biotechnol 175:2206–2223 Bhawna AMZ, Arya L, Verma M (2016) Use of ScoT markers to assess the gene flow and population structure among two different population of bottle gourd. Plant Gene 9:80–86 D’Amore R, Johnson J, Haldenby S et al (2017) SMRT gate: a method for validation of synthetic constructs on Pacific biosciences sequencing platforms. Bio Techniques 63(1):13–20 Decker-Walters D, Staub J, López-Sesé A et al (2001) Diversity in landraces and cultivars of bottle gourd (Lagenaria siceraria; Cucurbitaceae) as assessed by random amplified polymorphic DNA. Genet Resour Crop Ev 48(4):369–380 Du HL, Liang CZ (2019) Assembly of chromosome-scale contigs by efficiently resolving repetitive sequences with long reads. Nat Commun 10(1):5360 Erickson DL, Smith BD, Clarke AC, Sandweiss DH, Tuross N (2005) An Asian origin for a 10, 000-yearold domesticated plant in the Americas. Proc Natl Acad Sci USA 102:18315–18320 Garcia-Mas J, Benjak A, Sanseverino W, Bourgeois M, Mir G, Gonzalez VM et al (2012) The genome of melon (Cucumis melo L.). Proc Natl Acad Sci U S A 109:11872–11877 Ghule BV, Ghanti MH, Yeole PG, Saoji AN (2007) Diuretic activity of Lagenaria siceraria fruit extracts in rats. Ind J Pharm Sci 69(6):817–819

22

Sequencing of the Bottle Gourd Genomes Enhances Understanding …

Guo S, Zhang J, Sun H, Salse J, Lucas WG, Zhang H et al (2013) The draft genome of watermelon (Citrullus lanatus) and resequencing of 20 diverse accessions. Nat Genet 45(1):51–58 Gurcan K, Say A, Yesitir H (2015) A study of genetic diversity in bottle gourd [Lagenaria siceraria (Molina) Standl.] population: and implication for the historical origins on bottle gourds in Turkey. Genet Resour Crop Evol 62:321–333 Haas BJ, Salzberg SL, Wei Z et al (2008) Automated eukaryotic gene structure annotation using evidence modeler and the program to assemble spliced alignments. Genome Biol 9(1):R7 Heiser CB (1979) The gourd book: a thorough and fascinating account of gourds from throughout the world. University of Oklahoma Press, Oklahoma, Norman Huang S, Li R, Zhang Z, Li L, Gu X, Fan W et al (2009) The genome of the cucumber, Cucumis sativus L. Nat Genet 41:1275–1281 Husna A, Mahmud F, Islam MR, Mahmud MAA, Ratna M (2011) Genetic variability, correlation and path co-efficient analysis in bottle gourd (Lagenaria siceraria L.). Adv Biol Res 5:323–327 Jain S, Gupta S (2013) Biotechnology of neglected and underutilized crops. Springer, Netherlands, Dordrecht Kalyan RAO, Tomar BS, Singh B, Aher B (2016) Morphological characterization of parental lines and cultivated genotypes of bottle gourd (Lagenaria siceraria). Ind J Agric Sci 86:65–70 Karlova R, Rosin FM, Busscher-Lange J, Parapunova V, Do PT, Fernie AR, Fraser PD, Baxter C, Angenent GC, de Maagd RA (2011) Transcriptome and metabolite profiling show that APETALA2a is a major regulator of tomato fruit ripening. Plant Cell 23 (3):923–941 Kistler L, Montenegro A, Smith BD, Grifford JA, Green RE, Newsom LA, Shapiro B (2014) Transoceanic drift and the domestication of African bottle gourds in the Americas. Proc Nat Acad Sci U S A 111:2397–2941 Koffi KK, Anzera GK, Malice M, Dje Y, Bertin P, Baudoin JP, Zoro-Bi IA (2009) Morphological and allozyme variation in a collection of Lagenaria siceraria (Molina) Standl from Cote D’Ivoire. Agron Soc Environ 13:257–270 Kousik CS, Levi A, Ling KS, Wechter P (2008) Potential sources of resistance to cucurbit powdery mildew in U.S. plant introductions of bottle gourd. HortScience 43:1359–1364 Lin X, Zhang Y, Kuang H, Chen J (2013) Frequent loss of lineages and deficient duplications accounted for low copy number of disease resistance genes in Cucurbitaceae. BMC Genom 14:335 Liu J, Van Eck J, Cong B et al (2002) A new class of regulatory genes underlying the cause of pear-shaped tomato fruit. Proc Nat Acad Sci U S A 99(20):13302– 13306 Mashilo J, Shimelis H, Odindo A (2015) Genetic diversity of bottle gourd (Lagenaria siceraria (Molina) Standl.)

427

landraces of South Africa assessed by morphological traits and simple sequence repeat markers. S Afr J Plant Soil 33(2), 113–124 Mashilo J, Shimelis H, Odindo A, Amelework B (2016) Genetic diversity of South African bottle gourd [Lagenaria siceraria (Molina) Standl.] landraces revealed by simple sequence repeat markers. HortScience 51:120–126 Mashilo J, Shimelis H, Odindo A, Amelework B (2017) Genetic differentiation of bottle gourd [Lagenaria siceraria (Molina) Standl.] landraces assessed by fruit qualitative traits and simple sequence repeat markers. Sci Hortic 216:1–11 Massawe FJ, Mwale SS, Azam-Ali SN, Roberts JA (2007) Towards genetic improvement of bambara groundnut [Vigna subterranea (L.) Verdc.]. In: Ochatt, S, Jain SM (eds) Breeding of neglected and underutilized crops: spices and herbs. Science Publishers, p 468 Mladenovic E, Berenji J, Ognjanov JBV, Ljubojevic M, Cukanovic J (2012) Genetic variability of bottle gourd (Lagenaria siceraria (Mol.) Standley and its morphological characterization by multivariate analysis. Arch Biol Sci (Belgrade), 64(2):573–583 Monforte AJ, Diaz A, Caño-Delgado A, van der Knaap E (2014) The genetic basis of fruit morphology in horticultural crops: lessons from tomato and melon. J Exp Bot 65(16):4625–4637 Morimoto Y, Mvere B (2004) Lagenaria siceraria. In: Grubben GJH, Denton OA (eds) Vegetables plant resources of tropical Africa 2. Wageningen/Leiden: Backhuys Publishers/CTA, pp 353–358 Morimoto Y, Maundu P, Fujimaki H, Morishima H (2005) Diversity of landraces of the white-flowered gourd (Lagenaria siceraria) and its wild relatives in Kenya: fruit and seed morphology. Genet Resour Crop Evol 52:737–747 Padulosi S, Hodgkin T, Williams JT, Haq N (2002) Underutilized crops: trends, challenges and opportunities in the 21st century. In: Engels JMM, Rao VM, Brown AHD, Jackson AHD (eds) Managing plant genetic diversity. CABI/ IPGRI, Rome, pp 323–338 Rhie A, Walenz BP, Koren S et al (2020) Merqury: Reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol 21 (1):245 Schlumbaum A, Vandorpe P (2012) A short history of Lagenaria siceraria (bottle gourd) in the Roman provinces: morphotypes and archaeogenetics. Veg Hist Archaeobot 21:499–509 Schonfeldt HC, Pretorius B (2011) The nutrient content of five traditional South African dark green leafy vegetables-A preliminary study. J Food Comp Anal 24(8):1141–1146 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinformatics 31(19):3210–3212 Sivaraj N, Pandravada SR (2005) Morphological diversity for fruit characters in bottle gourd germplasm from

428 Tribal pockets of Telangana region of Andhra Pradesh, India. Asian Agri-Hist 9:305–310 van der Knaap E, Chakrabarti M, Chu YH et al (2014) What lies beyond the eye: the molecular mechanisms regulating tomato fruit weight and shape. Front Plant Sc 5:227 Wang Y, Xu P, Wu X, Wu X, Wang B, Huang Y, Hu Y, Lin J, Lu Z, Li G et al (2018) GourdBase: a genomecentered multi-omics database for the bottle gourd (Lagenaria siceraria), an economically important cucurbit crop. Sci Rep 8:3604 Wang Y, Wang L, Xing N et al (2020) A universal pipeline for mobile mRNA detection and insights into heterografting advantages under chilling stress. Hortic Res 7:13 Wu S, Shamimuzzaman M, Sun H, Salse J, Sui X, Wilder A, Wu Z, Levi A, Xu Y, Ling KS et al (2017) The bottle gourd genome provides insights into Cucurbitaceae evolution and facilitates mapping of a Papaya ring-spot virus resistance locus. Plant J 92:963–975 Wu S, Xiao H, Cabrera A et al (2011) SUN regulates vegetative and reproductive organ shape by changing cell division patterns. Plant Physiol 157(3):1175–1186 Xiao H, Jiang N, Schaffner E et al (2008) A retrotransposon-mediated gene duplication underlies morphological variation of tomato fruit. Science 319 (5869):1527–1530 Xu P et al (2011) Partial sequencing of the bottle gourd genome reveals markers useful for phylogenetic analysis and breeding. BMC Genomics 12:467

Y. Wang et al. Xu P et al (2014) Population genomic analyses from lowcoverage RAD-seq data: a case study on the nonmodel cucurbit bottle gourd. Plant J 77:430–442 Xu P et al (2021) Long-read genome assembly and genetic architecture of fruit shape in the bottle gourd. Plant J. https://doi.org/10.1111/TPJ.15358 Yetisir H, Sari N (2003) Effect of different rootstock on plant growth, yield and quality of watermelon. Aust J Exp Agr 43:1269–1274 Yetişir H, Şakar M, Serçe S (2008) Collection and morphological characterization of Lagenaria siceraria germplasm from the Mediterranean region of Turkey. Genet Resour Crop Evol 55:1257–1266 Yildiz M, Cuevas HE, Sensoy S, Erdinc C, Baloch FS (2015) Transferability of cucurbita SSR markers for genetic diversity assessment of Turkish bottle gourd (Lagenaria siceraria) genetic resources. Biochem Syst Ecol 59:45–53 Zhang G (1981) Interaction of genes and the expression of bitterness in Lagenaria Sicereria. Acta Hortic Sin 8 (4):43–48 Zhao JY, Jiang L, Che G, Pan Y, Li Y, Hou Y, Zhao W, Zhong Y, Ding L, Yan S (2020) A functional allele of CsFUL1 regulates fruit length through inhibiting CsSUP and auxin transport in cucumber. Plant Cell 32(6):2048–2055

Advances and Prospects in Genomic and Functional Studies of the Aquatic Crop, Sacred Lotus

23

Tao Shi, Zhiyan Gao, Yue Zhang, and Jinming Chen

Abstract

We focus on reviewing the genomic progress of sacred lotus, a widely consumed aquatic vegetable and medicinal food in Asia in this chapter. We summarize current genomic, population, functional gene studies of sacred lotus, and discuss the unexplored area in conclusion.

23.1

Overview

Although the aquatic plant, lotus or sacred lotus (Nelumbo nucifera Gaertn.), has been commonly known as a traditional ornamental flower with symbolic meanings in Buddhism and Hinduism for thousands of years, its status in the human diet cannot be ignored, particularly in Asia.

T. Shi (&)  Z. Gao  Y. Zhang  J. Chen CAS Key Laboratory of Aquatic Botany and Watershed Ecology, Wuhan Botanical Garden, Chinese Academy of Sciences, Wuhan 430074, China e-mail: [email protected] J. Chen e-mail: [email protected] T. Shi  Z. Gao  Y. Zhang  J. Chen Center of Conservation Biology, Core Botanical Gardens, Wuhan, China Z. Gao  Y. Zhang University of Chinese Academy of Sciences, Beijing, China

Lotus rhizome, which is called ‘Ou’ in China and ‘Nadru’ in India, is a popular vegetable with thousands of years history in Asia (Guo 2008). Also, lotus seed, with a longevity of more than a thousand years, is a starch-rich and popular snack that has been consumed as medicinal food and ingredient in many desserts such as ‘mooncake’ (Guo 2008; Mukherjee et al. 2009; Shen-Miller et al. 1995). The nutritious and therapeutic phytochemicals including alkaloids, flavonoids, glycosides, triterpenoid, and vitamins from different parts of lotus have also been recognized in the West (Mukherjee et al. 2009). Therefore, these valuable lotus traits were the result of natural selection, long-term artificial selection and cultivation, which meet different needs including agricultural and ornamental purposes. Given such differences in utility, the cultivated lotus was further primarily classified into three categories, namely rhizome lotus, seed lotus, and flower lotus. Here, we review some of the most relevant genomic and functional progress of this aquatic crop and discuss how these studies will further benefit its molecular breeding progress.

23.2

Genome Assembly and WholeGenome Resequencing Analysis

The genome of sacred lotus ‘China Antique’ was first sequenced by Illumina and assembled with assistant data from 454 sequencing and a genetic

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_23

429

430

T. Shi et al.

Table 23.1 Summary of the ‘China Antique’ chromosomal-level assembly and genome annotation

N. nucifera Assembled genome size

807.6 Mb

Number of contigs

4709

Contig N50

484.3 kb

Eight pseudochromosomes size

813.2 Mb

Number of unanchored contigs

456

Unanchored contigs size

8.0 Mb

GC content

38.97%

Ratio of repetitive sequences

58.5%

Number of protein-coding genes

32,124

Complete BUSCOs

94.60%

map; it was later updated into chromosomal-level assembly by using PacBio long reads and highthroughput chromosome conformation capture (Hi-C) (Table 23.1) (Ming et al. 2013; Shi et al. 2020). The genome assembly and annotations of ‘China Antique’, together with gene expression profiles of different tissues and single nucleotide polymorphism (SNP) variants of 88 lotus cultivars are collected in Nelumbo genome database (Li et al. 2021b). This lotus genome further revealed that lotus features the most conserved genome among all sequenced eudicot plants with traces of only a single whole-genome duplication (Shi et al. 2020). The lotus reference genome, thus provides a framework for population and functional genomic studies. Identification of SNP variants by mapping whole-genome resequencing reads of different lotus accessions on the reference genome uncovered the genetic divergence between temperate and tropical lotus, the relationship among flower lotus, seed lotus, rhizome lotus, and wild lotus, and genes under directional selection during domestication (Huang et al. 2018; Liu et al. 2020).

23.3

Genome Sequencing, Genetics, and Evolution

The significantly distinct rhizome phenotypes between temperate and tropical lotus make lotus an ideal material to understand the evolutionary divergence of plant ecotypes. Based on four lotus

transcriptomics, some conserved or ecotypespecific SNP sites were identified between enlarged rhizome lotus and thin rhizome lotus (Yang et al. 2015a). Further, 18 lotus resequencing datasets comprehensively compared the genetic divergences and variations between temperate and tropical lotus (Huang et al. 2018). See the following sections (23.4-23.7) for detailed information on lotus genomics and evolution.

23.4

Transcriptomics in Sacred Lotus

By mapping to the high-quality genome assembly, important agronomic traits with nutritional and ornamental values of lotus were investigated using high-throughput transcriptome sequencing. Herein, we focus on the transcriptomic studies which illustrated the complex regulatory mechanisms of gene expression and understand the important role of spatiotemporally specific expressed genes in contributing to phenotypic variations in lotus. The genetic central dogma only illustrates a portion of gene regulation because gene expression regulation is a multi-layer mechanism involving more processes such as alternative splicing (AS) of pre-mRNAs and microRNA (miRNA) regulation. Therefore, AS can dramatically increase the diversity of transcriptome and proteome and regulatory robustness by generating isoforms from the same gene, and hence

23

Advances and Prospects in Genomic and Functional Studies …

regulating distinct physiological and developmental processes in eukaryotes (Barash et al. 2010). Early transcriptomic study of four lotus cultivars revealed a total of 177,540 AS events in 64% of the expressed genes of lotus and proposed that the difference in expressed genes between leaves and rhizomes was contributed by AS (Yang et al. 2015a). However, the nextgeneration sequencing approach used in that AS study has technical defects in the accurate identification of isoforms because of the short reads. Therefore, an updated AS dataset based on the full-length transcriptome sequencing by PacBio Sequel from 12 tissues provided a comprehensive perspective into the AS events in the lotus (Zhang, et al. 2019). Meanwhile, by combining the transcriptome datasets of tissues, tissuespecific AS and differentially expressed isoforms were identified, further highlighting the vital role of AS in regulating gene expression during the growth and organ specification of lotus (Zhang et al. 2019). miRNAs are a group of non-coding small RNAs that can interact with the gene at the RNA level and inhibit gene expression. The evolutionary study of lotus miRNAs showed that different miRNAs were arisen during different plant evolutionary stages, and older miRNA families tend to have more target genes (Shi et al. 2017). Different

Fig. 23.1 The important plant morphological traits in Nelumbo. a The flowers and rhizomes between tropical and temperate ecotypes of N. nucifera. b The flowers at bud and opening stages between N. nucifera and N. lutea

431

miRNAs are also involved in different organ developments of lotus. The differentially expressed miRNAs regulated the adventitious root formation in the lotus by targeting the genes involved in plant hormone metabolism (Libao et al. 2019). Also, the seed germination of the lotus was positively regulated by miR160 by decreasing the ABA levels in the germinating seeds (Hu et al. 2016). Intriguingly, some isoforms from the same gene have divergent mRNA structures including the presence and absence of miRNA binding sites in lotus, suggesting the divergence of gene expression is escalated by both AS and miRNA regulations (Zhang et al. 2020).

23.5

Studies of Phenotypic Variation of Rhizome in the Two Lotus Ecotypes

The rhizome of lotus evolved into two distinct phenotypes, enlarged and whip-like rhizomes, representing adaptation to different latitudes and climate conditions. The enlarged rhizome contains abundant starch, proteins, and vitamins, making it a popular vegetable in eastern Asia. In contrast, the whip-like and thin rhizome continues to grow and branch for almost the entire year in the tropical zone (Fig. 23.1a).

432

T. Shi et al.

Since the enlarged rhizome is a storage organ for lotus to survive in winter in high latitude, similar to the tuberization of the potato, the development and enlargement of the rhizome received much attention. Gene expression during rhizome formation in temperate and tropical lotus were analyzed through RNA-Seq, which identified many candidate genes with differential expression participating in rhizome enlargement (Yang et al. 2015b). Other than gene expression, methylation marks are also differentiated between the two ecotypes. Using bisulfite sequencing, the DNA methylation potentially involved in rhizome morphological variation was investigated, and higher global DNA methylation was found in temperate lotus as compared with tropical lotus (Li et al. 2021a). Quantitative trait locus (QTL) analysis of an F2 population of hybrids between temperate and tropical cultivars identified 22 QTLs related to rhizome enlargement (Huang et al. 2021). Further, quantitative proteomic analysis on the rhizome at three stages identified 302 stage-specific proteins and 172 differently expressed proteins and suggested the secondary messenger Ca2+ helps transduce the light and auxin signal during rhizome enlargement (Cao et al. 2019). Phenolic compounds play a critical role in plant growth and development, and transcriptional profiles from six stages of lotus rhizome formation revealed complex crosstalk in the gene network and involved in the biosynthesis of phenolic compounds of lotus (Min et al. 2019). In addition, the adventitious roots of the lotus were found to be cooperatively regulated by sucrose and IAA by comparative transcriptomic analysis (Libao et al. 2020).

23.6

Longevity and Yield of Lotus Seed

The lotus seed is also a famous fruit for its sweet taste and abundant micronutrients in Asian countries. Hence, increasing the yield in lotus seed production can also create more economic value. Comparative transcriptomic analysis was conducted between two lotus cultivars with contrasting phenotypes in both seed size and seed

number per seedpod, determining candidate genes crucial for lotus seed yield (Li et al. 2018). Intriguingly, lotus seeds could remain biologically active for hundreds of years, holding the world record of seed longevity. Meanwhile, the hard lotus seed kernel after dehydration makes it have a strong tolerance to different stress conditions, such as high temperature and radiation (Chu et al. 2012; Shen-Miller et al. 2002). Since the level of reactive oxygen species release and macromolecular oxidization during seed storage often are closely related to seed longevity, the ectopic expression of the lotus 1-cysteine peroxiredoxin antioxidant gene in Arabidopsis enhanced seed longevity and stress tolerance, illustrating the molecular mechanism of lotus seed longevity (Chen et al. 2016). However, the genetic mechanisms underlying lotus seed yield needs further genomic and population studies.

23.7

Diversity of Lotus Flower

Plant flowers are highly diversified, showing different shapes and colors, and are important for pollinator attraction (Endress and Matthews 2006; Soltis and Soltis 2014). The lotus flowers are varied in color, shape, and flowering time due to natural variations and breeding progress. Nelumbo only contains an allopatric species pair, Nelumbo nucifera Gaertn. with red/pink to white petals and Nelumbo lutea Willd. (yellow lotus or American lotus) with only yellow petals that are separated by the Pacific Ocean (Li et al. 2014; Lin et al. 2019c)(Fig. 23.1b). The red color in petals of the Asian lotus is caused by the amount and type of anthocyanin (Chen et al. 2013; Deng et al. 2013; Sun et al. 2016). The yellow color of the American lotus petal is mostly from both non-anthocyanins flavonoids (Chen et al. 2013; Zhu et al. 2019) and carotenoids (Katori et al. 2002). Mutations of genes related to anthocyanin biosynthesis pathways were found in lotus. Low hypomethylation on the promoter region of anthocyanidin synthase (ANS) was found to promote anthocyanin accumulation in red lotus (Deng et al. 2015). High expression of chalcone isomerase (CHI), negligible expression of

23

Advances and Prospects in Genomic and Functional Studies …

dihydroflavonol 4-reductase (DFR), ANS and three UDP-glucose: flavonoid 3-Oglucosyltransferase (UFGTs) in yellow color lotus can inhibit its colored anthocyanin accumulation and further increase anthoxanthin level associated with yellow color (Sun et al. 2016; Wang et al. 2016). In the non-anthocyanin flavonoid biosynthesis pathway, the substrate specificity of flavonol synthase (FLS) and the high expression of O-methyltransferases (OMTs) in yellow lotus indicate kaempferol and isorhamnetin contributing to coloration (Zhu et al. 2019). Premature termination codons (PTCs) within MYB5, and repression of a glutathione S-transferase gene inhibited anthocyanin accumulation in yellow lotus (Sun et al. 2016). Carotenoids, which are typically responsible for coloration ranging from red to yellow, have been rarely studied in lotus. Variations in floral shape were also studied. During the development of lotus, the petaloidy of a flower can turn a few-petalled flower into complicated types containing double-petalled, duplicate-petalled, and all-double-petalled flowers (Guo 2008; Lin et al. 2019a, b Lin et al. 2019a, b; Liu et al. 2020). The petaloid differentiation that occurs in lotus might be due to stamen petaloid and carpel petaloid which is different from the classic flower structure (Lin et al. 2018, 2019a). Under the guidance of the conserved ‘ABCE’ model (Krizek and Fletcher 2005; Wellmer et al. 2014), the molecular mechanism underlying stamen petaloid in lotus was found to be likely associated with several hormone-related genes and transcription factors (TFs) (Lin et al. 2018), high DNA methylation level (Lin et al. 2019b) through a comparative transcriptomic analysis and a whole-genome bisulfite sequencing (WGBS) analysis, respectively. Flowering is resulting from vegetative to reproductive growth during the life cycle of flowering plants, which is modulated by a series of flowering pathways in response to endogenous and exogenous signals to optimize plant adaption (Srikanth and Schmid 2011). Lotus can be further divided into two ecotypes with different characteristics adapting to different climatic regions, termed as temperate lotus with late-

433

flowering and tropical lotus with early flowering, respectively (Li et al. 2021a; Yang et al. 2014). Through transgenic study in Arabidopsis, the lotus FT-INTERACTING PROTEIN 1 (FTIP1) with the conserved function was recently shown to mediate the transport of FT1 in lotus while these two genes with similar expression patterns and subcellular localization in leaves and shoot apical meristem affect flowering directly (Zhang et al. 2021). Through transcriptomic analysis, other flowering-related genes in other pathways were also predicted to be participated in the floral transition during lotus development, such as VERNALIZATION INSENSITIVE 3 (VIN3) in the vernalization pathway and GIBBERELLIC ACID INSENSITIVE (GAI) in the gibberellic acid pathway (Yang et al. 2014).

23.8

Conclusion

Although there are many genomic and functional progress of cultivated lotus, the wild lotus has limited studies. Future investigation of wild lotus populations by biogeographic, genomic, and epigenomic studies of key adaptive traits will facilitate molecular breeding by introducing favorable alleles to lotus cultivars. Similarly, the sister species of sacred lotus, N. lutea, has been hardly studied in terms of genome structural, regulatory divergence, and functional evolution. Thus, future genomic studies of N. lutea will also benefit the breeding progress of lotus by providing genetic materials for introgression.

References Barash Y et al (2010) Deciphering the splicing code. Nature 465(7294):53–59. https://doi.org/10.1038/ nature09000 Cao D et al (2019) Proteomic analysis showing the signaling pathways involved in the rhizome enlargement process in Nelumbo nucifera. BMC Genomics 20(1):766. https://doi.org/10.1186/s12864-019-6151-x Chen HH et al (2016) Ectopic expression of NnPER1, a Nelumbo nucifera 1-cysteine peroxiredoxin antioxidant, enhances seed longevity and stress tolerance in Arabidopsis. Plant J 88(4):608–619. https://doi.org/10. 1111/tpj.13286

434 Chen S, Xiang Y, Deng J, Liu Y, Li S (2013) Simultaneous analysis of anthocyanin and nonanthocyanin flavonoid in various tissues of different lotus (Nelumbo) cultivars by HPLC-DAD-ESI-MS(n). PLoS ONE 8(4):e62291. https://doi.org/10.1371/ journal.pone.0062291 Chu P et al (2012) Proteomic and functional analyses of Nelumbo nucifera annexins involved in seed thermotolerance and germination vigor. Planta 235(6):1271– 1288. https://doi.org/10.1007/s00425-011-1573-y Deng J et al (2013) Systematic qualitative and quantitative assessment of anthocyanins, flavones and flavonols in the petals of 108 lotus (Nelumbo nucifera) cultivars. Food Chem 139(1–4):307–312. https://doi.org/10. 1016/j.foodchem.2013.02.010 Deng J et al (2015) Proteomic and epigenetic analyses of lotus (Nelumbo nucifera) petals between red and white cultivars. Plant Cell Physiol 56(8):1546–1555. https:// doi.org/10.1093/pcp/pcv077 Endress P, Matthews M (2006) Elaborate petals and staminodes in eudicots: diversity, function, and evolution. Org Divers Evol 6(4):257–293. https://doi.org/ 10.1016/j.ode.2005.09.005 Guo HB (2008) Cultivation of lotus (Nelumbo nucifera Gaertn. ssp. nucifera) and its utilization in China. Genet Resour Crop Evol 56(3):323–330. https://doi. org/10.1007/s10722-008-9366-2 Hu J, Jin J, Qian Q, Huang K, Ding Y (2016) Small RNA and degradome profiling reveals miRNA regulation in the seed germination of ancient eudicot Nelumbo nucifera. BMC Genomics 17(1):684. https://doi.org/ 10.1186/s12864-016-3032-4 Huang L, Li M, Cao D, Yang P (2021) Genetic dissection of rhizome yield-related traits in Nelumbo nucifera through genetic linkage map construction and QTL mapping. Plant Physiol Biochem 160:155–165. https://doi.org/10.1016/j.plaphy.2021.01.020 Huang L et al (2018) Whole genome re-sequencing reveals evolutionary patterns of sacred lotus (Nelumbo nucifera). J Integr Plant Biol 60(1):2–15. https://doi. org/10.1111/jipb.12606 Katori M, Watanabe K, Nomura K, Yoneda K (2002) Cultivar Differences in Anthocyanin and Carotenoid Pigments in the Petals of the Flowering Lotus (Nelumbo spp.). Engei Gakkai zasshi 71(6):812–817. https://doi.org/10.2503/jjshs.71.812 Krizek BA, Fletcher JC (2005) Molecular mechanisms of flower development: an armchair guide. Nat Rev Genet 6(9):688–698. https://doi.org/10.1038/nrg1675 Li H, Yang X, Wang Q, Chen J, Shi T (2021a) Distinct methylome patterns contribute to ecotypic differentiation in the growth of the storage organ of a flowering plant (sacred lotus). Mol Ecol 30(12):2831–2845. https://doi.org/10.1111/mec.15933 Li H et al (2021b) Nelumbo genome database, an integrative resource for gene expression and variants of Nelumbo nucifera. Sci Data 8(1):38. https://doi.org/ 10.1038/s41597-021-00828-8 Li J et al (2018) Systematic transcriptomic analysis provides insights into lotus (Nelumbo nucifera) seed

T. Shi et al. development. Plant Growth Regul 86(3):339–350. https://doi.org/10.1007/s10725-018-0433-1 Li Y et al (2014) Paleobiogeography of the lotus plant (Nelumbonaceae: Nelumbo) and its bearing on the paleoclimatic changes. Palaeogeogr Palaeoclimatol Palaeoecol 399:284–293. https://doi.org/10.1016/j. palaeo.2014.01.022 Libao C, Huiying L, Yuyan H, Shuyan L (2019) Transcriptome analysis of miRNAs expression reveals novel insights into adventitious root formation in lotus (Nelumbo nucifera Gaertn.). Mol Biol Rep 46 (3):2893–2905. https://doi.org/10.1007/s11033-01904749-z Libao C, Minrong Z, Zhubing H, Huiying L, Shuyan L (2020) Comparative transcriptome analysis revealed the cooperative regulation of sucrose and IAA on adventitious root formation in lotus (Nelumbo nucifera Gaertn). BMC Genomics 21(1):653. https://doi.org/ 10.1186/s12864-020-07046-3 Lin Z, Damaris RN, Shi T, Li J, Yang P (2018) Transcriptomic analysis identifies the key genes involved in stamen petaloid in lotus (Nelumbo nucifera). BMC Genomics 19(1):554. https://doi.org/ 10.1186/s12864-018-4950-0 Lin Z et al (2019a) Genome-wide DNA methylation profiling in the lotus (Nelumbo nucifera) flower showing its contribution to the Stamen Petaloid. Plants (Basel) 8(5). https://doi.org/10.3390/plants8050135 Lin Z et al (2019b) Genome-Wide DNA Methylation Profiling in the Lotus (Nelumbo nucifera) Flower Showing its Contribution to the Stamen Petaloid. Plants (basel) 8(5):135. https://doi.org/10.3390/ plants8050135 Lin Z, Zhang C, Cao D, Damaris RN, Yang P (2019c) The latest studies on lotus (Nelumbo nucifera)-an emerging horticultural model plant. Int J Mol Sci 20 (15):3680. https://doi.org/10.3390/ijms20153680 Liu Z et al (2020) Resequencing of 296 cultivated and wild lotus accessions unravels its evolution and breeding history. Plant J 104(6):1673–1684. https:// doi.org/10.1111/tpj.15029 Min T et al (2019) Transcription profiles reveal the regulatory synthesis of phenols during the development of Lotus Rhizome (Nelumbo nucifera Gaertn). Int J Mol Sci 20(11):2735. https://doi.org/10.3390/ ijms20112735 Ming R et al (2013) Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol 14(5): R41. https://doi.org/10.1186/gb-2013-14-5-r41 Mukherjee PK, Mukherjee D, Maji AK, Rai S, Heinrich M (2009) The sacred lotus (Nelumbo nucifera) - phytochemical and therapeutic profile. J Pharm Pharmacol 61 (4):407–422. https://doi.org/10.1211/jpp/61.04.0001 Shen-Miller J, Mudgett MB, Schopf JW, Clarke S, Berger R (1995) Exceptional seed longevity and robust growth: ancient Sacred Lotus from China. Am J Bot 82(11):1367–1380. https://doi.org/10.1002/ j.1537-2197.1995.tb12673.x Shen-Miller J et al (2002) Long-living lotus: germination and soil {gamma}-irradiation of centuries-old fruits,

23

Advances and Prospects in Genomic and Functional Studies …

and cultivation, growth, and phenotypic abnormalities of offspring. Am J Bot 89(2):236–247. https://doi.org/ 10.3732/ajb.89.2.236 Shi T et al (2020) Distinct expression and methylation patterns for genes with different fates following a single whole-genome duplication in flowering plants. Mol Biol Evol 37(8):2394–2413. https://doi.org/10. 1093/molbev/msaa105 Shi T, Wang K, Yang P (2017) The evolution of plant microRNAs: insights from a basal eudicot sacred lotus. Plant J 89(3):442–457. https://doi.org/10.1111/ tpj.13394 Soltis PS, Soltis DE (2014) Flower diversity and angiosperm diversification. Methods Mol Biol 1110:85–102. https://doi.org/10.1007/978-1-4614-9408-9_4 Srikanth A, Schmid M (2011) Regulation offlowering time: all roads lead to Rome. Cell Mol Life Sci 68(12):2013– 2037. https://doi.org/10.1007/s00018-011-0673-y Sun SS, Gugger PF, Wang QF, Chen JM (2016) Identification of a R2R3-MYB gene regulating anthocyanin biosynthesis and relationships between its variation and flower color difference in lotus (Nelumbo Adans.). PeerJ 4e2369. https://doi.org/10.7717/peerj.2369 Wang Y et al (2016) Flower Color Diversity Revealed by Differential Expression of Flavonoid Biosynthetic Genes in Sacred Lotus. J Am Soc Hortic Sci 141 (6):573–582. https://doi.org/10.21273/jashs03848-16 Wellmer F et al (2014) Flower development: open questions and future directions. Methods Mol Biol 1110:103–124. https://doi.org/10.1007/978-1-4614-9408-9_5

435

Yang M, Xu L, Liu Y, Yang P (2015a) RNA-Seq Uncovers SNPs and Alternative Splicing Events in Asian Lotus (Nelumbo nucifera). PLoS ONE 10(4):e0125702. https://doi.org/10.1371/journal.pone.0125702 Yang M et al (2015b) Transcriptomic analysis of the regulation of rhizome formation in temperate and tropical lotus (Nelumbo nucifera). Sci Rep 513059. https://doi.org/10.1038/srep13059 Yang M, Zhu L, Xu L, Pan C, Liu Y (2014) Comparative transcriptomic analysis of the regulation of flowering in temperate and tropical lotus (Nelumbo nucifera) by RNA-Seq. Ann Appl Biol 165(1):73–95. https://doi. org/10.1111/aab.12119 Zhang L et al (2021) The lotus NnFTIP1 and NnFT1 regulate flowering time in Arabidopsis. Plant Sci 302:110677. https://doi.org/10.1016/j.plantsci.2020.110677 Zhang Y, Nyong AT, Shi T, Yang P (2019) The complexity of alternative splicing and landscape of tissue-specific expression in lotus (Nelumbo nucifera) unveiled by Illumina- and single-molecule real-timebased RNA-sequencing. DNA Res 26(4):301–311. https://doi.org/10.1093/dnares/dsz010 Zhang Y, Rahmani RS, Yang X, Chen J, Shi T (2020) Integrative expression network analysis of microRNA and gene isoforms in sacred lotus. BMC Genomics 21 (1):429. https://doi.org/10.1186/s12864-020-06853-y Zhu HH et al (2019) Differences in flavonoid pathway metabolites and transcripts affect yellow petal colouration in the aquatic plant Nelumbo nucifera. BMC Plant Biol 19(1):277. https://doi.org/10.1186/s12870-019-1886-8

Utilising Public Resources for Fundamental Work in Underutilised and Orphan Crops Mark A. Chapman

Abstract

A starting point to understanding more about underutilised and orphan crops is to investigate their genetic diversity and origins. Whilst whole genome analyses will give a wealth of information in this regard, this is often impractical in terms of cost and an overkill in terms of these initial goals. Therefore, molecular markers are used to establish the partitioning of genetic variation, the location of domestication, and relationships between these crops and their wild relatives. All of this will generate information required to begin improving these crops and identify adaptive variation. A convenient and high throughput way to generate molecular marker information is to mine a genome for markers such as Simple Sequence Repeat markers (SSRs) or others. In this analysis we assembled draft genomes for three underutilised crops (slender Amaranth [Amaranthus viridis L.], lychee [Litchi chinensis Sonn.] and velvet bean [Mucuna pruriens (L.) DC.]) using publicly available data. We show that whilst the assembly could be improved considerably in terms of completeness and duplication, this can house thousands of potential SSR markers.

M. A. Chapman (&)  D. Fisher Biological Sciences, University of Southampton, Southampton SO17 1BJ, UK e-mail: [email protected]

24

and David Fisher

In addition, the chloroplast DNA (cpDNA) was assembled from the same sequencing reads and housed ca. 50–60 potential SSR markers. We also identified 10–11 million potential SNP markers by comparing three genomes for lychee. The approach we use relies solely on publicly available software and can be modified or adapted by others with ease.

24.1

Introduction

24.1.1 Underutilised and Orphan Crops The goal of this edited book is to highlight recent progress in the genomics of underutilised and orphan crops. These crops are typically underresearched compared to others, and lack international recognition, yet since the 1970s several have been touted for their interesting and novel phenotypes and nutritional properties, as well as being tolerant of extreme environments (Haq 1983; NAS 1975; Williams and Haq 2000). Whilst the reasons for most of these crops not receiving substantial national and international recognition and funding are diverse, they can generally be summed up as: the crops are relatively low yielding, containing anti-nutrient factors, and/or having a long generation time or being otherwise difficult to grow without constant human actions.

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 M. A. Chapman (ed.), Underutilised Crop Genomes, Compendium of Plant Genomes, https://doi.org/10.1007/978-3-031-00848-1_24

437

438

There is no strict definition of an underutilised crop, or indeed the point at which it receives sufficient research to move out of this category. In the last few decades some previously underutilised crops have clearly become so widely grown and researched that they should no longer be considered underutilised, for example chickpeas and quinoa. For others, their status could be equivocal (for example pigeon pea). But for the dozens to hundreds of others which have been identified as potential crops, they remain underresearched.

24.1.1.1 Underutilised Crops and Climate Resilience In the last decade the realisation that climate change and a growing population is putting extreme stresses on the food system, in terms of both quantity and quality of food (Campbell et al. 2016; Godfray et al. 2010), has reached the forefront of political agendas. One approach to solving this is to increase yields of existing and staple crops or maintain yields under changing conditions (Mickelbart et al. 2015). In fact, up until recently crop breeding approaches have allowed yield to be maintained even in the face of increasing temperatures and altered precipitation patterns. However, in the last few years, a series of successive very hot and dry summers in some temperate parts of the world have impacted the yield of staple crops (Asseng et al. 2015; Zhao et al. 2017). Another strategy is to focus on underutilised crops, to breed higher yielding cultivars of these, and to remove or reduce antinutrients (Li et al. 2020; Siddique et al. 2021). Important factors expediting this are the generation a genome sequence, developing mapping populations and assessing genetic diversity using molecular markers. This book represents the status quo of several underutilised crops for which international efforts are being taken to promote their use, develop cultivars with reduced negative attributes, and to understand the genetic basis of adaptive phenotypes and variation amongst varieties and accessions. The extent of genomic resources is variable between these crops; for some,

M. A. Chapman and D. Fisher

comprehensive resources are available including multiple genome sequences, marker resources and mapping populations, whereas for others the genome may be in draft form, or a ‘work in progress’. Despite the significantly reduced costs of high throughput sequencing (HTS) in recent years, going from deep sequencing to an assembled genome still represents a significant bioinformatic challenge. The highly repetitive nature of even small genomes (Charlesworth et al. 1994; Tørresen et al. 2019) makes assembly a challenge, often requiring long read sequencing and mapping populations, both of which take time and money (Li et al. 2017). And for polyploid species or those with larger genomes, this can be considered an even more significant challenge.

24.1.1.2 Establishing Resources for Underutilised Crops It is important to understand where crops originated and the partitioning of genetic variation. The location of a crop’s domestication often (but not always) marks the geographic region with greatest genetic diversity (Zeven and de Wet 1982) and can give clues as to the wild progenitor and other related species (Burger et al. 2008; Doebley et al. 2006). Genetic diversity is often lost during domestication and this can include adaptive alleles for stress resilience and resistance to pests and pathogens. In addition, wild relatives can hold novel genetic variation for traits not found in the crop (Dempewolf et al. 2017). Therefore, identifying the location of a crop’s origin is key to finding adaptive variation that can be utilised in developing and breeding cultivars with important traits (Hawkes 1991). Recent phylogenetic work has identified close relatives of some underutilised crops, for example winged bean (Yang et al. 2018), horsegram (Chap. 14), and several underutilised Vigna species (Takahashi et al. 2016). Beyond this, to expedite breeding and allow transgenic approaches for crop improvement, linking genetic variation to phenotypic variation is key, i.e. finding the genes that confer traits such as disease resistance, yield, taste, and (anti-)

24

Utilising Public Resources for Fundamental …

nutrient content. This can be achieved through quantitative trait locus (QTL) type approaches, wherein a population of individuals segregating for the trait(s) under investigation are mapped with molecular markers, or through genomewide association study (GWAS) in which hundreds of varieties/populations are phenotyped and genotyped to identify marker-trait associations (Burke et al. 2007). Developing molecular markers is often the first step in understanding crop origins and marker-trait associations. These markers often include simple sequence repeat markers (SSRs; microsatellites). SSRs are polymorphic, codominant, reproducible, and relatively low-cost markers, hence are ideal for the early stages of investigations in underutilised crops where the funding or resources might not be available. These can be developed from the large-scale sequencing of RNA, either as expressed sequence tags or RNAseq, or by sequencing the genome (Chapman 2019; Ellis and Burke 2007). SSR resources developed in this manner are available for several underutilised crops (Chapman 2015; Bellis et al. 2016; Deletre et al. 2013; Sharma et al. 2015; Tangphatsornruang et al. 2009; Wong et al. 2017).

24.1.1.3 Generating SSR Resources for Underutilised Crops HTS and assembly of the reads from the genome or transcriptome can yield potentially hundreds or thousands of SSRs, even without very deep sequencing. Several hundred SSRs can be identified from an assembly of very small numbers of HTS reads (2 million) from the genome and transcriptome, with greater sequencing affording even more markers and likely a more accurate assembly (Chapman 2019). Libraries of genomic reads usually contain chloroplast DNA (cpDNA) reads, and hence a cpDNA genome can sometimes be assembled, and SSR markers within it identified. Once markers are identified they need to be trialled to ensure they can be amplified across the species under investigation. Transcriptome and genome sequencing reads required to identify molecular markers can be generated with relative ease and low-cost using

439

HTS, or may be available from public data sources, for example the sequence read archive (SRA) at the National Centre for Biotechnology Information (https://www.ncbi.nlm.nih.gov/sra). In addition, the data for a closely related taxon can be used instead, and a portion of the identified markers should still be useable, although as the genetic distance between the focal taxon and the sequenced taxon increases, marker transferability is expected to decrease (Chapman 2019). In this chapter we show how publicly available data can be used to generate molecular markers (specifically SSRs), which could be used to establish the origins of underutilised crops, the partitioning of genetic variation, and for QTL mapping or GWAS approaches.

24.1.2 Target Species The genomes of several underutilised crops were sequenced using HTS in a recent investigation with the goal of sequencing an entire botanic garden (Liu et al. 2019). Of these, we focussed on three.

24.1.2.1 Amaranthus viridis L. (Amaranthaceae)— Slender Amaranth Several amaranth species are cultivated, either for their seeds or their leaves (National Research Council 2006). The slender amaranth is found throughout the world and cultivated in some parts, especially India, parts of Africa, and Greece, for its leaves which are boiled much like spinach. It has a high protein content and is relatively easy to cultivate, with a high yield per area under favourable conditions. Of note, it is considered a weed in some parts of the world, growing in the fields of many other crops (https:// www.cabi.org/isc/datasheet/4654). Hence, it is of interest as both an underutilised crop as well as understanding more about its potential (bio)control. Little has been published on the origin and genetic diversity of A. viridis. It was included in a phylogenetic analysis of the genus and is closely related to A. muricatus, A. deflexus, and A.

440

vulgatissimus (Waselkov et al. 2018), none of which are known to be consumed by humans.

24.1.2.2 Litchi chinensis Sonn. (Sapindaceae)—Lychee Lychee (Litchi chinensis) is the sole member of the genus Litchi. Lychee is native to Southeastern China, and cultivation extends throughout China, parts of India and Southeast Asia. The trees are cultivated for their fruits, which are high in vitamin C. Around 200 cultivars exist, but there is confusion over the relationships between these, and only a few of these are grown commercially (Menzel 2002). The fruits (especially under-ripe ones) contain methylenecyclopropylglycine which can cause encephalitis if consumed in large amounts and on an empty stomach (Shrivastava et al. 2017). Whilst the genome of lychee has not been published, work examining the transcriptome has taken place (Li et al. 2013), and analysis of unpublished data has identified Single Nucleotide Polymorphism markers (SNPs) and assessed genetic diversity in Chinese germplasm (Liu et al. 2015). Twelve SSR primer pairs are available for lychee (Viruel and Hormaza 2004), a subset of which transfer to another underutilised crop, Ackee (Blighia sapida; (Ekué et al. 2009). 24.1.2.3 Mucuna pruriens (L.) DC. (Fabaceae)—Velvet Bean Velvet bean is a forage and cover crop, and sometimes eaten by indigenous communities in India and Africa. It grows well with low water input and on poor soils. More recently it has achieved prominence because of its production of L-Dopa, a precursor of dopamine, an antiParkinson’s drug (reviewed in Sathyanarayana et al. 2017). The pods are covered in hair that is unpleasant to touch (Buckles 1995). SSR markers have been identified from the transcriptome of M. pruriens, and have been used to assess genetic diversity in a panel of Indian germplasm (Sathyanarayana et al. 2017).

M. A. Chapman and D. Fisher

24.2

Methods

24.2.1 Raw Data Download and QC The data used in this chapter was generated by Liu et al. (2019) and was downloaded from the NCBI SRA. All data is BGISEQ-500 data, paired end 100 bp reads with insert size 200 bp. Reads were downloaded with fastq-dump within biobuilds ver. 2017.11 (https://l7informatics.com/ resources/biobuilds-2017-11/). NCBI accession numbers for the reads are as follows: Amaranthus viridis (SRR7121710), Litchi chinensis (SRR7121549) and Mucuna pruriens (SRR7121922). The number of reads downloaded was between 322 and 356 M. Whilst writing this chapter, resequencing data for lychee were made available on NCBI SRA, therefore polymorphism between lychee accessions was assessed using three samples: YNYL04 (SRR15214222), YNYL06 (SRR15219642), both wild samples, and the cultivar Feizixiao (SRR15214261). 40-50 M reads were downloaded. Poor quality reads and reads containing adapter fragments were identified and removed using Trimmomatic ver. 0.32 (Bolger et al. 2014) and the settings LEADING:5 TRAILING:5 SLIDINGWINDOW:4:15. Reads shorter than 72 bp were removed.

24.2.2 Genome Assembly We used abyss-pe (Simpson et al. 2009) with a range of kmer sizes from 70 to 90 in steps of 5 to assemble the genomes. To compare the outputs, we used number of contigs, N50 and sum of total contig length. The assembly with the longest N50 was selected for downstream analysis (although we acknowledge there are a range of other metrics for assessing genome quality). Benchmarking Universal Single Copy Orthologs (BUSCO); (Simao et al. 2015), with the 1440 viridiplantae_odb10 genes, was used to determine genome completeness.

24

Utilising Public Resources for Fundamental …

24.2.3 cpDNA NOVOPlasty ver.4.1 (Dierckxsens et al. 2016) was used to assemble cpDNA genomes from the three species. Kmer was set at 33 and min and max size of the cpDNA genome set to 120 and 200 kb, respectively. The sequence of the rbcL gene from each species (or a closely related species) was used to seed the assembly. Due to the high copy number of cpDNA reads likely in a genome DNA library, a subset of 25 M reads was used. cpDNA genomes were visualised using OGDRAW (Greiner et al. 2019).

24.2.4 Mining for SSR Markers The nuclear genome assembly selected (trimmed to contigs >500 bp; see below) and the cpDNA genome were mined for SSRs using Misa.pl (Thiel et al. 2003) (http://pgrc.ipk-gatersleben.de/ misa/). The minimum number of di-, tri-, tetra-, penta-, and hexanucleotide repeats was set to 8, 6, 4, 4, and 4 for the nuclear genome and the minimum number of mono-, di-, tri-, and tetranucleotide repeats was set to 10, 8, 6, and 4 for the cpDNA genome.

24.2.5 SNP Polymorphism for Litchi Using resequencing data for three individuals, we identified SNP markers. Reads were quality trimmed as above. Reads were mapped to the genome we assembled (after trimming to contigs only 5 kb or larger) using bowtie2 (Langmead and Salzberg 2012) and the ‘very-sensitive-local’ settings. Outputs in sam format were converted to bam format using samtools ver. 1.1 (Li et al. 2009) and then to coordinated bam files using picard ver 2.8.3 (https://broadinstitute.github.io/ picard/). Sorted bam files were then merged using bcftools (within biobuilds 2017.11) mpileup, SNPs called using bcftools call and to generate a vcf file, and then SNPs were quality trimmed for depth and quality using bcftools filter and the flags -i'QUAL > 20 and DP > 10’ and -i'QUAL > 13 and DP > 5’ to give high

441

confidence and medium confidence sets of SNPs, respectively. These were then thinned to max 1 SNP per 1 kb and per 10 kb using vcftools/0.1.14 as these may be more useful for genetic mapping and GWAS approaches where tightly linked SNPs are usually avoided.

24.3

Results

24.3.1 Raw Data Quality trimming removed 4.4–6.8% of the genome reads, leaving 303.7–340.7 M reads for the three assemblies (Table 24.1). Trimming the three Litchi resequencing samples removed 3.6– 8.0% of the reads, leaving 40.5–48.2 M reads (Table 24.1).

24.3.2 Assembly Assembly statistics for the five different kmer settings are given in Table 24.2 and visualised in Fig. 24.1. The assembly with the greatest N50 was used for further analyses (kmer = 75 for Amaranthus and kmer = 80 for Litchi and Mucuna). As expected, all assemblies were fragmented, with contig N50 ranging from 7.6 kb (Litchi) to 22.2 kb (Amaranthus). Total assembly size (after trimming to contigs > 500 bp) for Amaranthus and Mucuna represented about 66 and 57% of the estimated genome size, whereas for Litchi this represented 131% of the genome size, potentially indicating a heterozygous accession wherein divergent alleles had not assembled correctly. We used BUSCO to determine the percentage of universal single copy orthologs present, fragmented, duplicated, and missing in each of the three selected genomes (Table 24.3). The metrics were reasonable for Amaranthus, with 85% of BUSCOs present and in single copy. For Mucuna the percentage was lower, with an increased number of duplicated genes, and for Litchi the percentage of duplicated BUSCOs was as high as the percentage of single copy. This backs up our prediction, based on the genome assembly

442

M. A. Chapman and D. Fisher

Table 24.1 Number of reads before and after quality trimming, and their sources Species

Cultivar

SRA accession

Raw reads (millions)

Input reads (millions)

A. viridis

not given

SRR7121710

332.7

310.1

L. chinensis

not given

SRR7121549

322.7

303.7

M. pruriens

not given

SRR7121922

356.2

340.7

L. chinensis

Feizixiao

SRR15214261

50.0

48.2

L. chinensis (wild)

YNYL06

SRR15219642

44.0

40.5

L. chinensis (wild)

YNYL04

SRR15214222

50.0

46.1

Table 24.2 Assembly metrics for five different kmer settings for three underutilised crops. N, number, assemblies in bold italics were chosen for downstream analyses Species

kmer

N contigs

N contigs >500 bp

N50

max contig length (bp)

sum contig length (>500 bp contigs) (MB)

A. viridis

k70

448,682

46,271

21,656

193,880

354

A. viridis

k75

372,578

46,267

22,205

193,913

358

A. viridis

k80

321,725

46,951

22,058

190,521

361

A. viridis

k85

273,914

48,313

21,175

176,087

363

A. viridis

k90

237,892

51,001

19,313

172,083

363

L. chinensis

k70

929,722

215,643

7121

178,999

863

L. chinensis

k75

762,497

211,528

7579

178,954

888

L. chinensis

k80

644,986

212,514

7658

188,582

899

L. chinensis

k85

598,800

235,436

6345

178,826

884

L. chinensis

k90

698,422

330,414

3585

161,704

822

M. pruriens

k70

1,551,522

184,991

7367

200,108

652

M. pruriens

k75

1,233,874

196,532

7372

132,425

713

M. pruriens

k80

937,112

194,718

7705

134,221

756

M. pruriens

k85

748,409

200,623

7321

151,174

762

M. pruriens

k90

642,079

211,563

6187

151,446

684

being >30% larger than the expected genome size (above) that regions of the genome with divergent alleles had not assembled correctly, leading to a high level of redundancy.

24.3.3 cpDNA Between 1.66 and 3.67 M of the 25 M reads were cpDNA reads, giving coverage of 1067 to 2475 X. Chloroplast genome sizes were 150,310 bp for Amaranthus, 162,524 bp for Litchi and 154,834 bp for Mucuna (Table 24.4; Fig. 24.2).

24.3.4 SSRs Due to potential problems designing primers for SSRs near the ends of contigs (Chapman 2019), we identified SSR in both the full genome and in an assembly for which smaller contigs ( 1 kb and > 10 kb between SNPs, which reduced the number of SNPs to 300–306 K and 65–66 K, respectively (Table 24.6).

24.4

Discussion

Underutilised and orphan crop species traditionally suffer from a lack of international recognition and funding; however, more recently are being identified as having traits which could make them increasingly important with climate change (Li et al. 2020; Siddique et al. 2021). It is important at this stage to start to investigate these crops, to learn more about their origins and genetic diversity, and to pave the way for more in-depth analyses of select species which offer the most attributes in a changing climate with a growing population. The starting point for understanding underutilised and orphan crops is often determining the partitioning of genetic variation in the crop, the location of the crop’s domestication, and to identify closely related species. Whilst the latter can be carried out sequencing universal phylogenetic markers, for example conserved cpDNA regions and the nuclear internal transcribed spacer (ITS) (see e.g. Yang et al. ( 2018) for an example in the underutilised crop winged bean), the former require markers to be developed for the focal

445

species or a very close relative. This is because cpDNA and ITS are not expected to be highly variable within a species; instead, highly polymorphic markers are needed, and the most common markers are typically simple sequence repeat markers (SSRs; aka microsatellite markers). These are markers of choice because they are relatively low-cost, medium-throughput, reproducible, and co-dominant (Ellis and Burke 2007). Before high throughput sequencing (HTS) was commonplace, SSR markers were typically mined from SSR-enriched libraries, for example in the underutilised crops pigeon pea (Saxena et al. 2010) and Bambara groundnut (Basu et al. 2007), and from expressed sequence tag (EST) libraries, for example in horse gram (Sharma et al. 2015) and safflower (Chapman et al. 2009). It is now quite routine to develop SSRs using HTS (Hodel et al. 2016). HTS is a cheap (per base pair) method to sequence plant genomes and transcriptomes, and markers can be developed from the assembly, even if the assembly is not optimised, which can be bioinformatically challenging and require more than a small amount of HTS. Recently, Chapman (2019) showed that as few as 2 million HTS reads can generate an assembly with dozens to a few hundred SSR markers. Moderate sequencing depth, up to 20 million reads, resulted in the identification of ca. 7000–27,000 potential SSRs across Arabidopsis, soybean, tomato, rice, and Lablab. SSRs have been mined from transcriptome and genome sequences for several underutilised crops. To demonstrate the utility of these markers,

Table 24.5 SSR abundance in the genome assemblies of three underutilised crops. Di-, tri-, tetra-, penta-, and hexarefer to SSR motifs Species

kmer

Total N SSRs

di-

tri-

tetra-

penta-

hexa-

SSRs per kb

A. viridis

k75

34,216

10,701

14,615

5277

1834

1789

0.085

A. viridis

k75 (>500 bp)

31,129

8,862

13,725

5089

1794

1659

0.087

L. chinensis

k80

171,866

96,665

37,574

26,211

8576

2840

0.177

L. chinensis

k80 (>500 bp)

162,254

91,236

34,897

25,159

8232

2730

0.180

M. pruriens

k80

98,760

45,740

24,544

18,280

6215

3981

0.113

M. pruriens

k80 (>500 bp)

85,979

40,643

19,239

16,995

5748

3354

0.113

446

M. A. Chapman and D. Fisher

Fig. 24.3 Numbers of SSRs of different sizes in the assembled genomes of Amaranthus, Litchi and Mucuna. Results are given for the full assembly as well after removing short contigs (see text for details) Table 24.6 Number of SNPs in the alignment of three Litchi accessions. Two different thresholds were used to filter the SNPs (QUAL and DP) and then the data were trimmed (column ‘thin’) to max 1 SNP per kb and max 1 SNP per 10 kb. QUAL, quality score; DP, depth; N, number QUAL

DP

Thin (kb)

N SNPs

20

10



10,345,598

20

10

1

300,111

20

10

10

65,355

13

5



11,059,351

13

5

1

306,516

13

5

10

66,062

Chapman (2015) developed SSR markers for four underutilised legumes, and for two (lablab and winged bean), subsequent population genetic analyses have been carried out to discern crop origins and the partitioning of genetic diversity (Robotham and Chapman 2015; Yang et al. 2018). SSRs can often be mined from one organism and transferred to a congener, hence working on an underutilised crop closely related to a sequenced organism means that HTS resources are not necessarily needed for the focal species. Chapman (2019) also examined transferability across species, using SSRs identified from tomato. SSRs from genome libraries were less transferable across species than SSRs from the transcriptome, however, were more variable. Other underutilised crops have benefitted from

HTS resources and SSRs from a related taxon. For example population genetic analyses of zombi pea (Vigna vexillata (L.) A. Rich) and Bambara groundnut (Vigna subterranea (L.) Verdc.) have been carried out using markers developed from adzuki bean, mung bean and cowpea (Dachapak et al. 2017; Somta et al. 2011). Similarly, ca. 40% of 134 SSR markers from a variety of legumes amplified and were polymorphic in Lablab and about 20% of SSRs from Satsuma mandarin amplified and were polymorphic in ca. 20 other Citrus species including underutilised species (Rai et al. 2016; Dixit et al. 2010). In our study, draft genomes were assembled for three underutilised crop species using publicly available data generated for another purpose (Liu et al. 2019). Whilst the assemblies were

24

Utilising Public Resources for Fundamental …

clearly fragmented and in at least once case contained a high degree of duplication, it was not the goal of the chapter to generate a polished assembly. Indeed, the purpose is that resources can be developed relatively simply, with little cost (or no cost if the data are already available), and without the need to generate an optimised assembly. All programmes and scripts are freely available, do not require significant bioinformatic experience, and these, or similar trimming and assembly programmes are free-to-use on the galaxy server (https://usegalaxy.org/). Ultimately the assemblies contained 34,000– 172,000 potential SSR markers and therefore represent rich resources for population genetic analyses, genetic mapping and understanding crop origins and relationships. Whilst it is assumed that a subset will fail to amplify well or will lack polymorphism, as is typical for SSRs derived from any source, clearly these draft genomes are significant resources. It is also pertinent to note uses of these markers in other taxa and for reasons other than population genetic analysis. First, there are other species of economic or ecological importance related to the target taxa here which could benefit from these markers after testing for cross-species amplification. Amaranthus viridis is related to other horticultural and edible amaranths, Litchi chinensis is related to rambutan (same subfamily) and longan and ackee (same family), and several Mucuna species are all named velvet bean, not only Mucuna pruriens, our focal taxon. We would anticipate that a portion of these markers can be used in these related taxa. As an example, it was previously shown that of 12 Litchi SSR markers, four were transferable and polymorphic in ackee (Ekué et al. 2009). Second, polymorphic markers such as SSRs can be used in molecular breeding approaches. Whilst this needs phenotype data and/or other sources (Varshney and Dubey 2009), the SSRs we have identified can be used in molecular breeding if they are in the vicinity of QTL for adaptive traits which would be of interest to breeders.

447

Also unexplored in this chapter is the use of a draft genome to identify candidate genes underlying agronomic traits, based on data from other species. There are several examples of orthologous loci controlling agronomic phenotypes in diverse crops. For example a YABBY transcription factor, Shattering1, controls grain shattering in sorghum, rice, and maize (Lin et al. 2012), and based on QTL mapping it seems that *40% of QTL controlling fruit weight, shape, and colour QTL are present in multiple Solanaceaous crops (e.g. tomato, eggplant, pepper) suggesting orthologous loci are involved (Doganlar et al. 2002). Knowing how a trait is controlled in one species may therefore be an effective way to fasttrack the testing of candidate loci in another species. Using this candidate gene approach, it has been shown in legumes that orthologues of the Arabidopsis TERMINAL FLOWER1 gene confer determinacy (Repinski et al. 2012; Liu et al. 2010). For the species investigated here, we could mine their genomes for loci orthologues to those from related species that control traits which are current breeding targets. For example leaf and seed size in Medicago and soybean are partially controlled by the plant-specific transcription regulator BIG SEEDS1 (Ge et al. 2016); this could be identified in the draft genome of Mucuna and an association analysis conducted to compare organ size with alleles at this locus. In Amaranths, breeding targets include increased seed size, reduction in seed shattering, and increased seed protein, amongst others (Kauffman 1992), and candidate genes for all of these are present in other crops (Meyer and Purugganan 2013). Again, these candidates can be identified in a draft genome and tested for association with the phenotype of interest. Finally, one should remember that short read data is often used to supplement long read data in genome assemblies as it usually covers the genome at greater depth than the long read data. Hence, any data generated for identifying molecular markers could well be of value to future endeavours to assemble chromosome-

448

scale genome sequences. Ensuring the data is stored in its raw format, and even storing the DNA and/or the tissue from which the genome sequence was generated may be of use in the future, and highlights how research teams can work together to reach similar goals. If researchers would like to use the SSR resources developed here, or assistance in generating the same for their species of interest we would welcome their contact.

References Asseng S, Ewert F, Martre P, Rotter RP, Lobell DB, Cammarano D et al (2015) Rising temperatures reduce global wheat production. Nat Clim Chang 5(2):143– 147. https://doi.org/10.1038/nclimate2470 Basu S, Roberts JA, Azam-Ali SN, Mayes S (2007) Development of microsatellite markers for bambara groundnut (Vigna subterranea L. Verdc.)—an underutilized African legume crop species. Mol Ecol Notes 7(6):1326–1328. https://doi.org/10.1111/j.14718286.2007.01870.x Bolger A, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120 Buckles D (1995) Velvetbean: a “new” plant with a history. Econ Bot 49(1):13–25. https://doi.org/10. 1007/BF02862271 Burger JC, Chapman MA, Burke JM (2008) Molecular insights into the evolution of crop plants. Am J Bot 95:113–122 Burke JM, Burger JC, Chapman MA (2007) Crop evolution: from genetics to genomics. Curr Opin Genet Dev 17(6):525–532 Campbell BM, Vermeulen SJ, Aggarwal PK, CornerDolloff C, Girvetz E, Loboguerrero AM et al (2016) Reducing risks to food security from climate change. Glob Food Sec 11:34–43. https://doi.org/10.1016/j. gfs.2016.06.002 Chapman MA (2015) Transcriptome sequencing and marker development for four underutilized legumes. Appl Plant Sci 3(2):1400111 Chapman MA (2019) Optimizing depth and type of highthroughput sequencing data for microsatellite discovery. Appl Plant Sci 7(11):e11298. https://doi.org/10. 1002/aps3.11298 Chapman MA, Hvala J, Strever J, Matvienko M, Kozik A, Michelmore RW et al (2009) Development, polymorphism, and cross-taxon utility of EST-SSR markers from safflower (Carthamus tinctorius L.). Theor Appl Genet 120(1):85–91. https://doi.org/10.1007/s00122009-1161-8

M. A. Chapman and D. Fisher Charlesworth B, Sniegowski P, Stephan W (1994) The evolutionary dynamics of repetitive DNA in Eukaryotes. Nature 371(6494):215–220 Dachapak S, Somta P, Poonchaivilaisak S, Yimram T, Srinives P (2017) Genetic diversity and structure of the zombi pea (Vigna vexillata (L.) A. Rich) gene pool based on SSR marker analysis. Genetica 145(2):189– 200. https://doi.org/10.1007/s10709-017-9957-y De Bellis F, Malapa R, Kagy V, Lebegin S, Billot C, Labouisse JP (2016) New development and validation of 50 SSR markers in breadfruit (Artocarpus altilis, Moraceae) by next-generation sequencing. Appl Plant Sci 4(8). https://doi.org/10.3732/apps.1600021 Deletre M, Soengas B, Utge J, Lambourdiere J, Sorensen M (2013) Microsatellite markers for the yam bean Pachyrhizus (Fabaceae). Appl Plant Sci 1 (7). https://doi.org/10.3732/apps.1200551 Dempewolf H, Baute G, Anderson J, Kilian B, Smith C, Guarino L (2017) Past and future use of wild relatives in crop breeding. Crop Sci 57(3):1070–1082. https:// doi.org/10.2135/cropsci2016.10.0885 Dierckxsens N, Mardulyn P, Smits G (2016) NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res45(4):e18-e. https://doi.org/10.1093/nar/gkw955 Dixit A, Chung JW, Zhao WG, Lee GA, Lee DH, Ma KH et al (2010) Development of new microsatellite markers for molecular diversity analysis of Citrus species. J Hortic Sci Biotechnol 85(6):521–527. https://doi.org/10.1080/14620316.2010.11512708 Doebley JF, Gaut BS, Smith BD (2006) The molecular genetics of crop domestication. Cell 127(7):1309– 1321 Doganlar S, Frary A, Daunay MC, Lester RN, Tanksley SD (2002) Conservation of gene function in the Solanaceae as revealed by comparative mapping of domestication traits in eggplant. Genetics 161 (4):1713–1726 Ekué MR, Gailing O, Finkeldey R (2009) Transferability of Simple Sequence Repeat (SSR) markers developed in Litchi chinensis to Blighia sapida (Sapindaceae). Plant Mol Biol Report 27(4):570–574. https://doi.org/ 10.1007/s11105-009-0115-2 Ellis JR, Burke JM (2007) EST-SSRs as a resource for population genetic analyses. Heredity 99:125–132 Ge L, Yu J, Wang H, Luth D, Bai G, Wang K et al (2016) Increasing seed size and quality by manipulating BIG SEEDS1 in legume species. Proc Natl Acad Sci 113(44): 12414–12419. https://doi.org/10.1073/pnas.1611763113 Godfray HCJ, Beddington JR, Crute IR, Haddad L, Lawrence D, Muir JF et al (2010) Food security: the challenge of feeding 9 billion people. Science 327 (5967):812–818. https://doi.org/10.1126/science.1185383 Greiner S, Lehwark P, Bock R (2019) OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes. Nucleic Acids Res 47(W1):W59– W64. https://doi.org/10.1093/nar/gkz238

24

Utilising Public Resources for Fundamental …

Haq N (1983) New food legume crops for the tropics. Ciba Found Symp 97:144–160 Hawkes JG (1991) 1. The importance of genetic resources in plant breeding. Biol J Linn Soc 43(1):3–10. https:// doi.org/10.1111/j.1095-8312.1991.tb00578.x Hodel RGJ, Gitzendanner MA, Germain-Aubrey CC, Liu XX, Crowl AA, Sun M et al (2016) A new resource for the development of SSR markers: millions of loci from a thousand plant transcriptomes. Appl Plant Sci 4(6):1600024. https://doi.org/10.3732/ apps.1600024 Kauffman CS (1992) Realizing the potential of grain amaranth. Food Rev Int 8(1):5–21. https://doi.org/10. 1080/87559129209540927 Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat Methods 9(4):357– 359. https://doi.org/10.1038/nmeth.1923 Li C, Lin F, An D, Wang W, Huang R (2017) Genome sequencing and assembly by long reads in plants. Genes 9(1):6. https://doi.org/10.3390/genes9010006 Li C, Wang Y, Huang X, Li J, Wang H, Li J (2013) De novo assembly and characterization of fruit transcriptome in Litchi chinensis Sonn and analysis of differentially regulated genes in fruit in response to shading. BMC Genomics 14(1):552. https://doi.org/ 10.1186/1471-2164-14-552 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N et al (2009) The sequence alignment/Map format and SAMtools. Bioinformatics 25(16):2078– 2079. https://doi.org/10.1093/bioinformatics/btp352 Li X, Yadav R, Siddique KHM (2020) Neglected and underutilized crop species: the key to improving dietary diversity and fighting hunger and malnutrition in Asia and the Pacific. Front Nutr 7:593711. https:// doi.org/10.3389/fnut.2020.593711 Lin Z, Li X, Shannon LM, Yeh C-T, Wang ML, Bai G et al (2012) Parallel domestication of the Shattering1 genes in cereals. Nat Genet 44(6):720–724. https://doi. org/10.1038/ng.2281 Liu B, Watanabe S, Uchiyama T, Kong F, Kanazawa A, Xia Z et al (2010) The soybean stem growth habit gene Dt1 is an ortholog of Arabidopsis TERMINAL FLOWER1. Plant Physiol 153(1):198–210. https://doi. org/10.1104/pp.109.150607 Liu H, Wei J, Yang T, Mu W, Song B, Yang T et al (2019) Molecular digitization of a botanical garden: high-depth whole-genome sequencing of 689 vascular plant species from the Ruili Botanical Garden. GigaScience 8(4):giz007. https://doi.org/10.1093/ gigascience/giz007 Liu W, Xiao Z, Bao X, Yang X, Fang J, Xiang X (2015) Identifying Litchi (Litchi chinensis Sonn.) Cultivars and their genetic relationships using single nucleotide polymorphism (SNP) markers. Plos One 10(8): e0135390. https://doi.org/10.1371/journal.pone. 0135390 Menzel C (2002) the lychee crop in Asia and the Pacific. Bangkok, Thailand: FAO Regional Office for Asia and the Pacific

449 Meyer RS, Purugganan MD (2013) Evolution of crop species: genetics of domestication and diversification. Nat Rev Genet 14(12):840–852 Mickelbart MV, Hasegawa PM, Bailey-Serres J (2015) Genetic mechanisms of abiotic stress tolerance that translate to crop yield stability. Nat Rev Genet 16 (4):237–251. https://doi.org/10.1038/nrg3901 National Research Council (2006) Lost crops of Africa: volume II: vegetables. The National Academies Press, Washington, DC NAS (1975) Underexploited tropical plants with promising economic value. National Academy of Sciences, Washington, D.C., USA Rai N, Kumar S, Singh RK, Rai K, Tiwari G, Kashyap SP et al (2016) Genetic diversity in Indian bean (Lablab purpureus) accessions as revealed by quantitative traits and cross-species transferable SSR markers. Indian J Agric Sci 86:1193–1200 Repinski SL, Kwak M, Gepts P (2012) The common bean growth habit gene PvTFL1y is a functional homolog of Arabidopsis TFL1. Theor Appl Genet 124(8):1539– 1547. https://doi.org/10.1007/s00122-012-1808-8 Robotham O, Chapman MA (2015) Population genetic analysis of hyacinth bean (Lablab purpureus (L.) Sweet, Leguminosae) indicates an East African origin and variation in drought tolerance. Genet Resour Crop Evol 64:139–148 Sathyanarayana N, Pittala RK, Tripathi PK, Chopra R, Singh HR, Belamkar V et al (2017) Transcriptomic resources for the medicinal legume Mucuna pruriens: de novo transcriptome assembly, annotation, identification and validation of EST-SSR markers. BMC Genomics 18:409. https://doi.org/10.1186/s12864-017-3780-9 Saxena RK, Prathima C, Saxena KB, Hoisington DA, Singh NK, Varshney RK (2010) Novel SSR Markers for Polymorphism Detection in Pigeonpea (Cajanus spp.). Plant Breed 129(2):142–148. https://doi.org/10. 1111/j.1439-0523.2009.01680.x Sharma V, Rana M, Katoch M, Sharma PK, Ghani M, Rana JC et al (2015) Development of SSR and ILP markers in horsegram (Macrotyloma uniflorum), their characterization, cross-transferability and relevance for mapping. Mol Breed 35(4):102. https://doi.org/ 10.1007/s11032-015-0297-2 Shrivastava A, Kumar A, Thomas JD, Laserson KF, Bhushan G, Carter MD et al (2017) Association of acute toxic encephalopathy with litchi consumption in an outbreak in Muzaffarpur, India, 2014: a casecontrol study. Lancet Glob Health 5(4):e458–e466. https://doi.org/10.1016/S2214-109X(17)30035-9 Siddique KHM, Li X, Gruber K (2021) Rediscovering Asia’s forgotten crops to fight chronic and hidden hunger. Nat Plants. 7(2):116–122. https://doi.org/10. 1038/s41477-021-00850-z Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with singlecopy orthologs. Bioinformatics 31(19):3210–3212. https://doi.org/10.1093/bioinformatics/btv351

450 Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJM, Birol I (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19 (6):1117–1123. https://doi.org/10.1101/gr.089532.108 Somta P, Chankaew S, Rungnoi O, Srinives P (2011) Genetic diversity of the Bambara groundnut (Vigna subterranea (L.) Verdc.) as assessed by SSR markers. Genome54(11):898–910. https://doi.org/10.1139/g11056 Takahashi Y, Somta P, Muto C, Iseki K, Naito K, Pandiyan M et al (2016) Novel genetic resources in the genus Vigna unveiled from gene bank accessions. Plos One 11(1):e0147568. https://doi.org/10.1371/ journal.pone.0147568 Tangphatsornruang S, Somta P, Uthaipaisanwong P, Chanprasert J, Sangsrakru D, Seehalak W et al (2009) Characterization of microsatellites and gene contents from genome shotgun sequences of mungbean (Vigna radiata (L.) Wilczek). BMC Plant Biol 9:137 Thiel T, Michalek W, Varshney RK, Graner A (2003) Exploiting EST databases for the development and characterization of gene-derived SSR-markers in barley (Hordeum vulgare L.). Theor Appl Genet 106 (3):411–422 Tørresen OK, Star B, Mier P, Andrade-Navarro MA, Bateman A, Jarnot P et al (2019) Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases. Nucleic Acids Res 47(21):10994–11006. https://doi.org/10. 1093/nar/gkz841 Varshney RK, Dubey A (2009) Novel genomic tools and modern genetic and breeding approaches for crop

M. A. Chapman and D. Fisher improvement. J Plant Biochem Biotechnol 18(2):127– 138 Viruel MA, Hormaza JI (2004) Development, characterization and variability analysis of microsatellites in lychee (Litchi chinensis Sonn., Sapindaceae). Theor Appl Genet 108(5):896–902. https://doi.org/10.1007/ s00122-003-1497-4 Waselkov KE, Boleda A, Olsen K (2018) A Phylogeny of the genus Amaranthus (Amaranthaceae) based on several low-copy nuclear loci and chloroplast regions. Syst Bot 43:439–458 Williams JT, Haq N (2000) Global research on underutilized crops. An assessment of current activities and proposals for enhanced cooperation Wong QN, Tanzi AS, Ho WK, Malla S, Blythe M, Karunaratne A et al (2017) Development of GeneBased SSR Markers in Winged Bean (Psophocarpus tetragonolobus (L.) DC.) for Diversity Assessment. Genes 8(3). https://doi.org/10.3390/genes8030100 Yang S, Grall A, Chapman MA (2018) Origin and diversification of winged bean (Psophocarpus tetragonolobus (L.) DC.; Fabaceae) a multi-purpose underutilised legume. Am J Bot 105:888–897 Zeven AC, de Wet JMJ (1982) Dictionary of cultivated plants and their regions of diversity, 2nd edn. Centre for Agricultural Publishing and Documentation, Wageningen, Netherlands Zhao C, Liu B, Piao S, Wang X, Lobell DB, Huang Y et al (2017) Temperature increase reduces global yields of major crops in four independent estimates. Proc Natl Acad Sci USA 114(35):9326–9331.https:// doi.org/10.1073/pnas.1701762114